aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-12-11Abstract out definition of table dumping functions.Arun Isaac
* dump.scm (define-dump): New macro. (dump-species, dump-strain, dump-mapping-method, dump-inbred-set, dump-phenotype, dump-publication, dump-publish-xref, dump-tissue, dump-investigators, dump-avg-method, dump-gene-chip, dump-info-files): Redefine using define-dump.
2021-12-11Use string similarity and check if foreign key is an integer.Arun Isaac
* dump.scm (<column>): New type. (tables): Use <column> objects to represent columns. (trigrams, jaccard-index, jaccard-string-similarity): New functions. (dump-schema): Use string similarity and check if foreign key is an integer.
2021-12-11Remove rdflib python code.Arun Isaac
* rdf.py: Delete file.
2021-12-11Visualize schema.Arun Isaac
* .dir-locals.el (scheme-mode): Indent set-table-columns correctly. * dump.scm: Import (srfi srfi-9 gnu). (%database-name): New variable. (<table>): New type. (tables, string-remove-suffix-ci, human-units, graph->dot, dump-schema): New functions. Invoke dump-schema. * guix.scm: Import (gnu packages bioinformatics). Add ccwl, graphviz and guile-libyaml to the manifest.
2021-12-11Use select-query.Arun Isaac
* dump.scm (get-tables-from-comments, dump-table-fields, dump-species, dump-strain, dump-mapping-method, dump-inbred-set, dump-phenotype, dump-publication, dump-publish-xref, dump-tissue, dump-investigators, dump-avg-method, dump-gene-chip, dump-info-files): Use select-query.
2021-12-11Implement S-expression like SQL select query.Arun Isaac
* dump/sql.scm: Import (srfi srfi-1). Export select-query. (select-query): New macro.
2021-12-04Add emacs directory local variables.Arun Isaac
* .dir-locals.el: New file.
2021-12-04Remove redundant camel->lower-camel function.Arun Isaac
* dump.scm (camel->lower-camel): Delete function. (default-metadata-proc): Do not use camel->lower-camel.
2021-12-04Build subjects exclusively with string->identifier.Arun Isaac
* dump.scm (dump-mapping-method, dump-publication, dump-info-files): Use string->identifier to build subjects.
2021-12-04Append an underscore to the identifier prefix.Arun Isaac
This is slightly more readable. * dump.scm (string->identifier): Append an underscore to the identifier prefix.
2021-12-04Fix indentation.Arun Isaac
* dump.scm (get-tables-from-comments, dump-table-fields): Fix indentation.
2021-12-04Use the map-alist DSL.Arun Isaac
* dump.scm: Import (dump utils). (string-blank?): New function. (scm->triples): Filter out triples with #f or blank string objects. (process-metadata-alist): Delete function. (default-metadata-proc): New function. (dump-species, dump-strain, mapping-method-name->id, dump-inbred-set, dump-phenotype, dump-publication, dump-publish-xref, dump-tissue, dump-investigators, dump-avg-method, dump-gene-chip, dump-info-files): Use map-alist.
2021-12-04Implement the map-alist DSL.Arun Isaac
map-alist is a DSL to transform one association list into another. These transformations are frequently required when dumping tables, especially metadata tables. * dump/utils.scm: New file.
2021-12-02Construct investigator ID using first and last names too.Arun Isaac
* dump.scm (investigator-email->id): Rename to investigator-attributes->id. Use first and last names in addition to the email ID. (dump-investigators): Use investigator-attributes->id. Include records that have no email ID. (dump-info-files): Use investigator-attributes->id. Include records that have no email ID.
2021-12-02Use string-delete instead of string-replace-substring.Arun Isaac
For the simple task of removing spaces, string-delete is sufficient. string-replace-substring is overkill. * dump.scm (fix-email-id): Use string-delete instead of string-replace-substring.
2021-12-02Abstract out string->identifier.Arun Isaac
Building a turtle identifier from a string after removing illegal characters and prefixing is an extremely common operation. Abstract it. Also, mandate identifier prefixes. It is better to play it safe. * dump.scm (string->identifier): New function. (binomial-name->species-id, dump-strain, mapping-method-name->id, inbred-set-name->id, aphenotype-id->id, tissue-short-name->id, investigator-email->id, avg-method-name->id, gene-chip-name->id): Use string->identifier.
2021-12-02Document delete-substrings.Arun Isaac
* dump.scm (delete-substring): Add docstring.
2021-12-01Deal with AvgMethodId = 0.Arun Isaac
* dump.scm (dump-info-files): Deal with AvgMethodId.
2021-12-01Use InfoFileTitle instead of InfoPageTitle for dataset name.Arun Isaac
Not all datasets have a non-NULL InfoPageTitle field. * dump.scm (dump-info-files): Use InfoFileTitle instead of InfoPageTitle for dataset name.
2021-12-01Extract name of dataset group.Arun Isaac
* dump.scm (dump-info-files): Extract name of dataset group.
2021-12-01Do not link inbred-set to mapping-method.Arun Isaac
Not all inbred sets have a mapping method, and the mapping method of the inbred set has, so far, not been used anywhere. * dump.scm (mapping-method-name->id, dump-mapping-method): Mark as unused. (dump-inbred-set): Do not link inbred-set to mapping-method.
2021-12-01Allow N/A avg method.Arun Isaac
* dump.scm (dump-avg-method): Allow N/A in name. (dump-info-files): Allow N/A in avg-method-name. (avg-method-name->id): Replace / with _.
2021-12-01Remove gn:geoSeries when value starts with "No Geo Series...".Arun Isaac
* dump.scm (dump-info-files): Remove gn:geoSeries when value starts with "No Geo Series...".
2021-12-01In the gn: prefix, use http instead of https.Arun Isaac
* dump.scm: In the gn:prefix, use http instead of https.
2021-12-01Remove dependency on python-rdflib.Arun Isaac
* guix.scm: Remove python-rdflib. (python-berkeleydb, python-rdflib-6): Delete variables.
2021-12-01Replace guix environment with guix shell.Arun Isaac
guix environment is deprecated. * guix.scm: Replace guix environment with guix shell.
2021-12-01Remove unrequired rdfs: prefix.Arun Isaac
* dump.scm: Remove rdfs: prefix.
2021-11-11Fix unbalanced parentheses.Arun Isaac
Parentheses became unbalanced due to my careless git use. * dump.scm: Fix unbalanced parentheses.
2021-11-09Use upstream guile-dbi and guile-dbd-mysql.Arun Isaac
* guix.scm: Do not prefix guix: in importing (gnu packages guile-xyz). (guile-dbi, guile-dbi-bootstrap, guile-dbd-mysql): Delete variables.
2021-09-14Dump InfoFiles.Arun Isaac
* dump.scm (dump-info-files): New function. [main]: Call dump-info-files.
2021-09-14Abstract out deleting substrings.Arun Isaac
* dump.scm (delete-substrings): New function. (dump-publication): Replace string-replace-substring with delete-substrings.
2021-09-14Update python-rdflib package.Arun Isaac
* guix.scm: Import (gnu packages dbm), (guix build-system python) and (guix download). (python-berkeleydb, python-rdflib-6): New variables. [mainfest]: Replace python-rdflib with python-rdflib-6. Remove python-urrlib3.
2021-09-14Update guile-dbi package.Arun Isaac
* guix.scm (guile-dbi): Update to 2.1.8. (guile-dbi-bootstrap): Inherit from guile-dbi instead of guix:guile-dbi. Update comments about contributing upstream.
2021-09-14Do not define sub-properties of rdfs:label.Arun Isaac
* dump.scm (dump-publication, dump-tissue): Do not define any property to be a sub-property of rdfs:label.
2021-09-14Dump GeneChip.Arun Isaac
* dump.scm (gene-chip-name->id, dump-gene-chip): New functions. [main]: Call dump-gene-chip.
2021-09-14Dump AvgMethod.Arun Isaac
* dump.scm (avg-method-name->id, dump-avg-method): New functions. [main]: Call dump-avg-method.
2021-09-14Delete unused camel->kebab function.Arun Isaac
* dump.scm (camel->kebab): Delete function.
2021-09-09Dump Investigators.Arun Isaac
* dump.scm (fix-email-id, investigator-email->id, dump-investigators): New functions. Invoke dump-investigators.
2021-09-09Dump Tissue.Arun Isaac
* dump.scm (tissue-short-name->id, dump-tissue): New functions. Invoke dump-tissue.
2021-09-09Add foaf prefix.Arun Isaac
* dump.scm: Add foaf prefix.
2021-09-09Abstract out prefix entries.Arun Isaac
* dump.scm (prefix): New function. Use prefix.
2021-08-27Initial commitArun Isaac