aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-12-20Move schema visualization to separate script.Arun Isaac
* dump.scm: Do not import (sxml simple) and (dump string-similarity). (string-remove-suffix-ci, floor-log1024, human-units, human-units-color, sxml->xml-string, sxml->graphviz-html, table-label, table->graphviz-node, column->foreign-table, tables->graphviz-edges): Move to ... (dump-schema): Dump schema to RDF. (main): Call dump-schema without setting schema.dot as the output file. * visualize-schema.scm: ... here.
2021-12-20Add guile-sparql to Guix manifest.Arun Isaac
* guix.scm: Add guile-sparql to manifest.
2021-12-20Upgrade ccwl to latest commit.Arun Isaac
* guix.scm (ccwl): Upgrade to commit 51c12b7e58685b70e7cfd9612dac403cf9ee845c.
2021-12-20Capture full column type.Arun Isaac
Capture full column type instead of just whether it is an integer. * dump.scm (dump-data-table): Capture full column type in <column> object. * dump/table.scm (<column>)[int?]: Delete member. [type]: New member. Export column-type instead of column-int?.
2021-12-20Move <table> and <column> types to separate module.Arun Isaac
* dump.scm (<table>, <column>): Move to ... * dump/table.scm: ... here.
2021-12-17Indent define-dump better.Arun Isaac
* dump.scm (define-dump): Indent better.
2021-12-17Document RDF schema during dumping.Arun Isaac
* dump.scm (define-dump): Support schema-triples clause. (dump-strain, dump-publish-xref, dump-info-files): Add schema-triples clause. (main): Output rdfs: prefix.
2021-12-17Make order of clauses in define-dump unspecified.Arun Isaac
* dump.scm (find-clause): New function. (define-dump): Make order of clauses unspecified.
2021-12-16Make define-dump syntax more concise.Arun Isaac
* dump.scm (field->key, field->assoc-ref, collect-fields): New functions. (define-dump): Redefine with more concise syntax. * dump.scm (dump-species, dump-strain, dump-mapping-method, dump-inbred-set, dump-phenotype, dump-publication, dump-publish-xref, dump-tissue, dump-investigators, dump-avg-method, dump-gene-chip, dump-info-files): Use new define-dump syntax. (default-metadata-proc): Delete function. * .dir-locals.el (scheme-mode): Indent triples form correctly.
2021-12-16Add tests.Arun Isaac
* tests.scm: New file.
2021-12-16Specify map-alist behaviour for multiple set verbs.Arun Isaac
* dump/utils.scm (map-alist): Specify behaviour for multiple set verbs.
2021-12-16Generalize collect-keys and key->assoc-ref.Arun Isaac
The generalized versions---collect forms and translate-forms---will be required by other macros. * dump/utils.scm (collect-forms, translate forms): New public functions. (collect-keys): Rewrite in terms of collect-forms. (key->assoc-ref): Rewrite in terms of translate-forms.
2021-12-16Rename away delete from (srfi srfi-1).Arun Isaac
delete from (srfi srfi-1) somehow interferes with the delete verb of map-alist. It is not clear why. * dump/utils.scm (dump): Rename delete to srfi:delete while importing.
2021-12-15Move string similarity functions to separate module.Arun Isaac
* dump.scm: Use (dump string-similarity). (trigrams, jaccard-index, jaccard-string-similarity, jaccard-string-similar?): Move to ... * dump/string-similarity.scm: ... here.
2021-12-14Camel case gn:binomialName.Arun Isaac
* dump.scm (dump-species): Change gn:binomialname to gn:binomialName.
2021-12-14Use node ports to indicate foreign key relations precisely.Arun Isaac
* dump.scm (table-label): Set port attributes on <td> tag. (tables->graphviz-edges): Specify ports on edges.
2021-12-14Specify appearance using HTML table.Arun Isaac
* dump.scm (table-label): Unset border attribute of <table> tag. Set cellborder and bgcolor attributes of <table> tag. (table->graphviz-node): Unset style and fillcolor node attributes. Set shape node attribute to none.
2021-12-14Take advantage of bug fixes in bleeding edge (ccwl graphviz).Arun Isaac
* dump.scm (graph->dot): Delete function. (sxml->graphviz-html): Return a <html-string> object. (dump-schema): Use graph->dot from (ccwl graphviz).
2021-12-13Upgrade to bleeding edge (ccwl graphviz).Arun Isaac
This fixes a few bugs and brings in new features from (ccwl graphviz). * guix.scm: Import (gnu packages autotools), (guix git-download) and (guix packages). Prefix (gnu packages bioinformatics) imports with guix:. (ccwl): New variable.
2021-12-13Abstract out table to graphviz edge conversion.Arun Isaac
* dump.scm (column->foreign-table, tables->graphviz-edges): New functions. (dump-schema): Use tables->graphviz-edges.
2021-12-13Abstract out table to graphviz node conversion.Arun Isaac
* dump.scm (dumped-table?, table-label, table->graphviz-node): New functions. (dump-schema): Use table->graphviz-node.
2021-12-13Color table headers by size.Arun Isaac
* dump.scm (human-units-color): New function. (dump-schema): Use human-units-color.
2021-12-13Implement human units conversion in terms of log1024.Arun Isaac
This generalizes better and is mathematically cleaner. * dump.scm (floor-log1024): New function. (human-units): Use floor-log1024.
2021-12-13Use sxml to construct graphviz HTML strings.Arun Isaac
Using sxml allows us to stay in the world of S-expressions. * dump.scm (sxml->xml-string, sxml->graphviz-html): New function. (dump-schema): Construct graphviz HTML string using sxml.
2021-12-11Highlight dumped tables and columns.Arun Isaac
* dump.scm (dump-schema): Highlight tables and columns.
2021-12-11Fix HTML string handling in dot output.Arun Isaac
* dump.scm (replace-substrings): New function. (graph->dot): Fix HTML string handling.
2021-12-11Log dumped tables and columns.Arun Isaac
* dump.scm (%dumped): New variable. (define-dump): Append to %dumped when a new table dumping function is defined.
2021-12-11Abstract out definition of table dumping functions.Arun Isaac
* dump.scm (define-dump): New macro. (dump-species, dump-strain, dump-mapping-method, dump-inbred-set, dump-phenotype, dump-publication, dump-publish-xref, dump-tissue, dump-investigators, dump-avg-method, dump-gene-chip, dump-info-files): Redefine using define-dump.
2021-12-11Use string similarity and check if foreign key is an integer.Arun Isaac
* dump.scm (<column>): New type. (tables): Use <column> objects to represent columns. (trigrams, jaccard-index, jaccard-string-similarity): New functions. (dump-schema): Use string similarity and check if foreign key is an integer.
2021-12-11Remove rdflib python code.Arun Isaac
* rdf.py: Delete file.
2021-12-11Visualize schema.Arun Isaac
* .dir-locals.el (scheme-mode): Indent set-table-columns correctly. * dump.scm: Import (srfi srfi-9 gnu). (%database-name): New variable. (<table>): New type. (tables, string-remove-suffix-ci, human-units, graph->dot, dump-schema): New functions. Invoke dump-schema. * guix.scm: Import (gnu packages bioinformatics). Add ccwl, graphviz and guile-libyaml to the manifest.
2021-12-11Use select-query.Arun Isaac
* dump.scm (get-tables-from-comments, dump-table-fields, dump-species, dump-strain, dump-mapping-method, dump-inbred-set, dump-phenotype, dump-publication, dump-publish-xref, dump-tissue, dump-investigators, dump-avg-method, dump-gene-chip, dump-info-files): Use select-query.
2021-12-11Implement S-expression like SQL select query.Arun Isaac
* dump/sql.scm: Import (srfi srfi-1). Export select-query. (select-query): New macro.
2021-12-04Add emacs directory local variables.Arun Isaac
* .dir-locals.el: New file.
2021-12-04Remove redundant camel->lower-camel function.Arun Isaac
* dump.scm (camel->lower-camel): Delete function. (default-metadata-proc): Do not use camel->lower-camel.
2021-12-04Build subjects exclusively with string->identifier.Arun Isaac
* dump.scm (dump-mapping-method, dump-publication, dump-info-files): Use string->identifier to build subjects.
2021-12-04Append an underscore to the identifier prefix.Arun Isaac
This is slightly more readable. * dump.scm (string->identifier): Append an underscore to the identifier prefix.
2021-12-04Fix indentation.Arun Isaac
* dump.scm (get-tables-from-comments, dump-table-fields): Fix indentation.
2021-12-04Use the map-alist DSL.Arun Isaac
* dump.scm: Import (dump utils). (string-blank?): New function. (scm->triples): Filter out triples with #f or blank string objects. (process-metadata-alist): Delete function. (default-metadata-proc): New function. (dump-species, dump-strain, mapping-method-name->id, dump-inbred-set, dump-phenotype, dump-publication, dump-publish-xref, dump-tissue, dump-investigators, dump-avg-method, dump-gene-chip, dump-info-files): Use map-alist.
2021-12-04Implement the map-alist DSL.Arun Isaac
map-alist is a DSL to transform one association list into another. These transformations are frequently required when dumping tables, especially metadata tables. * dump/utils.scm: New file.
2021-12-02Construct investigator ID using first and last names too.Arun Isaac
* dump.scm (investigator-email->id): Rename to investigator-attributes->id. Use first and last names in addition to the email ID. (dump-investigators): Use investigator-attributes->id. Include records that have no email ID. (dump-info-files): Use investigator-attributes->id. Include records that have no email ID.
2021-12-02Use string-delete instead of string-replace-substring.Arun Isaac
For the simple task of removing spaces, string-delete is sufficient. string-replace-substring is overkill. * dump.scm (fix-email-id): Use string-delete instead of string-replace-substring.
2021-12-02Abstract out string->identifier.Arun Isaac
Building a turtle identifier from a string after removing illegal characters and prefixing is an extremely common operation. Abstract it. Also, mandate identifier prefixes. It is better to play it safe. * dump.scm (string->identifier): New function. (binomial-name->species-id, dump-strain, mapping-method-name->id, inbred-set-name->id, aphenotype-id->id, tissue-short-name->id, investigator-email->id, avg-method-name->id, gene-chip-name->id): Use string->identifier.
2021-12-02Document delete-substrings.Arun Isaac
* dump.scm (delete-substring): Add docstring.
2021-12-01Deal with AvgMethodId = 0.Arun Isaac
* dump.scm (dump-info-files): Deal with AvgMethodId.
2021-12-01Use InfoFileTitle instead of InfoPageTitle for dataset name.Arun Isaac
Not all datasets have a non-NULL InfoPageTitle field. * dump.scm (dump-info-files): Use InfoFileTitle instead of InfoPageTitle for dataset name.
2021-12-01Extract name of dataset group.Arun Isaac
* dump.scm (dump-info-files): Extract name of dataset group.
2021-12-01Do not link inbred-set to mapping-method.Arun Isaac
Not all inbred sets have a mapping method, and the mapping method of the inbred set has, so far, not been used anywhere. * dump.scm (mapping-method-name->id, dump-mapping-method): Mark as unused. (dump-inbred-set): Do not link inbred-set to mapping-method.
2021-12-01Allow N/A avg method.Arun Isaac
* dump.scm (dump-avg-method): Allow N/A in name. (dump-info-files): Allow N/A in avg-method-name. (avg-method-name->id): Replace / with _.
2021-12-01Remove gn:geoSeries when value starts with "No Geo Series...".Arun Isaac
* dump.scm (dump-info-files): Remove gn:geoSeries when value starts with "No Geo Series...".