diff options
Diffstat (limited to 'README.md')
-rw-r--r-- | README.md | 60 |
1 files changed, 38 insertions, 22 deletions
@@ -28,22 +28,21 @@ $ make or for a container ```shell -mkdir test +mkdir ./tmp guix shell -C --network --share=/run/mysqld/ --manifest=manifest.scm export GUILE_LOAD_PATH=.:$GUILE_LOAD_PATH -guile json-dump.scm conn.scm test/ +guile json-to-ttl.scm etc/sample.json tmp/ ``` +That reads the `etc/sample.json` file included in this repository and converts it to an RDF representation that is stored in a file `./tmp/sampledata.ttl`. + ## Set up connection parameters -Describe the database connection parameters in a file *conn.scm* file as -shown below. Take care to replace the placeholders within angle brackets -with the appropriate values. +Describe the database connection parameters in a file *conn.scm* file as shown below. Take care to replace the placeholders within angle brackets with the appropriate values. ``` scheme -((generif-data-file . "/path/to/generifs_basic.gz") - (sql-username . "<sql-username-here>") +((sql-username . "<sql-username-here>") (sql-password . "<sql-password-here>") (sql-database . "<sql-database-name-here>") (sql-host . "<sql-hostname-here>") @@ -56,14 +55,9 @@ with the appropriate values. (sparql-port . <sparql-endpoint-port-here>)) ``` -Download the GeneRIF data file from -https://ftp.ncbi.nih.gov/gene/GeneRIF/generifs_basic.gz and specify -its path in the `generif-data-file` parameter. - Here's a sample *conn.scm*. ``` scheme -((generif-data-file . "/home/gn/generifs_basic.gz") - (sql-username . "webqtlout") +((sql-username . "webqtlout") (sql-password . "my-secret-password") (sql-database . "db_webqtl") (sql-host . "localhost") @@ -76,30 +70,52 @@ Here's a sample *conn.scm*. (sparql-port . 9082)) ``` -## Dump the database +## Transform the database -Then, to dump the database to \~/data/dump, run inside shell +Example: Transform the phenotype from SQL to Terse RDF Triple Language (TTL) ```sh -./pre-inst-env ./examples/dump-species-metadata.scm ../conn.scm ~/tmp +guile -s examples/phenotype.scm \ + --settings=conn.scm \ + --output=tmp/phenotype.ttl \ + --documentation=tmp/phenotype.ttl.md ``` -``` shell -$ guix shell -m manifest.scm -- ./pre-inst-env ./examples/dump-dataset-metadata.scm ../conn.scm ~/tmp +the `-s` option to *guile* runs the `examples/phenotype.scm` file as a script. Everything else on the command line is passed onto the script as command-line arguments. + +This should create the files: +- `tmp/phenotype.ttl`: will contain the data in the database in TTL format +- `tmp/phenotype.ttl.md`: will contain a short documentation on the data in the file above. + +**Note to Devs**: The current `pre-inst-env` script will not work within containers since it assumes the existence of `/usr/bin/env`. We need to fix that if we intend to keep using that. + + +There is a shorter form of the command above: + +```sh +guile -s examples/phenotype.scm \ + -s conn.scm \ + -o tmp/phenotype.ttl \ + -d tmp/phenotype.ttl.md ``` +which does the same thing, but has the potential to be confusing due to the two `-s` options: the first `-s` option is to guile while the second is to the script itself. + ## Validate and load dump -Then, validate the dumped RDF using `rapper` and load it into -virtuoso. This will load the dumped RDF into the -`http://genenetwork.org` graph, and will delete all pre-existing data -in that graph (FIXME) +Then, validate the dumped RDF using `rapper`: ``` shell $ guix shell -m manifest.scm -- rapper --input turtle --count ~/data/dump/dump.ttl +``` + +If there are no errors, load the relevant RDF files into the `http://genenetwork.org` graph using the `load-rdf.scm` script: + +``` shell $ guix shell -m manifest.scm -- ./pre-inst-env ./load-rdf.scm conn.scm ~/data/dump/dump.ttl ``` +This `load-rdf.scm` script replaces the existing graph with the ttl files from: "/var/lib/data", and indexes all the text data for quicker searches. ## Upload data to virtuoso |