about summary refs log tree commit diff
path: root/README.md
diff options
context:
space:
mode:
Diffstat (limited to 'README.md')
-rw-r--r--README.md66
1 files changed, 44 insertions, 22 deletions
diff --git a/README.md b/README.md
index 8c94ccd..c8efad2 100644
--- a/README.md
+++ b/README.md
@@ -28,22 +28,21 @@ $ make
 or for a container
 
 ```shell
-mkdir test
+mkdir ./tmp
 guix shell -C --network --share=/run/mysqld/ --manifest=manifest.scm
 export GUILE_LOAD_PATH=.:$GUILE_LOAD_PATH
-guile json-dump.scm conn.scm test/
+guile json-to-ttl.scm etc/sample.json tmp/
 ```
 
+That reads the `etc/sample.json` file included in this repository and converts it to an RDF representation that is stored in a file `./tmp/sampledata.ttl`.
+
 
 ## Set up connection parameters
 
-Describe the database connection parameters in a file *conn.scm* file as
-shown below. Take care to replace the placeholders within angle brackets
-with the appropriate values.
+Describe the database connection parameters in a file *conn.scm* file as shown below. Take care to replace the placeholders within angle brackets with the appropriate values.
 
 ``` scheme
-((generif-data-file . "/path/to/generifs_basic.gz")
- (sql-username . "<sql-username-here>")
+((sql-username . "<sql-username-here>")
  (sql-password . "<sql-password-here>")
  (sql-database . "<sql-database-name-here>")
  (sql-host . "<sql-hostname-here>")
@@ -56,14 +55,9 @@ with the appropriate values.
  (sparql-port . <sparql-endpoint-port-here>))
 ```
 
-Download the GeneRIF data file from
-https://ftp.ncbi.nih.gov/gene/GeneRIF/generifs_basic.gz and specify
-its path in the `generif-data-file` parameter.
-
 Here's a sample *conn.scm*.
 ``` scheme
-((generif-data-file . "/home/gn/generifs_basic.gz")
- (sql-username . "webqtlout")
+((sql-username . "webqtlout")
  (sql-password . "my-secret-password")
  (sql-database . "db_webqtl")
  (sql-host . "localhost")
@@ -76,30 +70,58 @@ Here's a sample *conn.scm*.
  (sparql-port . 9082))
 ```
 
-## Dump the database
+## Transform the database
 
-Then, to dump the database to \~/data/dump, run inside shell
+Example: Transform the phenotype from SQL to Terse RDF Triple Language (TTL)
 
 ```sh
-./pre-inst-env ./examples/dump-species-metadata.scm ../conn.scm ~/tmp
+guile -s examples/phenotype.scm \
+      --settings=conn.scm \
+      --output=tmp/phenotype.ttl \
+      --documentation=tmp/phenotype.ttl.md
 ```
 
-``` shell
-$ guix shell -m manifest.scm -- ./pre-inst-env ./examples/dump-dataset-metadata.scm ../conn.scm ~/tmp
+the `-s` option to *guile* runs the `examples/phenotype.scm` file as a script. Everything else on the command line is passed onto the script as command-line arguments.
+
+This should create the files:
+- `tmp/phenotype.ttl`: will contain the data in the database in TTL format
+- `tmp/phenotype.ttl.md`: will contain a short documentation on the data in the file above.
+
+**Note to Devs**: The current `pre-inst-env` script will not work within containers since it assumes the existence of `/usr/bin/env`. We need to fix that if we intend to keep using that.
+
+
+There is a shorter form of the command above:
+
+```sh
+guile -s examples/phenotype.scm \
+      -s conn.scm \
+      -o tmp/phenotype.ttl \
+      -d tmp/phenotype.ttl.md
+```
+
+which does the same thing, but has the potential to be confusing due to the two `-s` options: the first `-s` option is to guile while the second is to the script itself.
+
+There's an extra script that loops through all the scheme files in examples and runs them.   To run it:
+
+```sh
+./generate-ttl-files.scm -s conn.scm -o <ttl-output-directory> -d <docs-output-directory>
 ```
 
 ## Validate and load dump
 
-Then, validate the dumped RDF using `rapper` and load it into
-virtuoso. This will load the dumped RDF into the
-`http://genenetwork.org` graph, and will delete all pre-existing data
-in that graph (FIXME)
+Then, validate the dumped RDF using `rapper`:
 
 ``` shell
 $ guix shell -m manifest.scm -- rapper --input turtle --count ~/data/dump/dump.ttl
+```
+
+If there are no errors, load the relevant RDF files into the  `http://genenetwork.org` graph using the `load-rdf.scm` script:
+
+``` shell
 $ guix shell -m manifest.scm -- ./pre-inst-env ./load-rdf.scm conn.scm ~/data/dump/dump.ttl
 ```
 
+This `load-rdf.scm` script replaces the existing graph with the ttl files from: "/var/lib/data", and indexes all the text data for quicker searches.
 
 ## Upload data to virtuoso