Dump genenetwork database
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Arun Isaac 8f60fde7f5
Try uploading twice into virtuoso.
2 months ago
dump Rename the delete verb of map-alist to remove. 5 months ago
.dir-locals.el Allow tabs in Makefiles. 5 months ago
.gitignore Add gitignore. 2 months ago
BUGS.org BUGS: Add "Dump table and field annotations to RDF" bug. 8 months ago
Makefile Makefile: Disable auto-compilation while compiling. 3 months ago
README.md Document source compilation. 3 months ago
dump.scm Delete vertical tab character in publication abstracts. 2 months ago
load-rdf.scm Try uploading twice into virtuoso. 2 months ago
manifest.scm manifest: Add gnu-make. 3 months ago
pre-inst-env Compile scheme sources. 3 months ago
tests.scm Rename the delete verb of map-alist to remove. 5 months ago
visualize-schema.scm Read SPARQL connection settings from file. 3 months ago


dump-genenetwork-database-tests CIbadge dump-genenetwork-database CIbadge

The GeneNetwork database is being migrated from a relational database to a plain text and RDF database. This repository contains code to dump the relational database to plain text.


Drop into a development environment with

$ guix shell

Build the sources.

$ make

Describe the database connection parameters in a file conn.scm file as shown below. Take care to replace the placeholders within angle brackets with the appropriate values.

((sql-username . "<sql-username-here>")
 (sql-password . "<sql-password-here>")
 (sql-database . "<sql-database-name-here>")
 (sql-host . "<sql-hostname-here>")
 (sql-port . <sql-port-here>)
 (virtuoso-port . <virtuoso-port-here>)
 (virtuoso-username . "<virtuoso-username-here>")
 (virtuoso-password . "<virtuoso-password-here>")
 (sparql-scheme . <sparql-endpoint-scheme-here>)
 (sparql-host . "<sparql-endpoint-hostname-here>")
 (sparql-port . <sparql-endpoint-port-here>))

Then, to dump the database to ~/data/dump, run

$ ./pre-inst-env ./dump.scm conn.scm ~/data/dump

Make sure there is enough free space! It's best to dump the database on penguin2 where disk space and bandwidth are not significant constraints.

Then, validate the dumped RDF using rapper and load it into virtuoso. This will load the dumped RDF into the http://genenetwork.org graph, and will delete all pre-existing data in that graph.

$ rapper --input turtle --count ~/data/dump/dump.ttl
$ ./pre-inst-env ./load-rdf.scm conn.scm ~/data/dump/dump.ttl

Now, you may query virtuoso to visualize the SQL and RDF schema.

$ ./pre-inst-env ./visualize-schema.scm conn.scm

This will output graphviz dot files sql.dot and rdf.dot describing the schema. Render them into SVG images like so.

$ dot -Tsvg -osql.svg sql.dot
$ dot -Tsvg -ordf.svg rdf.dot

Or, peruse them interactively with xdot.

$ xdot sql.dot
$ xdot rdf.dot

The dump-genenetwork-database continuous integration job runs these steps on every commit and publishes its version of sql.svg and rdf.svg.


See bugs and tasks in BUGS.org.