aboutsummaryrefslogtreecommitdiff
path: root/README.md
blob: d558e4d10e5244eb2251826ed602c95958b9882f (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
[![dump-genenetwork-database-tests CI
badge](https://ci.genenetwork.org/badge/dump-genenetwork-database-tests.svg)](https://ci.genenetwork.org/jobs/dump-genenetwork-database-tests)
[![dump-genenetwork-database CI
badge](https://ci.genenetwork.org/badge/dump-genenetwork-database.svg)](https://ci.genenetwork.org/jobs/dump-genenetwork-database)

The GeneNetwork database is being migrated from a relational database to
a plain text and RDF database. This repository contains code to dump the
relational database to plain text.

# Using

Drop into a development environment with

``` shell
$ guix shell
```

Describe the database connection parameters in a file *conn.scm* file as
shown below. Take care to replace the placeholders within angle brackets
with the appropriate values.

``` scheme
((sql-username . "<sql-username-here>")
 (sql-password . "<sql-password-here>")
 (sql-database . "<sql-database-name-here>")
 (sql-host . "<sql-hostname-here>")
 (sql-port . <sql-port-here>)
 (virtuoso-port . <virtuoso-port-here>)
 (virtuoso-username . "<virtuoso-username-here>")
 (virtuoso-password . "<virtuoso-password-here>")
 (sparql-scheme . <sparql-endpoint-scheme-here>)
 (sparql-host . "<sparql-endpoint-hostname-here>")
 (sparql-port . <sparql-endpoint-port-here>))
```

Then, to dump the database to \~/data/dump, run

``` shell
$ ./pre-inst-env ./dump.scm conn.scm ~/data/dump
```

Make sure there is enough free space! It\'s best to dump the database on
penguin2 where disk space and bandwidth are not significant
constraints.

Then, load the dumped RDF into virtuoso. This will load the dumped RDF
into the `http://genenetwork.org` graph, and will delete all
pre-existing data in that graph.

``` shell
$ ./pre-inst-env ./load-rdf.scm conn.scm ~/data/dump/dump.ttl
```

Now, you may query virtuoso to visualize the SQL and RDF schema.

``` shell
$ ./pre-inst-env ./visualize-schema.scm conn.scm
```

This will output graphviz dot files `sql.dot` and `rdf.dot` describing
the schema. Render them into SVG images like so.

``` shell
$ dot -Tsvg -osql.svg sql.dot
$ dot -Tsvg -ordf.svg rdf.dot
```

Or, peruse them interactively with `xdot`.

``` shell
$ xdot sql.dot
$ xdot rdf.dot
```

# Contributing

See bugs and tasks in BUGS.org.