From 0c81484eb1d5862f990f6b44ac1df480c3fcc9d9 Mon Sep 17 00:00:00 2001 From: Pjotr Prins Date: Fri, 24 Mar 2023 10:45:54 +0100 Subject: Info on dumping database to RDF --- topics/RDF/genenetwork-sql-database-to-rdf.gmi | 9 +++++ .../mariadb/precompute-mapping-input-data.gmi | 11 ++++- topics/systems/virtuoso.gmi | 47 +++------------------- 3 files changed, 23 insertions(+), 44 deletions(-) create mode 100644 topics/RDF/genenetwork-sql-database-to-rdf.gmi diff --git a/topics/RDF/genenetwork-sql-database-to-rdf.gmi b/topics/RDF/genenetwork-sql-database-to-rdf.gmi new file mode 100644 index 0000000..283f94c --- /dev/null +++ b/topics/RDF/genenetwork-sql-database-to-rdf.gmi @@ -0,0 +1,9 @@ +# GeneNetwork SQL Database to RDF + +We use RDF in virtuoso to handle metadata for GN using + +=> https://github.com/genenetwork/dump-genenetwork-database + +See also + +=> ../systems/virtuoso.gmi diff --git a/topics/systems/mariadb/precompute-mapping-input-data.gmi b/topics/systems/mariadb/precompute-mapping-input-data.gmi index 9ce4c95..d26a97a 100644 --- a/topics/systems/mariadb/precompute-mapping-input-data.gmi +++ b/topics/systems/mariadb/precompute-mapping-input-data.gmi @@ -115,7 +115,7 @@ Rob voiced a wish to retain all scores (at least those above 1.0). This is not f ## Notes on ProbeSetXRef -ProbeSetXRef is pretty small, currently @5.6Gb and 48,307,650 rows, so we could decide to add columns to track different mappers. Something funny +ProbeSetXRef is pretty small, currently @5.6Gb and 48,307,650 rows, so we could decide to add columns to track different mappers. Something funny ``` select count(LRS) from ProbeSetXRef; @@ -150,5 +150,12 @@ MariaDB [db_webqtl]> select count(*) from ProbeSetXRef where LRS=0 and Locus="rs +----------+ ``` -There is obviously more. I think this table can use some cleaning up? +There is obviously more. I think this table can use some cleaning up? +## Preparing for GEMMA + +A good dataset to take apart is + +=> http://genenetwork.org/show_trait?trait_id=1436869_at&dataset=HC_M2_0606_P + +because it has 71 BXD samples and 32 other samples. diff --git a/topics/systems/virtuoso.gmi b/topics/systems/virtuoso.gmi index f84603f..1665f43 100644 --- a/topics/systems/virtuoso.gmi +++ b/topics/systems/virtuoso.gmi @@ -158,49 +158,12 @@ When virtuoso has just been started up with a clean state (that is, the virtuoso ## Dumping data from a MySQL database -To dump data into a ttl file, first make sure that you are in the guix environment in the "dump-genenetwork-database" repository - -=> https://github.com/genenetwork/dump-genenetwork-database/ Dump Genenetwork Database - -Next, drop into a development environment with: - -``` -$ guix shell -m manifest.scm -``` - -Build the sources: - -``` -$ make -``` +See also -Describe the database connection parameters in a file *conn.scm* file as shown below. Take care to replace the placeholders within angle brackets with the appropriate values. +=> ../RDF/genenetwork-sql-database-to-rdf.gmi -``` -((sql-username . "root") - (sql-password . "root") - (sql-database . "db_webqtl_s") - (sql-host . "localhost") - (sql-port . 3306) - (virtuoso-port . 8891) - (virtuoso-username . "dba") - (virtuoso-password . "dba") - (sparql-scheme . http) - (sparql-host . "localhost") - (sparql-port . 8892)) -``` - -Then, to dump the database to \~/data/dump, run: - -``` -$ ./pre-inst-env ./dump.scm conn.scm ~/data/dump -``` - -Make sure there is enough free space! It\'s best to dump the database on penguin2 where disk space and bandwidth are not significant constraints. +To dump data into a ttl file, first make sure that you are in the guix environment in the "dump-genenetwork-database" repository -Then, validate the dumped RDF using `rapper` and load it into virtuoso. This will load the dumped RDF into the `http://genenetwork.org` graph, and will delete all pre-existing data in that graph. +=> https://github.com/genenetwork/dump-genenetwork-database/ Dump Genenetwork Database -``` -$ rapper --input turtle --count ~/data/dump/dump.ttl -$ ./pre-inst-env ./load-rdf.scm conn.scm ~/data/dump/dump.ttl -``` +See the README for instructions. -- cgit v1.2.3