Info on dumping database to RDF

author: Pjotr Prins 2023-03-24 10:45:54 +0100
committer: Pjotr Prins 2023-03-24 10:45:54 +0100
commit: 0c81484eb1d5862f990f6b44ac1df480c3fcc9d9 (patch)
tree: dbd0f00012401b928c66f9a92f60377c73b2a663 /topics/systems
parent: 03f29b98d4b7468b63cf6fd0cc6cc6abfd558b3d (diff)
download: gn-gemtext-0c81484eb1d5862f990f6b44ac1df480c3fcc9d9.tar.gz
2 files changed, 14 insertions, 44 deletions
diff --git a/topics/systems/mariadb/precompute-mapping-input-data.gmi b/topics/systems/mariadb/precompute-mapping-input-data.gmi
index 9ce4c95..d26a97a 100644
--- a/topics/systems/mariadb/precompute-mapping-input-data.gmi
+++ b/topics/systems/mariadb/precompute-mapping-input-data.gmi
@@ -115,7 +115,7 @@ Rob voiced a wish to retain all scores (at least those above 1.0). This is not f
 
 ## Notes on ProbeSetXRef
 
-ProbeSetXRef is pretty small, currently @5.6Gb and 48,307,650 rows, so we could decide to add columns to track different mappers. Something funny 
+ProbeSetXRef is pretty small, currently @5.6Gb and 48,307,650 rows, so we could decide to add columns to track different mappers. Something funny
 
 ```
 select count(LRS) from ProbeSetXRef;
@@ -150,5 +150,12 @@ MariaDB [db_webqtl]> select count(*) from ProbeSetXRef where LRS=0 and Locus="rs
 +----------+
 ```
 
-There is obviously more.  I think this table can use some cleaning up? 
+There is obviously more.  I think this table can use some cleaning up?
 
+## Preparing for GEMMA
+
+A good dataset to take apart is
+
+=> http://genenetwork.org/show_trait?trait_id=1436869_at&dataset=HC_M2_0606_P
+
+because it has 71 BXD samples and 32 other samples.
diff --git a/topics/systems/virtuoso.gmi b/topics/systems/virtuoso.gmi
index f84603f..1665f43 100644
--- a/topics/systems/virtuoso.gmi
+++ b/topics/systems/virtuoso.gmi
@@ -158,49 +158,12 @@ When virtuoso has just been started up with a clean state (that is, the virtuoso
 
 ## Dumping data from a MySQL database
 
-To dump data into a ttl file, first make sure that you are in the guix environment in the "dump-genenetwork-database" repository
-
-=> https://github.com/genenetwork/dump-genenetwork-database/ Dump Genenetwork Database
-
-Next, drop into a development environment with:
-
-```
-$ guix shell -m manifest.scm
-```
-
-Build the sources:
-
-```
-$ make
-```
+See also
 
-Describe the database connection parameters in a file *conn.scm* file as shown below. Take care to replace the placeholders within angle brackets with the appropriate values.
+=> ../RDF/genenetwork-sql-database-to-rdf.gmi
 
-```
-((sql-username . "root")
- (sql-password . "root")
- (sql-database . "db_webqtl_s")
- (sql-host . "localhost")
- (sql-port . 3306)
- (virtuoso-port . 8891)
- (virtuoso-username . "dba")
- (virtuoso-password . "dba")
- (sparql-scheme . http)
- (sparql-host . "localhost")
- (sparql-port . 8892))
-```
-
-Then, to dump the database to \~/data/dump, run:
-
-```
-$ ./pre-inst-env ./dump.scm conn.scm ~/data/dump
-```
-
-Make sure there is enough free space! It\'s best to dump the database on penguin2 where disk space and bandwidth are not significant constraints.
+To dump data into a ttl file, first make sure that you are in the guix environment in the "dump-genenetwork-database" repository
 
-Then, validate the dumped RDF using `rapper` and load it into virtuoso. This will load the dumped RDF into the `http://genenetwork.org` graph, and will delete all pre-existing data in that graph.
+=> https://github.com/genenetwork/dump-genenetwork-database/ Dump Genenetwork Database
 
-```
-$ rapper --input turtle --count ~/data/dump/dump.ttl
-$ ./pre-inst-env ./load-rdf.scm conn.scm ~/data/dump/dump.ttl
-```
+See the README for instructions.
author	Pjotr Prins	2023-03-24 10:45:54 +0100
committer	Pjotr Prins	2023-03-24 10:45:54 +0100
commit	0c81484eb1d5862f990f6b44ac1df480c3fcc9d9 (patch)
tree	dbd0f00012401b928c66f9a92f60377c73b2a663 /topics/systems
parent	03f29b98d4b7468b63cf6fd0cc6cc6abfd558b3d (diff)
download	gn-gemtext-0c81484eb1d5862f990f6b44ac1df480c3fcc9d9.tar.gz