summaryrefslogtreecommitdiff
path: root/topics
diff options
context:
space:
mode:
authorPjotr Prins2023-03-24 10:45:54 +0100
committerPjotr Prins2023-03-24 10:45:54 +0100
commit0c81484eb1d5862f990f6b44ac1df480c3fcc9d9 (patch)
treedbd0f00012401b928c66f9a92f60377c73b2a663 /topics
parent03f29b98d4b7468b63cf6fd0cc6cc6abfd558b3d (diff)
downloadgn-gemtext-0c81484eb1d5862f990f6b44ac1df480c3fcc9d9.tar.gz
Info on dumping database to RDF
Diffstat (limited to 'topics')
-rw-r--r--topics/RDF/genenetwork-sql-database-to-rdf.gmi9
-rw-r--r--topics/systems/mariadb/precompute-mapping-input-data.gmi11
-rw-r--r--topics/systems/virtuoso.gmi47
3 files changed, 23 insertions, 44 deletions
diff --git a/topics/RDF/genenetwork-sql-database-to-rdf.gmi b/topics/RDF/genenetwork-sql-database-to-rdf.gmi
new file mode 100644
index 0000000..283f94c
--- /dev/null
+++ b/topics/RDF/genenetwork-sql-database-to-rdf.gmi
@@ -0,0 +1,9 @@
+# GeneNetwork SQL Database to RDF
+
+We use RDF in virtuoso to handle metadata for GN using
+
+=> https://github.com/genenetwork/dump-genenetwork-database
+
+See also
+
+=> ../systems/virtuoso.gmi
diff --git a/topics/systems/mariadb/precompute-mapping-input-data.gmi b/topics/systems/mariadb/precompute-mapping-input-data.gmi
index 9ce4c95..d26a97a 100644
--- a/topics/systems/mariadb/precompute-mapping-input-data.gmi
+++ b/topics/systems/mariadb/precompute-mapping-input-data.gmi
@@ -115,7 +115,7 @@ Rob voiced a wish to retain all scores (at least those above 1.0). This is not f
## Notes on ProbeSetXRef
-ProbeSetXRef is pretty small, currently @5.6Gb and 48,307,650 rows, so we could decide to add columns to track different mappers. Something funny
+ProbeSetXRef is pretty small, currently @5.6Gb and 48,307,650 rows, so we could decide to add columns to track different mappers. Something funny
```
select count(LRS) from ProbeSetXRef;
@@ -150,5 +150,12 @@ MariaDB [db_webqtl]> select count(*) from ProbeSetXRef where LRS=0 and Locus="rs
+----------+
```
-There is obviously more. I think this table can use some cleaning up?
+There is obviously more. I think this table can use some cleaning up?
+## Preparing for GEMMA
+
+A good dataset to take apart is
+
+=> http://genenetwork.org/show_trait?trait_id=1436869_at&dataset=HC_M2_0606_P
+
+because it has 71 BXD samples and 32 other samples.
diff --git a/topics/systems/virtuoso.gmi b/topics/systems/virtuoso.gmi
index f84603f..1665f43 100644
--- a/topics/systems/virtuoso.gmi
+++ b/topics/systems/virtuoso.gmi
@@ -158,49 +158,12 @@ When virtuoso has just been started up with a clean state (that is, the virtuoso
## Dumping data from a MySQL database
-To dump data into a ttl file, first make sure that you are in the guix environment in the "dump-genenetwork-database" repository
-
-=> https://github.com/genenetwork/dump-genenetwork-database/ Dump Genenetwork Database
-
-Next, drop into a development environment with:
-
-```
-$ guix shell -m manifest.scm
-```
-
-Build the sources:
-
-```
-$ make
-```
+See also
-Describe the database connection parameters in a file *conn.scm* file as shown below. Take care to replace the placeholders within angle brackets with the appropriate values.
+=> ../RDF/genenetwork-sql-database-to-rdf.gmi
-```
-((sql-username . "root")
- (sql-password . "root")
- (sql-database . "db_webqtl_s")
- (sql-host . "localhost")
- (sql-port . 3306)
- (virtuoso-port . 8891)
- (virtuoso-username . "dba")
- (virtuoso-password . "dba")
- (sparql-scheme . http)
- (sparql-host . "localhost")
- (sparql-port . 8892))
-```
-
-Then, to dump the database to \~/data/dump, run:
-
-```
-$ ./pre-inst-env ./dump.scm conn.scm ~/data/dump
-```
-
-Make sure there is enough free space! It\'s best to dump the database on penguin2 where disk space and bandwidth are not significant constraints.
+To dump data into a ttl file, first make sure that you are in the guix environment in the "dump-genenetwork-database" repository
-Then, validate the dumped RDF using `rapper` and load it into virtuoso. This will load the dumped RDF into the `http://genenetwork.org` graph, and will delete all pre-existing data in that graph.
+=> https://github.com/genenetwork/dump-genenetwork-database/ Dump Genenetwork Database
-```
-$ rapper --input turtle --count ~/data/dump/dump.ttl
-$ ./pre-inst-env ./load-rdf.scm conn.scm ~/data/dump/dump.ttl
-```
+See the README for instructions.