summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorMunyoki Kilyungi2023-07-21 16:28:32 +0300
committerMunyoki Kilyungi2023-07-21 16:28:32 +0300
commitb823fb72c6eeff134db9ca136fe43de2d0c9d534 (patch)
tree54e2e3564d6c356a1e02e4dfb5a0b87adf4950e9
parent87016ee055fb5151197969b755302d24ee41e80c (diff)
downloadgn-gemtext-b823fb72c6eeff134db9ca136fe43de2d0c9d534.tar.gz
Document how to bulk load data
Signed-off-by: Munyoki Kilyungi <me@bonfacemunyoki.com>
-rw-r--r--topics/systems/virtuoso.gmi68
1 files changed, 68 insertions, 0 deletions
diff --git a/topics/systems/virtuoso.gmi b/topics/systems/virtuoso.gmi
index b85ab86..548481b 100644
--- a/topics/systems/virtuoso.gmi
+++ b/topics/systems/virtuoso.gmi
@@ -170,6 +170,74 @@ guix shell -N virtuoso-ose -m manifest.scm -- ./pre-inst-env ./load-rdf.scm conn
=> https://github.com/genenetwork/dump-genenetwork-database/blob/master/conn.scm Example conn.scm
+### Bulk Loading Data
+
+Virtuoso has access to the folder: /export/data/genenetwork-virtuoso/. As such, place all the turtle files for bulk uploads here. To bulk load data:
+
+First make sure that all the data is deleted:
+
+```
+$ isql
+SQL> DELETE FROM rdf_quad WHERE g = iri_to_id('http://genenetwork.org');
+```
+
+Use isql to register all the title files:
+
+```
+SQL> ld_dir('/var/lib/data', '*.ttl', 'http://genenetwork.org');
+```
+
+
+You can check the table DB.DBA.load_list to check the list of datasets registered for loading.:
+
+```
+SQL> SELECT * FROM DB.DBA.load_list;
+```
+
+In case you want to empty the list:
+
+```
+DELETE FROM DB.DBA.load_list WHERE ll_file='*.ttl';
+```
+
+Perform the bulk load of all data by running:
+
+```
+SQL> rdf_loader_run();
+```
+
+Commit the bulk loaded data to the Virtuoso database file by running:
+
+```
+checkpoint;
+```
+
+Run a query to make sure that indeed you have loaded data E.g.
+
+```
+SPARQL
+PREFIX gn: <http://genenetwork.org/id/>
+
+SELECT * FROM <http://genenetwork.org> WHERE {
+gn:Mus_musculus ?p ?o.
+};
+```
+
+In case you want to get a list of all queries:
+
+```
+SPARQL
+SELECT DISTINCT ?g
+ WHERE { GRAPH ?g {?s ?p ?o} }
+ORDER BY ?g;
+```
+
+Other resources:
+
+=> https://vos.openlinksw.com/owiki/wiki/VOS/VirtBulkRDFLoader Bulk Loading RDF Source Files into one or more Graph IRIs
+
+=> https://vos.openlinksw.com/owiki/wiki/VOS/VirtBulkRDFLoaderExampleSingle VOS.VirtBulkRDFLoaderExampleSingle
+
## Dumping to RDF from the GeneNetwork MySQL database
See also