blob: 614565a0206c3f918b2353f5727937999ffb429e (
about) (
plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
|
# Set-up Virtuoso+Xapian on Production
## Tags
* assigned: bonfacem, zachs, fredm
* priority: high
* type: ops
* keywords: virtuoso
## Description
We already have virtuoso set-up in tux02. Right now, to be able to interact with RDF, we need to have virtuoso set-up. This issue will unblock:
* Global Search in Production
=> https://github.com/genenetwork/genenetwork3/pull/137 Update RDF endpoints
=> https://github.com/genenetwork/genenetwork2/pull/808 UI/RDF frontend
## HOWTO: Updating Virtuoso in Production (Tux01)
Note where the virtuoso data directory is mapped from the "production.sh" script as you will use this in the consequent steps:
> --share=/export2/guix-containers/genenetwork/var/lib/virtuoso=/var/lib/virtuoso
### Generating the TTL Files
=> https://git.genenetwork.org/gn-transform-databases/tree/generate-ttl-files.scm Run "generate-ttl-files" to generate the TTL files:
```
time guix shell guile-dbi -m manifest.scm -- \
./generate-ttl-files.scm --settings conn-dev.scm --output \
/export2/guix-containers/genenetwork-development/var/lib/virtuoso \
--documentation /tmp/doc-directory
```
* [Recommended] Alternatively, copy over the TTL files (in Tux01) to the correct shared directory in the container:
```
cp /home/bonfacem/ttl-files/*ttl /export2/guix-containers/genenetwork/var/lib/virtuoso/
```
### Loading the TTL Files
* Make sure that the virtuoso service type has the "dirs-allowed" variable set correctly:
```
(service virtuoso-service-type
(virtuoso-configuration
(server-port 7892)
(http-server-port 7893)
(dirs-allowed "/var/lib/virtuoso")))
```
* Get into isql:
```
guix shell virtuoso-ose -- isql 7892
```
* Make sure that no pre-existing TTL files exist in "DB.DBA.LOAD_LIST":
```
SQL> select * from DB.DBA.LOAD_LIST;
SQL> delete from DB.DBA.load_list;
```
* Delete the genenetwork graph:
```
SQL> DELETE FROM rdf_quad WHERE g = iri_to_id('http://genenetwork.org');
```
* Load all the TTL files (This takes some time):
```
SQL> ld_dir('/var/lib/virtuoso', '*.ttl', 'http://genenetwork.org');
SQL> rdf_loader_run();
SQL> CHECKPOINT;
SQL> checkpoint_interval(60);
SQL> scheduler_interval(10);
```
* Verify you have some RDF data by running:
```
SQL> SPARQL
PREFIX gn: <http://genenetwork.org/id/>
PREFIX gnc: <http://genenetwork.org/category/>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX gnt: <http://genenetwork.org/term/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX taxon: <http://purl.uniprot.org/taxonomy/>
SELECT * WHERE {
?s skos:member gn:Mus_musculus .
?s ?p ?o .
};
```
* Update GN3 Configurations to point to the correct Virtuoso instance:
> SPARQL_ENDPOINT="http://localhost:7893/sparql"
## HOWTO: Generating the Xapian Index
* Make sure you are using the correct guix profile or that you have the "PYTHONPATH" pointing to the GN3 repository.
* Generate the Xapian Index using "genenetwork3/scripts/create-xapian-index" against the correct output directory (The build takes around 71 minutes on an SSD Drive):
```
time python index-genenetwork create-xapian-index \
/export/data/genenetwork-xapian/ \
mysql://<user>:<password>@localhost/db_webqtl \
http://localhost:7893/sparql
```
* After the build, you can verify that the index works by:
```
guix shell xapian -- xapian-delve /export/data/genenetwork-xapian/
```
* Update GN3 configuration files to point to the right Xapian path:
> XAPIAN_DB_PATH="/export/data/genenetwork-xapian/"
## Resolution
@fredm updated virtuoso; and @zachs updated the xapian index in production.
* closed
|