summaryrefslogtreecommitdiff
path: root/topics/systems/virtuoso.gmi
blob: 11b3b074a2e651b882d5d7b0120e97cc17fd1d82 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
# Virtuoso

We run instances of Virtuoso for our graph databases. Virtuoso is a remarkable software and running some really large databases, including Uniprot

=> https://github.com/openlink/virtuoso-opensource
=> https://www.uniprot.org/sparql/

On Penguin2 virtuoso is running by default as a shepherd servive, see

=> ./shepherd.gmi

```
guix environment --ad-hoc virtuoso-ose -- virtuoso-t -f
```

=> https://git.genenetwork.org/efraim/shepherd-services/src/branch/master/shepherd/init.d/virtuoso.scm penguin2:/home/shepherd/shepherd-services/shepherd/init.d/virtuoso.scm

The database is initialized from 'penguin2:/export/virtuoso/var/lib/virtuoso/db/virtuoso.ini'

### Running virtuoso in a guix system container

We have a Guix virtuoso service in the guix-bioinformatics channel. The easiest way to run virtuoso is to use the virtuoso service to run it in a guix system container. The only downside of this method is that, since guix system containers require root privileges to start up, you will need root priviliges on the machine you are running this on.

Here is a basic guix system configuration that runs virtuoso listening on port 8891, and with its HTTP server listening on port 8892. Among other things, the HTTP server provides a SPARQL endpoint to interact with.
```
(use-modules (gnu)
             (gn services databases))

(operating-system
  (host-name "virtuoso")
  (timezone "UTC")
  (locale "en_US.utf8")
  (bootloader (bootloader-configuration
               (bootloader grub-bootloader)
               (targets (list "/dev/sdX"))))
  (file-systems (cons (file-system
                        (device "root")
                        (mount-point "/")
                        (type "ext4"))
                      %base-file-systems))
  (users %base-user-accounts)
  (packages %base-packages)
  (services (cons (service virtuoso-service-type
                           (virtuoso-configuration
                            (server-port 8891)
                            (http-server-port 8892)))
                  %base-services)))
```

You can write the above configuration to a file, say virtuoso-os.scm, build a container with it, and run it with the command below. Everything inside the container is ephemeral and vanishes when the container is stopped. In order to persist the database, we mount a host directory /tmp/virtuoso-state at /var/lib/virtuoso in the container. /var/lib/virtuoso is the default state directory used by the Guix virtuoso service.
```
sudo $(guix system container --network --share=/tmp/virtuoso-state=/var/lib/virtuoso virtuoso-os.scm)
```

### Running virtuoso by invoking it on the command line

You may also choose to run virtuoso the traditional way by invoking it on the command line. Managing long-running instances started from the command line is messy. So, this method works best for temporary instances. Let's start from the virtuoso.ini file:

```
mkdir -p ~/services/virtuoso
cd services/virtuoso
cp /export/virtuoso/var/lib/virtuoso/db/virtuoso.ini .
```

and edit it to change paths and ports - use non-priviliged ports(!). A full diff is below. Start the server in a screen or tmux (it may ask for creating ./db):

```
penguin2:~/services/virtuoso$ ~/.config/guix/current/bin/guix environment --ad-hoc virtuoso-ose glibc-locales

penguin2:~/services/virtuoso$ /gnu/store/9aqd4jmkafhkdm095hnmxpxzws3ym3wd-virtuoso-ose-7.2.5/bin/virtuoso-t +foreground +configfile virtuoso.ini

03:34:50 HTTP/WebDAV server online at 28890
03:34:50 Server online at 21111 (pid 57078)
```

Now the server should respond to

```
curl localhost:28890/sparql
```

and the admin interface on

```
curl localhost:28890/conductor
```

To use the service from your remote machine use ssh tunnels:

```
ssh -L 28890:127.0.0.1:28890 -f -N myname@penguin2.genenetwork.org
```

and surf to http://localhost:28890/conductor. A good time to change the default password (dba:dba)!

### Uploading data with CURL

To upload RDF I use rapper to validate the data. First delete the existing graph with something like

```
curl -v --digest --user dba:password --verbose --url -G http://localhost:28890/sparql-graph-crud-auth --data-urlencode graph=https://BioHackrXiv.org/graph -X DELETE
```

Next update the graph with

```
curl -v -X PUT --digest -u dba:password -H Content-Type:text/turtle -T test/data/biohackrxiv.ttl -G http://localhost:28890/sparql-graph-crud-auth --data-urlencode graph=https://BioHackrXiv.org/graph
```

Where BioHackrXiv is the name of the graph (in this example). A python version can be found in

=> https://github.com/pubseq/bh20-seq-resource/blob/master/scripts/update_virtuoso/check_for_updates.py

### Virtuoso.ini

What changed in $HOME/services/virtuoso/virtuoso.ini

```
+DatabaseFile                   = $HOME/services/virtuoso/db/virtuoso.db
+ErrorLogFile                   = $HOME/services/virtuoso/db/virtuoso.log
+LockFile                       = $HOME/services/virtuoso/db/virtuoso.lck
+TransactionFile                = $HOME/services/virtuoso/db/virtuoso.trx
+xa_persistent_file             = $HOME/services/virtuoso/db/virtuoso.pxa
+DatabaseFile                   = $HOME/services/virtuoso/db/virtuoso-temp.db
+TransactionFile                = $HOME/services/virtuoso/db/virtuoso-temp.trx
-ServerPort                     = 1111
+ServerPort                     = 21111
+NumberOfBuffers                = 340000
+MaxDirtyBuffers                = 250000
+ServerPort                     = 28890
+ServerRoot                     = $HOME/services/virtuoso/vsp
```