summaryrefslogtreecommitdiff
path: root/topics/systems/virtuoso.gmi
blob: 7d1233145d4d7dd992f9842656a5b2ec75114dcf (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
# Virtuoso

We run instances of Virtuoso for our graph databases. Virtuoso is a remarkable software and running some really large databases, including Uniprot

=> https://github.com/openlink/virtuoso-opensource
=> https://www.uniprot.org/sparql/

On Penguin2 virtuoso is running by default as a shepherd servive, see

=> ./shepherd.gmi

```
guix shell virtuoso-ose -- virtuoso-t -f
```

=> https://git.genenetwork.org/efraim/shepherd-services/src/branch/master/shepherd/init.d/virtuoso.scm penguin2:/home/shepherd/shepherd-services/shepherd/init.d/virtuoso.scm

The database is initialized from 'penguin2:/export/virtuoso/var/lib/virtuoso/db/virtuoso.ini'

## Virtuoso users and setting their passwords

The most important virtuoso user is the `dba` user. The default password of the `dba` user is `dba`. You can change passwords using the isql command-line client.
=> http://docs.openlinksw.com/virtuoso/defpasschange/ Virtuoso users and how to set their passwords

## Running virtuoso
### Running virtuoso in a guix system container

We have a Guix virtuoso service in the guix-bioinformatics channel. The easiest way to run virtuoso is to use the virtuoso service to run it in a guix system container. The only downside of this method is that, since guix system containers require root privileges to start up, you will need root priviliges on the machine you are running this on.

Here is a basic guix system configuration that runs virtuoso listening on port 8891, and with its HTTP server listening on port 8892. Among other things, the HTTP server provides a SPARQL endpoint to interact with.
```
(use-modules (gnu)
             (gn services databases))

(operating-system
  (host-name "virtuoso")
  (timezone "UTC")
  (locale "en_US.utf8")
  (bootloader (bootloader-configuration
               (bootloader grub-bootloader)
               (targets (list "/dev/sdX"))))
  (file-systems (cons (file-system
                        (device "root")
                        (mount-point "/")
                        (type "ext4"))
                      %base-file-systems))
  (users %base-user-accounts)
  (packages %base-packages)
  (services (cons (service virtuoso-service-type
                           (virtuoso-configuration
                            (server-port 8891)
                            (http-server-port 8892)))
                  %base-services)))
```

You can write the above configuration to a file, say virtuoso-os.scm, build a container with it, and run it with the command below. Everything inside the container is ephemeral and vanishes when the container is stopped. In order to persist the database, we mount a host directory /tmp/virtuoso-state at /var/lib/virtuoso in the container. /var/lib/virtuoso is the default state directory used by the Guix virtuoso service.
```
sudo $(guix system container --network --share=/tmp/virtuoso-state=/var/lib/virtuoso virtuoso-os.scm)
```

### Running virtuoso by invoking it on the command line

You may also choose to run virtuoso the traditional way by invoking it on the command line. Managing long-running instances started from the command line is messy. So, this method works best for temporary instances. Let's start from the virtuoso.ini file:

```
mkdir -p ~/services/virtuoso
cd services/virtuoso
cp /export/virtuoso/var/lib/virtuoso/db/virtuoso.ini .
```

and edit it to change paths and ports - use non-priviliged ports(!). A full diff is below. Start the server in a screen or tmux (it may ask for creating ./db):

```
penguin2:~/services/virtuoso$ ~/.config/guix/current/bin/guix shell virtuoso-ose glibc-locales
penguin2:~/services/virtuoso [env]$ /gnu/store/9aqd4jmkafhkdm095hnmxpxzws3ym3wd-virtuoso-ose-7.2.5/bin/virtuoso-t +foreground +configfile virtuoso.ini
03:34:50 HTTP/WebDAV server online at 28890
03:34:50 Server online at 21111 (pid 57078)
```

Now the server should respond to
```
curl localhost:28890/sparql
```

and the admin interface on
```
curl localhost:28890/conductor
```

To use the service from your remote machine use ssh tunnels:
```
ssh -L 28890:127.0.0.1:28890 -f -N myname@penguin2.genenetwork.org
```
and surf to http://localhost:28890/conductor. A good time to change the default password (dba:dba)!

## Loading data into virtuoso

Virtuoso supports at least three different ways to load RDF.

### Bulk loading using the isql command-line client

=> http://vos.openlinksw.com/owiki/wiki/VOS/VirtBulkRDFLoader Bulk loading using the isql command-line client
Bulk loading using the isql command-line client is usually the fastest. But, it requires correct handling of file system permissions, and cannot work on remote servers.

### SPARQL 1.1 Update

The standard SPARQL protocol allows update of RDF too.
=> https://www.w3.org/TR/sparql11-update/ SPARQL 1.1 Update

### SPARQL 1.1 Graph Store HTTP Protocal

For ease of implementation, SPARQL 1.1 also specifies an additional REST-like API to update data.
=> https://www.w3.org/TR/sparql11-http-rdf-update/ SPARQL 1.1 Graph Store HTTP Protocol
The virtuoso documentation shows examples of using this protocol with cURL.
=> http://vos.openlinksw.com/owiki/wiki/VOS/VirtGraphProtocolCURLExamples Virtuoso SPARQL 1.1 Graph Store HTTP Protocol examples using cURL
We recap the same here. First delete the existing graph with something like
```
curl -v --digest --user dba:password --verbose --url -G http://localhost:28890/sparql-graph-crud-auth --data-urlencode graph=https://BioHackrXiv.org/graph -X DELETE
```
Next update the graph with
```
curl -v -X PUT --digest -u dba:password -H Content-Type:text/turtle -T test/data/biohackrxiv.ttl -G http://localhost:28890/sparql-graph-crud-auth --data-urlencode graph=https://BioHackrXiv.org/graph
```
where https://BioHackrXiv.org/graph is the name of the graph (in this example). A python version can be found in
=> https://github.com/pubseq/bh20-seq-resource/blob/master/scripts/update_virtuoso/check_for_updates.py

## Validate data using rapper

TODO

## Virtuoso.ini

TODO: Elaborate.

What changed in $HOME/services/virtuoso/virtuoso.ini

```
+DatabaseFile                   = $HOME/services/virtuoso/db/virtuoso.db
+ErrorLogFile                   = $HOME/services/virtuoso/db/virtuoso.log
+LockFile                       = $HOME/services/virtuoso/db/virtuoso.lck
+TransactionFile                = $HOME/services/virtuoso/db/virtuoso.trx
+xa_persistent_file             = $HOME/services/virtuoso/db/virtuoso.pxa
+DatabaseFile                   = $HOME/services/virtuoso/db/virtuoso-temp.db
+TransactionFile                = $HOME/services/virtuoso/db/virtuoso-temp.trx
-ServerPort                     = 1111
+ServerPort                     = 21111
+NumberOfBuffers                = 340000
+MaxDirtyBuffers                = 250000
+ServerPort                     = 28890
+ServerRoot                     = $HOME/services/virtuoso/vsp
```