# Set-up Virtuoso+Xapian on Production ## Tags * assigned: bonfacem, zachs, fredm * priority: high * type: ops * keywords: virtuoso ## Description We already have virtuoso set-up in tux02. Right now, to be able to interact with RDF, we need to have virtuoso set-up. This issue will unblock: * Global Search in Production => https://github.com/genenetwork/genenetwork3/pull/137 Update RDF endpoints => https://github.com/genenetwork/genenetwork2/pull/808 UI/RDF frontend ## HOWTO: Updating Virtuoso in Production (Tux01) Note where the virtuoso data directory is mapped from the "production.sh" script as you will use this in the consequent steps: > --share=/export2/guix-containers/genenetwork/var/lib/virtuoso=/var/lib/virtuoso ### Generating the TTL Files => https://git.genenetwork.org/gn-transform-databases/tree/generate-ttl-files.scm Run "generate-ttl-files" to generate the TTL files: ``` time guix shell guile-dbi -m manifest.scm -- \ ./generate-ttl-files.scm --settings conn-dev.scm --output \ /export2/guix-containers/genenetwork-development/var/lib/virtuoso \ --documentation /tmp/doc-directory ``` * [Recommended] Alternatively, copy over the TTL files (in Tux01) to the correct shared directory in the container: ``` cp /home/bonfacem/ttl-files/*ttl /export2/guix-containers/genenetwork/var/lib/virtuoso/ ``` ### Loading the TTL Files * Make sure that the virtuoso service type has the "dirs-allowed" variable set correctly: ``` (service virtuoso-service-type (virtuoso-configuration (server-port 7892) (http-server-port 7893) (dirs-allowed "/var/lib/virtuoso"))) ``` * Get into isql: ``` guix shell virtuoso-ose -- isql 7892 ``` * Make sure that no pre-existing TTL files exist in "DB.DBA.LOAD_LIST": ``` SQL> select * from DB.DBA.LOAD_LIST; SQL> delete from DB.DBA.load_list; ``` * Delete the genenetwork graph: ``` SQL> DELETE FROM rdf_quad WHERE g = iri_to_id('http://genenetwork.org'); ``` * Load all the TTL files (This takes some time): ``` SQL> ld_dir('/var/lib/virtuoso', '*.ttl', 'http://genenetwork.org'); SQL> rdf_loader_run(); SQL> CHECKPOINT; SQL> checkpoint_interval(60); SQL> scheduler_interval(10); ``` * Verify you have some RDF data by running: ``` SQL> SPARQL PREFIX gn: PREFIX gnc: PREFIX owl: PREFIX gnt: PREFIX skos: PREFIX rdf: PREFIX rdfs: PREFIX taxon: SELECT * WHERE { ?s skos:member gn:Mus_musculus . ?s ?p ?o . }; ``` * Update GN3 Configurations to point to the correct Virtuoso instance: > SPARQL_ENDPOINT="http://localhost:7893/sparql" ## HOWTO: Generating the Xapian Index * Make sure you are using the correct guix profile or that you have the "PYTHONPATH" pointing to the GN3 repository. * Generate the Xapian Index using "genenetwork3/scripts/create-xapian-index" against the correct output directory (The build takes around 71 minutes on an SSD Drive): ``` time python index-genenetwork create-xapian-index \ /export/data/genenetwork-xapian/ \ mysql://:@localhost/db_webqtl \ http://localhost:7893/sparql ``` * After the build, you can verify that the index works by: ``` guix shell xapian -- xapian-delve /export/data/genenetwork-xapian/ ``` * Update GN3 configuration files to point to the right Xapian path: > XAPIAN_DB_PATH="/export/data/genenetwork-xapian/" ## Resolution @fredm updated virtuoso; and @zachs updated the xapian index in production. * closed