about summary refs log tree commit diff

GENENETWORK SPARQL endpoint

SPARQL is the query language for our RDF database. This endpoint can export HTML, JSON and TSV(!)

Note that we created a reflective REST API that executes similar queries. See the REST API.

SPARQL examples are:

Get species info

  • list_species() - List available species.
    PREFIX gn: <http://genenetwork.org/id/>
    PREFIX gnc: <http://genenetwork.org/category/>
    PREFIX owl: <http://www.w3.org/2002/07/owl#>
    PREFIX gnt: <http://genenetwork.org/term/>
    PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    PREFIX taxon: <http://purl.uniprot.org/taxonomy/>
    
    SELECT DISTINCT * WHERE {
        ?s rdf:type gnc:species .
        ?s ?p ?o .
    }
    

try

Get 'group' or population info

  • list_groups("drosophila") - List available groups of datasets
        PREFIX gn: <http://genenetwork.org/id/>
        PREFIX gnc: <http://genenetwork.org/category/>
        PREFIX owl: <http://www.w3.org/2002/07/owl#>
        PREFIX gnt: <http://genenetwork.org/term/>
        PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
        PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
        PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
        PREFIX taxon: <http://purl.uniprot.org/taxonomy/>

        SELECT ?inbredSet WHERE {
             rdf:type gnc:species .
            ?species skos:altLabel "drosophila" .
            ?inbredSet rdf:type gnc:inbredSet .
            ?inbredSet gnt:belongsToSpecies ?species .
        }

List all sets with species and description:

       SELECT DISTINCT ?set ?species ?descr WHERE {
            ?set rdf:type gnc:inbredSet ;
                 gnt:belongsToSpecies ?species .
            OPTIONAL {?set rdfs:label ?descr } .

And list all 50+ sets for Mouse:

       SELECT DISTINCT * WHERE {
            ?inbredSet rdf:type gnc:inbredSet ;
             gnt:belongsToSpecies gn:Mus_musculus .
            OPTIONAL {?inbredSet rdfs:label ?descr }.
        }

try.

Show set info for one 'group' without tissue info

       SELECT DISTINCT * WHERE {
            gn:inbredSetHsnih-palmer ?p ?o .
            FILTER ( !EXISTS{ gn:inbredSetHsnih-palmer gnt:hasTissue ?o }) .
        }

List all datasets for a group/population:

  • list_datasets("BXD") - List available datasets for a given group (here, "BXD").
SELECT DISTINCT * WHERE {
             ?dataset gnt:belongsToInbredSet gn:inbredSetBxd ;
             rdfs:label ?descr .
}

try

Pick one, e.g. http://genenetwork.org/id/Devneocortex_ilm6_2p14rinv_1111 or gn:Devneocortex_ilm6_2p14rinv_1111

SELECT DISTINCT * WHERE {
      gn:Devneocortex_ilm6_2p14rinv_1111 ?p ?o .
}

Will show something like:

http://www.w3.org/1999/02/22-rdf-syntax-ns#type    http://genenetwork.org/category/probesetDataset
http://purl.org/dc/terms/created "2011-11-18"
http://www.w3.org/2004/02/skos/core#prefLabel "BIDMC/UTHSC Dev Neocortex P14 ILMv6.2 (Nov10)"
http://genenetwork.org/term/belongsToInbredSet  http://genenetwork.org/id/inbredSetBxd
http://vocab.fairdatacollective.org/gdmt/hasCreatorAffiliation  "Beth Israel Deaconess Medical Center"

try

Another way to list datasets with the name that is used in GN:

        SELECT DISTINCT ?dataset ?datasetName WHERE {
            ?dataset rdf:type/rdfs:subClassOf gnc:dataset .
            ?dataset rdfs:label ?datasetName .
            ?dataset gnt:belongsToInbredSet ?inbredSet .
            ?inbredSet skos:altLabel "BXD" .
          }

To list all datasets

        SELECT DISTINCT  ?dataset ?datasetName WHERE {
            ?dataset rdf:type/rdfs:subClassOf gnc:dataset .
            ?dataset rdfs:label ?datasetName .

          }

And count them!

        SELECT count(?dataset) WHERE {
            ?dataset rdf:type/rdfs:subClassOf gnc:dataset .
          }

893 at last count(!)

  • info_dataset("CB_M_1004_P") - Get meta information about a data set using the GN name:
        SELECT DISTINCT * WHERE {
            ?s rdfs:label "CB_M_1004_P" .
            ?s ?p ?o .
             }

(you should be using the identifier here)

  • info_datasets("B6D2F2") - Get meta information about all data sets for a group.
    SELECT DISTINCT * WHERE {
            ?s rdf:type/rdfs:subClassOf gnc:dataset .
            ?s gnt:belongsToInbredSet ?inbredSet .
            ?inbredSet skos:altLabel "B6D2F2" .
            ?s ?p ?o .
             }
  • info_pheno("BXD", "10038") - Get summary information for a phenotype

The following works if you change the gnt prefix to terms. This is bug.

        PREFIX gn: <http://genenetwork.org/id/>
        PREFIX gnc: <http://genenetwork.org/category/>
        PREFIX owl: <http://www.w3.org/2002/07/owl#>
        PREFIX gnt: <http://genenetwork.org/term/>
        PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
        PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
        PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
        PREFIX taxon: <http://purl.uniprot.org/taxonomy/>
        PREFIX fabio: <http://purl.org/spar/fabio/>
        PREFIX dct: <http://purl.org/dc/terms/>

        SELECT DISTINCT * WHERE {
            ?s rdf:type gnc:phenotype .
            ?inbredSet skos:altLabel "BXD" .
            ?s gnt:belongsToInbredSet ?inbredSet.
            ?s gnt:traitName "10001" .
            ?s ?p ?o .
        OPTIONAL {
            ?pub fabio:hasPubMedId ?pmid .
            ?s dct:isReferencedBy ?pmid .
            ?pub ?pubTerms ?pubResult .
            }
        }
  • get_pheno("BXD", "10646") - Get phenotype values for a classical trait.

Use lmdb

  • get_geno("BXD") - Get genotypes for a group.

Use lmdb

  • run_gemma("BXDPublish", "10015") - Perform a genome scan with gemma
  • run_rqtl("BXDPublish", "10015") - Perform a genome scan with R/qtl
  • run_correlation("HC_M2_0606_P", "BXDPublish", "1427571_at") - Finds traits that are correlated with a given trait.

Not in SPARQL