aboutsummaryrefslogtreecommitdiff

GENENETWORK SPARQL endpoint

SPARQL is the query language for our RDF database. This endpoint can export HTML, JSON and TSV(!)

Note that we created a reflective REST API that executes similar queries. See the REST API.

SPARQL examples are:

Get species info

  • list_species() - List available species.
    PREFIX gn: <http://genenetwork.org/id/>
    PREFIX gnc: <http://genenetwork.org/category/>
    PREFIX owl: <http://www.w3.org/2002/07/owl#>
    PREFIX gnt: <http://genenetwork.org/term/>
    PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    PREFIX taxon: <http://purl.uniprot.org/taxonomy/>
    
    SELECT DISTINCT * WHERE {
        ?s rdf:type gnc:species .
        ?s ?p ?o .
    }
    

try

Get 'group' or population info

  • list_groups("drosophila") - List available groups of datasets
        PREFIX gn: <http://genenetwork.org/id/>
        PREFIX gnc: <http://genenetwork.org/category/>
        PREFIX owl: <http://www.w3.org/2002/07/owl#>
        PREFIX gnt: <http://genenetwork.org/term/>
        PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
        PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
        PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
        PREFIX taxon: <http://purl.uniprot.org/taxonomy/>

        SELECT ?inbredSet WHERE {
             rdf:type gnc:species .
            ?species skos:altLabel "drosophila" .
            ?inbredSet rdf:type gnc:inbredSet .
            ?inbredSet gnt:belongsToSpecies ?species .
        }

List all sets with species and description:

       SELECT DISTINCT ?set ?species ?descr WHERE {
            ?set rdf:type gnc:inbredSet ;
                 gnt:belongsToSpecies ?species .
            OPTIONAL {?set rdfs:label ?descr } .

And list all 50+ sets for Mouse:

       SELECT DISTINCT * WHERE {
            ?inbredSet rdf:type gnc:inbredSet ;
             gnt:belongsToSpecies gn:Mus_musculus .
            OPTIONAL {?inbredSet rdfs:label ?descr }.
        }

try.

Show set info for one 'group' without tissue info

       SELECT DISTINCT * WHERE {
            gn:inbredSetHsnih-palmer ?p ?o .
            FILTER ( !EXISTS{ gn:inbredSetHsnih-palmer gnt:hasTissue ?o }) .
        }

List all datasets for a group/population:

  • list_datasets("BXD") - List available datasets for a given group (here, "BXD").
SELECT DISTINCT * WHERE {
             ?dataset gnt:belongsToInbredSet gn:inbredSetBxd ;
             rdfs:label ?descr .
}

try

Pick one, e.g. http://genenetwork.org/id/Devneocortex_ilm6_2p14rinv_1111 or gn:Devneocortex_ilm6_2p14rinv_1111

SELECT DISTINCT * WHERE {
      gn:Devneocortex_ilm6_2p14rinv_1111 ?p ?o .
}

Will show something like:

http://www.w3.org/1999/02/22-rdf-syntax-ns#type    http://genenetwork.org/category/probesetDataset
http://purl.org/dc/terms/created "2011-11-18"
http://www.w3.org/2004/02/skos/core#prefLabel "BIDMC/UTHSC Dev Neocortex P14 ILMv6.2 (Nov10)"
http://genenetwork.org/term/belongsToInbredSet  http://genenetwork.org/id/inbredSetBxd
http://vocab.fairdatacollective.org/gdmt/hasCreatorAffiliation  "Beth Israel Deaconess Medical Center"

try

Another way to list datasets with the name that is used in GN:

        SELECT DISTINCT ?dataset ?datasetName WHERE {
            ?dataset rdf:type/rdfs:subClassOf gnc:dataset .
            ?dataset rdfs:label ?datasetName .
            ?dataset gnt:belongsToInbredSet ?inbredSet .
            ?inbredSet skos:altLabel "BXD" .
          }

To list all datasets

        SELECT DISTINCT  ?dataset ?datasetName WHERE {
            ?dataset rdf:type/rdfs:subClassOf gnc:dataset .
            ?dataset rdfs:label ?datasetName .

          }

And count them!

        SELECT count(?dataset) WHERE {
            ?dataset rdf:type/rdfs:subClassOf gnc:dataset .
          }

893 at last count(!)

  • info_dataset("CB_M_1004_P") - Get meta information about a data set using the GN name:
        SELECT DISTINCT * WHERE {
            ?s rdfs:label "CB_M_1004_P" .
            ?s ?p ?o .
             }

(you should be using the identifier here)

  • info_datasets("B6D2F2") - Get meta information about all data sets for a group.
    SELECT DISTINCT * WHERE {
            ?s rdf:type/rdfs:subClassOf gnc:dataset .
            ?s gnt:belongsToInbredSet ?inbredSet .
            ?inbredSet skos:altLabel "B6D2F2" .
            ?s ?p ?o .
             }
  • info_pheno("BXD", "10038") - Get summary information for a phenotype

The following works if you change the gnt prefix to terms. This is bug.

        PREFIX gn: <http://genenetwork.org/id/>
        PREFIX gnc: <http://genenetwork.org/category/>
        PREFIX owl: <http://www.w3.org/2002/07/owl#>
        PREFIX gnt: <http://genenetwork.org/term/>
        PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
        PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
        PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
        PREFIX taxon: <http://purl.uniprot.org/taxonomy/>
        PREFIX fabio: <http://purl.org/spar/fabio/>
        PREFIX dct: <http://purl.org/dc/terms/>

        SELECT DISTINCT * WHERE {
            ?s rdf:type gnc:phenotype .
            ?inbredSet skos:altLabel "BXD" .
            ?s gnt:belongsToInbredSet ?inbredSet.
            ?s gnt:traitName "10001" .
            ?s ?p ?o .
        OPTIONAL {
            ?pub fabio:hasPubMedId ?pmid .
            ?s dct:isReferencedBy ?pmid .
            ?pub ?pubTerms ?pubResult .
            }
        }
  • get_pheno("BXD", "10646") - Get phenotype values for a classical trait.

Use lmdb

  • get_geno("BXD") - Get genotypes for a group.

Use lmdb

  • run_gemma("BXDPublish", "10015") - Perform a genome scan with gemma
  • run_rqtl("BXDPublish", "10015") - Perform a genome scan with R/qtl
  • run_correlation("HC_M2_0606_P", "BXDPublish", "1427571_at") - Finds traits that are correlated with a given trait.

Not in SPARQL