Case studies
The Hp1bp3 transcript
Investigate Hp1bp3, which has a cis-QTL in hippocampus and is associated with cognitive ageing.
Search for the dataset:
https://genenetwork.org/api/v2/Mus_musculus/BXD/datasets?search=hippocampus
[API v1]: # https://genenetwork.org/api/v_pre1/datasets/bxd
[
{
"AvgID": 1,
"CreateTime": "Mon, 24 Oct 2005 00:00:00 GMT",
"DataScale": "log2",
"FullName": "Hippocampus Consortium M430v2 (Oct05) MAS5",
"Id": 86,
"Long_Abbreviation": "Hippocampus_M430_V2_BXD_MAS5_Oct05",
"ProbeFreezeId": 24,
"ShortName": "Hippocampus M430v2 BXD 10/05 MAS5",
"Short_Abbreviation": "HC_M2_1005_M",
"confidentiality": 0,
"public": 0
},
{
"AvgID": 3,
"CreateTime": "Mon, 24 Oct 2005 00:00:00 GMT",
"DataScale": "log2",
"FullName": "Hippocampus Consortium M430v2 (Oct05) RMA",
"Id": 87,
"Long_Abbreviation": "Hippocampus_M430_V2_BXD_RMA_Oct05",
"ProbeFreezeId": 24,
"ShortName": "Hippocampus M430v2 BXD 10/05 RMA",
"Short_Abbreviation": "HC_M2_1005_R",
"confidentiality": 0,
"public": 0
},
{
"AvgID": 2,
"CreateTime": "Mon, 24 Oct 2005 00:00:00 GMT",
"DataScale": "log2",
"FullName": "Hippocampus Consortium M430v2 (Oct05) PDNN",
"Id": 88,
"Long_Abbreviation": "Hippocampus_M430_V2_BXD_PDNN_Oct05",
"ProbeFreezeId": 24,
"ShortName": "Hippocampus M430v2 BXD 10/05 PDNN",
"Short_Abbreviation": "HC_M2_1005_P",
"confidentiality": 0,
"public": 0
}
]
This should return a list of all hippocampal datasets containing the phrase 'hippocampus' (or its lemma).
The user can then look through the descriptions and decide which one they need.
In this case the appropriate key is HC_M2_0606_P
.
We could also just get a listing of all datasets and work through them locally (by eye or with a local grep).
https://genenetwork.org/api/v2/Mus_musculus/BXD/datasets
In all cases, giving the generic term (species
, populations
, datasets
, traits
) will return a listing of all descendent options.
Just using the instance keys as the endpoint (e.g. api/v2/Mus_musculus
, api/v2/Mus_musculus/BXD
, api/v2/Mus_musculus/BXD/HC_M2_0606_P
) will return metadata about the level (about the species 'mouse', the population 'BXD' or the dataset 'HC_M2_0606_P' respectively in the above examples).
To continue, we dig down and search the dataset for the desired gene name:
https://genenetwork.org/api/v2/Mus_musculus/BXD/HC_M2_0606_P/traits?search&symbol=Hp1bp3
[
{
"additive": -0.15845054446461,
"alias": "HP1BP74; HP1-BP74; Hp1bp74",
"chr": "4",
"description": "heterochromatin protein 1, binding protein 3",
"id": 78509,
"locus": "rsm10000002056",
"lrs": 57.6845496792109,
"mb": 138.242585,
"mean": 12.2393434343434,
"name": "1415751_at",
"p_value": 0.0,
"se": null,
"symbol": "Hp1bp3"
},
{
"additive": -0.489152777777777,
"alias": "HP1BP74; HP1-BP74; Hp1bp74",
"chr": "4",
"description": "heterochromatin protein 1, binding protein 3",
"id": 102578,
"locus": "rsm10000002058",
"lrs": 96.3121317863362,
"mb": 138.244118,
"mean": 8.88365656565657,
"name": "1439845_at",
"p_value": 0.0,
"se": null,
"symbol": "Hp1bp3"
},
{
"additive": -0.037382526029878,
"alias": "HP1BP74; HP1-BP74; Hp1bp74; 2310026L22Rik",
"chr": "4",
"description": "heterochromatin protein 1, binding protein 3",
"id": 110688,
"locus": "rs32937254",
"lrs": 13.2029671197265,
"mb": 138.21577,
"mean": 6.51316161616162,
"name": "1447955_at",
"p_value": 0.317,
"se": null,
"symbol": "Hp1bp3"
}
]
This gives us the three probesets associated with Hp1bp3 and some metadata (name, aliases, expression, precomputed QTL etc.).
We decide that 1439845_at
is the correct probeset.
Get more information about 1439845_at
including the metadata noted above, but also microarray platform, probe composition and mapping, chromosomal position, gene/transcript length, links to gene info (NCBI, Wikidata), homologous genes in other species, [what other datasets contain data for this gene] etc.:
https://genenetwork.org/api/v2/Mus_musculus/BXD/HC_M2_0606_P/1439845_at
[API v1]: # https://genenetwork.org/api/v_pre1/trait/HC_M2_0606_P/1439845_at
[
{
"additive": -0.489152777777777,
"alias": "HP1BP74; HP1-BP74; Hp1bp74",
"chr": "4",
"description": "heterochromatin protein 1, binding protein 3",
"id": 102578,
"locus": "rsm10000002058",
"lrs": 96.3121317863362,
"mb": 138.244118,
"mean": 8.88365656565657,
"name": "1439845_at",
"p_value": 0.0,
"se": null,
"symbol": "Hp1bp3",
"wikidata": "Q18251298",
"homologene", "7774",
}
]
Should include all of the data shown at https://genenetwork.org/show_trait?trait_id=1439845_at&dataset=HC_M2_0606_P
Get the expression data for this trait:
https://genenetwork.org/api/v2/Mus_musculus/BXD/HC_M2_0606_P/1439845_at/data
[
{
"data_id": 23426549,
"sample_name": "129S1/SvImJ",
"sample_name_2": "129S1/SvImJ",
"se": 0.219,
"value": 6.61
},
{
"data_id": 23426549,
"sample_name": "A/J",
"sample_name_2": "A/J",
"se": 0.158,
"value": 6.536
},
{
"data_id": 23426549,
"sample_name": "AKR/J",
"sample_name_2": "AKR/J",
"se": 0.076,
"value": 6.486
},
{
"data_id": 23426549,
"sample_name": "B6D2F1",
"sample_name_2": "B6D2F1",
"se": 0.09,
"value": 6.561
},
.
.
.
]
This is a data endpoint, so the returned JSON includes a vector of the transcript expression values for this probeset.
If we wanted to grab the whole microarray dataset, then we can just use the data keyword one level up. Here, a return type can also be specified
https://genenetwork.org/api/v2/Mus_musculus/BXD/HC_M2_0606_P/BXD/data.tsv
This returns a tab-delimited table of data (probesets in columns, strains/individuals in rows) for download.
Get the QTL vector:
https://genenetwork.org/api/v2/Mus_musculus/BXD/HC_M2_0606_P/1439845_at/qtl?method=GEMMA&genotype=mm10
[API v1]: # https://genenetwork.org/api/v_pre1/mapping?trait_id=1447955_at&db=HC_M2_0606_P&method=gemma&use_loco=FALSE&use_loco=0.01
[
[
{
"Mb": 3.00149,
"additive": -0.0017764785,
"chr": 1,
"lod_score": 0.06055383480931299,
"name": "rsm10000000001",
"p_value": 0.8698536
},
{
"Mb": 3.010274,
"additive": -0.0017764785,
"chr": 1,
"lod_score": 0.06055383480931299,
"name": "rs31443144",
"p_value": 0.8698536
},
{
"Mb": 3.492195,
"additive": -0.0017764785,
"chr": 1,
"lod_score": 0.06055383480931299,
"name": "rs6269442",
"p_value": 0.8698536
},
{
"Mb": 3.511204,
"additive": -0.0017764785,
"chr": 1,
"lod_score": 0.06055383480931299,
"name": "rs32285189",
"p_value": 0.8698536
},
{
"Mb": 3.659804,
"additive": -0.0017764785,
"chr": 1,
"lod_score": 0.06055383480931299,
"name": "rs258367496",
"p_value": 0.8698536
},
{
"Mb": 3.777023,
"additive": -0.0017764785,
"chr": 1,
"lod_score": 0.06055383480931299,
"name": "rs32430919",
"p_value": 0.8698536
},
.
.
.
]
This is also a data endpoint, so we get a vector of p-values together with a vector of chromosomal positions.
Correlate with all phenotypes:
https://genenetwork.org/api/v2/Mus_musculus/BXD/HC_M2_0606_P/1439845_at/correlations?method=spearmann&dataset=phenotypes&n_results=10
[API v1]: # https://genenetwork.org/api/v_pre1/correlation?trait_id=1447955_at&db=HC_M2_0606_P&target_db=BXDPublish&type=sample&method=spearman&return=10 [Error]: # This returns 500 results.
[
{
"#_strains": 7,
"p_value": 0.0025194724037946874,
"sample_r": 0.9285714285714288,
"trait": "12562"
},
{
"#_strains": 13,
"p_value": 2.4445741031329683e-05,
"sample_r": 0.9023392305243964,
"trait": "12889"
},
{
"#_strains": 7,
"p_value": 0.01369732661532562,
"sample_r": -0.8571428571428573,
"trait": "19087"
},
{
"#_strains": 13,
"p_value": 0.00039102596905431295,
"sample_r": 0.8342668763658431,
"trait": "20884"
},
{
"#_strains": 8,
"p_value": 0.01017554012345675,
"sample_r": -0.8333333333333335,
"trait": "10409"
},
{
"#_strains": 8,
"p_value": 0.01017554012345675,
"sample_r": -0.8333333333333335,
"trait": "10410"
},
{
"#_strains": 6,
"p_value": 0.04156268221574334,
"sample_r": 0.8285714285714287,
"trait": "20393"
},
{
"#_strains": 6,
"p_value": 0.04156268221574334,
"sample_r": -0.8285714285714287,
"trait": "20595"
},
{
"#_strains": 10,
"p_value": 0.0038149200825507135,
"sample_r": -0.8181818181818182,
"trait": "16177"
},
{
"#_strains": 15,
"p_value": 0.000219365827727102,
"sample_r": 0.8142857142857142,
"trait": "27198"
}
]
It is not necessary to specify the target at any level above dataset as correlations can only be performed within a population.
Correlate with a specific trait:
https://genenetwork.org/api/v2/Mus_musculus/BXD/HC_M2_0606_P/1439845_at/correlations?method=pearson&dataset=HC_M2_0606_P&traits=1415751_at,1447955_at
Here, we have correlated against the two other Hs1bp3 probesets, which are specified by a comma-delimited list of trait IDs.
Correlation across different datasets would be achieved by multiple API calls. Although there may be a way to line up a series of calls and have them run as a batch (I presume more complicated queries like this would be done via a POST interface though).
More advanced searches could allow restricting the search to certain fields:
https://genenetwork.org/api/v2/Mus_musculus/BXD/datasets?search&type=transcript&tag=hippocampus
I would support using tags to associate keywords with items at all levels.
Here, the search
parameter was left empty as we are looking for a phrase in a particular field.
If all parameters are empty, this should not fail but return the same as the datasets
query without the parameters (i.e. return a listing of all available datasets).