api/GN2-REST-API.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211

# API Query Documentation #

This document describes fetching data from GeneNetwork using the REST API.

Karl Broman wrote the [GNapi](https://github.com/kbroman/GNapi/blob/master/README.md) R package for accessing the GN API which allows convenient access:

- `check_gn()` - Check connection.
- `list_species()` - List available species.
- `list_groups("drosophila")` - List available groups of datasets
- `list_datasets("BXD")` - List available datasets for a given group (here, `"BXD"`).
- `info_dataset("CB_M_1004_P")` - Get meta information about a data set.
- `info_datasets("B6D2F2")` - Get meta information about all data sets for a group.
- `info_pheno("BXD", "10038")` - Get summary information for a phenotype
- `get_pheno("BXD", "10646")` - Get phenotype values for a classical trait.
- `get_geno("BXD")` - Get genotypes for a group.
- `run_gemma("BXDPublish", "10015")` - Perform a genome scan with [gemma](https://github.com/genetics-statistics/GEMMA)
- `run_rqtl("BXDPublish", "10015")` - Perform a genome scan with [R/qtl](https://rqtl.org)
- `run_correlation("HC_M2_0606_P", "BXDPublish", "1427571_at")` - Finds traits that are correlated with a given trait.

For further examples, see the [online
vignette](https://kbroman.org/GNapi/GNapi.html), which is also
available from within R:

```r
vignette("GNapi")
```

---
# Fetching Dataset/Trait info/data #
---
## Fetch Species List ##

To get a list of species with data available in GN (and their associated names and ids):
```
curl https://genenetwork.org/api/v_pre1/species
[ { "FullName": "Mus musculus", "Id": 1, "Name": "mouse", "TaxonomyId": 10090 }, ... { "FullName": "Populus trichocarpa", "Id": 10, "Name": "poplar", "TaxonomyId": 3689 } ]
```

Or to get a single species info:
```
curl https://genenetwork.org/api/v_pre1/species/mouse
```
OR
```
curl https://genenetwork.org/api/v_pre1/species/mouse.json
```

*For all queries where the last field is a user-specified name/ID, there will be the option to append a file format type. Currently there is only JSON (and it will default to JSON if none is provided), but other formats will be added later*

## Groups

The data is organised by group (also a drop down menu on the GN search page). General information on groups can be found [here](http://gn1.genenetwork.org/mouseCross.html).


This query can optionally filter by species:

```
curl https://genenetwork.org/api/v_pre1/groups (for all species)
```
OR
```
curl https://genenetwork.org/api/v_pre1/groups/mouse (for just mouse groups/RISets)
[ { "DisplayName": "BXD", "FullName": "BXD RI Family", "GeneticType": "riset", "Id": 1, "MappingMethodId": "1", "Name": "BXD", "SpeciesId": 1, "public": 2 }, ... { "DisplayName": "AIL LGSM F34 and F39-43 (GBS)", "FullName": "AIL LGSM F34 and F39-43 (GBS)", "GeneticType": "intercross", "Id": 72, "MappingMethodId": "2", "Name": "AIL-LGSM-F34-F39-43-GBS", "SpeciesId": 1, "public": 2 } ]
```

## Fetch Genotypes for Group/RISet ##
```
curl https://genenetwork.org/api/v_pre1/genotypes/bimbam/BXD
curl https://genenetwork.org/api/v_pre1/genotypes/BXD.bimbam
```
Returns a group's genotypes in one of several formats - bimbam, rqtl2, or geno (a format used by qtlreaper which is just a CSV file consisting of marker positions and genotypes)

Rqtl2 genotype queries can also include the dataset name and will return a zip of the genotypes, phenotypes, and gene map (marker names/positions). For example:
```
curl https://genenetwork.org/api/v_pre1/genotypes/rqtl2/BXD/HC_M2_0606_P.zip
```

## Fetch Datasets ##
```
curl https://genenetwork.org/api/v_pre1/datasets/bxd
```
OR
```
curl https://genenetwork.org/api/v_pre1/datasets/mouse/bxd
[ { "AvgID": 1, "CreateTime": "Fri, 01 Aug 2003 00:00:00 GMT", "DataScale": "log2", "FullName": "UTHSC/ETHZ/EPFL BXD Liver Polar Metabolites Extraction A, CD Cohorts (Mar 2017) log2", "Id": 1, "Long_Abbreviation": "BXDMicroArray_ProbeSet_August03", "ProbeFreezeId": 3, "ShortName": "Brain U74Av2 08/03 MAS5", "Short_Abbreviation": "Br_U_0803_M", "confidentiality": 0, "public": 0 }, ... { "AvgID": 3, "CreateTime": "Tue, 14 Aug 2018 00:00:00 GMT", "DataScale": "log2", "FullName": "EPFL/LISP BXD CD Liver Affy Mouse Gene 1.0 ST (Aug18) RMA", "Id": 859, "Long_Abbreviation": "EPFLMouseLiverCDRMAApr18", "ProbeFreezeId": 181, "ShortName": "EPFL/LISP BXD CD Liver Affy Mouse Gene 1.0 ST (Aug18) RMA", "Short_Abbreviation": "EPFLMouseLiverCDRMA0818", "confidentiality": 0, "public": 1 } ]
```
(I added the option to specify species just in case we end up with the same group name across multiple species at some point, though it's currently unnecessary)

## Fetch Individual Dataset Info ##
### For mRNA Assay/"ProbeSet" ###

```
curl https://genenetwork.org/api/v_pre1/dataset/HC_M2_0606_P
```
OR
```
curl https://genenetwork.org/api/v_pre1/dataset/bxd/HC_M2_0606_P```
{ "confidential": 0, "data_scale": "log2", "dataset_type": "mRNA expression", "full_name": "Hippocampus Consortium M430v2 (Jun06) PDNN", "id": 112, "name": "HC_M2_0606_P", "public": 2, "short_name": "Hippocampus M430v2 BXD 06/06 PDNN", "tissue": "Hippocampus mRNA", "tissue_id": 9 }
```
(This also has the option to specify group/riset)

### For "Phenotypes" (basically non-mRNA Expression; stuff like weight, sex, etc) ###
For these traits, the query fetches publication info and takes the group and phenotype 'ID' as input. For example:
```
curl https://genenetwork.org/api/v_pre1/dataset/bxd/10001
{ "dataset_type": "phenotype", "description": "Central nervous system, morphology: Cerebellum weight, whole, bilateral in adults of both sexes [mg]", "id": 10001, "name": "CBLWT2", "pubmed_id": 11438585, "title": "Genetic control of the mouse cerebellum: identification of quantitative trait loci modulating size and architecture", "year": "2001" }
```

## Fetch Sample Data for Dataset ##
```
curl https://genenetwork.org/api/v_pre1/sample_data/HSNIH-PalmerPublish.csv
```

Returns a CSV file with sample/strain names as the columns and trait IDs as rows

## Phenotype matrix

```
curl https://genenetwork.org/api/v_pre1/sample_data/BXDPublish.csv > BXDPublish.csv
```

You should query the API to get more information. Something like this
for the BXD

```
curl https://genenetwork.org/api/v_pre1/datasets/mouse/bxd > bxd_datasets.json
```

## Fetch Sample Data for Single Trait ##
```
curl https://genenetwork.org/api/v_pre1/sample_data/HC_M2_0606_P/1436869_at
[ { "data_id": 23415463, "sample_name": "129S1/SvImJ", "sample_name_2": "129S1/SvImJ", "se": 0.123, "value": 8.201 }, { "data_id": 23415463, "sample_name": "A/J", "sample_name_2": "A/J", "se": 0.046, "value": 8.413 }, { "data_id": 23415463, "sample_name": "AKR/J", "sample_name_2": "AKR/J", "se": 0.134, "value": 8.856 }, ... ]
```

## Fetch Trait List for Dataset ##
```
curl https://genenetwork.org/api/v_pre1/traits/HXBBXHPublish.json
[ { "Additive": 0.0499967532467532, "Id": 10001, "LRS": 16.2831307029479, "Locus": "rs106114574", "PhenotypeId": 1449, "PublicationId": 319, "Sequence": 1 }, ... ]
```

Both JSON and CSV formats can be specified, with JSON as default. There is also an optional "ids_only" and "names_only" parameter that will only return a list of trait IDs or names, respectively.

## Fetch Trait Info (Name, Description, Location, etc) ##
### For mRNA Expression/"ProbeSet" ###
```
curl https://genenetwork.org/api/v_pre1/trait/HC_M2_0606_P/1436869_at
{ "additive": -0.214087568058076, "alias": "HHG1; HLP3; HPE3; SMMCI; Dsh; Hhg1", "chr": "5", "description": "sonic hedgehog (hedgehog)", "id": 99602, "locus": "rs8253327", "lrs": 12.7711275309832, "mb": 28.457155, "mean": 9.27909090909091, "name": "1436869_at", "p_value": 0.306, "se": null, "symbol": "Shh" }
```

### For "Phenotypes" ###
For phenotypes this just gets the  max LRS, its location, and additive effect (as calculated by qtlreaper)

Since each group/riset only has one phenotype "dataset", this query takes either the group/riset name or the group/riset name + "Publish" (for example "BXDPublish", which is the dataset name in the DB) as input
```
curl https://genenetwork.org/api/v_pre1/trait/BXD/10001
{ "additive": 2.39444435069444, "id": 4, "locus": "rs48756159", "lrs": 13.4974911471087 }
```

---

# Analyses #
---
## Mapping ##
Currently two mapping tools can be used - GEMMA and R/qtl. qtlreaper will be added later with Christian Fischer's RUST implementation - https://github.com/chfi/rust-qtlreaper

Each method's query takes the following parameters respectively (more will be added):
### GEMMA ###
* trait_id (*required*) - ID for trait being mapped
* db (*required*) - DB name for trait above (Short_Abbreviation listed when you query for datasets)
* use_loco - Whether to use LOCO (leave one chromosome out) method (default = false)
* maf - minor allele frequency (default = 0.01)

Example query:
```
curl https://genenetwork.org/api/v_pre1/mapping?trait_id=10015&db=BXDPublish&method=gemma&use_loco=true
```

### R/qtl ###
(See the R/qtl guide for information on some of these options - http://www.rqtl.org/manual/qtl-manual.pdf)
* trait_id (*required*) - ID for trait being mapped
* db (*required*) - DB name for trait above (Short_Abbreviation listed when you query for datasets)
* rqtl_method - hk (default) | ehk | em | imp | mr | mr-imp | mr-argmax ; Corresponds to the "method" option for the R/qtl scanone function.
* rqtl_model - normal (default) | binary | 2-part | np ; corresponds to the "model" option for the R/qtl scanone function
* num_perm - number of permutations; 0 by default
* control_marker - Name of marker to use as control; this relies on the user knowing the name of the marker they want to use as a covariate
* interval_mapping - Whether to use interval mapping; "false" by default
* pair_scan - *NYI*

Example query:
```
curl https://genenetwork.org/api/v_pre1/mapping?trait_id=1418701_at&db=HC_M2_0606_P&method=rqtl&num_perm=100
```

Some combinations of methods/models may not make sense. The R/qtl manual should be referred to for any questions on its use (specifically the scanone function in this case)

## Calculate Correlation ##
Currently only Sample and Tissue correlations are implemented

This query currently takes the following parameters (though more will be added):
* trait_id (*required*) - ID for trait used for correlation
* db (*required*) - DB name for the trait above (this is the Short_Abbreviation listed when you query for datasets)
* target_db (*required*) - Target DB name to be correlated against
* type - sample (default) | tissue
* method - pearson (default) | spearman
* return - Number of results to return (default = 500)

Example query:
```
curl https://genenetwork.org/api/v_pre1/correlation?trait_id=1427571_at&db=HC_M2_0606_P&target_db=BXDPublish&type=sample&return_count=100
[ { "#_strains": 6, "p_value": 0.004804664723032055, "sample_r": -0.942857142857143, "trait": 20511 }, { "#_strains": 6, "p_value": 0.004804664723032055, "sample_r": -0.942857142857143, "trait": 20724 }, { "#_strains": 12, "p_value": 1.8288943424888848e-05, "sample_r": -0.9233615170820528, "trait": 13536 }, { "#_strains": 7, "p_value": 0.006807187408935392, "sample_r": 0.8928571428571429, "trait": 10157 }, { "#_strains": 7, "p_value": 0.006807187408935392, "sample_r": -0.8928571428571429, "trait": 20392 }, ... ]
```