From e1a4702576ee4958c94ceecc5223d9906b85ab16 Mon Sep 17 00:00:00 2001 From: Pjotr Prins Date: Wed, 28 Jun 2023 05:21:59 -0500 Subject: Documenting and testing existing API --- api/README.md | 1 + api/alternative-API-structure.md | 23 ++-- api/questions-to-ask-GN.md | 287 +++++++++++++++++++++++++++++++++++++++ 3 files changed, 298 insertions(+), 13 deletions(-) create mode 100644 api/questions-to-ask-GN.md (limited to 'api') diff --git a/api/README.md b/api/README.md index 6bdc974..42d549d 100644 --- a/api/README.md +++ b/api/README.md @@ -2,4 +2,5 @@ This document is an index to the GeneNetwork API information. +- [Questions API](./question-to-ask-GN.md) - [Upload API](./upload.md) diff --git a/api/alternative-API-structure.md b/api/alternative-API-structure.md index 483a0e4..df65b9b 100644 --- a/api/alternative-API-structure.md +++ b/api/alternative-API-structure.md @@ -23,11 +23,11 @@ - There may be several alternative genotypes for a species (e.g. mm9, mm10 or GRCm39 for mouse). - These may also be served in different formats. The content and the format are conceptually different, so they should be handled as such. - - `http://genenetwork.org/api/v_pre1/genotypes/bimbam/BXD` Is wrong. + - `http://genenetwork.org/api/v_pre1/genotypes/bimbam/BXD` Is wrong. - `http://genenetwork.org/api/v_pre1/genotypes/BXD.bimbam` Is better. - `http://genenetwork.org/api/v_pre1/mouse/BXD/genotypes/mm10.bimbam` [**! not a real URL**] Is ideal. - The genotypes are conceptually at the same level as datasets, and could be seen as a special type of dataset. - + ### Datasets - Again, why are these not nested under 'population'? @@ -48,7 +48,7 @@ Does this need to be called 'sample_data" rather than just 'data'? This ma (?) make it easier to handle cases where different subsets of strains are present in different datasets. - Keep in mind that 'data' is actually processed results and not raw data. -There seems to be some interest in actually storing or access raw data (CEL/fasta/video) files. +There seems to be some interest in actually storing or access raw data (CEL/fasta/video) files. This would necessitate an even deeper level (e.g. 'data/raw') that should also be accessed through this conceptual hierarchy. ### Fetching @@ -110,7 +110,7 @@ if(RCurl::url.exists("http://genenetwork.org/api/v_pre1/species/mouse")){ do.call("rbind", species_info) ``` -### Fetch the population listing for a species +### Fetch the population listing for a species Proposed URL: "http://genenetwork.org/api/v_pre1/mouse/populations" The next level of grouping within each species is the population. @@ -129,17 +129,17 @@ do.call("rbind", population)[1:20, ] **! It is not clear whether this level of grouping currently exists for all species.** -### Fetch info for a population +### Fetch info for a population Proposed URL: "http://genenetwork.org/api/v_pre1/mouse/bxd" **! Is this currently possible?** -### Fetch genotype listing +### Fetch genotype listing Proposed URL: "http://genenetwork.org/api/v_pre1/mouse/bxd/genotypes" **! Is this currently possible?** -### Fetch genotypes for a population +### Fetch genotypes for a population Proposed URL: "http://genenetwork.org/api/v_pre1/mouse/bxd/mm10.geno" Is it necessary for an API to provide data in every format possible? @@ -202,7 +202,7 @@ Conceptually, that means that dataset metadata should exist that describe the ph It is unclear what useful information this could contain though. **! Is this currently possible?** -*The following code returns info for a single trait. This command should usefully return one such row of results for every trait in the phenotype database. This would enable local searching and selection of traits of interest.* +*The following code returns info for a single trait. This command should usefully return one such row of results for every trait in the phenotype database. This would enable local searching and selection of traits of interest.* ```{r echo = FALSE} ``` @@ -213,7 +213,7 @@ Proposed URL: "http://genenetwork.org/api/v_pre1/mouse/bxd/phenotypes/traits" Because of the slightly different information attached to phenotypes, the info returned by this command would most usefully be metadata on all the individual traits and the studies they are associated with. **! Is this currently possible?** -*The following code returns info for a single trait. This command should usefully return one such row of results for every trait in the phenotype database. This would enable local searching and selection of traits of interest.* +*The following code returns info for a single trait. This command should usefully return one such row of results for every trait in the phenotype database. This would enable local searching and selection of traits of interest.* ```{r echo = FALSE} if(RCurl::url.exists("http://genenetwork.org/api/v_pre1/dataset/bxd/10001")){ @@ -398,7 +398,4 @@ As examples: - "BXD96" in older publications is now present in GeneNetwork as "BXD48a". Data from this strain will be dropped if a re-analysis is attempted. ### Upload API -The ability to upload temporary datasets programmatically and analyse them directly using the API would be a hugely useful workflow. - - - +The ability to upload temporary datasets programmatically and analyse them directly using the API would be a hugely useful workflow. diff --git a/api/questions-to-ask-GN.md b/api/questions-to-ask-GN.md new file mode 100644 index 0000000..3ad6c09 --- /dev/null +++ b/api/questions-to-ask-GN.md @@ -0,0 +1,287 @@ +# Questions to ask GN + +We are asking GN users to list questions here that should return information through the GN APIs. We will add information on (proposed) API endpoints. + +The GN API is used by the gnapi R package + +https://github.com/kbroman/gnapi + + + +## Is it live? + +``` +curl https://genenetwork.org/api/v_pre1/ +{"hello":"world"} +``` + +## Return species + +* Current: + +``` +curl -s https://genenetwork.org/api/v_pre1/species|jq '.[0:2]' +[ + { + "FullName": "Mus musculus", + "Id": 1, + "Name": "mouse", + "TaxonomyId": 10090 + }, + { + "FullName": "Rattus norvegicus", + "Id": 2, + "Name": "rat", + "TaxonomyId": 10116 + } +] +``` + +## Return available groups/populations + +* Current: https://genenetwork.org/api/v_pre1/groups/mouse +* Proposed: https://genenetwork.org/api/v_pre1/mouse/groups (?) + +```sh +curl -s http://genenetwork.org/api/v_pre1/groups/mouse |jq '.[8:10]' -M +``` + +```js +[ + { + "DisplayName": "B6D2F2 OHSU Striatum", + "FullName": "B6D2F2 OHSU Striatum", + "GeneticType": "intercross", + "Id": 12, + "MappingMethodId": "1", + "Name": "BDF2-2005", + "SpeciesId": 1, + "public": 2 + }, + { + "DisplayName": "Mouse Diversity Panel", + "FullName": "Mouse Diversity Panel", + "GeneticType": "None", + "Id": 15, + "MappingMethodId": "1", + "Name": "MDP", + "SpeciesId": 1, + "public": 2 + } +] +``` + +## Return cross info + +There is a bug in this one. It is supposed to return something like + +``` +{"species_id":1,"species":"mouse","mapping_method_id":1,"group_id":1,"group":"BXD","genetic_type":"riset","chr_info":[["1",197195432],["2",181748087],["3",159599783],["4",155630120],["5",152537259],["6",149517037],["7",152524553],["8",131738871],["9",124076172],["10",129993255],["11",121843856],["12",121257530],["13",120284312],["14",125194864],["15",103494974],["16",98319150],["17",95272651],["18",90772031],["19",61342430],["X",166650296]]} +``` + +Errors + +``` +curl -s https://genenetwork.org/api/v_pre1/group/BXD +File "/home/gn2/production/gene/wqflask/wqflask/api/router.py", line 157, in get_group_info + group = results.fetchone() +AttributeError: 'tuple' object has no attribute 'fetchone' +``` + +## Return Genotypes + +Return all genotypes for a specific population. + +* Current https://genenetwork.org/api/v_pre1/genotypes/HSNIH-Palmer_true.geno + +without _true we get a different and incomplete file. + +The API code is at +https://github.com/genenetwork/genenetwork2/blob/8bfb79da9b8dc0591532939dca97e0fa9c06c5d2/wqflask/wqflask/api/router.py#L803 + +You can see it simply returns a file - so we have a geno file named HSNIH-Palmer_true.geno and that is what it returns. + +According to above code we can get .geno, .csv, .rqtl2, .bimbam, etc. as long as the file exists. + +Standardization of genotype data format would be helpful. Alternatively, a query that tells the user what genotype formats are available. + +* Proposed: return available files + +## Return trait metadata + +Return trait metadata such as probeset info or other "trait covariates" for the high-dimensional traits. There is also the information on how the data was collected or processed. + +* Current: through SPARQL +* Proposed: https://genenetwork.org/api/v_pre1/mouse/bxd/HC_M2_0606_P/1436869_at + +## List datasets + +``` +curl -s https://genenetwork.org/api/v_pre1/datasets/BXD|jq ".[0:2]" +[ + { + "AvgID": 1, + "CreateTime": "Fri, 01 Aug 2003 00:00:00 GMT", + "DataScale": "log2", + "FullName": "Brain U74Av2 08/03 MAS5", + "Id": 1, + "Long_Abbreviation": "BXDMicroArray_ProbeSet_August03", + "ProbeFreezeId": 337, + "ShortName": "Brain U74Av2 08/03 MAS5", + "Short_Abbreviation": "Br_U_0803_M", + "confidentiality": 0, + "public": 0 + }, + { + "AvgID": 1, + "CreateTime": "Sun, 01 Jun 2003 00:00:00 GMT", + "DataScale": "log2", + "FullName": "UTHSC Brain mRNA U74Av2 (Jun03) MAS5", + "Id": 2, + "Long_Abbreviation": "BXDMicroArray_ProbeSet_June03", + "ProbeFreezeId": 10, + "ShortName": "Brain U74Av2 06/03 MAS5", + "Short_Abbreviation": "Br_U_0603_M", + "confidentiality": 0, + "public": 0 + } +] +``` + +## Get information on a dataset + +Use the dataset name + +``` +curl -s https://genenetwork.org/api/v_pre1/dataset/KIN_YSM_HIP_0711.json|jq +{ + "confidential": 0, + "data_scale": "log2", + "dataset_type": "mRNA expression", + "full_name": "Human Hippocampus Affy Hu-Exon 1.0 ST (Jul11) Quantile", + "id": 337, + "name": "KIN_YSM_HIP_0711", + "public": 1, + "short_name": "KIN/YSM Human HIP Affy Hu-Exon 1.0 ST (Jul11) Quantile", + "tissue": "Hippocampus mRNA", + "tissue_id": 9 +} +``` + +Use the ProbeFreezeId above (correct?) + +``` +curl -s https://genenetwork.org/api/v_pre1/dataset/10.json|jq +{ + "confidential": 0, + "data_scale": "log2", + "dataset_type": "mRNA expression", + "full_name": "Eye M430v2 No Mutant/Mutant (Aug12) RMA", + "id": 10, + "name": "gn10", + "public": 1, + "short_name": "Eye M430v2 No Mutant/Mutant (Aug12) RMA", + "tissue": "Eye mRNA", + "tissue_id": 10 +} +``` + +## Return sample data + +Return all traits in a dataset. + +``` + curl -s https://genenetwork.org/api/v_pre1/traits/HC_U_0304_R.json|jq ".[0:2]" +[ + { + "Additive": 0.0803547619047631, + "Aliases": "T3g; Ctg3; Ctg-3", + "Chr": "9", + "Description": "CD3d antigen, gamma polypeptide", + "Id": 1, + "LRS": 12.2805314427567, + "Locus": "rsm10000021399", + "Mb": 44.970689, + "Mean": 8.14033666666667, + "Name": "100001_at", + "P-Value": 0.118, + "SE": 0.023595817125580502, + "Symbol": "Cd3g" + }, + { + "Additive": 0.0317847222222219, + "Aliases": "Intin3; Itih-3; AW108094", + "Chr": "14", + "Description": "inter-alpha trypsin inhibitor, heavy chain 3", + "Id": 2, + "LRS": 8.37046436677732, + "Locus": "rsm10000013342", + "Mb": 30.908741, + "Mean": 7.82323333333333, + "Name": "100002_at", + "P-Value": 0.561, + "SE": 0.011720083297057399, + "Symbol": "Itih3" + } +] +``` + +Return trait by probe + +``` +curl -s https://genenetwork.org/api/v_pre1/trait//HC_U_0304_R/104617_at.json|jq +{ + "additive": -0.0515941964285714, + "alias": "AI182092; 0610005C13Rik; 0610005C13Rik-205", + "chr": "7", + "description": "RIKEN cDNA 0610005C13 protein (high kidney and liver expression)_", + "id": 3690, + "locus": "rsm10000026692", + "lrs": 11.3682286632142, + "mb": 45.568173, + "mean": 8.165623333333329, + "name": "104617_at", + "p_value": 0.666, + "se": 0.0170213555407089, + "symbol": "0610005C13Rik" +} +``` + +## Return QTL + +Return the QTL (one or more) for a trait. + +* Proposed: http://genenetwork.org/api/v_pre1/mouse/bxd/HC_M2_0606_P/1436869_at/qtl + +There is also the question of more complex queries, such as with covariates. + +## Mapping + +Return mapping results through the API. + +``` +curl -s "https://genenetwork.org/api/v_pre1/mapping?trait_id=10015&db=BXDPublish&method=rqtl&limit_to=10"|jq ".[0:3]" +[ + { + "Mb": 3.010274, + "cM": 3.010274, + "chr": 1, + "lod_score": 0.116927114593807, + "name": "rs31443144" + }, + { + "Mb": 3.492195, + "cM": 3.492195, + "chr": 1, + "lod_score": 0.117404479202946, + "name": "rs6269442" + }, + { + "Mb": 3.511204, + "cM": 3.511204, + "chr": 1, + "lod_score": 0.11742354952122, + "name": "rs32285189" + } +] +``` -- cgit v1.2.3