From 1c3f21477f536d413e3e13bced6af1878deba0ee Mon Sep 17 00:00:00 2001 From: Pjotr Prins Date: Wed, 27 Jan 2021 11:42:30 +0000 Subject: Uploader --- api/README.md | 5 ++++ api/upload.md | 92 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 97 insertions(+) create mode 100644 api/README.md create mode 100644 api/upload.md diff --git a/api/README.md b/api/README.md new file mode 100644 index 0000000..6bdc974 --- /dev/null +++ b/api/README.md @@ -0,0 +1,5 @@ +# GENENETWORK API + +This document is an index to the GeneNetwork API information. + +- [Upload API](./upload.md) diff --git a/api/upload.md b/api/upload.md new file mode 100644 index 0000000..62b828c --- /dev/null +++ b/api/upload.md @@ -0,0 +1,92 @@ +# GeneNetwork upload API + +## API + +The REST API will accept gzipped tar ball which contains multiple +files: + +1. A metadata file (JSON) +2. A phenotype file + +On upload to the API the data gets unpacked into files in a temporary +directory. Next we compute an MD5SUM over the directory and it gets +renamed to the hash value. So, after unpacking, first we have + +``` +GENENETWORK_UPLOAD_DIR/tempdir/metadata.json + /phenotypes.tsv +``` + +The upload directory is set in the GN2 +[config](https://github.com/genenetwork/genenetwork2/blob/testing/etc/default_settings.py). After +computing the HASH we rename it to + +``` +GENENETWORK_UPLOAD_DIR/e524ee7ea9b1f452c58abe560960a60f/metadata.json + /phenotypes.tsv +``` + +On success the upload REST API returns this HASH to the invoker: + +``` +{ + "status": 0, + "token": "e524ee7ea9b1f452c58abe560960a60f" +} +``` + +On error the result should include the error output + +``` +{ + "status": 128 + "error": "gzip failed to unpack file" +} +``` + +### Metadata + +The metadata file is a simple JSON file containing + +```js +{ + "title": "This is my dataset for testing the REST API", + "description": "Longer description" + "date": "20210127", + "authors": [ + "R.W. Williams" + ], + "cross": "BXD" +} +``` + +### Phenotype file + +The phenotype file is a tab delimited 'spreadsheet' where the columns +contain phenotypes and the rows contain individuals. Example + +``` + pheno +BXD01 5.060 +BXD02 307.866 +BXD03 185.400 +BXD04 380.729 +BXD05 150.066 +BXD06 94.483 +BXD07 438.700 +BXD08 NA +BXD09 130.457 +BXD10 184.900 +BXD11 223.400 +BXD12 167.250 +BXD13 313.950 +BXD14 219.383 +BXD15 277.800 +BXD16 6.467 +BXD17 364.967 +BXD18 132.016 +BXD19 468.133 +BXD20 309.500 +``` + +Missing data are 'NA'. Multiple pheno columns are possible. -- cgit v1.2.3