api/upload.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92

# GeneNetwork upload API

## API

The REST API will accept gzipped tar ball which contains multiple
files:

1. A metadata file (JSON)
2. A phenotype file

On upload to the API the data gets unpacked into files in a temporary
directory. Next we compute an MD5SUM over the directory and it gets
renamed to the hash value. So, after unpacking, first we have

```
GENENETWORK_UPLOAD_DIR/tempdir/metadata.json
                              /phenotypes.tsv
```

The upload directory is set in the GN2
[config](https://github.com/genenetwork/genenetwork2/blob/testing/etc/default_settings.py). After
computing the HASH we rename it to

```
GENENETWORK_UPLOAD_DIR/e524ee7ea9b1f452c58abe560960a60f/metadata.json
                                                       /phenotypes.tsv
```

On success the upload REST API returns this HASH to the invoker:

```
{
  "status": 0,
  "token": "e524ee7ea9b1f452c58abe560960a60f"
}
```

On error the result should include the error output

```
{
  "status": 128
  "error": "gzip failed to unpack file"
}
```

### Metadata

The metadata file is a simple JSON file containing

```js
{
  "title": "This is my dataset for testing the REST API",
  "description": "Longer description"
  "date": "20210127",
  "authors": [
    "R.W. Williams"
  ],
  "cross": "BXD"
}
```

### Phenotype file

The phenotype file is a tab delimited 'spreadsheet' where the columns
contain phenotypes and the rows contain individuals. Example

```
      pheno
BXD01 5.060
BXD02 307.866
BXD03 185.400
BXD04 380.729
BXD05 150.066
BXD06 94.483
BXD07 438.700
BXD08 NA
BXD09 130.457
BXD10 184.900
BXD11 223.400
BXD12 167.250
BXD13 313.950
BXD14 219.383
BXD15 277.800
BXD16 6.467
BXD17 364.967
BXD18 132.016
BXD19 468.133
BXD20 309.500
```

Missing data are 'NA'. Multiple pheno columns are possible.