summaryrefslogtreecommitdiff
path: root/topics/gn-uploader/adding-species.gmi
blob: 8a57fa8987bb40150902d50c63e0f7a20ce749d2 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
## Tags

* status: open
* priority: medium
* type: documentation
* assigned: fredm, flisso, acenteno
* keywords: doc, documentation, platform, gn-uploader, uploader, adding data

## Description

We do not have a UI for adding new species. In this case, we need to add the data manually by running a query.

As an example, I document how I added the C. elegans species to the https://staging.genenetwork.org server:

We begin by looking at the schema for the Species table:

```
MariaDB [db_webqtl]> DESC Species;
+---------------+----------------------+------+-----+---------+----------------+
| Field         | Type                 | Null | Key | Default | Extra          |
+---------------+----------------------+------+-----+---------+----------------+
| Id            | smallint(5) unsigned | NO   | PRI | NULL    | auto_increment |
| SpeciesId     | int(5)               | YES  |     | NULL    |                |
| SpeciesName   | varchar(50)          | YES  |     | NULL    |                |
| Name          | char(30)             | NO   | MUL |         |                |
| MenuName      | char(50)             | YES  |     | NULL    |                |
| FullName      | char(100)            | NO   |     |         |                |
| Family        | varchar(50)          | YES  |     | NULL    |                |
| FamilyOrderId | smallint(6)          | YES  |     | NULL    |                |
| TaxonomyId    | int(11)              | YES  |     | NULL    |                |
| OrderId       | smallint(6)          | YES  |     | NULL    |                |
+---------------+----------------------+------+-----+---------+----------------+
10 rows in set (0.00 sec)
```

From
=> https://gn1.genenetwork.org/webqtl/main.py?FormID=schemaShowPage#Species the schema docs
and looking at the data, we know the following about the fields:

* Id: Internal GeneNetwork species identifier
* SpeciesId: Same as 'Id'
* SpeciesName: A layman's name for the species, e.g. Mouse, Roundworm, Barley, etc.
* Name: Mostly the same as SpeciesName, but in all lowercase
* MenuName: A display name to show on the UI - computed for 'SpeciesName' and 'FullName'
* FullName: The species binomial name (scientific name) e.g. Mus musculus, Caenorhabditis elegans, etc.
* Family: A loose, largish grouping for the species, e.g. Vertebrates, Plants, etc.
* FamilyOrderId: ???
* TaxonomyId: The Taxonomy ID for the species on https://www.ncbi.nlm.nih.gov/
* OrderId: ???

### Preparing the Data

With this in mind, we can now get any missing data for the species.

To find the Taxonomy ID value:

=> https://www.ncbi.nlm.nih.gov/ Go to NCBI
* In the "All Databases" drop-down, select "Taxonomy"
* In the search box, enter the species' FullName, e.g. Caenorhabditis elegans
* Click "Search"
* If the species exists, a link with the species' binomial name shows up. Click it.
* Near the top of the page, you should see something like " Taxonomy ID: 6239 (for references in articles please use NCBI:txid6239)". The number after the "Taxonomy ID: " is what you want.

***TODO: Figure out what FamilyOrderId and OrderId are, and how to get the data. Update this page.***

### Adding the Species to the Database

We can now run the query to insert the data:

```
INSERT INTO Species(SpeciesName,Name,MenuName,FullName,Family,FamilyOrderId,TaxonomyId,OrderId)
VALUES ("Roundworm", "roundworm", "Roundworm (C. elegans)", "Caenorhabditis elegans", "Nematodes", NULL, "6239", NULL);
```

now update the "SpeciesId":

```
UPDATE Species SET SpeciesId=Id WHERE SpeciesName="Roundworm";
```