blob: de66143fd1cc5e56e4fb3be86d7d54e0ad67eea2 (
plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
|
# Add mouse data-set
## Tags
* assigned: bonfacem
* priority: medium
* status: stalled
### Description
Klaus' recently shared with us some mouse data. Here's a snip of how that looks like:
```
mouse_ID BW day strain sex inf_dose animal.no.
241 CC001_m_1 100 perc_d00 CC001 m 10 FFU 1
242 CC001_m_1 98.56 perc_d03 CC001 m 10 FFU 1
243 CC001_m_1 NA perc_d13 CC001 m 10 FFU 1
244 CC001_m_1 NA perc_d12 CC001 m 10 FFU 1
245 CC001_m_1 NA perc_d10 CC001 m 10 FFU 1
246 CC001_m_1 100.92 perc_d04 CC001 m 10 FFU 1
247 CC001_m_1 98.08 perc_d01 CC001 m 10 FFU 1
248 CC001_m_1 76.21 perc_d08 CC001 m 10 FFU 1
249 CC001_m_1 93.22 perc_d05 CC001 m 10 FFU 1
250 CC001_m_1 90.42 perc_d06 CC001 m 10 FFU 1
```
I've been working on adding the above to the GN2 database. The current challenge I have is that this data is Time Series---for the same strain, we have values indexed by time. Also, we tag data by "animal.no." and "sex". So for a male version of "CC001" with animal number 1, we have "CC001_m_1". This is a problem---storing TS data---that Rob/Suheeta have highlighted in the past. How do we go about doing this? Currently, in GN2 we store averages of the aforementioned data. This doesn't work out well for us: we don't have, AFAIU, a concept for "animal.no." I would suggest we use lmdb to store this data, and work out a way to integrate it with the rest of GN2 - so that we display this info on the main page.
Here's how to extract the data from the provided data-set:
Just extract the data for d1, d2, d3 separately and use each day as a separate data set.
```
> unique(dat2$day)
[1] d0 d1 d2 d3
Levels: d0 d1 d2 d3
> table(dat2$day)
d0 d1 d2 d3
44 44 44 44
dat10 <- subset(dat2,dat2$day=="d1")
dat10
> dat10
mouse_ID BW day
45 BXD 50_3 94.85000 d1
46 BXD 64_1 96.36000 d1
47 BXD 29_1 96.85000 d1
48 BXD 40_3 97.69000 d1
49 BXD 49_2 97.06000 d1
50 BXD 6_5 89.03000 d1
[...]
```
Some comments from Zach:
```
I think that Klaus is referring to what we store in GN as phenotype traits.
So you'd have a separate trait page for each time series "step".'
He's probably referring to these traits:
Day 1 - https://genenetwork.org/show_trait?trait_id=13005&dataset=BXDPublish
Day 2 - https://genenetwork.org/show_trait?trait_id=13006&dataset=BXDPublish
And continues from there - you can see them with the following search (with
a few other random traits mixed in; I first just searched for "Schughart"
in the global search) -
https://genenetwork.org/gsearch?type=phenotype&terms=H1N1
You're correct about there not being a (good) way to deal with something
like animal number currently. The way we deal with something like that is
to create a new group, with the "strain list" being a list of individuals.
```
Current script that enters data into gn2:
=> https://github.com/genenetwork/dump-genenetwork-database/blob/master/csv-dump.scm
Remaining tasks:
* [ ] Share latest changes.
* [ ] Test the script in a copy of the production database.
* [ ] Make this more generic
* [ ] Integrate with GN2
The last 2 tasks will/should probably be broken down into smaller issues.
|