summaryrefslogtreecommitdiff
path: root/issues/add-mouse-data-from-klaus.gmi
blob: de66143fd1cc5e56e4fb3be86d7d54e0ad67eea2 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
# Add mouse data-set

## Tags

* assigned: bonfacem
* priority: medium
* status: stalled

### Description

Klaus' recently shared with us some mouse data.  Here's a snip of how that looks like:

```
mouse_ID                   BW       day         strain	sex    inf_dose	animal.no.
241	CC001_m_1	100	perc_d00	CC001	m	10 FFU	1
242	CC001_m_1	98.56	perc_d03	CC001	m	10 FFU	1
243	CC001_m_1	NA	perc_d13	CC001	m	10 FFU	1
244	CC001_m_1	NA	perc_d12	CC001	m	10 FFU	1
245	CC001_m_1	NA	perc_d10	CC001	m	10 FFU	1
246	CC001_m_1	100.92	perc_d04	CC001	m	10 FFU	1
247	CC001_m_1	98.08	perc_d01	CC001	m	10 FFU	1
248	CC001_m_1	76.21	perc_d08	CC001	m	10 FFU	1
249	CC001_m_1	93.22	perc_d05	CC001	m	10 FFU	1
250	CC001_m_1	90.42	perc_d06	CC001	m	10 FFU	1
```

I've been working on adding the above to the GN2 database.  The current challenge I have is that this data is Time Series---for the same strain, we have values indexed by time.  Also, we tag data by "animal.no." and "sex".  So for a male version of "CC001" with animal number 1, we have "CC001_m_1".  This is a problem---storing TS data---that Rob/Suheeta have highlighted in the past.  How do we go about doing this?  Currently, in GN2 we store averages of the aforementioned data.  This doesn't work out well for us: we don't have, AFAIU, a concept for "animal.no."  I would suggest we use lmdb to store this data, and work out a way to integrate it with the rest of GN2 - so that we display this info on the main page.

Here's how to extract the data from the provided data-set:

Just extract the data for d1, d2, d3 separately and use each day as a separate data set.

```
> unique(dat2$day)
[1] d0 d1 d2 d3
Levels: d0 d1 d2 d3

> table(dat2$day)
d0 d1 d2 d3 
44 44 44 44

dat10 <- subset(dat2,dat2$day=="d1")
dat10

> dat10
    mouse_ID        BW day
45  BXD 50_3  94.85000  d1
46  BXD 64_1  96.36000  d1
47  BXD 29_1  96.85000  d1
48  BXD 40_3  97.69000  d1
49  BXD 49_2  97.06000  d1
50   BXD 6_5  89.03000  d1
[...]
```

Some comments from Zach:

```
I think that Klaus is referring to what we store in GN as phenotype traits.
So you'd have a separate trait page for each time series "step".'

He's probably referring to these traits:
Day 1 - https://genenetwork.org/show_trait?trait_id=13005&dataset=BXDPublish
Day 2 - https://genenetwork.org/show_trait?trait_id=13006&dataset=BXDPublish
And continues from there - you can see them with the following search (with
a few other random traits mixed in; I first just searched for "Schughart"
in the global search) -
https://genenetwork.org/gsearch?type=phenotype&terms=H1N1

 You're correct about there not being a (good) way to deal with something
like animal number currently. The way we deal with something like that is
to create a new group, with the "strain list" being a list of individuals.
```

Current script that enters data into gn2:

=> https://github.com/genenetwork/dump-genenetwork-database/blob/master/csv-dump.scm

Remaining tasks:

* [ ] Share latest changes.
* [ ] Test the script in a copy of the production database.
* [ ] Make this more generic
* [ ] Integrate with GN2

The last 2 tasks will/should probably be broken down into smaller issues.