summaryrefslogtreecommitdiff
path: root/issues/fetch-sample-data.gmi
blob: 6b1621e717f36d72ddde7ca71db5fc0adb7efb01 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
#

### Tags

* assigned: bonifacem
* keywords: sample data, GN2, GN3
* status: closed
* priority: medium

#### Fetch all Sample Data

Currently we fetch all the sample data using this function:

```
def get_trait_csv_sample_data(conn: Any,
                              trait_name: int, phenotype_id: int):
    """Fetch a trait and return it as a csv string"""
    sql = ("SELECT DISTINCT Strain.Id, PublishData.Id, Strain.Name, "
           "PublishData.value, "
           "PublishSE.error, NStrain.count FROM "
           "(PublishData, Strain, PublishXRef, PublishFreeze) "
           "LEFT JOIN PublishSE ON "
           "(PublishSE.DataId = PublishData.Id AND "
           "PublishSE.StrainId = PublishData.StrainId) "
           "LEFT JOIN NStrain ON (NStrain.DataId = PublishData.Id AND "
           "NStrain.StrainId = PublishData.StrainId) WHERE "
           "PublishXRef.InbredSetId = PublishFreeze.InbredSetId AND "
           "PublishData.Id = PublishXRef.DataId AND "
           "PublishXRef.Id = %s AND PublishXRef.PhenotypeId = %s "
           "AND PublishData.StrainId = Strain.Id Order BY Strain.Name")
    csv_data = ["Strain Id,Strain Name,Value,SE,Count"]
    publishdata_id = ""
    with conn.cursor() as cursor:
        cursor.execute(sql, (trait_name, phenotype_id,))
        for record in cursor.fetchall():
            (strain_id, publishdata_id,
             strain_name, value, error, count) = record
            csv_data.append(
                ",".join([str(val) if val else "x"
                          for val in (strain_id, strain_name,
                                      value, error, count)]))
    return f"# Publish Data Id: {publishdata_id}\n\n" + "\n".join(csv_data)
```


Sometimes there are situations where we want to display sample data
that isn't part of the "main" sample list (which is the one in the
.geno file).  In this page[0][1] you see the samples split into
"Primary" and "Other." In the code(genenetwork2), it does a DB query
for all sample data and just puts all the sample data that isn't in
the "primary" list (as taken from the .geno file) into the Other
table.

It's ridiculous how we pull the sample list from the .geno file. In
the past this was done using qtlreaper's Python module to read in the
.geno file - the functions in
wqflask/maintenance/get_group_samplelists were written when we were
removing all use of the old qtlreaper.

Parents/F1s are also a situation where we need to come up with some
alternative way to store them, and ideally in a way that is more
flexible and allows for other situations (like 4-way crosses -
currently the code always just assumes 2 parents and 2 F1s). They're
currently hard-coded in webqtlUtil.py.

[0]

[1] * Tasks, Guix
:ARCHIVE: /home/bonface/Self/org/archive/2021_guix.org_archive::

#### Wed 25 May 2022 21:59:52 EAT

Closing this because this issue is stale. Also, the sample data is fetched just fine.