summary refs log tree commit diff
diff options
context:
space:
mode:
authorBonfaceKilz2021-08-09 10:15:06 +0300
committerBonfaceKilz2021-08-09 10:15:06 +0300
commit11b2e44d5d638427c51cdcaa423fcba7794721cf (patch)
tree135151e4eefa1ecec6c13a5c2cdcb6f12a7d50bb
parent0e18e0ca70e9ab6664d6e8378ad5c69b2f2e05bf (diff)
downloadgn-gemtext-11b2e44d5d638427c51cdcaa423fcba7794721cf.tar.gz
Create issue on fetching sample data from a given trait
-rw-r--r--issues/fetch-sample-data.gmi61
1 files changed, 61 insertions, 0 deletions
diff --git a/issues/fetch-sample-data.gmi b/issues/fetch-sample-data.gmi
new file mode 100644
index 0000000..c467925
--- /dev/null
+++ b/issues/fetch-sample-data.gmi
@@ -0,0 +1,61 @@
+# 
+#### Fetch all Sample Data
+
+Currently we fetch all the sample data using this function:
+
+```
+def get_trait_csv_sample_data(conn: Any,
+                              trait_name: int, phenotype_id: int):
+    """Fetch a trait and return it as a csv string"""
+    sql = ("SELECT DISTINCT Strain.Id, PublishData.Id, Strain.Name, "
+           "PublishData.value, "
+           "PublishSE.error, NStrain.count FROM "
+           "(PublishData, Strain, PublishXRef, PublishFreeze) "
+           "LEFT JOIN PublishSE ON "
+           "(PublishSE.DataId = PublishData.Id AND "
+           "PublishSE.StrainId = PublishData.StrainId) "
+           "LEFT JOIN NStrain ON (NStrain.DataId = PublishData.Id AND "
+           "NStrain.StrainId = PublishData.StrainId) WHERE "
+           "PublishXRef.InbredSetId = PublishFreeze.InbredSetId AND "
+           "PublishData.Id = PublishXRef.DataId AND "
+           "PublishXRef.Id = %s AND PublishXRef.PhenotypeId = %s "
+           "AND PublishData.StrainId = Strain.Id Order BY Strain.Name")
+    csv_data = ["Strain Id,Strain Name,Value,SE,Count"]
+    publishdata_id = ""
+    with conn.cursor() as cursor:
+        cursor.execute(sql, (trait_name, phenotype_id,))
+        for record in cursor.fetchall():
+            (strain_id, publishdata_id,
+             strain_name, value, error, count) = record
+            csv_data.append(
+                ",".join([str(val) if val else "x"
+                          for val in (strain_id, strain_name,
+                                      value, error, count)]))
+    return f"# Publish Data Id: {publishdata_id}\n\n" + "\n".join(csv_data)
+```
+
+
+Sometimes there are situations where we want to display sample data
+that isn't part of the "main" sample list (which is the one in the
+.geno file).  In this page[0][1] you see the samples split into
+"Primary" and "Other." In the code(genenetwork2), it does a DB query
+for all sample data and just puts all the sample data that isn't in
+the "primary" list (as taken from the .geno file) into the Other
+table.
+
+It's ridiculous how we pull the sample list from the .geno file. In
+the past this was done using qtlreaper's Python module to read in the
+.geno file - the functions in
+wqflask/maintenance/get_group_samplelists were written when we were
+removing all use of the old qtlreaper.
+
+Parents/F1s are also a situation where we need to come up with some
+alternative way to store them, and ideally in a way that is more
+flexible and allows for other situations (like 4-way crosses -
+currently the code always just assumes 2 parents and 2 F1s). They're
+currently hard-coded in webqtlUtil.py.
+
+[0]
+
+[1] * Tasks, Guix
+:ARCHIVE: /home/bonface/Self/org/archive/2021_guix.org_archive::