diff options
author | BonfaceKilz | 2021-08-09 10:15:06 +0300 |
---|---|---|
committer | BonfaceKilz | 2021-08-09 10:15:06 +0300 |
commit | 11b2e44d5d638427c51cdcaa423fcba7794721cf (patch) | |
tree | 135151e4eefa1ecec6c13a5c2cdcb6f12a7d50bb /issues | |
parent | 0e18e0ca70e9ab6664d6e8378ad5c69b2f2e05bf (diff) | |
download | gn-gemtext-11b2e44d5d638427c51cdcaa423fcba7794721cf.tar.gz |
Create issue on fetching sample data from a given trait
Diffstat (limited to 'issues')
-rw-r--r-- | issues/fetch-sample-data.gmi | 61 |
1 files changed, 61 insertions, 0 deletions
diff --git a/issues/fetch-sample-data.gmi b/issues/fetch-sample-data.gmi new file mode 100644 index 0000000..c467925 --- /dev/null +++ b/issues/fetch-sample-data.gmi @@ -0,0 +1,61 @@ +# +#### Fetch all Sample Data + +Currently we fetch all the sample data using this function: + +``` +def get_trait_csv_sample_data(conn: Any, + trait_name: int, phenotype_id: int): + """Fetch a trait and return it as a csv string""" + sql = ("SELECT DISTINCT Strain.Id, PublishData.Id, Strain.Name, " + "PublishData.value, " + "PublishSE.error, NStrain.count FROM " + "(PublishData, Strain, PublishXRef, PublishFreeze) " + "LEFT JOIN PublishSE ON " + "(PublishSE.DataId = PublishData.Id AND " + "PublishSE.StrainId = PublishData.StrainId) " + "LEFT JOIN NStrain ON (NStrain.DataId = PublishData.Id AND " + "NStrain.StrainId = PublishData.StrainId) WHERE " + "PublishXRef.InbredSetId = PublishFreeze.InbredSetId AND " + "PublishData.Id = PublishXRef.DataId AND " + "PublishXRef.Id = %s AND PublishXRef.PhenotypeId = %s " + "AND PublishData.StrainId = Strain.Id Order BY Strain.Name") + csv_data = ["Strain Id,Strain Name,Value,SE,Count"] + publishdata_id = "" + with conn.cursor() as cursor: + cursor.execute(sql, (trait_name, phenotype_id,)) + for record in cursor.fetchall(): + (strain_id, publishdata_id, + strain_name, value, error, count) = record + csv_data.append( + ",".join([str(val) if val else "x" + for val in (strain_id, strain_name, + value, error, count)])) + return f"# Publish Data Id: {publishdata_id}\n\n" + "\n".join(csv_data) +``` + + +Sometimes there are situations where we want to display sample data +that isn't part of the "main" sample list (which is the one in the +.geno file). In this page[0][1] you see the samples split into +"Primary" and "Other." In the code(genenetwork2), it does a DB query +for all sample data and just puts all the sample data that isn't in +the "primary" list (as taken from the .geno file) into the Other +table. + +It's ridiculous how we pull the sample list from the .geno file. In +the past this was done using qtlreaper's Python module to read in the +.geno file - the functions in +wqflask/maintenance/get_group_samplelists were written when we were +removing all use of the old qtlreaper. + +Parents/F1s are also a situation where we need to come up with some +alternative way to store them, and ideally in a way that is more +flexible and allows for other situations (like 4-way crosses - +currently the code always just assumes 2 parents and 2 F1s). They're +currently hard-coded in webqtlUtil.py. + +[0] + +[1] * Tasks, Guix +:ARCHIVE: /home/bonface/Self/org/archive/2021_guix.org_archive:: |