diff options
author | Munyoki Kilyungi | 2022-12-15 19:42:13 +0300 |
---|---|---|
committer | Munyoki Kilyungi | 2022-12-15 19:42:13 +0300 |
commit | 1ee54541862151bbaa65552a5c3058a0ebff6de6 (patch) | |
tree | a29d833f38d6082694dffc68031321b320373ff7 /issues/dump-sample-data-to-lmdb.gmi | |
parent | 126b14a652d737737f7136cfe03acee60ad893a3 (diff) | |
download | gn-gemtext-1ee54541862151bbaa65552a5c3058a0ebff6de6.tar.gz |
Dump sample data to LMDB for GN2/3
* issues/dump-sample-data-to-lmdb.gmi: New issue.
Diffstat (limited to 'issues/dump-sample-data-to-lmdb.gmi')
-rw-r--r-- | issues/dump-sample-data-to-lmdb.gmi | 44 |
1 files changed, 44 insertions, 0 deletions
diff --git a/issues/dump-sample-data-to-lmdb.gmi b/issues/dump-sample-data-to-lmdb.gmi new file mode 100644 index 0000000..18ac808 --- /dev/null +++ b/issues/dump-sample-data-to-lmdb.gmi @@ -0,0 +1,44 @@ +# Dump sample data to LMDB + +* assigned: bonfacem +* priority: high +* status: in progress +* keywords: lmdb, rdf + +## Description + +For GeneNetwork2, a dataset is made up of multiple traits, each with its own sample data. The trait's name is a combination of the species name and the trait's ID (for genotypes/probesets this may not be the case), which is obtained from a SQL table. The objective of this task is to store each trait's sample data in LMDB, allowing it to be accessed quickly in GN2/3 via RDF, which will decouple the data from the python-base classes/objects it is associated with, significantly improving sample data access speed. + +## Tasks +Dump data and add relevant RDF Metadata for LMDB for: + +* [-] [un-]published phenotypes +* [ ] probesets +* [ ] genotypes +* [ ] GN2/3 Integration + +## General Notes + +To fetch all data, including case-attributes data, for published phenotypes in SQL (using BXD_10007 as an example), you would use the following: + +=> http://genenetwork.org/show_trait?trait_id=10007&dataset=BXDPublish + +``` +SELECT A.Name, A.Value, A.SE, A.Count, B.CaseAttributes FROM +(SELECT DISTINCT st.Name as 'Name', ifnull(pd.value, 'x') as 'Value', +ifnull(ps.error, 'x') as 'SE', ifnull(ns.count, 'x') as 'Count', ps.StrainId as 'StrainId' +FROM PublishFreeze pf JOIN PublishXRef px ON px.InbredSetId = pf.InbredSetId +JOIN PublishData pd ON pd.Id = px.DataId JOIN Strain st ON pd.StrainId = st.Id +LEFT JOIN PublishSE ps ON ps.DataId = pd.Id AND ps.StrainId = pd.StrainId +LEFT JOIN NStrain ns ON ns.DataId = pd.Id AND ns.StrainId = pd.StrainId +WHERE px.PhenotypeId = 35 ORDER BY st.Name) A +JOIN +(SELECT cxref.StrainId as StrainId, group_concat(ca.Name, '=', cxref.Value) as "CaseAttributes" +FROM CaseAttributeXRefNew cxref LEFT JOIN CaseAttribute ca +ON ca.Id = cxref.CaseAttributeId +GROUP BY InbredSetId, cxref.StrainId) B ON A.StrainId = B.StrainId; +``` + +See this answer for how a join was performed on 2 different queries: + +=> https://dba.stackexchange.com/questions/146509/joining-results-of-two-mysql-queries |