Update issue on dumping sample data to LMDB

Signed-off-by: Munyoki Kilyungi <me@bonfacemunyoki.com>
author: Munyoki Kilyungi 2023-05-03 19:42:57 +0300
committer: Munyoki Kilyungi 2023-05-03 19:42:57 +0300
commit: 0dc9584a9c464df7c150eefaf4c70fe4cf7b3db5 (patch)
tree: e5e223d6492a99a95b45e8f3afa2325d1b7a7b42 /issues
parent: 9e19f27929ed20f52394b65f21db0084b7ff8235 (diff)
download: gn-gemtext-0dc9584a9c464df7c150eefaf4c70fe4cf7b3db5.tar.gz
1 files changed, 20 insertions, 7 deletions
diff --git a/issues/dump-sample-data-to-lmdb.gmi b/issues/dump-sample-data-to-lmdb.gmi
index 18ac808..d87c3f3 100644
--- a/issues/dump-sample-data-to-lmdb.gmi
+++ b/issues/dump-sample-data-to-lmdb.gmi
@@ -2,13 +2,9 @@
 
 * assigned: bonfacem
 * priority: high
-* status: in progress
+* status: stalled
 * keywords: lmdb, rdf
 
-## Description
-
-For GeneNetwork2, a dataset is made up of multiple traits, each with its own sample data. The trait's name is a combination of the species name and the trait's ID (for genotypes/probesets this may not be the case), which is obtained from a SQL table. The objective of this task is to store each trait's sample data in LMDB, allowing it to be accessed quickly in GN2/3 via RDF, which will decouple the data from the python-base classes/objects it is associated with, significantly improving sample data access speed.
-
 ## Tasks
 Dump data and add relevant RDF Metadata for LMDB for:
 
@@ -16,8 +12,11 @@ Dump data and add relevant RDF Metadata for LMDB for:
 * [ ] probesets
 * [ ] genotypes
 * [ ] GN2/3 Integration
+* [ ] Have files and named files available through RDF
 
-## General Notes
+## Description
+
+For GeneNetwork2, a dataset is made up of multiple traits, each with its own sample data. The trait's name is a combination of the species name and the trait's ID (for genotypes/probesets this may not be the case), which is obtained from a SQL table. The objective of this task is to store each trait's sample data in LMDB, allowing it to be accessed quickly in GN2/3 via RDF, which will decouple the data from the python-base classes/objects it is associated with, significantly improving sample data access speed.
 
 To fetch all data, including case-attributes data, for published phenotypes in SQL (using BXD_10007 as an example), you would use the following:
 
@@ -41,4 +40,18 @@ GROUP BY InbredSetId, cxref.StrainId) B ON A.StrainId = B.StrainId;
 
 See this answer for how a join was performed on 2 different queries:
 
-=> https://dba.stackexchange.com/questions/146509/joining-results-of-two-mysql-queries
+=> https://dba.stackexchange.com/questions/146509/joining-results-of-two-mysql-queries Joining results of two mysql queries
+
+#### Correlations
+
+Correlations are slow.  As of Tuesday April 4, 2023 at 1:37pm:
+
+GN1 took *29 sec* (completed) vs GN2 *38 sec* (completed) for 1457545_at in the Hippocampus Consortium M430v2 (Jun06) PDNN
+
+=> http://genenetwork.org/webqtl/main.py?FormID=sharinginfo&GN_AccessionId=112 Hippocampus Consortium M430v2 (Jun06) PDNN
+
+GN1 took *1.56 mins* (completed) vs GN2 *5.24 mins* (completed)  for 10528873 in the UTHSC BXD Aged Hippocampus Affy Mouse Gene 1.0 ST (Sep12) RMA Exon Level
+
+=> https://gn1.genenetwork.org/webqtl/main.py?FormID=sharinginfo&InfoPageName=UTHSC_BXD_H_0912 UTHSC BXD Aged Hippocampus Affy Mouse Gene 1.0 ST (Sep12) RMA Exon Level
+
+Research Question:  What effect would using LMDB have on correlations over text-file caching and sql fetches?
author	Munyoki Kilyungi	2023-05-03 19:42:57 +0300
committer	Munyoki Kilyungi	2023-05-03 19:42:57 +0300
commit	0dc9584a9c464df7c150eefaf4c70fe4cf7b3db5 (patch)
tree	e5e223d6492a99a95b45e8f3afa2325d1b7a7b42 /issues
parent	9e19f27929ed20f52394b65f21db0084b7ff8235 (diff)
download	gn-gemtext-0dc9584a9c464df7c150eefaf4c70fe4cf7b3db5.tar.gz