From 71a4fb1dacc5c68c42694907a218f6f763f29762 Mon Sep 17 00:00:00 2001
From: Pjotr Prins
Date: Thu, 16 Mar 2023 12:08:59 +0100
Subject: Starting on gemma precompute

---
 issues/gemma/databases-getting-out-of-wack.gmi | 25 +++++++++++-------
 issues/gemma/lmm-precomputed-scores.gmi        | 36 ++++++++++++++++++++++++++
 2 files changed, 51 insertions(+), 10 deletions(-)
 create mode 100644 issues/gemma/lmm-precomputed-scores.gmi

diff --git a/issues/gemma/databases-getting-out-of-wack.gmi b/issues/gemma/databases-getting-out-of-wack.gmi
index a102846..45aa6c9 100644
--- a/issues/gemma/databases-getting-out-of-wack.gmi
+++ b/issues/gemma/databases-getting-out-of-wack.gmi
@@ -1,25 +1,33 @@
 # Databases Getting Out of Wack
 
+This issue refers to precomputed scores generated by the ancient reaper module that runs as a script:
+
+=> https://github.com/genenetwork/genenetwork2/blob/testing/scripts/maintenance/QTL_Reaper_v6.py
+
+We'll create a new issue:
+
+=> lmm-precomputed-scores.gmi
+
 ## Tags
 
-* assigned: Bonface, Zach, jgart
+* assigned: pjotrp
 * priority: high
-* type: bug
+* type: bug, enhancement
 * status: unclear
 * keywords: database, gemma, reaper
 
 ## Let's use Gemma instead of Reaper
 
 Zachary:
-> If we're using GEMMA, we'll need to recalculate all other trait Max LRS scores using 
-> GEMMA as well (so I think we should just do this with qtlreaper for now). Otherwise 
+> If we're using GEMMA, we'll need to recalculate all other trait Max LRS scores using
+> GEMMA as well (so I think we should just do this with qtlreaper for now). Otherwise
 > we'll just have a bunch of qtlreaper scores mixed with GEMMA scores without the user
-> having any way of knowing the difference. Also, storing the full results (what Rob 
-> calls the "full vector model") will require some sort of fundamental change to the 
+> having any way of knowing the difference. Also, storing the full results (what Rob
+> calls the "full vector model") will require some sort of fundamental change to the
 > way we store this data and should be postponed for later (since the high priority
 > immediate issue is to ensure that the stored Max LRS values aren't wrong)
 
-> As for Mean, that should be simple, since it's just taking the average of sample values 
+> As for Mean, that should be simple, since it's just taking the average of sample values
 > immediately after an update.
 
 > The main thing I'm uncertain how to do (though I know is possible since Bonface already
@@ -39,6 +47,3 @@ Rob:
 > BXD vector results and store that as a big juicy TRANSFORMATIVE blob of
 > data. A big paper in doing just that. Reaper is just wrong at this
 > point. We have LMM: We should use it.
-
-
-
diff --git a/issues/gemma/lmm-precomputed-scores.gmi b/issues/gemma/lmm-precomputed-scores.gmi
new file mode 100644
index 0000000..1a86e12
--- /dev/null
+++ b/issues/gemma/lmm-precomputed-scores.gmi
@@ -0,0 +1,36 @@
+# LMM precomputed scores
+
+Here we will track introducing a new precomputation of scores for GN. Interestingly, this ties in with our xapian search and fast querying of value ranges.
+
+# Tags
+
+* assigned: pjotrp
+* priority: high
+* type: bug, enhancement
+* status: unclear
+* keywords: database, gemma, reaper
+
+# Old storage
+
+The old reaper scores are in
+
+```
+MariaDB [db_webqtl]> select ProbeSetId,ProbeSetFreezeId,Locus,LRS,additive from ProbeSetXRef limit 10;
++------------+------------------+----------------+------------------+--------------------+
+| ProbeSetId | ProbeSetFreezeId | Locus          | LRS              | additive           |
++------------+------------------+----------------+------------------+--------------------+
+|          1 |                1 | rs13480619     |  12.590069931048 |        -0.28515625 |
+|          2 |                1 | rs29535974     | 10.5970737900941 | -0.116783333333333 |
+|          3 |                1 | rs49742109     |  6.0970532702754 |  0.112957489878542 |
+|          4 |                1 | rsm10000002321 | 11.7748675511731 | -0.157113725490196 |
+|          5 |                1 | rsm10000019445 | 10.9232633740162 |  0.114764705882353 |
+|          6 |                1 | rsm10000017255 | 8.45741703245224 | -0.200034412955466 |
+|          7 |                1 | rs4178505      | 7.46477918183565 |  0.104331983805668 |
+|          8 |                1 | rsm10000144086 | 12.1201771258006 | -0.134278431372548 |
+|          9 |                1 | rsm10000014965 | 11.8837168740735 |  0.341458333333334 |
+|         10 |                1 | rsm10000020208 | 10.2809848009836 | -0.173866666666667 |
++------------+------------------+----------------+------------------+--------------------+
+10 rows in set (0.000 sec)
+```
+
+This means for every dataset one single maximum score gets stored (?)
-- 
cgit v1.2.3