summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--issues/gemma/databases-getting-out-of-wack.gmi25
-rw-r--r--issues/gemma/lmm-precomputed-scores.gmi36
2 files changed, 51 insertions, 10 deletions
diff --git a/issues/gemma/databases-getting-out-of-wack.gmi b/issues/gemma/databases-getting-out-of-wack.gmi
index a102846..45aa6c9 100644
--- a/issues/gemma/databases-getting-out-of-wack.gmi
+++ b/issues/gemma/databases-getting-out-of-wack.gmi
@@ -1,25 +1,33 @@
# Databases Getting Out of Wack
+This issue refers to precomputed scores generated by the ancient reaper module that runs as a script:
+
+=> https://github.com/genenetwork/genenetwork2/blob/testing/scripts/maintenance/QTL_Reaper_v6.py
+
+We'll create a new issue:
+
+=> lmm-precomputed-scores.gmi
+
## Tags
-* assigned: Bonface, Zach, jgart
+* assigned: pjotrp
* priority: high
-* type: bug
+* type: bug, enhancement
* status: unclear
* keywords: database, gemma, reaper
## Let's use Gemma instead of Reaper
Zachary:
-> If we're using GEMMA, we'll need to recalculate all other trait Max LRS scores using
-> GEMMA as well (so I think we should just do this with qtlreaper for now). Otherwise
+> If we're using GEMMA, we'll need to recalculate all other trait Max LRS scores using
+> GEMMA as well (so I think we should just do this with qtlreaper for now). Otherwise
> we'll just have a bunch of qtlreaper scores mixed with GEMMA scores without the user
-> having any way of knowing the difference. Also, storing the full results (what Rob
-> calls the "full vector model") will require some sort of fundamental change to the
+> having any way of knowing the difference. Also, storing the full results (what Rob
+> calls the "full vector model") will require some sort of fundamental change to the
> way we store this data and should be postponed for later (since the high priority
> immediate issue is to ensure that the stored Max LRS values aren't wrong)
-> As for Mean, that should be simple, since it's just taking the average of sample values
+> As for Mean, that should be simple, since it's just taking the average of sample values
> immediately after an update.
> The main thing I'm uncertain how to do (though I know is possible since Bonface already
@@ -39,6 +47,3 @@ Rob:
> BXD vector results and store that as a big juicy TRANSFORMATIVE blob of
> data. A big paper in doing just that. Reaper is just wrong at this
> point. We have LMM: We should use it.
-
-
-
diff --git a/issues/gemma/lmm-precomputed-scores.gmi b/issues/gemma/lmm-precomputed-scores.gmi
new file mode 100644
index 0000000..1a86e12
--- /dev/null
+++ b/issues/gemma/lmm-precomputed-scores.gmi
@@ -0,0 +1,36 @@
+# LMM precomputed scores
+
+Here we will track introducing a new precomputation of scores for GN. Interestingly, this ties in with our xapian search and fast querying of value ranges.
+
+# Tags
+
+* assigned: pjotrp
+* priority: high
+* type: bug, enhancement
+* status: unclear
+* keywords: database, gemma, reaper
+
+# Old storage
+
+The old reaper scores are in
+
+```
+MariaDB [db_webqtl]> select ProbeSetId,ProbeSetFreezeId,Locus,LRS,additive from ProbeSetXRef limit 10;
++------------+------------------+----------------+------------------+--------------------+
+| ProbeSetId | ProbeSetFreezeId | Locus | LRS | additive |
++------------+------------------+----------------+------------------+--------------------+
+| 1 | 1 | rs13480619 | 12.590069931048 | -0.28515625 |
+| 2 | 1 | rs29535974 | 10.5970737900941 | -0.116783333333333 |
+| 3 | 1 | rs49742109 | 6.0970532702754 | 0.112957489878542 |
+| 4 | 1 | rsm10000002321 | 11.7748675511731 | -0.157113725490196 |
+| 5 | 1 | rsm10000019445 | 10.9232633740162 | 0.114764705882353 |
+| 6 | 1 | rsm10000017255 | 8.45741703245224 | -0.200034412955466 |
+| 7 | 1 | rs4178505 | 7.46477918183565 | 0.104331983805668 |
+| 8 | 1 | rsm10000144086 | 12.1201771258006 | -0.134278431372548 |
+| 9 | 1 | rsm10000014965 | 11.8837168740735 | 0.341458333333334 |
+| 10 | 1 | rsm10000020208 | 10.2809848009836 | -0.173866666666667 |
++------------+------------------+----------------+------------------+--------------------+
+10 rows in set (0.000 sec)
+```
+
+This means for every dataset one single maximum score gets stored (?)