From 71a4fb1dacc5c68c42694907a218f6f763f29762 Mon Sep 17 00:00:00 2001 From: Pjotr Prins Date: Thu, 16 Mar 2023 12:08:59 +0100 Subject: Starting on gemma precompute --- issues/gemma/databases-getting-out-of-wack.gmi | 25 +++++++++++------- issues/gemma/lmm-precomputed-scores.gmi | 36 ++++++++++++++++++++++++++ 2 files changed, 51 insertions(+), 10 deletions(-) create mode 100644 issues/gemma/lmm-precomputed-scores.gmi diff --git a/issues/gemma/databases-getting-out-of-wack.gmi b/issues/gemma/databases-getting-out-of-wack.gmi index a102846..45aa6c9 100644 --- a/issues/gemma/databases-getting-out-of-wack.gmi +++ b/issues/gemma/databases-getting-out-of-wack.gmi @@ -1,25 +1,33 @@ # Databases Getting Out of Wack +This issue refers to precomputed scores generated by the ancient reaper module that runs as a script: + +=> https://github.com/genenetwork/genenetwork2/blob/testing/scripts/maintenance/QTL_Reaper_v6.py + +We'll create a new issue: + +=> lmm-precomputed-scores.gmi + ## Tags -* assigned: Bonface, Zach, jgart +* assigned: pjotrp * priority: high -* type: bug +* type: bug, enhancement * status: unclear * keywords: database, gemma, reaper ## Let's use Gemma instead of Reaper Zachary: -> If we're using GEMMA, we'll need to recalculate all other trait Max LRS scores using -> GEMMA as well (so I think we should just do this with qtlreaper for now). Otherwise +> If we're using GEMMA, we'll need to recalculate all other trait Max LRS scores using +> GEMMA as well (so I think we should just do this with qtlreaper for now). Otherwise > we'll just have a bunch of qtlreaper scores mixed with GEMMA scores without the user -> having any way of knowing the difference. Also, storing the full results (what Rob -> calls the "full vector model") will require some sort of fundamental change to the +> having any way of knowing the difference. Also, storing the full results (what Rob +> calls the "full vector model") will require some sort of fundamental change to the > way we store this data and should be postponed for later (since the high priority > immediate issue is to ensure that the stored Max LRS values aren't wrong) -> As for Mean, that should be simple, since it's just taking the average of sample values +> As for Mean, that should be simple, since it's just taking the average of sample values > immediately after an update. > The main thing I'm uncertain how to do (though I know is possible since Bonface already @@ -39,6 +47,3 @@ Rob: > BXD vector results and store that as a big juicy TRANSFORMATIVE blob of > data. A big paper in doing just that. Reaper is just wrong at this > point. We have LMM: We should use it. - - - diff --git a/issues/gemma/lmm-precomputed-scores.gmi b/issues/gemma/lmm-precomputed-scores.gmi new file mode 100644 index 0000000..1a86e12 --- /dev/null +++ b/issues/gemma/lmm-precomputed-scores.gmi @@ -0,0 +1,36 @@ +# LMM precomputed scores + +Here we will track introducing a new precomputation of scores for GN. Interestingly, this ties in with our xapian search and fast querying of value ranges. + +# Tags + +* assigned: pjotrp +* priority: high +* type: bug, enhancement +* status: unclear +* keywords: database, gemma, reaper + +# Old storage + +The old reaper scores are in + +``` +MariaDB [db_webqtl]> select ProbeSetId,ProbeSetFreezeId,Locus,LRS,additive from ProbeSetXRef limit 10; ++------------+------------------+----------------+------------------+--------------------+ +| ProbeSetId | ProbeSetFreezeId | Locus | LRS | additive | ++------------+------------------+----------------+------------------+--------------------+ +| 1 | 1 | rs13480619 | 12.590069931048 | -0.28515625 | +| 2 | 1 | rs29535974 | 10.5970737900941 | -0.116783333333333 | +| 3 | 1 | rs49742109 | 6.0970532702754 | 0.112957489878542 | +| 4 | 1 | rsm10000002321 | 11.7748675511731 | -0.157113725490196 | +| 5 | 1 | rsm10000019445 | 10.9232633740162 | 0.114764705882353 | +| 6 | 1 | rsm10000017255 | 8.45741703245224 | -0.200034412955466 | +| 7 | 1 | rs4178505 | 7.46477918183565 | 0.104331983805668 | +| 8 | 1 | rsm10000144086 | 12.1201771258006 | -0.134278431372548 | +| 9 | 1 | rsm10000014965 | 11.8837168740735 | 0.341458333333334 | +| 10 | 1 | rsm10000020208 | 10.2809848009836 | -0.173866666666667 | ++------------+------------------+----------------+------------------+--------------------+ +10 rows in set (0.000 sec) +``` + +This means for every dataset one single maximum score gets stored (?) -- cgit v1.2.3