From d9e47731f8a1616b06fdc1ef2dbd3cf50413706d Mon Sep 17 00:00:00 2001 From: Pjotr Prins Date: Sun, 10 Dec 2023 08:12:19 -0600 Subject: Precompute notes on NAs --- .../mariadb/precompute-mapping-input-data.gmi | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/topics/systems/mariadb/precompute-mapping-input-data.gmi b/topics/systems/mariadb/precompute-mapping-input-data.gmi index 12c21da..89bd8be 100644 --- a/topics/systems/mariadb/precompute-mapping-input-data.gmi +++ b/topics/systems/mariadb/precompute-mapping-input-data.gmi @@ -979,6 +979,28 @@ Genotype state lives in 4 places. Time to create a 5th one with lmdb ;). At leas Using this information we created our first phenotype file and GEMMA run! +# Notes + +## NAs in GN + +A note from Zach: + +On Sat, Dec 09, 2023 at 06:09:56PM -0600, Zachary Sloan wrote: +> (After typing the rest of this out, I realized that part of the +> confusion might be about how locations are stored. We don't actually +> database locations in the ProbeSetXRef table - we only database the +> peak Locus marker name. This is then cross-referenced against the Geno +> table, where the actual Location is stored. This is the main source of +> the problem. So I think the best short-term solution might be to just +> directly database the locations in the ProbeSetXRef table. Those +> locations might become out of date, but as you mention they'd still +> probably be in the same ballpark.) + +It is logical to store the location with the peak - if it changes we +should recompute. That also adds the idea that we should track the +version of the genotypes in that table. + + ## More complicated datasets A good dataset to take apart is -- cgit v1.2.3