summaryrefslogtreecommitdiff
path: root/issues/gemma/gemma2-has-different-output-from-rqtl2.gmi
diff options
context:
space:
mode:
Diffstat (limited to 'issues/gemma/gemma2-has-different-output-from-rqtl2.gmi')
-rw-r--r--issues/gemma/gemma2-has-different-output-from-rqtl2.gmi80
1 files changed, 80 insertions, 0 deletions
diff --git a/issues/gemma/gemma2-has-different-output-from-rqtl2.gmi b/issues/gemma/gemma2-has-different-output-from-rqtl2.gmi
new file mode 100644
index 0000000..a0b2c5c
--- /dev/null
+++ b/issues/gemma/gemma2-has-different-output-from-rqtl2.gmi
@@ -0,0 +1,80 @@
+# GEMMA output differs from R/qtl2
+
+# Tags
+
+* assigned: pjotrp, davea
+* priority: high
+* type: bug, enhancement
+* status: closed
+* keywords: database, gemma, reaper, rqtl2
+
+# Description
+
+When running trait BXD_21526 results differ significantly.
+
+=> https://genenetwork.org/show_trait?trait_id=21526&dataset=BXDPublish
+=> https://genenetwork.org/show_trait?trait_id=21529&dataset=BXDPublish
+
+So I confirm I am getting the same results as Dave in GN for GEMMA (see Conclusion below).
+
+# Tasks
+
+## GeneNetwork
+
+I run GEMMA for precompute on the command line and that I confirmed to
+be the same as what we see in the browser. This suggests either data
+or method is different with Dave's approach.
+
+I confirmed that gemma in GN matches Dave's results. It is interesting
+to see that running without LOCO has some impact, but not as bad as
+the R/qtl2 difference. First we should check the genotype files to see
+if they match. I checked that the phenotypes match.
+
+Our inputs are different if I count genotypes (first yours, the other
+on production):
+
+```
+ 1 2184941 B
+ 2 2132744 D
+ 3 628980 H
+ 1 2195662 B
+ 2 2142959 D
+ 3 650168 H
+```
+
+The number of rows/markers is the same. So we probably added some
+genometypes, but if we miss one that would matter. Dave you can find
+the file in /home/wrk/BXD.geno on tux02 if you want to look.
+
+I notice that we don't use H in the R/qtl2 control file. That
+might make a difference though it probably won't explain what we see
+now. BTW I also correlated the LOD scores from GEMMA and R/qtl2 in
+the spreadsheet and at 0.7 that is too low. So it is probably not
+just a magnitude problem. The results differ a lot in your
+spreadsheet.
+
+Next step is that I need to run R/qtl2 using the script in your
+dropbox and see what Karl's code does. The exercise does not hurt
+because it will help us bring R/qtl2 to GN.
+
+## R/qtl2
+
+R/qtl2 is packaged in guix and can be run in a shell with
+
+```
+guix shell -C r r-qtl2
+> library(qtl2)
+> bxd <- read_cross2(file = "bxd_cancer_new_GN_July_2024.json")
+Warning messages:
+1: In recode_geno(sheet, genotypes) :
+ 630519 genotypes treated as missing: "H", "U"
+2: In matrix(as.numeric(unlist(pheno)), ncol = nc) :
+ NAs introduced by coercion
+3: In check_cross2(output) : Physical map out of order on chr 1, 2, 11, 19
+```
+
+The first warning matches above. If data is missing it may be filtered out. We'll have to check for that. The third warning I am not sure about. Probably a ranking of markers.
+
+# Conclusion
+
+It turned out that R/qtl was running HK - so it was a QTL mapping rather than an LMM.