diff options
Diffstat (limited to 'issues/gemma/gemma2-has-different-output-from-rqtl2.gmi')
-rw-r--r-- | issues/gemma/gemma2-has-different-output-from-rqtl2.gmi | 80 |
1 files changed, 80 insertions, 0 deletions
diff --git a/issues/gemma/gemma2-has-different-output-from-rqtl2.gmi b/issues/gemma/gemma2-has-different-output-from-rqtl2.gmi new file mode 100644 index 0000000..a0b2c5c --- /dev/null +++ b/issues/gemma/gemma2-has-different-output-from-rqtl2.gmi @@ -0,0 +1,80 @@ +# GEMMA output differs from R/qtl2 + +# Tags + +* assigned: pjotrp, davea +* priority: high +* type: bug, enhancement +* status: closed +* keywords: database, gemma, reaper, rqtl2 + +# Description + +When running trait BXD_21526 results differ significantly. + +=> https://genenetwork.org/show_trait?trait_id=21526&dataset=BXDPublish +=> https://genenetwork.org/show_trait?trait_id=21529&dataset=BXDPublish + +So I confirm I am getting the same results as Dave in GN for GEMMA (see Conclusion below). + +# Tasks + +## GeneNetwork + +I run GEMMA for precompute on the command line and that I confirmed to +be the same as what we see in the browser. This suggests either data +or method is different with Dave's approach. + +I confirmed that gemma in GN matches Dave's results. It is interesting +to see that running without LOCO has some impact, but not as bad as +the R/qtl2 difference. First we should check the genotype files to see +if they match. I checked that the phenotypes match. + +Our inputs are different if I count genotypes (first yours, the other +on production): + +``` + 1 2184941 B + 2 2132744 D + 3 628980 H + 1 2195662 B + 2 2142959 D + 3 650168 H +``` + +The number of rows/markers is the same. So we probably added some +genometypes, but if we miss one that would matter. Dave you can find +the file in /home/wrk/BXD.geno on tux02 if you want to look. + +I notice that we don't use H in the R/qtl2 control file. That +might make a difference though it probably won't explain what we see +now. BTW I also correlated the LOD scores from GEMMA and R/qtl2 in +the spreadsheet and at 0.7 that is too low. So it is probably not +just a magnitude problem. The results differ a lot in your +spreadsheet. + +Next step is that I need to run R/qtl2 using the script in your +dropbox and see what Karl's code does. The exercise does not hurt +because it will help us bring R/qtl2 to GN. + +## R/qtl2 + +R/qtl2 is packaged in guix and can be run in a shell with + +``` +guix shell -C r r-qtl2 +> library(qtl2) +> bxd <- read_cross2(file = "bxd_cancer_new_GN_July_2024.json") +Warning messages: +1: In recode_geno(sheet, genotypes) : + 630519 genotypes treated as missing: "H", "U" +2: In matrix(as.numeric(unlist(pheno)), ncol = nc) : + NAs introduced by coercion +3: In check_cross2(output) : Physical map out of order on chr 1, 2, 11, 19 +``` + +The first warning matches above. If data is missing it may be filtered out. We'll have to check for that. The third warning I am not sure about. Probably a ranking of markers. + +# Conclusion + +It turned out that R/qtl was running HK - so it was a QTL mapping rather than an LMM. |