diff options
author | Pjotr Prins | 2024-09-21 08:08:35 +0200 |
---|---|---|
committer | Pjotr Prins | 2024-09-21 08:08:35 +0200 |
commit | 55b4b172e9d003a6a2f32d69c995e5c7c5ae7d23 (patch) | |
tree | 9920759b5a7319ef51a63cca078e3fae62cb21f8 | |
parent | fd81338405813419b004d242f0d83979fe9bcee7 (diff) | |
download | gn-gemtext-55b4b172e9d003a6a2f32d69c995e5c7c5ae7d23.tar.gz |
Permutations updated - we are at the last step where gemma fails
-rw-r--r-- | topics/lmms/gemma/permutations.gmi | 78 |
1 files changed, 68 insertions, 10 deletions
diff --git a/topics/lmms/gemma/permutations.gmi b/topics/lmms/gemma/permutations.gmi index d923bf8..a3968fe 100644 --- a/topics/lmms/gemma/permutations.gmi +++ b/topics/lmms/gemma/permutations.gmi @@ -502,27 +502,68 @@ First we run GEMMA just without LOCO using default settings that GN uses ./bin/rqtl2-pheno-to-gemma.py BXD_pheno_Dave.csv --json BXD.geno.json > BXD_pheno_matched.txt Wrote GEMMA pheno 237 from 237 with genometypes (rows) and 24 collections (cols)! gemma -gk -g BXD.geno.txt -p BXD_pheno_matched.txt -n 5 -gemma -lmm 9 -g BXD.geno.txt -p BXD_pheno_matched.txt -k output/result.cXX.txt -n +gemma -lmm 9 -g BXD.geno.txt -p BXD_pheno_matched.txt -k output/result.cXX.txt -n 5 ``` -If that works we can move to a full LOCO +So far the output is correct. + +``` +-Math.log10(7.532460e-05) +=> 4.123063165904243 +``` + +Try with gemma-wrapper + +``` +./bin/gemma-wrapper --json -- -gk -g BXD.geno.txt -p BXD_pheno_matched.txt -n 5 -a BXD.8_snps.txt > K.json +cp output/bab43175329bd14d485e582b7ad890cf0ec28915.cXX.txt /tmp +``` + +Works, but the following failed without the -n switch: + +``` +./bin/gemma-wrapper --debug --verbose --force --json --lmdb --input K.json -- -g BXD.geno.txt -a BXD.8_snps.txt -lmm 9 -p BXD_pheno_matched.txt -n 5 +``` + +and worked with. That is logical, if you see output like + +``` +19 rs30886715 46903165 0 X Y 0.536 0.000000e+00 0.000000e+00 1.000000e-05 1.000000e+00 +19 rs6376540 46905638 0 X Y 0.536 0.000000e+00 0.000000e+00 1.000000e-05 1.000000e+00 +19 rs50610897 47412184 0 X Y 0.538 0.000000e+00 0.000000e+00 1.000000e-05 1.000000e+00 +``` + +It means the phenotype column that was parsed has empty values. In this case the BXD strain names. GEMMA should show a meaningful error. + +Now that works we can move to a full LOCO ``` ./bin/gemma-wrapper --loco --json -- -gk -g BXD.geno.txt -p BXD_pheno_matched.txt -n 5 -a BXD.8_snps.txt > K.json -./bin/gemma-wrapper --debug --verbose --force --loco --json --lmdb --input K.json -- -g BXD.geno.txt -a test/data/input/BXD_snps.txt -lmm 9 -maf 0.05 -p BXD_pheno_matched.txt +./bin/gemma-wrapper --debug --verbose --force --loco --json --lmdb --input K.json -- -g BXD.geno.txt -a test/data/input/BXD_snps.txt -lmm 9 -maf 0.05 -p BXD_pheno_matched.txt -n 5 +``` + +and we get + +``` +18,69216071,rs3718618,0.635,-195.5784,82.1243,100000.0,0.0,4.5 +18,69825784,rs50446650,0.635,-195.5784,82.1243,100000.0,0.0,4.5 +18,68189477,rs29539715,0.596,-189.7332,79.7479,100000.0,0.0,4.49 ``` +When we converted BXD.geno to its BIMBAM BXD.geno.txt we also got a BXD.geno.json file which contains a list of the individuals/genometypes that were used in the genotype file. + +Now we reduce the traits file to something GEMMA can use for permutations - adding the trait number and output BXD_pheno_Dave.csv.json + ```sh -# Convert the traits file to something GEMMA can use - adding the trait number and output BXD_pheno_Dave.csv.json ./bin/rqtl2-pheno-to-gemma.py BXD_pheno_Dave.csv --json BXD.geno.json -n 5 > BXD_pheno_matched-5.txt ``` -If you inspect the JSON file you should see +The matched file should be identical to the earlier BXD_pheno_matched.txt file. Meanwhile, if you inspect the JSON file you should see ``` jq < BXD_pheno_Dave.csv.json - "samples-column": 4, + "samples-column": 5, "trait": "21529", "samples-reduced": { "BXD1": 1919.450806, @@ -530,13 +571,15 @@ jq < BXD_pheno_Dave.csv.json "BXD102": 1745.671997, ``` +So far we are OK! + At this point we have a reduced sample set, a BIMBAM file and a phenotype file GEMMA can use! ``` -./bin/gemma-wrapper --loco --json --input BXD_pheno_Dave.csv.json -- -gk -g BXD.geno.txt -p BXD_pheno_matched.txt -n 5 -a BXD.8_snps.txt > K.json +./bin/gemma-wrapper --loco --json --input BXD_pheno_Dave.csv.json -- -gk -g BXD.geno.txt -p BXD_pheno_matched.txt -a BXD.8_snps.txt -n 5 > K.json ``` -Note that at this step we actually do not create a reduced GRM. That happens in the next mapping stage. +Note that at this step we actually create a full GRM. Reducing happens in the next mapping stage. ``` ./bin/gemma-wrapper --debug --verbose --force --loco --json --lmdb --input K.json -- -g BXD.geno.txt -a test/data/input/BXD_snps.txt -lmm 9 -maf 0.05 @@ -545,15 +588,29 @@ Note that at this step we actually do not create a reduced GRM. That happens in Note the use of '-n' switch. We should change that. ``` -./bin/gemma-values.py /tmp/test/8599834ee474b9da9ff39cc4954d662518a6b5c8.mdb +./bin/gemma-values.py /tmp/test/8599834ee474b9da9ff39cc4954d662518a6b5c8.mdb --sort ``` -Look for rs3718618 at 69216071 and I am currently getting the wrong result and it is not clear why that is: +Look for rs3718618 at 69216071 and I am currently getting the wrong result for trait 21529 and it is not clear why that is: ``` +chr,pos,marker,af,beta,se,l_mle,l_lrt,-logP +16,88032783,?,0.538,-134.1339,75.7837,0.0,0.0009,3.02 +16,88038734,?,0.538,-134.1339,75.7837,0.0,0.0009,3.02 +(...) 18,69216071,?,0.462,10.8099,93.3936,0.0,0.8097,0.09 ``` +When running GEMMA on genenetwork.org we get a peak for LOCO at that position for rs3718618. I note that the non-LOCO version at 4.1 vs 4.5 for LOCO has a higher peak. We should compute the significance for both! + +Now, when I run the non-LOCO version by hand I get + +``` +-Math.log10(7.532460e-05) +=> 4.123063165904243 +``` + + ## Dealing with epoch @@ -571,6 +628,7 @@ We have two or more possible solutions to deal with hierarchy in the population. * [ ] Fix non-use of MAF in GN for non-LOCO * [ ] Fix running of -p switch when assoc cache exists (bug) +* [ ] Compare significance for non-LOCO Quantile-Based Permutation Thresholds for Quantitative Trait Loci Hotspots https://academic.oup.com/genetics/article/191/4/1355/5935078 |