aboutsummaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
authorPjotr Prins2025-01-04 02:53:42 -0600
committerPjotr Prins2025-01-04 02:53:42 -0600
commite58c198e1e5fd8f94cdcaf3c621931ee3a21e2f4 (patch)
treea62fbb0bad7c1a4c38ee835f6d32318d652dda9f /doc
parent2701a364228d91ba2c52f511ad6619d070e2235b (diff)
downloadpangemma-e58c198e1e5fd8f94cdcaf3c621931ee3a21e2f4.tar.gz
Added checkpoints for file readers
Diffstat (limited to 'doc')
-rw-r--r--doc/code/pangemma.md13
1 files changed, 12 insertions, 1 deletions
diff --git a/doc/code/pangemma.md b/doc/code/pangemma.md
index 223e2ae..6640a80 100644
--- a/doc/code/pangemma.md
+++ b/doc/code/pangemma.md
@@ -204,7 +204,7 @@ multiple outputs - in that case we may add filenames. And exits with:
**** Checkpoint reached: read-geno-file (normal exit)
```
-# List check points
+## List check points
When you compile PanGEMMA with debug information
@@ -216,8 +216,19 @@ and run a computation with the '-debug' switch it should output the check-points
```
**** DEBUG: checkpoint read-geno-file passed with ./example/mouse_hs1940.geno.txt.gz in src/gemma_io.cpp at line 874 in ReadFile_geno
+**** DEBUG: checkpoint bimbam-kinship-matrix passed with kinship.txt in src/gemma_io.cpp at line 1598 in BimbamKin
```
+## Filtering steps
+
+note that both plink and BIMBAM input files have their own kinship computation with some filtering(!).
+Similarly read-geno-file also filters on MAF, for example, and it is well known that the old GEMMA will read the genotype file multiple times for different purposes. With growing geno files this is becoming highly inefficient.
+
+In my new propagator setup these filtering steps should go in their own functions or propagators.
+
+To refactor this at read-geno-file we can start to write out the filtered-genotype file at the checkpoint. That will be our base line 'output'. Next we write an alternative path and make sure the outputs are the same! Sounds easy, no?
+
+
# Other
## Example