Added checkpoints for file readers

author: Pjotr Prins 2025-01-04 02:53:42 -0600
committer: Pjotr Prins 2025-01-04 02:53:42 -0600
commit: e58c198e1e5fd8f94cdcaf3c621931ee3a21e2f4 (patch)
tree: a62fbb0bad7c1a4c38ee835f6d32318d652dda9f /doc
parent: 2701a364228d91ba2c52f511ad6619d070e2235b (diff)
download: pangemma-e58c198e1e5fd8f94cdcaf3c621931ee3a21e2f4.tar.gz
1 files changed, 12 insertions, 1 deletions
diff --git a/doc/code/pangemma.md b/doc/code/pangemma.md
index 223e2ae..6640a80 100644
--- a/doc/code/pangemma.md
+++ b/doc/code/pangemma.md
@@ -204,7 +204,7 @@ multiple outputs - in that case we may add filenames. And exits with:
 **** Checkpoint reached: read-geno-file (normal exit)
 ```
 
-# List check points
+## List check points
 
 When you compile PanGEMMA with debug information
 
@@ -216,8 +216,19 @@ and run a computation with the '-debug' switch it should output the check-points
 
 ```
 **** DEBUG: checkpoint read-geno-file passed with ./example/mouse_hs1940.geno.txt.gz in src/gemma_io.cpp at line 874 in ReadFile_geno
+**** DEBUG: checkpoint bimbam-kinship-matrix passed with kinship.txt in src/gemma_io.cpp at line 1598 in BimbamKin
 ```
 
+## Filtering steps
+
+note that both plink and BIMBAM input files have their own kinship computation with some filtering(!).
+Similarly read-geno-file also filters on MAF, for example, and it is well known that the old GEMMA will read the genotype file multiple times for different purposes. With growing geno files this is becoming highly inefficient.
+
+In my new propagator setup these filtering steps should go in their own functions or propagators.
+
+To refactor this at read-geno-file we can start to write out the filtered-genotype file at the checkpoint. That will be our base line 'output'. Next we write an alternative path and make sure the outputs are the same! Sounds easy, no?
+
+
 # Other
 
 ## Example
author	Pjotr Prins	2025-01-04 02:53:42 -0600
committer	Pjotr Prins	2025-01-04 02:53:42 -0600
commit	e58c198e1e5fd8f94cdcaf3c621931ee3a21e2f4 (patch)
tree	a62fbb0bad7c1a4c38ee835f6d32318d652dda9f /doc
parent	2701a364228d91ba2c52f511ad6619d070e2235b (diff)
download	pangemma-e58c198e1e5fd8f94cdcaf3c621931ee3a21e2f4.tar.gz