aboutsummaryrefslogtreecommitdiff
path: root/doc/code/pangemma.md
diff options
context:
space:
mode:
authorPjotr Prins2025-01-03 01:35:06 -0600
committerPjotr Prins2025-01-03 01:35:50 -0600
commit53530c209c17fe1f3b37513b53c2f0ef9a491dc3 (patch)
tree9860e4954fad775277bf0dad10a88c8381e02331 /doc/code/pangemma.md
parent1db223ea71837a0cdfcb881411a5a4cd685a7dae (diff)
downloadpangemma-53530c209c17fe1f3b37513b53c2f0ef9a491dc3.tar.gz
Adding support for checkpoints and relevant documentation
Diffstat (limited to 'doc/code/pangemma.md')
-rw-r--r--doc/code/pangemma.md23
1 files changed, 23 insertions, 0 deletions
diff --git a/doc/code/pangemma.md b/doc/code/pangemma.md
index ac65f37..c359a56 100644
--- a/doc/code/pangemma.md
+++ b/doc/code/pangemma.md
@@ -106,6 +106,29 @@ Every propagator has state (too). I.e. it may be idle, computing and done.
The runner visits the list of propagators and checks wether the inputs are complete and whether they have changed. On change computation has to happen updating the output cell.
+## Setting check points in GEMMA
+
+GEMMA is quite stateful in its original design. We want to break the work up into chunks setting 'check points'. For example the actual kinship multiplication could start as 'start-compute-kinship' and end with 'compute-kinship' check points. To not have to deal with state too much we can simply let gemma run from the start of the program until 'compute-kinship' to have a kinship-propagator. The output will be a kinship file. Similarly we can run until 'filter-genotypes' that is a step earlier. The output of these propagators can be used by other pangemma propagators as input for comparison and continuation. All the original GEMMA does is act as a reference for alternative implementation of these chunks. Speed is not a concern though there may be opportunities to start compute after some of these check points (using intermediate output) down the line.
+
+So, let's start with a first check point implementation for 'read-bimbam-file'.
+
+## read-bimbam-file
+
+Reading the bimbam file happens in the `ReadFile_bim' function in `gemma_io.cpp'. Of course all it does is read a file - which is the same as any output. But just for the sake of a simple pilot we'll add the check point at the end of the function that will exit GEMMA.
+We'll add a CLI switch `-checkpoint read-geno-file' which will force the exit.
+
+```C++
+checkpoint("read-geno-file",file_geno);
+```
+
+It passes in the outputfile (the infile in this case), that is used to feed the calling propagator. Some of the outfiles may be composed of multiple outputs - in that case we may add filenames. And exits with:
+
+```
+**** Checkpoint reached: read-geno-file (normal exit)
+```
+
+# Other
+
## Example
I created a very minimalistic example in Ruby with a simple round robin scheduler: