diff options
Diffstat (limited to 'topics/data')
-rw-r--r-- | topics/data/precompute/steps.gmi | 44 |
1 files changed, 44 insertions, 0 deletions
diff --git a/topics/data/precompute/steps.gmi b/topics/data/precompute/steps.gmi new file mode 100644 index 0000000..e72366e --- /dev/null +++ b/topics/data/precompute/steps.gmi @@ -0,0 +1,44 @@ +# Precompute steps + +At this stage precompute fetches a trait from the DB and runs GEMMA. Next it tar balls up the vector for later use. It also updates the database with the latest info. + +To actually kick off compute on machines that do not access the DB I realize now we need a step-wise approach. Basically you want to shift files around without connecting to a DB. And then update the DB whenever it is convenient. So we are going to make it a multi-step procedure. + +We will track precompute steps here. We will have: + +* [ ] steps g: genotype archives (first we only do BXD-latest, include BXD.json) +* [ ] steps k: kinship archives (first we only do BXD-latest) +* [ ] steps p: trait archives (first we do p1-4) + +Trait archives will have steps for + +* [ ] step p1: list-traits-to-compute +* [ ] step p2: trait-values-export: get trait values from mariadb +* [ ] step p3: gemma-lmm9-loco-output: Compute standard GEMMA lmm9 LOCO vector with gemma-wrapper +* [ ] step p4: gemma-to-lmdb: create a clean vector + +The DB itself can be updated from these + +* [ ] step p5: updated-db-v1: update DB using single LOD score, number of samples and + +Later + +* [ ] bulklmm: Compute bulklmm vector + +# Tags + +* assigned: pjotrp +* type: precompute, gemma +* status: in progress +* priority: high +* keywords: ui, correlations + +# Tasks + +* [ ] Check Artyoms LMDB version for kinship and maybe add LOCO +* [ ] Create JSON metadata controller for every compute incl. type of content +* [ ] Create genotype archive +* [ ] Create kinship archive +* [ ] Create trait archives +* [ ] Kick off lmm9 step +* [ ] Update DB step v1 |