summaryrefslogtreecommitdiff
path: root/topics/data
diff options
context:
space:
mode:
Diffstat (limited to 'topics/data')
-rw-r--r--topics/data/precompute/steps.gmi44
1 files changed, 44 insertions, 0 deletions
diff --git a/topics/data/precompute/steps.gmi b/topics/data/precompute/steps.gmi
new file mode 100644
index 0000000..e72366e
--- /dev/null
+++ b/topics/data/precompute/steps.gmi
@@ -0,0 +1,44 @@
+# Precompute steps
+
+At this stage precompute fetches a trait from the DB and runs GEMMA. Next it tar balls up the vector for later use. It also updates the database with the latest info.
+
+To actually kick off compute on machines that do not access the DB I realize now we need a step-wise approach. Basically you want to shift files around without connecting to a DB. And then update the DB whenever it is convenient. So we are going to make it a multi-step procedure.
+
+We will track precompute steps here. We will have:
+
+* [ ] steps g: genotype archives (first we only do BXD-latest, include BXD.json)
+* [ ] steps k: kinship archives (first we only do BXD-latest)
+* [ ] steps p: trait archives (first we do p1-4)
+
+Trait archives will have steps for
+
+* [ ] step p1: list-traits-to-compute
+* [ ] step p2: trait-values-export: get trait values from mariadb
+* [ ] step p3: gemma-lmm9-loco-output: Compute standard GEMMA lmm9 LOCO vector with gemma-wrapper
+* [ ] step p4: gemma-to-lmdb: create a clean vector
+
+The DB itself can be updated from these
+
+* [ ] step p5: updated-db-v1: update DB using single LOD score, number of samples and
+
+Later
+
+* [ ] bulklmm: Compute bulklmm vector
+
+# Tags
+
+* assigned: pjotrp
+* type: precompute, gemma
+* status: in progress
+* priority: high
+* keywords: ui, correlations
+
+# Tasks
+
+* [ ] Check Artyoms LMDB version for kinship and maybe add LOCO
+* [ ] Create JSON metadata controller for every compute incl. type of content
+* [ ] Create genotype archive
+* [ ] Create kinship archive
+* [ ] Create trait archives
+* [ ] Kick off lmm9 step
+* [ ] Update DB step v1