From 99fb6ae36bf03fc06c84e4ae45e741c6f7c0902a Mon Sep 17 00:00:00 2001
From: Pjotr Prins
Date: Mon, 3 Jun 2024 09:25:13 +0200
Subject: precompute

---
 topics/data/precompute/steps.gmi                   | 32 ++++++++++++++++++++--
 topics/deploy/machines.gmi                         |  5 ++--
 .../mariadb/precompute-mapping-input-data.gmi      |  6 ++--
 3 files changed, 35 insertions(+), 8 deletions(-)

(limited to 'topics')

diff --git a/topics/data/precompute/steps.gmi b/topics/data/precompute/steps.gmi
index 33915cd..75e3bfd 100644
--- a/topics/data/precompute/steps.gmi
+++ b/topics/data/precompute/steps.gmi
@@ -8,12 +8,12 @@ We will track precompute steps here. We will have:
 
 * [X] steps g: genotype archives (first we only do BXD-latest, include BXD.json)
 * [X] steps k: kinship archives (first we only do BXD-latest)
-* [ ] steps p: trait archives (first we do p1-3)
+* [X] steps p: trait archives (first we do p1-3)
 
 Trait archives will have steps for
 
-* [+] step p1: list-traits-to-compute
-* [ ] step p2: gemma-lmm9-loco-output: Compute standard GEMMA lmm9 LOCO vector with gemma-wrapper
+* [X] step p1: list-traits-to-compute
+* [+] step p2: gemma-lmm9-loco-output: Compute standard GEMMA lmm9 LOCO vector with gemma-wrapper
 * [ ] step p3: gemma-to-lmdb: create a clean vector
 
 The DB itself can be updated from these
@@ -74,3 +74,29 @@ Note that it (potentially) includes the parents. Also the strain-id is a string
 from guile-hashing and guile-gcrypt modules.
 
 In the next step we have to check the normal distribution of the trait values and maybe winsorize outliers.
+Actually we should brute force the default first.
+Be interesting to see the effect of handling outliers and normalisation of phenotypes.
+
+# Step p2: run GEMMA lmm9 + LOCO
+
+Last week I checked out Arun's CCWL. Anything more complicated that a few steps can go into CCWL. It will also handle cluster work loads using, for example, toil. We will use trait files + genotypes as inputs for gemma-wrapper initially.
+
+p1: Workflow 1:
+
+```
+batch traits from DB
+```
+
+p2+p3: Workflow 2:
+
+```
+batched traits, genotypes ->
+  gemma-wrapper ->
+    lmdb'ize
+```
+
+p4: Workflow 3:
+
+```
+Update DB
+```
diff --git a/topics/deploy/machines.gmi b/topics/deploy/machines.gmi
index 21ff2ea..d610c9f 100644
--- a/topics/deploy/machines.gmi
+++ b/topics/deploy/machines.gmi
@@ -1,12 +1,13 @@
 # Machines
 
 ```
+- [ ] bacchus             172.23.17.156   (00:11:32:ba:7f:17) -  1 Gbs
 - [X] lambda01            172.23.18.212   (7c:c2:55:11:9c:ac)
-- [ ] tux03i              172.23.17.181   (00:0a:f7:c1:00:8d) - 10 Gbs
+- [X] tux03i              172.23.17.181   (00:0a:f7:c1:00:8d) - 10 Gbs
   [X] tux03               128.169.5.101   (00:0a:f7:c1:00:8b) -  1 Gbs
 - [ ] tux04i              172.23.17.170   (14:23:f2:4f:e6:10)
 - [ ] tux04               128.169.5.119   (14:23:f2:4f:e6:11)
-- [ ] tux05               172.23.18.129   (14:23:f2:4f:35:00)
+- [X] tux05               172.23.18.129   (14:23:f2:4f:35:00)
 - [X] tux06               172.23.17.188   (14:23:f2:4e:29:10)
 - [X] tux07               172.23.17.191   (14:23:f2:4e:7d:60)
 - [X] tux08               172.23.17.186   (14:23:f2:4f:4e:b0)
diff --git a/topics/systems/mariadb/precompute-mapping-input-data.gmi b/topics/systems/mariadb/precompute-mapping-input-data.gmi
index 2e73590..0c89fe5 100644
--- a/topics/systems/mariadb/precompute-mapping-input-data.gmi
+++ b/topics/systems/mariadb/precompute-mapping-input-data.gmi
@@ -27,10 +27,10 @@ Next, for running the full batch:
 * [X] Store all GEMMA values efficiently
 * [ ] Include metadata record in lmdb and as JSON file
 * [ ] Include metadata record on compute status
-* [ ] Remove junk from tarball
-* [ ] List significant markers as metadata
+* [ ] Remove junk from tarball - use lmdb?
+* [ ] List significant markers as metadata in lmdb
 * [ ] Reread below info
-* [ ] Submit jobs to PBS
+* [ ] Submit jobs to PBS using CCWL
 * [ ] Report results to mariadb
 
 And after:
-- 
cgit v1.2.3