summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorPjotr Prins2024-06-03 09:25:13 +0200
committerPjotr Prins2024-06-03 09:25:16 +0200
commit99fb6ae36bf03fc06c84e4ae45e741c6f7c0902a (patch)
treec893e3b7ff1a61bbc29f1de6e17a99877854930f
parent46915b1fac722fd9e00341c834e12c29c9300a5f (diff)
downloadgn-gemtext-99fb6ae36bf03fc06c84e4ae45e741c6f7c0902a.tar.gz
precompute
-rw-r--r--topics/data/precompute/steps.gmi32
-rw-r--r--topics/deploy/machines.gmi5
-rw-r--r--topics/systems/mariadb/precompute-mapping-input-data.gmi6
3 files changed, 35 insertions, 8 deletions
diff --git a/topics/data/precompute/steps.gmi b/topics/data/precompute/steps.gmi
index 33915cd..75e3bfd 100644
--- a/topics/data/precompute/steps.gmi
+++ b/topics/data/precompute/steps.gmi
@@ -8,12 +8,12 @@ We will track precompute steps here. We will have:
* [X] steps g: genotype archives (first we only do BXD-latest, include BXD.json)
* [X] steps k: kinship archives (first we only do BXD-latest)
-* [ ] steps p: trait archives (first we do p1-3)
+* [X] steps p: trait archives (first we do p1-3)
Trait archives will have steps for
-* [+] step p1: list-traits-to-compute
-* [ ] step p2: gemma-lmm9-loco-output: Compute standard GEMMA lmm9 LOCO vector with gemma-wrapper
+* [X] step p1: list-traits-to-compute
+* [+] step p2: gemma-lmm9-loco-output: Compute standard GEMMA lmm9 LOCO vector with gemma-wrapper
* [ ] step p3: gemma-to-lmdb: create a clean vector
The DB itself can be updated from these
@@ -74,3 +74,29 @@ Note that it (potentially) includes the parents. Also the strain-id is a string
from guile-hashing and guile-gcrypt modules.
In the next step we have to check the normal distribution of the trait values and maybe winsorize outliers.
+Actually we should brute force the default first.
+Be interesting to see the effect of handling outliers and normalisation of phenotypes.
+
+# Step p2: run GEMMA lmm9 + LOCO
+
+Last week I checked out Arun's CCWL. Anything more complicated that a few steps can go into CCWL. It will also handle cluster work loads using, for example, toil. We will use trait files + genotypes as inputs for gemma-wrapper initially.
+
+p1: Workflow 1:
+
+```
+batch traits from DB
+```
+
+p2+p3: Workflow 2:
+
+```
+batched traits, genotypes ->
+ gemma-wrapper ->
+ lmdb'ize
+```
+
+p4: Workflow 3:
+
+```
+Update DB
+```
diff --git a/topics/deploy/machines.gmi b/topics/deploy/machines.gmi
index 21ff2ea..d610c9f 100644
--- a/topics/deploy/machines.gmi
+++ b/topics/deploy/machines.gmi
@@ -1,12 +1,13 @@
# Machines
```
+- [ ] bacchus 172.23.17.156 (00:11:32:ba:7f:17) - 1 Gbs
- [X] lambda01 172.23.18.212 (7c:c2:55:11:9c:ac)
-- [ ] tux03i 172.23.17.181 (00:0a:f7:c1:00:8d) - 10 Gbs
+- [X] tux03i 172.23.17.181 (00:0a:f7:c1:00:8d) - 10 Gbs
[X] tux03 128.169.5.101 (00:0a:f7:c1:00:8b) - 1 Gbs
- [ ] tux04i 172.23.17.170 (14:23:f2:4f:e6:10)
- [ ] tux04 128.169.5.119 (14:23:f2:4f:e6:11)
-- [ ] tux05 172.23.18.129 (14:23:f2:4f:35:00)
+- [X] tux05 172.23.18.129 (14:23:f2:4f:35:00)
- [X] tux06 172.23.17.188 (14:23:f2:4e:29:10)
- [X] tux07 172.23.17.191 (14:23:f2:4e:7d:60)
- [X] tux08 172.23.17.186 (14:23:f2:4f:4e:b0)
diff --git a/topics/systems/mariadb/precompute-mapping-input-data.gmi b/topics/systems/mariadb/precompute-mapping-input-data.gmi
index 2e73590..0c89fe5 100644
--- a/topics/systems/mariadb/precompute-mapping-input-data.gmi
+++ b/topics/systems/mariadb/precompute-mapping-input-data.gmi
@@ -27,10 +27,10 @@ Next, for running the full batch:
* [X] Store all GEMMA values efficiently
* [ ] Include metadata record in lmdb and as JSON file
* [ ] Include metadata record on compute status
-* [ ] Remove junk from tarball
-* [ ] List significant markers as metadata
+* [ ] Remove junk from tarball - use lmdb?
+* [ ] List significant markers as metadata in lmdb
* [ ] Reread below info
-* [ ] Submit jobs to PBS
+* [ ] Submit jobs to PBS using CCWL
* [ ] Report results to mariadb
And after: