From 99fb6ae36bf03fc06c84e4ae45e741c6f7c0902a Mon Sep 17 00:00:00 2001 From: Pjotr Prins Date: Mon, 3 Jun 2024 09:25:13 +0200 Subject: precompute --- topics/data/precompute/steps.gmi | 32 ++++++++++++++++++++-- topics/deploy/machines.gmi | 5 ++-- .../mariadb/precompute-mapping-input-data.gmi | 6 ++-- 3 files changed, 35 insertions(+), 8 deletions(-) (limited to 'topics') diff --git a/topics/data/precompute/steps.gmi b/topics/data/precompute/steps.gmi index 33915cd..75e3bfd 100644 --- a/topics/data/precompute/steps.gmi +++ b/topics/data/precompute/steps.gmi @@ -8,12 +8,12 @@ We will track precompute steps here. We will have: * [X] steps g: genotype archives (first we only do BXD-latest, include BXD.json) * [X] steps k: kinship archives (first we only do BXD-latest) -* [ ] steps p: trait archives (first we do p1-3) +* [X] steps p: trait archives (first we do p1-3) Trait archives will have steps for -* [+] step p1: list-traits-to-compute -* [ ] step p2: gemma-lmm9-loco-output: Compute standard GEMMA lmm9 LOCO vector with gemma-wrapper +* [X] step p1: list-traits-to-compute +* [+] step p2: gemma-lmm9-loco-output: Compute standard GEMMA lmm9 LOCO vector with gemma-wrapper * [ ] step p3: gemma-to-lmdb: create a clean vector The DB itself can be updated from these @@ -74,3 +74,29 @@ Note that it (potentially) includes the parents. Also the strain-id is a string from guile-hashing and guile-gcrypt modules. In the next step we have to check the normal distribution of the trait values and maybe winsorize outliers. +Actually we should brute force the default first. +Be interesting to see the effect of handling outliers and normalisation of phenotypes. + +# Step p2: run GEMMA lmm9 + LOCO + +Last week I checked out Arun's CCWL. Anything more complicated that a few steps can go into CCWL. It will also handle cluster work loads using, for example, toil. We will use trait files + genotypes as inputs for gemma-wrapper initially. + +p1: Workflow 1: + +``` +batch traits from DB +``` + +p2+p3: Workflow 2: + +``` +batched traits, genotypes -> + gemma-wrapper -> + lmdb'ize +``` + +p4: Workflow 3: + +``` +Update DB +``` diff --git a/topics/deploy/machines.gmi b/topics/deploy/machines.gmi index 21ff2ea..d610c9f 100644 --- a/topics/deploy/machines.gmi +++ b/topics/deploy/machines.gmi @@ -1,12 +1,13 @@ # Machines ``` +- [ ] bacchus 172.23.17.156 (00:11:32:ba:7f:17) - 1 Gbs - [X] lambda01 172.23.18.212 (7c:c2:55:11:9c:ac) -- [ ] tux03i 172.23.17.181 (00:0a:f7:c1:00:8d) - 10 Gbs +- [X] tux03i 172.23.17.181 (00:0a:f7:c1:00:8d) - 10 Gbs [X] tux03 128.169.5.101 (00:0a:f7:c1:00:8b) - 1 Gbs - [ ] tux04i 172.23.17.170 (14:23:f2:4f:e6:10) - [ ] tux04 128.169.5.119 (14:23:f2:4f:e6:11) -- [ ] tux05 172.23.18.129 (14:23:f2:4f:35:00) +- [X] tux05 172.23.18.129 (14:23:f2:4f:35:00) - [X] tux06 172.23.17.188 (14:23:f2:4e:29:10) - [X] tux07 172.23.17.191 (14:23:f2:4e:7d:60) - [X] tux08 172.23.17.186 (14:23:f2:4f:4e:b0) diff --git a/topics/systems/mariadb/precompute-mapping-input-data.gmi b/topics/systems/mariadb/precompute-mapping-input-data.gmi index 2e73590..0c89fe5 100644 --- a/topics/systems/mariadb/precompute-mapping-input-data.gmi +++ b/topics/systems/mariadb/precompute-mapping-input-data.gmi @@ -27,10 +27,10 @@ Next, for running the full batch: * [X] Store all GEMMA values efficiently * [ ] Include metadata record in lmdb and as JSON file * [ ] Include metadata record on compute status -* [ ] Remove junk from tarball -* [ ] List significant markers as metadata +* [ ] Remove junk from tarball - use lmdb? +* [ ] List significant markers as metadata in lmdb * [ ] Reread below info -* [ ] Submit jobs to PBS +* [ ] Submit jobs to PBS using CCWL * [ ] Report results to mariadb And after: -- cgit v1.2.3