summary refs log tree commit diff
path: root/topics/data/precompute
diff options
context:
space:
mode:
Diffstat (limited to 'topics/data/precompute')
-rw-r--r--topics/data/precompute/steps.gmi28
1 files changed, 21 insertions, 7 deletions
diff --git a/topics/data/precompute/steps.gmi b/topics/data/precompute/steps.gmi
index 75e3bfd..ac03d1a 100644
--- a/topics/data/precompute/steps.gmi
+++ b/topics/data/precompute/steps.gmi
@@ -2,7 +2,8 @@
 
 At this stage precompute fetches a trait from the DB and runs GEMMA. Next it tar balls up the vector for later use. It also updates the database with the latest info.
 
-To actually kick off compute on machines that do not access the DB I realize now we need a step-wise approach. Basically you want to shift files around without connecting to a DB. And then update the DB whenever it is convenient. So we are going to make it a multi-step procedure. I don't have to write all code because we have a working runner. I just need to chunk the work.
+To actually kick off compute on machines that do not access the DB I realize now we need a step-wise approach. Basically you want to shift files around without connecting to a DB. And then update the DB whenever it is convenient. So we are going to make it a multi-step procedure.
+We need to chunk the work.
 
 We will track precompute steps here. We will have:
 
@@ -13,8 +14,18 @@ We will track precompute steps here. We will have:
 Trait archives will have steps for
 
 * [X] step p1: list-traits-to-compute
-* [+] step p2: gemma-lmm9-loco-output: Compute standard GEMMA lmm9 LOCO vector with gemma-wrapper
-* [ ] step p3: gemma-to-lmdb: create a clean vector
+* [X] step p2: gemma-lmm9-loco-output: Compute standard GEMMA lmm9 LOCO vector with gemma-wrapper
+* [X] step p3: gemma-to-lmdb: create a clean vector
+
+Start precompute
+
+* [ ] Fetch traits on tux04
+* [ ] Set up runner on tux04 and others
+* [ ] Run on Octopus
+
+Work on published data
+
+* [ ] Fetch traits
 
 The DB itself can be updated from these
 
@@ -22,8 +33,11 @@ The DB itself can be updated from these
 
 Later
 
+* [ ] Rqtl2: Compute Rqtl2 vector
 * [ ] bulklmm: Compute bulklmm vector
 
+Interestingly this work coincides with Arun's work on CWL. Rather than trying to write a workflow in bash, we'll use ccwl and accompanying tools to scale up the effort.
+
 # Tags
 
 * assigned: pjotrp
@@ -36,10 +50,10 @@ Later
 
 * [ ] Check Artyoms LMDB version for kinship and maybe add LOCO
 * [+] Create JSON metadata controller for every compute incl. type of content
-* [+] Create genotype archive
-* [+] Create kinship archive
+* [X] Create genotype archive
+* [X] Create kinship archive
 * [+] Create trait archives
-* [+] Kick off lmm9 step
+* [X] Kick off lmm9 step
 * [ ] Update DB step v1
 
 # Step p1: list traits to compute
@@ -62,7 +76,7 @@ At this point we can write
 {"2":9.40338,"3":10.196,"4":10.1093,"5":9.42362,"6":9.8285,"7":10.0808,"8":9.17844,"9":10.1527,"10":10.1167,"11":9.88551,"13":9.58127,"15":9.82312,"17":9.88005,"19":10.0761,"20":10.2739,"21":9.54171,"22":10.1056,"23":10.5702,"25":10.1433,"26":9.68685,"28":9.98464,"29":10.132,"30":9.96049,"31":10.2055,"35":10.1406,"36":9.94794,"37":9.96864,"39":9.31048}
 ```
 
-Note that it (potentially) includes the parents. Also the strain-id is a string and we may want to plug in the strain name. To allow for easy comparison downstream. Finally we may want to store a checksum of sorts. In Guile this can be achieved with:
+Note that it (potentially) includes the parents and that is corrected when generating the phenotype file for GEMMA. Also the strain-id is a string and we may want to plug in the strain name. To allow for easy comparison downstream. Finally we may want to store a checksum of sorts. In Guile this can be achieved with:
 
 ```scheme
 (use-modules  (rnrs bytevectors)