aboutsummaryrefslogtreecommitdiff
path: root/deploy/CWL/run-common-workflow-language.org
blob: 9658879a3adacef142335d2dbeecf3b8da787083 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
#+TITLE: Running the common workflow language on GNU Guix

* Introduction

The common workflow language (CWL) can run workflows defined in a YAML
definition. Some key concepts are that CWL workflows can be analysed
and reasoned on (unlike shell scripts) and CWL workflows are a
separation of concerns: (1) tools/scripts, (2) data and (3) the
workflow, i.e. how it connects up.

CWL is also agnostic about finding underlying tooling. Docker links
are often provided as hints, but with ~--no-container~ a tool just
gets invoked. This is great in the context of GNU Guix environments!

* Install CWL using GNU Guix

You may need to install GNU Guix and see the README on
http://git.genenetwork.org/guix-bioinformatics/guix-bioinformatics

Recent versions of GNU Guix contain =cwl-runner=:

: guix pull
: ~/.config/guix/current/bin/guix package -A cwl
:   cwltool 3.0.20201121085451      out     gnu/packages/bioinformatics.scm:2627:2

Install with

: guix package -i cwltool

or in a special profile (I tend to do that)

: guix package -i cwltool -p ~/opt/CWL

Set the PATH and you should be able to run cwltool

: . ~/opt/CWL/etc/profile
: cwltool


* Set up a more advanced workflow

Let's run the workflow that was described in [[https://hpc.guix.info/blog/2019/01/creating-a-reproducible-workflow-with-cwl/][creating a reproducible
workflow with GNU Guix]]:

: git clone https://github.com/pjotrp/CWL-workflows

Build the contained trimmomatic (if you are unlucky this may take a
while)

: cd CWL-workflows
: env GUIX_PACKAGE_PATH=. guix build trimmomatic-jar

Now let's rerun the workflow as set up in above [[https://hpc.guix.info/blog/2019/01/creating-a-reproducible-workflow-with-cwl/][BLOG]] (I created a
local version to skip IPFS). Make sure your PATH points to all the
tools and

: cwltool --no-container Workflows/test-workflow.cwl Jobs/local-small.ERR034597.test-workflow.yml

in the first run gives an error: ERROR 'fastqc' not found. We need to
add the tool to the environment. For this I created a file .guix-deploy
in the root of the repo:

: cat .guix-deploy
: env GUIX_PACKAGE_PATH=.:~/iwrk/opensource/guix/guix-bioinformatics/  ~/.config/guix/current/bin/guix environment -C guix --ad-hoc cwltool trimmomatic-jar bwa fastqc go-ipfs curl --network

You can see it requires the guix-bioinformatics, so you may need to clone
that repo first. Next start the Guix container:

: . ./guix-deploy
: cwltool --no-container Workflows/test-workflow.cwl Jobs/local-small.ERR034597.test-workflow.yml

Now the workflow should run fastq. When it works it should say

: <lots of output>
: INFO Final process status is success

The current workflow is only working partly. It now complains with

ILLUMINACLIP:/gnu/store/v2jys382g6j5b7lsxzh8v4vfhd414nhz-profile/lib/share/jar/adapters/TruSeq2-PE.fa:2:40:15.
Error: Unable to access jarfile /gnu/store/v2jys382g6j5b7lsxzh8v4vfhd414nhz-profile/lib/share/jar/trimmomatic-0.38.jar

This is because I hard coded two paths which you need to point to your Guix
profile first:

: Tools/trimmomaticPE.cwl:    valueFrom: /gnu/store/v2jys382g6j5b7lsxzh8v4vfhd414nhz-profile/lib/share/jar/trimmomatic-0.38.jar
: Tools/trimmomaticPE.cwl:    valueFrom: 'ILLUMINACLIP:/gnu/store/v2jys382g6j5b7lsxzh8v4vfhd414nhz-profile/lib/share/jar/adapters/TruSeq2-PE.fa:2:40:15'

In the container the Guix profile can be found with

: echo $GUIX_ENVIRONMENT

Plug it into above values.  This is not typical and I should find a
proper way to do this. cwltool has a switch `--preserve-environment
ENVVAR'. After modifying the source by splitting in the GUIX_ENVIROMENT
it worked.

#+begin_src diff
diff --git a/Tools/trimmomaticPE.cwl b/Tools/trimmomaticPE.cwl
index ed57eb5..aedd23a 100644
--- a/Tools/trimmomaticPE.cwl
+++ b/Tools/trimmomaticPE.cwl
@@ -55,7 +55,7 @@ outputs:

 arguments:
   - position: 1
-    valueFrom: /gnu/store/v2jys382g6j5b7lsxzh8v4vfhd414nhz-profile/lib/share/jar/trimmomatic-0.38.jar
+    valueFrom: /gnu/store/j1ljhxzaxmcqy8v6d4v1y37p48c68f5q-profile/lib/share/jar/trimmomatic-0.38.jar
   - position: 2
     valueFrom: PE
   - position: 5
@@ -67,4 +67,4 @@ arguments:
   - position: 8
     valueFrom: $(inputs.fq2.basename).trim.2U.fastq
   - position: 9
-    valueFrom: 'ILLUMINACLIP:/gnu/store/v2jys382g6j5b7lsxzh8v4vfhd414nhz-profile/lib/share/jar/adapters/TruSeq2-PE.fa:2:40:15'
+    valueFrom: 'ILLUMINACLIP:/gnu/store/j1ljhxzaxmcqy8v6d4v1y37p48c68f5q-profile/lib/share/jar/adapters/TruSeq2-PE.fa:2:40:15'

#+end_src

Try

: . ./guix-deploy
: cwltool --no-container --preserve-environment GUIX_ENVIRONMENT Workflows/test-workflow.cwl Jobs/local-small.ERR034597.test-workflow.yml
: (output)
: INFO Final process status is success