aboutsummaryrefslogtreecommitdiff
path: root/deploy/CWL/run-common-workflow-language.org
blob: b0117fafca46b8871847bd3fae890cb3a6dca648 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
#+TITLE: Running the common workflow language on GNU Guix

* Introduction

The common workflow language (CWL) can run workflows defined in a YAML
definition. Some key concepts are that CWL workflows can be analysed
and reasoned on (unlike shell scripts) and CWL workflows are a
separation of concerns: (1) tools/scripts, (2) data and (3) the
workflow, i.e. how it connects up.

CWL is also agnostic about finding underlying tooling. Docker links
are often provided as hints, but with ~--no-container~ a tool just
gets invoked. This is great in the context of GNU Guix environments!

* Install CWL using GNU Guix

You may need to install GNU Guix and see the README on
http://git.genenetwork.org/guix-bioinformatics/guix-bioinformatics

Recent versions of GNU Guix contain =cwl-runner=:

: guix pull
: ~/.config/guix/current/bin/guix package -A cwl
:   cwltool 3.0.20201121085451      out     gnu/packages/bioinformatics.scm:2627:2

Install with

: guix package -i cwltool

or in a special profile (I tend to do that)

: guix package -i cwltool -p ~/opt/CWL

Set the PATH and you should be able to run cwltool

: . ~/opt/CWL/etc/profile
: cwltool


* Set up a more advanced workflow

Let's run the workflow that was described in [[https://hpc.guix.info/blog/2019/01/creating-a-reproducible-workflow-with-cwl/][creating a reproducible
workflow with GNU Guix]]:

: git clone https://github.com/pjotrp/CWL-workflows

Build the contained trimmomatic (if you are unlucky this may take a
while)

: cd CWL-workflows
: env GUIX_PACKAGE_PATH=. guix build trimmomatic-jar

Now let's rerun the workflow as set up in above [[https://hpc.guix.info/blog/2019/01/creating-a-reproducible-workflow-with-cwl/][BLOG]] (I created a
local version to skip IPFS). Make sure your PATH points to all the
tools and

: cwltool --no-container Workflows/test-workflow.cwl Jobs/local-small.ERR034597.test-workflow.yml

in the first run gives an error: ERROR 'fastqc' not found. We need to
add the tool to the environment. For this I created a file .guix-deploy
in the root of the repo:

: cat .guix-deploy
: env GUIX_PACKAGE_PATH=.:~/iwrk/opensource/guix/guix-bioinformatics/  ~/.config/guix/current/bin/guix environment -C guix --ad-hoc cwltool trimmomatic-jar bwa fastqc go-ipfs curl --network

You can see it requires the guix-bioinformatics, so you may need to clone
that repo first. Next start the Guix container:

: . ./guix-deploy
: cwltool --no-container Workflows/test-workflow.cwl Jobs/local-small.ERR034597.test-workflow.yml

Now the workflow should run fastq. When it works it should say

: <lots of output>
: INFO Final process status is success

The current workflow is only working partly. It now complains with

ILLUMINACLIP:/gnu/store/v2jys382g6j5b7lsxzh8v4vfhd414nhz-profile/lib/share/jar/adapters/TruSeq2-PE.fa:2:40:15.
Error: Unable to access jarfile /gnu/store/v2jys382g6j5b7lsxzh8v4vfhd414nhz-profile/lib/share/jar/trimmomatic-0.38.jar

This is because I hard coded two paths which you need to point to your Guix
profile first:

: Tools/trimmomaticPE.cwl:    valueFrom: /gnu/store/v2jys382g6j5b7lsxzh8v4vfhd414nhz-profile/lib/share/jar/trimmomatic-0.38.jar
: Tools/trimmomaticPE.cwl:    valueFrom: 'ILLUMINACLIP:/gnu/store/v2jys382g6j5b7lsxzh8v4vfhd414nhz-profile/lib/share/jar/adapters/TruSeq2-PE.fa:2:40:15'

In the container the Guix profile can be found with

: echo $GUIX_ENVIRONMENT

Plug it into above values.  This is not typical and I should find a
proper way to do this. After modifying the source by splitting in the
GUIX_ENVIROMENT it worked.

#+begin_src diff
diff --git a/Tools/trimmomaticPE.cwl b/Tools/trimmomaticPE.cwl
index ed57eb5..aedd23a 100644
--- a/Tools/trimmomaticPE.cwl
+++ b/Tools/trimmomaticPE.cwl
@@ -55,7 +55,7 @@ outputs:

 arguments:
   - position: 1
-    valueFrom: /gnu/store/v2jys382g6j5b7lsxzh8v4vfhd414nhz-profile/lib/share/jar/trimmomatic-0.38.jar
+    valueFrom: /gnu/store/j1ljhxzaxmcqy8v6d4v1y37p48c68f5q-profile/lib/share/jar/trimmomatic-0.38.jar
   - position: 2
     valueFrom: PE
   - position: 5
@@ -67,4 +67,4 @@ arguments:
   - position: 8
     valueFrom: $(inputs.fq2.basename).trim.2U.fastq
   - position: 9
-    valueFrom: 'ILLUMINACLIP:/gnu/store/v2jys382g6j5b7lsxzh8v4vfhd414nhz-profile/lib/share/jar/adapters/TruSeq2-PE.fa:2:40:15'
+    valueFrom: 'ILLUMINACLIP:/gnu/store/j1ljhxzaxmcqy8v6d4v1y37p48c68f5q-profile/lib/share/jar/adapters/TruSeq2-PE.fa:2:40:15'

#+end_src

Try

: . ./guix-deploy
: cwltool --no-container --preserve-environment GUIX_ENVIRONMENT Workflows/test-workflow.cwl Jobs/local-small.ERR034597.test-workflow.yml
: (output)
: INFO Final process status is success


** GUIX_ENVIRONMENT

The question is how to deal with GUIX_ENVIRONMENT. cwltool has a
switch `--preserve-environment ENVVAR'. This value is then available
in the environment, but it is not available to the CWL parser, it
appears.

To automate this I think there are two options:

1. Add GUIX_ENVIRONMENT support to CWL
2. Generate/patch above CWL script before running

The second one is easy if this is part of a Guix package, but
I think we need to add proper support in CWL.