#+TITLE: Running the common workflow language on GNU Guix

* Introduction

The common workflow language (CWL) can run workflows defined in a YAML
definition. Some key concepts are that CWL workflows can be analysed
and reasoned on (unlike shell scripts) and CWL workflows are a
separation of concerns: (1) tools/scripts, (2) data and (3) the
workflow, i.e. how it connects up.

CWL is also agnostic about finding underlying tooling. Docker links
are often provided as hints, but with ~--no-container~ a tool just
gets invoked. This is great in the context of GNU Guix environments!

* Install CWL using GNU Guix

You may need to install GNU Guix and see the README on
http://git.genenetwork.org/guix-bioinformatics/guix-bioinformatics

Recent versions of GNU Guix contain =cwl-runner=:

: guix pull
: ~/.config/guix/current/bin/guix package -A cwl
:   cwltool 3.0.20201121085451      out     gnu/packages/bioinformatics.scm:2627:2

Install with

: guix package -i cwltool

or in a special profile (I tend to do that)

: guix package -i cwltool -p ~/opt/CWL

Set the PATH and you should be able to run cwltool

: . ~/opt/CWL/etc/profile
: cwltool


* Set up a more advanced workflow

Let's run the workflow that was described in [[https://hpc.guix.info/blog/2019/01/creating-a-reproducible-workflow-with-cwl/][creating a reproducible
workflow with GNU Guix]]:

: git clone https://github.com/pjotrp/CWL-workflows

Build the contained trimmomatic (if you are unlucky this may take a
while)

: cd CWL-workflows
: env GUIX_PACKAGE_PATH=. guix build trimmomatic-jar

Now let's rerun the workflow as set up in above [[https://hpc.guix.info/blog/2019/01/creating-a-reproducible-workflow-with-cwl/][BLOG]] (I created a
local version to skip IPFS). Make sure your PATH points to all the
tools and

: cwltool --no-container Workflows/test-workflow.cwl Jobs/local-small.ERR034597.test-workflow.yml

in the first run gives an error: ERROR 'fastqc' not found. We need to
add the tool to the environment. For this I created a file .guix-deploy
in the root of the repo:

: cat .guix-deploy
: env GUIX_PACKAGE_PATH=.:~/iwrk/opensource/guix/guix-bioinformatics/  ~/.config/guix/current/bin/guix environment -C guix --ad-hoc cwltool trimmomatic-jar bwa fastqc go-ipfs curl --network

You can see it requires the guix-bioinformatics, so you may need to clone
that repo first. Next start the Guix container:

: . ./guix-deploy
: cwltool --no-container Workflows/test-workflow.cwl Jobs/local-small.ERR034597.test-workflow.yml

Now the workflow should run fastq. When it works it should say

: <lots of output>
: INFO Final process status is success

The current workflow is only working partly. It now complains with

ILLUMINACLIP:/gnu/store/v2jys382g6j5b7lsxzh8v4vfhd414nhz-profile/lib/share/jar/adapters/TruSeq2-PE.fa:2:40:15.
Error: Unable to access jarfile /gnu/store/v2jys382g6j5b7lsxzh8v4vfhd414nhz-profile/lib/share/jar/trimmomatic-0.38.jar

This is because I hard coded two paths which you need to point to your Guix
profile first:

: Tools/trimmomaticPE.cwl:    valueFrom: /gnu/store/v2jys382g6j5b7lsxzh8v4vfhd414nhz-profile/lib/share/jar/trimmomatic-0.38.jar
: Tools/trimmomaticPE.cwl:    valueFrom: 'ILLUMINACLIP:/gnu/store/v2jys382g6j5b7lsxzh8v4vfhd414nhz-profile/lib/share/jar/adapters/TruSeq2-PE.fa:2:40:15'

In the container the Guix profile can be found with

: echo $GUIX_ENVIRONMENT

Plug it into above values.  This is not typical and I should find a
proper way to do this. After modifying the source by splitting in the
GUIX_ENVIROMENT it worked.

#+begin_src diff
diff --git a/Tools/trimmomaticPE.cwl b/Tools/trimmomaticPE.cwl
index ed57eb5..aedd23a 100644
--- a/Tools/trimmomaticPE.cwl
+++ b/Tools/trimmomaticPE.cwl
@@ -55,7 +55,7 @@ outputs:

 arguments:
   - position: 1
-    valueFrom: /gnu/store/v2jys382g6j5b7lsxzh8v4vfhd414nhz-profile/lib/share/jar/trimmomatic-0.38.jar
+    valueFrom: /gnu/store/j1ljhxzaxmcqy8v6d4v1y37p48c68f5q-profile/lib/share/jar/trimmomatic-0.38.jar
   - position: 2
     valueFrom: PE
   - position: 5
@@ -67,4 +67,4 @@ arguments:
   - position: 8
     valueFrom: $(inputs.fq2.basename).trim.2U.fastq
   - position: 9
-    valueFrom: 'ILLUMINACLIP:/gnu/store/v2jys382g6j5b7lsxzh8v4vfhd414nhz-profile/lib/share/jar/adapters/TruSeq2-PE.fa:2:40:15'
+    valueFrom: 'ILLUMINACLIP:/gnu/store/j1ljhxzaxmcqy8v6d4v1y37p48c68f5q-profile/lib/share/jar/adapters/TruSeq2-PE.fa:2:40:15'

#+end_src

Try

: . ./guix-deploy
: cwltool --no-container --preserve-environment GUIX_ENVIRONMENT Workflows/test-workflow.cwl Jobs/local-small.ERR034597.test-workflow.yml
: (output)
: INFO Final process status is success


** GUIX_ENVIRONMENT

The question is how to deal with GUIX_ENVIRONMENT. cwltool has a
switch `--preserve-environment ENVVAR'. This value is then available
in the environment, but it is not available to the CWL parser, it
appears.

To automate this I think there are two options:

1. Add GUIX_ENVIRONMENT support to CWL
2. Generate/patch above CWL script before running

The second one is easy if this is part of a Guix package, but
I think we need to add proper support in CWL.