Introduction
The common workflow language (CWL) can run workflows defined in a YAML definition. Some key concepts are that CWL workflows can be analysed and reasoned on (unlike shell scripts) and CWL workflows are a separation of concerns: (1) tools/scripts, (2) data and (3) the workflow, i.e. how it connects up.
CWL is also agnostic about finding underlying tooling. Docker links
are often provided as hints, but with --no-container
a tool just
gets invoked. This is great in the context of GNU Guix environments!
Install CWL using GNU Guix
You may need to install GNU Guix and see the README on http://git.genenetwork.org/guix-bioinformatics/guix-bioinformatics
Recent versions of GNU Guix contain cwl-runner
:
guix pull ~/.config/guix/current/bin/guix package -A cwl cwltool 3.0.20201121085451 out gnu/packages/bioinformatics.scm:2627:2
Install with
guix package -i cwltool
or in a special profile (I tend to do that)
guix package -i cwltool -p ~/opt/CWL
Set the PATH and you should be able to run cwltool
. ~/opt/CWL/etc/profile cwltool
Set up a more advanced workflow
Let's run the workflow that was described in creating a reproducible workflow with GNU Guix:
git clone https://github.com/pjotrp/CWL-workflows
Build the contained trimmomatic (if you are unlucky this may take a while)
cd CWL-workflows env GUIX_PACKAGE_PATH=. guix build trimmomatic-jar
Now let's rerun the workflow as set up in above BLOG (I created a local version to skip IPFS). Make sure your PATH points to all the tools and
cwltool --no-container Workflows/test-workflow.cwl Jobs/local-small.ERR034597.test-workflow.yml
in the first run gives an error: ERROR 'fastqc' not found. We need to add the tool to the environment. For this I created a file .guix-deploy in the root of the repo:
cat .guix-deploy env GUIX_PACKAGE_PATH=.:~/iwrk/opensource/guix/guix-bioinformatics/ ~/.config/guix/current/bin/guix environment -C guix --ad-hoc cwltool trimmomatic-jar bwa fastqc go-ipfs curl --network
You can see it requires the guix-bioinformatics, so you may need to clone that repo first. Next start the Guix container:
. ./guix-deploy cwltool --no-container Workflows/test-workflow.cwl Jobs/local-small.ERR034597.test-workflow.yml
Now the workflow should run fastq. When it works it should say
<lots of output> INFO Final process status is success
The current workflow is only working partly. It now complains with
ILLUMINACLIP:/gnu/store/v2jys382g6j5b7lsxzh8v4vfhd414nhz-profile/lib/share/jar/adapters/TruSeq2-PE.fa:2:40:15. Error: Unable to access jarfile /gnu/store/v2jys382g6j5b7lsxzh8v4vfhd414nhz-profile/lib/share/jar/trimmomatic-0.38.jar
This is because I hard coded two paths which you need to point to your Guix profile first:
Tools/trimmomaticPE.cwl: valueFrom: /gnu/store/v2jys382g6j5b7lsxzh8v4vfhd414nhz-profile/lib/share/jar/trimmomatic-0.38.jar Tools/trimmomaticPE.cwl: valueFrom: 'ILLUMINACLIP:/gnu/store/v2jys382g6j5b7lsxzh8v4vfhd414nhz-profile/lib/share/jar/adapters/TruSeq2-PE.fa:2:40:15'
In the container the Guix profile can be found with
echo $GUIX_ENVIRONMENT
Plug it into above values. This is not typical and I should find a proper way to do this. After modifying the source by splitting in the GUIXENVIROMENT it worked.
diff --git a/Tools/trimmomaticPE.cwl b/Tools/trimmomaticPE.cwl index ed57eb5..aedd23a 100644 --- a/Tools/trimmomaticPE.cwl +++ b/Tools/trimmomaticPE.cwl @@ -55,7 +55,7 @@ outputs: arguments: - position: 1 - valueFrom: /gnu/store/v2jys382g6j5b7lsxzh8v4vfhd414nhz-profile/lib/share/jar/trimmomatic-0.38.jar + valueFrom: /gnu/store/j1ljhxzaxmcqy8v6d4v1y37p48c68f5q-profile/lib/share/jar/trimmomatic-0.38.jar - position: 2 valueFrom: PE - position: 5 @@ -67,4 +67,4 @@ arguments: - position: 8 valueFrom: $(inputs.fq2.basename).trim.2U.fastq - position: 9 - valueFrom: 'ILLUMINACLIP:/gnu/store/v2jys382g6j5b7lsxzh8v4vfhd414nhz-profile/lib/share/jar/adapters/TruSeq2-PE.fa:2:40:15' + valueFrom: 'ILLUMINACLIP:/gnu/store/j1ljhxzaxmcqy8v6d4v1y37p48c68f5q-profile/lib/share/jar/adapters/TruSeq2-PE.fa:2:40:15'
Try
. ./guix-deploy cwltool --no-container --preserve-environment GUIX_ENVIRONMENT Workflows/test-workflow.cwl Jobs/local-small.ERR034597.test-workflow.yml (output) INFO Final process status is success
GUIXENVIRONMENT
The question is how to deal with GUIXENVIRONMENT. cwltool has a switch `–preserve-environment ENVVAR'. This value is then available in the environment, but it is not available to the CWL parser, it appears.
To automate this I think there are two options:
- Add GUIXENVIRONMENT support to CWL
- Generate/patch above CWL script before running
The second one is easy if this is part of a Guix package, but I think we need to add proper support in CWL.