#+TITLE: Running the common workflow language on GNU Guix * Introduction The common workflow language (CWL) can run workflows defined in a YAML definition. Some key concepts are that CWL workflows can be analysed and reasoned on (unlike shell scripts) and CWL workflows are a separation of concerns: (1) tools/scripts, (2) data and (3) the workflow, i.e. how it connects up. CWL is also agnostic about finding underlying tooling. Docker links are often provided as hints, but with ~--no-container~ a tool just gets invoked. This is great in the context of GNU Guix environments! * Install CWL using GNU Guix You may need to install GNU Guix and see the README on http://git.genenetwork.org/guix-bioinformatics/guix-bioinformatics Recent versions of GNU Guix contain =cwl-runner=: : guix pull : ~/.config/guix/current/bin/guix package -A cwl : cwltool 3.0.20201121085451 out gnu/packages/bioinformatics.scm:2627:2 Install with : guix package -i cwltool or in a special profile (I tend to do that) : guix package -i cwltool -p ~/opt/CWL Set the PATH and you should be able to run cwltool : . ~/opt/CWL/etc/profile : cwltool * Set up a more advanced workflow Let's run the workflow that was described in [[https://hpc.guix.info/blog/2019/01/creating-a-reproducible-workflow-with-cwl/][creating a reproducible workflow with GNU Guix]]: : git clone https://github.com/pjotrp/CWL-workflows Build the contained trimmomatic (if you are unlucky this may take a while) : cd CWL-workflows : env GUIX_PACKAGE_PATH=. guix build trimmomatic-jar Now let's rerun the workflow as set up in above [[https://hpc.guix.info/blog/2019/01/creating-a-reproducible-workflow-with-cwl/][BLOG]] (I created a local version to skip IPFS). Make sure your PATH points to all the tools and : cwltool --no-container Workflows/test-workflow.cwl Jobs/local-small.ERR034597.test-workflow.yml in the first run gives an error: ERROR 'fastqc' not found. We need to add the tool to the environment. For this I created a file .guix-deploy in the root of the repo: : cat .guix-deploy : env GUIX_PACKAGE_PATH=.:~/iwrk/opensource/guix/guix-bioinformatics/ ~/.config/guix/current/bin/guix environment -C guix --ad-hoc cwltool trimmomatic-jar bwa fastqc go-ipfs curl --network You can see it requires the guix-bioinformatics, so you may need to clone that repo first. Next start the Guix container: : . ./guix-deploy : cwltool --no-container Workflows/test-workflow.cwl Jobs/local-small.ERR034597.test-workflow.yml Now the workflow should run fastq. When it works it should say : : INFO Final process status is success The current workflow is only working partly. It now complains with ILLUMINACLIP:/gnu/store/v2jys382g6j5b7lsxzh8v4vfhd414nhz-profile/lib/share/jar/adapters/TruSeq2-PE.fa:2:40:15. Error: Unable to access jarfile /gnu/store/v2jys382g6j5b7lsxzh8v4vfhd414nhz-profile/lib/share/jar/trimmomatic-0.38.jar This is because I hard coded two paths which you need to point to your Guix profile first: : Tools/trimmomaticPE.cwl: valueFrom: /gnu/store/v2jys382g6j5b7lsxzh8v4vfhd414nhz-profile/lib/share/jar/trimmomatic-0.38.jar : Tools/trimmomaticPE.cwl: valueFrom: 'ILLUMINACLIP:/gnu/store/v2jys382g6j5b7lsxzh8v4vfhd414nhz-profile/lib/share/jar/adapters/TruSeq2-PE.fa:2:40:15' In the container the Guix profile can be found with : echo $GUIX_ENVIRONMENT Plug it into above values. This is not typical and I should find a proper way to do this. After modifying the source by splitting in the GUIX_ENVIROMENT it worked. #+begin_src diff diff --git a/Tools/trimmomaticPE.cwl b/Tools/trimmomaticPE.cwl index ed57eb5..aedd23a 100644 --- a/Tools/trimmomaticPE.cwl +++ b/Tools/trimmomaticPE.cwl @@ -55,7 +55,7 @@ outputs: arguments: - position: 1 - valueFrom: /gnu/store/v2jys382g6j5b7lsxzh8v4vfhd414nhz-profile/lib/share/jar/trimmomatic-0.38.jar + valueFrom: /gnu/store/j1ljhxzaxmcqy8v6d4v1y37p48c68f5q-profile/lib/share/jar/trimmomatic-0.38.jar - position: 2 valueFrom: PE - position: 5 @@ -67,4 +67,4 @@ arguments: - position: 8 valueFrom: $(inputs.fq2.basename).trim.2U.fastq - position: 9 - valueFrom: 'ILLUMINACLIP:/gnu/store/v2jys382g6j5b7lsxzh8v4vfhd414nhz-profile/lib/share/jar/adapters/TruSeq2-PE.fa:2:40:15' + valueFrom: 'ILLUMINACLIP:/gnu/store/j1ljhxzaxmcqy8v6d4v1y37p48c68f5q-profile/lib/share/jar/adapters/TruSeq2-PE.fa:2:40:15' #+end_src Try : . ./guix-deploy : cwltool --no-container --preserve-environment GUIX_ENVIRONMENT Workflows/test-workflow.cwl Jobs/local-small.ERR034597.test-workflow.yml : (output) : INFO Final process status is success ** GUIX_ENVIRONMENT The question is how to deal with GUIX_ENVIRONMENT. cwltool has a switch `--preserve-environment ENVVAR'. This value is then available in the environment, but it is not available to the CWL parser, it appears. To automate this I think there are two options: 1. Add GUIX_ENVIRONMENT support to CWL 2. Generate/patch above CWL script before running The second one is easy if this is part of a Guix package, but I think we need to add proper support in CWL.