diff options
Diffstat (limited to 'README.org')
| -rw-r--r-- | README.org | 220 |
1 files changed, 107 insertions, 113 deletions
diff --git a/README.org b/README.org index be8db7c..0b6f8bc 100644 --- a/README.org +++ b/README.org @@ -1,136 +1,120 @@ * guix-bioinformatics -IMPORTANT: this repository has moved to https://git.genenetwork.org/guix-bioinformatics/! - -Bioinformatics packages for GNU Guix that are used in -https://genenetwork.org/ and some other places. See [[https://gitlab.com/pjotrp/guix-notes/blob/master/HACKING.org][Guix notes]] for -installing and hacking GNU Guix. Other channels of bioinformatics -interest can be found at - -1. https://github.com/BIMSBbioinfo -2. https://github.com/UMCUGenetics/guix-additions -3. https://github.com/ekg/guix-genomics - -Also see [[http://git.genenetwork.org/pjotrp/guix-notes/src/branch/master/CHANNELS.org][Guix notes]] for a list of channels. - -To easily use the packages from this repo, simply add it to your -`channels` list in ~/.config/guix/channels.scm as described -[[https://guix.gnu.org/manual/en/html_node/Channels.html][here]]: +IMPORTANT: this repository lives at https://git.genenetwork.org/guix-bioinformatics/! + +Over 300 older packages have been moved to https://git.genenetwork.org/guix-bioinformatics-past/. Check out the README to see what packages are there. + +Over 300 bioinformatics packages for Guix that are used in https://genenetwork.org/ and some other places. +Mostly targetting genomics, pangenomics and genetics. + +** Pangenome tools (pangenomes meta-package) + +The =pangenomes= meta-package provides a comprehensive pangenomics toolkit: + +| Tool | Version | Description | +|----------------+--------------+------------------------------------------------| +| pggb | 0.7.4 | PanGenome Graph Builder pipeline | +| wfmash | 0.14.0 | Whole-genome Fuzzy Mapping and Alignment | +| seqwish | 0.7.11 | Sequence graph induction from alignments | +| smoothxg | 0.8.2 | Graph normalization via partial order alignment | +| odgi | 0.9.0 | Optimized Dynamic Genome/Graph Implementation | +| vg | 1.72.0 | Variation graph toolkit | +| impg | 0.4.1 | Implicit pangenome graph queries | +| minimap2 | 2.28 | Fast pairwise aligner (from Guix upstream) | +| bwa-mem2 | 2.3 | Burrows-Wheeler Aligner for short reads | +| samtools | 1.19 | SAM/BAM/CRAM manipulation (from Guix upstream) | +| htslib | 1.21 | HTSlib C library (from Guix upstream) | +| bedtools | 2.31.1 | Genome interval tools (from Guix upstream) | +| bcftools | 1.21 | VCF/BCF manipulation (from Guix upstream) | +| vcflib | 1.0.15 | VCF manipulation library and tools | +| vcfbub | 0.1.0 | VCF bubble popping | +| bandage-ng | 2026.4.1 | Assembly graph visualizer (Qt6) | +| gfalook | 0.1.0 | GFA visualization (odgi viz reimplementation) | +| pafplot | 0.1.0 | PAF alignment dotplot renderer | +| wally | 0.7.1 | Structural variant visualization | +| agc | 2.1 | Assembled Genomes Compressor | +| cigzip | 0.1.0 | CIGAR compression with tracepoints | +| cosigt | 0.1.7 | Pangenome haplotype genotyping | +| gfainject | 0.1.0 | BAM-to-GAF graph injection | +| gafpack | 0.0.0 | GAF coverage vector extraction | +| gfaffix | 0.2.1 | Walk-preserving graph simplification | +| gfautil | 0.4.0 | GFA format utilities | +| fastga-rs | 0.1.2 | Fast genome aligner (Rust) | +| fastix | 0.1.0 | FASTA header prefix renaming (PanSN) | +| kfilt | 0.1.1 | K-mer filtering | +| meryl | 1.4.1 | K-mer counting and set operations | +| miniprot | 0.18 | Protein-to-genome aligner | +| pangene | 1.1 | Gene-level pangenome analysis | +| rtg-tools | 3.13 | VCF evaluation (vcfeval) | + +** MEMPANG workshop (mempang-workshop meta-package) + +Extends =pangenomes= with R plotting, Python, and general utilities +for the MEMPANG pangenome workshop tutorials: + +| Category | Packages | +|----------------+------------------------------------------------------| +| R packages | r-ggplot2, r-tidyverse, r-ape, r-ggtree, r-gggenes | +| Python | python, python-igraph, python-pycairo | +| Utilities | graphviz, gnuplot, parallel, pigz, wget, zstd, bc | +| QC | multiqc, mummer | + +** GeneNetwork packages + +| Package | Version | Description | +|----------------------+--------------+---------------------------------------| +| genenetwork2 | 3.11 | GeneNetwork2 web application | +| genenetwork3 | 0.1.0 | GeneNetwork3 REST API | +| gn-auth | 1.0.1 | GN authentication service | +| gn-guile | 4.0.0 | Guile utilities for GN | +| gn-libs | 0.0.0 | Shared Python libraries | +| gn-uploader | 0.1.1 | Data uploader | +| gemma-wrapper | 0.99.6 | GEMMA CLI wrapper | +| gemma-gn2 | 0.98.5 | GEMMA for GeneNetwork2 | +| genecup | 1.8 | GeneCup literature mining | + +See Guix documentation and [[https://gitlab.com/pjotrp/guix-notes/blob/master/HACKING.org][Guix notes]] for installing and hacking Guix. + +See [[https://github.com/franzos/awesome-guix][awesome guix]] for a list of other channels. + +To easily use the packages from this repo, simply add it to your `channels` list in ~/.config/guix/channels.scm as described [[https://guix.gnu.org/manual/en/html_node/Channels.html][here]]: #+BEGIN_SRC scheme + ;; example channels.scm (list (channel - (name 'gn-bioinformatics) + (name 'guix-bioinformatics) (url "https://git.genenetwork.org/guix-bioinformatics") - (branch "master"))) + (branch "main"))) #+END_SRC -and run /guix pull/ like normal to update your software. This is the -recommended way to use the software from this repository and the code -snippets in this README assume you have done so. In order to maintain -stability, the guix-bioinformatics channel depends on a specific -commit of upstream Guix. So, it is recommended to isolate use of the -guix-bioinformatics channel in a separate /guix pull/ profile. That is described [[https://issues.genenetwork.org/topics/guix-profiles][here]]. - -If you want to make changes to the packages in this repo you can set -the GUIX_PACKAGE_PATH to point to the root of this directory -before running Guix. E.g. +and run /guix pull/ like normal to update your software. E.g. -#+BEGIN_SRC bash - git clone https://git.genenetwork.org/guix-bioinformatics/guix-bioinformatics.git - git clone https://gitlab.inria.fr/guix-hpc/guix-past.git - export GUIX_PACKAGE_PATH=$PWD/guix-bioinformatics/:$PWD/guix-past/modules - guix package -A cwl +#+BEGIN_SRC sh + guix pull --url=https://codeberg.org/guix/guix -p ~/opt/guix-bioinformatics --channels=channels.scm #+END_SRC -or using a checked out Guix repo with - -: env GUIX_PACKAGE_PATH=$genenetwork/guix-bioinformatics/ ./pre-inst-env guix package -A cwl - -Some (or most) of these package definitions should make it upstream -into the GNU Guix repository when tested and stable. - -* Slurm and munge - -Install slurm with +The channel file actually accesses https://git.genenetwork.org/guix-bioinformatics/tree/.guix-channel which pulls other channels and fixates the hashes. The commit hash b0fa1dc can be found from the guix you want to run with /guix -V/, it speeds up installation and makes it reproducible. Note that the upstream channel may override that version. -#+BEGIN_SRC bash - guix pull - guix package -i slurm-llnl - - ~/.guix-profile/sbin/slurmd -C -D - ClusterName=(null) NodeName=selinunte CPUs=4 Boards=1 SocketsPerBoard=1 CoresPerSocket=2 ThreadsPerCore=2 RealMemory=7890 TmpDisk=29909 -#+END_SRC - - -* Common Workflow Language (CWL) - -/Note that CWL moved into Guix master!/ - -Install the common workflow language tool cwltool from this repo with - -#+BEGIN_SRC bash +The latest channel file that is used by our CI/CD you can find at https://ci.genenetwork.org/channels.scm. - guix pull - export PATH=$HOME/.config/guix/current/bin/guix:$PATH - ~/guix-bioinformatics$ env GUIX_PACKAGE_PATH=.:../guix-past/modules/ ~/.config/guix/current/bin/guix package -i cwl-runner -p ~/opt/CWL +Channels are to maintain stability, the guix-bioinformatics channel depends on a specific commit of upstream Guix. So, it is recommended to isolate use of the guix-bioinformatics channel in a separate /guix pull/ profile, described [[https://issues.genenetwork.org/topics/guix-profiles][here]]. -The following package will be installed: - cwl-runner 1.0 +You can use the --tune=native switch to optimize performance when installing pangenome tools and gemma. -The following derivations will be built: - /gnu/store/ld59374zr45rbqanh7ccfi2wa4d5x4yl-cwl-runner-1.0.drv - /gnu/store/86j15mxj5zp3k3sjimhqhb6zsj19azsf-python-schema-salad-7.0.20200811075006.drv - /gnu/store/0q2ls0is3253r4gx6hs7kmvlcz412lh1-schema-salad-7.0.20200811075006.tar.gz.drv - /gnu/store/myj1365ph687ynahjhg6zqslrmd6zpjq-cwltool-3.0.20201117141248.drv - -source ~/opt/CWL/etc/profile -cwltool --version - /gnu/store/50mncjcgc8vmq5dfrh0pb82avbzy8c4r-cwltool-3.0.20201117141248/bin/.cwltool-real 3.0 -#+END_SRC - -To run CWL definitions you can install tools in a Guix environment (avoiding -Docker). Say you need mafft in a workflow - -#+begin_src sh - ~/guix-bioinformatics$ env GUIX_PACKAGE_PATH=.:../guix-past/modules/ ~/.config/guix/current/bin/guix environment \ - guix --ad-hoc cwl-runner mafft -#+end_src - -in the new shell you should be able to find both CWL and MAFFT: - -#+begin_src sh -ls $GUIX_ENVIRONMENT/bin/cwl* -/gnu/store/bhfc5rk29s38w9kgcl4zmcdlh369y9f9-profile/bin/cwl-runner -/gnu/store/bhfc5rk29s38w9kgcl4zmcdlh369y9f9-profile/bin/cwltool -ls $GUIX_ENVIRONMENT/bin/mafft -/gnu/store/bhfc5rk29s38w9kgcl4zmcdlh369y9f9-profile/bin/mafft -#+end_src - -The paths can be loaded into the shell with - -: source $GUIX_ENVIRONMENT/etc/profile +* Development tips -* Module system +** Modify the load path -For those who think they need modules: install the module environment -with +If you want to make changes to the packages in this repo you can set the GUIX_PACKAGE_PATH (or use the -L switch) to point to the root of this directory before running Guix. E.g. #+BEGIN_SRC bash - guix pull - guix package -i environment-modules - - modulecmd --version - VERSION=3.2.10 - DATE=2012-12-21 + git clone https://git.genenetwork.org/guix-bioinformatics/guix-bioinformatics.git + guix package -A cwl #+END_SRC -Note that GNU Guix supercedes module functionality! - -* Development tips - ** Override individual packages -The cheerful way of overriding a version of a package: +The cheap and cheerful way of overriding a version of a package: #+BEGIN_SRC scheme (use-modules (guix) (gnu packages emacs)) @@ -151,9 +135,19 @@ We run our own substitution server. Add the key to your machine as root with : guix archive --authorize < tux02-guix-substitutions-public-key.txt -: guix build -L ~/guix-bioinformatics/ -L ~/guix-past/modules/ genenetwork2 --substitute-urls="https://ci.guix.gnu.org https://bordeaux.guix.gnu.org https://guix.genenetwork.org" --dry-run +: guix build -L ~/guix-bioinformatics/ --substitute-urls="https://cuirass.genenetwork.org https://ci.guix.gnu.org https://bordeaux.guix.gnu.org https://guix.genenetwork.org" hello + +* Testing the build + +All important packages are listed in manifest.scm.example. Test with + +: guix build -L . -m manifest.scm.example --tune=native + +* An important note on AI + +The packages in guix-bioinformatics channel are generally written with the help of AI. Only the directory ./gnu/packages contains software that was crafted by hand without the help of AI. +The packages in this directory align with Guix policy and may be upstreamed to guix trunk. * LICENSE -These package descriptions (so-called Guix expressions) are -distributed by the same license as GNU Guix, i.e. GPL3+ +These package descriptions (so-called Guix expressions) are distributed by the same license as Guix, i.e. GPL3+ |
