about summary refs log tree commit diff

guix-bioinformatics

IMPORTANT: this repository lives at https://git.genenetwork.org/guix-bioinformatics/!

Over 300 older packages have been moved to https://git.genenetwork.org/guix-bioinformatics-past/. Check out the README to see what packages are there.

Over 300 bioinformatics packages for Guix that are used in https://genenetwork.org/ and some other places. Mostly targetting genomics, pangenomics and genetics.

Pangenome tools (pangenomes meta-package)

The pangenomes meta-package provides a comprehensive pangenomics toolkit:

Tool Version Description
pggb 0.7.4 PanGenome Graph Builder pipeline
wfmash 0.14.0 Whole-genome Fuzzy Mapping and Alignment
seqwish 0.7.11 Sequence graph induction from alignments
smoothxg 0.8.2 Graph normalization via partial order alignment
odgi 0.9.0 Optimized Dynamic Genome/Graph Implementation
vg 1.72.0 Variation graph toolkit
impg 0.4.1 Implicit pangenome graph queries
minimap2 2.28 Fast pairwise aligner (from Guix upstream)
bwa-mem2 2.3 Burrows-Wheeler Aligner for short reads
samtools 1.19 SAM/BAM/CRAM manipulation (from Guix upstream)
htslib 1.21 HTSlib C library (from Guix upstream)
bedtools 2.31.1 Genome interval tools (from Guix upstream)
bcftools 1.21 VCF/BCF manipulation (from Guix upstream)
vcflib 1.0.15 VCF manipulation library and tools
vcfbub 0.1.0 VCF bubble popping
bandage-ng 2026.4.1 Assembly graph visualizer (Qt6)
gfalook 0.1.0 GFA visualization (odgi viz reimplementation)
pafplot 0.1.0 PAF alignment dotplot renderer
wally 0.7.1 Structural variant visualization
agc 2.1 Assembled Genomes Compressor
cigzip 0.1.0 CIGAR compression with tracepoints
cosigt 0.1.7 Pangenome haplotype genotyping
gfainject 0.1.0 BAM-to-GAF graph injection
gafpack 0.0.0 GAF coverage vector extraction
gfaffix 0.2.1 Walk-preserving graph simplification
gfautil 0.4.0 GFA format utilities
fastga-rs 0.1.2 Fast genome aligner (Rust)
fastix 0.1.0 FASTA header prefix renaming (PanSN)
kfilt 0.1.1 K-mer filtering
meryl 1.4.1 K-mer counting and set operations
miniprot 0.18 Protein-to-genome aligner
pangene 1.1 Gene-level pangenome analysis
rtg-tools 3.13 VCF evaluation (vcfeval)

MEMPANG workshop (mempang-workshop meta-package)

Extends pangenomes with R plotting, Python, and general utilities for the MEMPANG pangenome workshop tutorials:

Category Packages
R packages r-ggplot2, r-tidyverse, r-ape, r-ggtree, r-gggenes
Python python, python-igraph, python-pycairo
Utilities graphviz, gnuplot, parallel, pigz, wget, zstd, bc
QC multiqc, mummer

GeneNetwork packages

Package Version Description
genenetwork2 3.11 GeneNetwork2 web application
genenetwork3 0.1.0 GeneNetwork3 REST API
gn-auth 1.0.1 GN authentication service
gn-guile 4.0.0 Guile utilities for GN
gn-libs 0.0.0 Shared Python libraries
gn-uploader 0.1.1 Data uploader
gemma-wrapper 0.99.6 GEMMA CLI wrapper
gemma-gn2 0.98.5 GEMMA for GeneNetwork2
genecup 1.8 GeneCup literature mining

See Guix documentation and Guix notes for installing and hacking Guix.

See awesome guix for a list of other channels.

To easily use the packages from this repo, simply add it to your `channels` list in ~/.config/guix/channels.scm as described here:

;; example channels.scm
(list (channel
       (name 'guix-bioinformatics)
       (url "https://git.genenetwork.org/guix-bioinformatics")
       (branch "main")))

and run guix pull like normal to update your software. E.g.

guix pull --url=https://codeberg.org/guix/guix -p ~/opt/guix-bioinformatics  --channels=channels.scm

The channel file actually accesses https://git.genenetwork.org/guix-bioinformatics/tree/.guix-channel which pulls other channels and fixates the hashes. The commit hash b0fa1dc can be found from the guix you want to run with guix -V, it speeds up installation and makes it reproducible. Note that the upstream channel may override that version.

The latest channel file that is used by our CI/CD you can find at https://ci.genenetwork.org/channels.scm.

Channels are to maintain stability, the guix-bioinformatics channel depends on a specific commit of upstream Guix. So, it is recommended to isolate use of the guix-bioinformatics channel in a separate guix pull profile, described here.

You can use the –tune=native switch to optimize performance when installing pangenome tools and gemma.

Development tips

Modify the load path

If you want to make changes to the packages in this repo you can set the GUIXPACKAGEPATH (or use the -L switch) to point to the root of this directory before running Guix. E.g.

git clone https://git.genenetwork.org/guix-bioinformatics/guix-bioinformatics.git
guix package -A cwl

Override individual packages

The cheap and cheerful way of overriding a version of a package:

(use-modules (guix) (gnu packages emacs))

(package
  (inherit emacs)
  (name "emacs-snapshot")
  (source "/path/to/some-file-or-directory.tar.gz"))

and then run:

guix package --install-from-file=that-file.scm

Substitute server

We run our own substitution server. Add the key to your machine as root with

guix archive --authorize < tux02-guix-substitutions-public-key.txt
guix build -L ~/guix-bioinformatics/ --substitute-urls="https://cuirass.genenetwork.org https://ci.guix.gnu.org https://bordeaux.guix.gnu.org https://guix.genenetwork.org" hello

Testing the build

All important packages are listed in manifest.scm.example. Test with

guix build -L . -m manifest.scm.example --tune=native

An important note on AI

The packages in guix-bioinformatics channel are generally written with the help of AI. Only the directory ./gnu/packages contains software that was crafted by hand without the help of AI. The packages in this directory align with Guix policy and may be upstreamed to guix trunk.

LICENSE

These package descriptions (so-called Guix expressions) are distributed by the same license as Guix, i.e. GPL3+