about summary refs log tree commit diff
path: root/README.org
diff options
context:
space:
mode:
Diffstat (limited to 'README.org')
-rw-r--r--README.org220
1 files changed, 107 insertions, 113 deletions
diff --git a/README.org b/README.org
index be8db7c..0b6f8bc 100644
--- a/README.org
+++ b/README.org
@@ -1,136 +1,120 @@
 * guix-bioinformatics
 
-IMPORTANT: this repository has moved to https://git.genenetwork.org/guix-bioinformatics/!
-
-Bioinformatics packages for GNU Guix that are used in
-https://genenetwork.org/ and some other places.  See [[https://gitlab.com/pjotrp/guix-notes/blob/master/HACKING.org][Guix notes]] for
-installing and hacking GNU Guix. Other channels of bioinformatics
-interest can be found at
-
-1. https://github.com/BIMSBbioinfo
-2. https://github.com/UMCUGenetics/guix-additions
-3. https://github.com/ekg/guix-genomics
-
-Also see [[http://git.genenetwork.org/pjotrp/guix-notes/src/branch/master/CHANNELS.org][Guix notes]] for a list of channels.
-
-To easily use the packages from this repo, simply add it to your
-`channels` list in ~/.config/guix/channels.scm as described
-[[https://guix.gnu.org/manual/en/html_node/Channels.html][here]]:
+IMPORTANT: this repository lives at https://git.genenetwork.org/guix-bioinformatics/!
+
+Over 300 older packages have been moved to https://git.genenetwork.org/guix-bioinformatics-past/. Check out the README to see what packages are there.
+
+Over 300 bioinformatics packages for Guix that are used in https://genenetwork.org/ and some other places.
+Mostly targetting genomics, pangenomics and genetics.
+
+** Pangenome tools (pangenomes meta-package)
+
+The =pangenomes= meta-package provides a comprehensive pangenomics toolkit:
+
+| Tool           | Version      | Description                                    |
+|----------------+--------------+------------------------------------------------|
+| pggb           | 0.7.4        | PanGenome Graph Builder pipeline               |
+| wfmash         | 0.14.0       | Whole-genome Fuzzy Mapping and Alignment        |
+| seqwish        | 0.7.11       | Sequence graph induction from alignments        |
+| smoothxg       | 0.8.2        | Graph normalization via partial order alignment  |
+| odgi           | 0.9.0        | Optimized Dynamic Genome/Graph Implementation   |
+| vg             | 1.72.0       | Variation graph toolkit                         |
+| impg           | 0.4.1        | Implicit pangenome graph queries                |
+| minimap2       | 2.28         | Fast pairwise aligner (from Guix upstream)      |
+| bwa-mem2       | 2.3          | Burrows-Wheeler Aligner for short reads         |
+| samtools       | 1.19         | SAM/BAM/CRAM manipulation (from Guix upstream)  |
+| htslib         | 1.21         | HTSlib C library (from Guix upstream)           |
+| bedtools       | 2.31.1       | Genome interval tools (from Guix upstream)      |
+| bcftools       | 1.21         | VCF/BCF manipulation (from Guix upstream)       |
+| vcflib         | 1.0.15       | VCF manipulation library and tools              |
+| vcfbub         | 0.1.0        | VCF bubble popping                              |
+| bandage-ng     | 2026.4.1     | Assembly graph visualizer (Qt6)                 |
+| gfalook        | 0.1.0        | GFA visualization (odgi viz reimplementation)   |
+| pafplot        | 0.1.0        | PAF alignment dotplot renderer                  |
+| wally          | 0.7.1        | Structural variant visualization                |
+| agc            | 2.1          | Assembled Genomes Compressor                    |
+| cigzip         | 0.1.0        | CIGAR compression with tracepoints              |
+| cosigt         | 0.1.7        | Pangenome haplotype genotyping                  |
+| gfainject      | 0.1.0        | BAM-to-GAF graph injection                      |
+| gafpack        | 0.0.0        | GAF coverage vector extraction                  |
+| gfaffix        | 0.2.1        | Walk-preserving graph simplification            |
+| gfautil        | 0.4.0        | GFA format utilities                            |
+| fastga-rs      | 0.1.2        | Fast genome aligner (Rust)                      |
+| fastix         | 0.1.0        | FASTA header prefix renaming (PanSN)            |
+| kfilt          | 0.1.1        | K-mer filtering                                 |
+| meryl          | 1.4.1        | K-mer counting and set operations               |
+| miniprot       | 0.18         | Protein-to-genome aligner                       |
+| pangene        | 1.1          | Gene-level pangenome analysis                   |
+| rtg-tools      | 3.13         | VCF evaluation (vcfeval)                        |
+
+** MEMPANG workshop (mempang-workshop meta-package)
+
+Extends =pangenomes= with R plotting, Python, and general utilities
+for the MEMPANG pangenome workshop tutorials:
+
+| Category       | Packages                                             |
+|----------------+------------------------------------------------------|
+| R packages     | r-ggplot2, r-tidyverse, r-ape, r-ggtree, r-gggenes  |
+| Python         | python, python-igraph, python-pycairo                |
+| Utilities      | graphviz, gnuplot, parallel, pigz, wget, zstd, bc    |
+| QC             | multiqc, mummer                                      |
+
+** GeneNetwork packages
+
+| Package              | Version      | Description                           |
+|----------------------+--------------+---------------------------------------|
+| genenetwork2         | 3.11         | GeneNetwork2 web application          |
+| genenetwork3         | 0.1.0        | GeneNetwork3 REST API                 |
+| gn-auth              | 1.0.1        | GN authentication service             |
+| gn-guile             | 4.0.0        | Guile utilities for GN                |
+| gn-libs              | 0.0.0        | Shared Python libraries               |
+| gn-uploader          | 0.1.1        | Data uploader                         |
+| gemma-wrapper        | 0.99.6       | GEMMA CLI wrapper                     |
+| gemma-gn2            | 0.98.5       | GEMMA for GeneNetwork2                |
+| genecup              | 1.8          | GeneCup literature mining             |
+
+See Guix documentation and [[https://gitlab.com/pjotrp/guix-notes/blob/master/HACKING.org][Guix notes]] for installing and hacking Guix.
+
+See [[https://github.com/franzos/awesome-guix][awesome guix]] for a list of other channels.
+
+To easily use the packages from this repo, simply add it to your `channels` list in ~/.config/guix/channels.scm as described [[https://guix.gnu.org/manual/en/html_node/Channels.html][here]]:
 
 #+BEGIN_SRC scheme
+  ;; example channels.scm
   (list (channel
-         (name 'gn-bioinformatics)
+         (name 'guix-bioinformatics)
          (url "https://git.genenetwork.org/guix-bioinformatics")
-         (branch "master")))
+         (branch "main")))
 #+END_SRC
 
-and run /guix pull/ like normal to update your software. This is the
-recommended way to use the software from this repository and the code
-snippets in this README assume you have done so. In order to maintain
-stability, the guix-bioinformatics channel depends on a specific
-commit of upstream Guix. So, it is recommended to isolate use of the
-guix-bioinformatics channel in a separate /guix pull/ profile. That is described [[https://issues.genenetwork.org/topics/guix-profiles][here]].
-
-If you want to make changes to the packages in this repo you can set
-the GUIX_PACKAGE_PATH to point to the root of this directory
-before running Guix. E.g.
+and run /guix pull/ like normal to update your software. E.g.
 
-#+BEGIN_SRC bash
-    git clone https://git.genenetwork.org/guix-bioinformatics/guix-bioinformatics.git
-    git clone https://gitlab.inria.fr/guix-hpc/guix-past.git
-    export GUIX_PACKAGE_PATH=$PWD/guix-bioinformatics/:$PWD/guix-past/modules
-    guix package -A cwl
+#+BEGIN_SRC sh
+  guix pull --url=https://codeberg.org/guix/guix -p ~/opt/guix-bioinformatics  --channels=channels.scm
 #+END_SRC
 
-or using a checked out Guix repo with
-
-: env GUIX_PACKAGE_PATH=$genenetwork/guix-bioinformatics/ ./pre-inst-env guix package -A cwl
-
-Some (or most) of these package definitions should make it upstream
-into the GNU Guix repository when tested and stable.
-
-* Slurm and munge
-
-Install slurm with
+The channel file actually accesses https://git.genenetwork.org/guix-bioinformatics/tree/.guix-channel which pulls other channels and fixates the hashes. The commit hash b0fa1dc can be found from the guix you want to run with /guix -V/, it speeds up installation and makes it reproducible. Note that the upstream channel may override that version.
 
-#+BEGIN_SRC bash
-    guix pull
-    guix package -i slurm-llnl
-
-    ~/.guix-profile/sbin/slurmd -C -D
-      ClusterName=(null) NodeName=selinunte CPUs=4 Boards=1 SocketsPerBoard=1 CoresPerSocket=2 ThreadsPerCore=2 RealMemory=7890 TmpDisk=29909
-#+END_SRC
-
-
-* Common Workflow Language (CWL)
-
-/Note that CWL moved into Guix master!/
-
-Install the common workflow language tool cwltool from this repo with
-
-#+BEGIN_SRC bash
+The latest channel file that is used by our CI/CD you can find at https://ci.genenetwork.org/channels.scm.
 
-    guix pull
-    export PATH=$HOME/.config/guix/current/bin/guix:$PATH
-    ~/guix-bioinformatics$ env GUIX_PACKAGE_PATH=.:../guix-past/modules/ ~/.config/guix/current/bin/guix package -i cwl-runner -p ~/opt/CWL
+Channels are to maintain stability, the guix-bioinformatics channel depends on a specific commit of upstream Guix. So, it is recommended to isolate use of the guix-bioinformatics channel in a separate /guix pull/ profile, described [[https://issues.genenetwork.org/topics/guix-profiles][here]].
 
-The following package will be installed:
-   cwl-runner 1.0
+You can use the --tune=native switch to optimize performance when installing pangenome tools and gemma.
 
-The following derivations will be built:
-   /gnu/store/ld59374zr45rbqanh7ccfi2wa4d5x4yl-cwl-runner-1.0.drv
-   /gnu/store/86j15mxj5zp3k3sjimhqhb6zsj19azsf-python-schema-salad-7.0.20200811075006.drv
-   /gnu/store/0q2ls0is3253r4gx6hs7kmvlcz412lh1-schema-salad-7.0.20200811075006.tar.gz.drv
-   /gnu/store/myj1365ph687ynahjhg6zqslrmd6zpjq-cwltool-3.0.20201117141248.drv
-
-source ~/opt/CWL/etc/profile
-cwltool --version
-  /gnu/store/50mncjcgc8vmq5dfrh0pb82avbzy8c4r-cwltool-3.0.20201117141248/bin/.cwltool-real 3.0
-#+END_SRC
-
-To run CWL definitions you can install tools in a Guix environment (avoiding
-Docker). Say you need mafft in a workflow
-
-#+begin_src sh
-    ~/guix-bioinformatics$ env GUIX_PACKAGE_PATH=.:../guix-past/modules/ ~/.config/guix/current/bin/guix environment \
-       guix --ad-hoc cwl-runner mafft
-#+end_src
-
-in the new shell you should be able to find both CWL and MAFFT:
-
-#+begin_src sh
-ls $GUIX_ENVIRONMENT/bin/cwl*
-/gnu/store/bhfc5rk29s38w9kgcl4zmcdlh369y9f9-profile/bin/cwl-runner
-/gnu/store/bhfc5rk29s38w9kgcl4zmcdlh369y9f9-profile/bin/cwltool
-ls $GUIX_ENVIRONMENT/bin/mafft
-/gnu/store/bhfc5rk29s38w9kgcl4zmcdlh369y9f9-profile/bin/mafft
-#+end_src
-
-The paths can be loaded into the shell with
-
-: source $GUIX_ENVIRONMENT/etc/profile
+* Development tips
 
-* Module system
+** Modify the load path
 
-For those who think they need modules: install the module environment
-with
+If you want to make changes to the packages in this repo you can set the GUIX_PACKAGE_PATH (or use the -L switch) to point to the root of this directory before running Guix. E.g.
 
 #+BEGIN_SRC bash
-    guix pull
-    guix package -i environment-modules
-
-    modulecmd --version
-      VERSION=3.2.10
-      DATE=2012-12-21
+      git clone https://git.genenetwork.org/guix-bioinformatics/guix-bioinformatics.git
+      guix package -A cwl
 #+END_SRC
 
-Note that GNU Guix supercedes module functionality!
-
-* Development tips
-
 ** Override individual packages
 
-The cheerful way of overriding a version of a package:
+The cheap and cheerful way of overriding a version of a package:
 
 #+BEGIN_SRC scheme
     (use-modules (guix) (gnu packages emacs))
@@ -151,9 +135,19 @@ We run our own substitution server. Add the key to your machine as
 root with
 
 : guix archive --authorize < tux02-guix-substitutions-public-key.txt
-: guix build -L ~/guix-bioinformatics/ -L ~/guix-past/modules/ genenetwork2 --substitute-urls="https://ci.guix.gnu.org https://bordeaux.guix.gnu.org https://guix.genenetwork.org" --dry-run
+: guix build -L ~/guix-bioinformatics/ --substitute-urls="https://cuirass.genenetwork.org https://ci.guix.gnu.org https://bordeaux.guix.gnu.org https://guix.genenetwork.org" hello
+
+* Testing the build
+
+All important packages are listed in manifest.scm.example. Test with
+
+: guix build -L . -m manifest.scm.example --tune=native
+
+* An important note on AI
+
+The packages in guix-bioinformatics channel are generally written with the help of AI. Only the directory ./gnu/packages contains software that was crafted by hand without the help of AI.
+The packages in this directory align with Guix policy and may be upstreamed to guix trunk.
 
 * LICENSE
 
-These package descriptions (so-called Guix expressions) are
-distributed by the same license as GNU Guix, i.e. GPL3+
+These package descriptions (so-called Guix expressions) are distributed by the same license as Guix, i.e. GPL3+