summaryrefslogtreecommitdiff
path: root/topics/hpc/guix/R.gmi
blob: 6d300a06858040701ac8c171a57271e1b674da83 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
# R

R is a statistics package often used by biologists. We run it on our Octopus HPC using Guix.

Often with HPC the underlying Linux distribution is out of date. This is why people choose to use userland package managers, such as conda, brew etc.

Guix provides userland support for installing packages. If the 'store' is shared across the HPC, e.g. through NFS, software can be run using the powerful Guix software distribution with no additional cost.

The R language, for all its complexity and thousands of packages, is relatively easy to support in Guix and on HPC, partly due to the continuous integration that is happening by the R-project and CRAN.

For our purposes we had to support a package that is not in CRAN, but in one of the derived packaging systems for R. The MEDIPS package is part of the BiocManager installer and pulls in dependencies and builds them from source.

## Test with guix container

The first step was to build the package in a Guix container (guix shell -C) because that prevents from underlying dependencies getting linked from the HPC linux distro (in our case Debian Linux). For fixing the build and finding dependencies start from:

```
mkdir -p $HOME/.Rlibs && guix shell -C -N -F --share=$HOME/.Rlibs libpng pkg-config openblas gsl grep bzip2 libxml2 xz gfortran-toolchain r-curl zlib gcc-toolchain@10 sed gawk make r r-preprocesscore curl r-tidyverse openssl nss-certs linux-libre-headers bash which coreutils -- env R_LIBS_SITE=$HOME/.Rlibs:$R_LIBS_SITE R_LIBS_USER=$HOME/.Rlibs R -e '
.libPaths()
Sys.getenv("R_LIBS_USER")
r = getOption("repos")
r["CRAN"] = "http://cran.us.r-project.org"
options(repos = r)
if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("MEDIPS",force=TRUE) ; library("MEDIPS"); sessionInfo() ; BiocInstaller::biocValid() ;warnings() '
```

that looks complicated, but it is the nicest way to fix errors. What does this mean?

```
guix shell -C -N -F ...
```

guix is the command that installs packages. Note it is tightly coupled with the package tree. If you upgrade guix you get newer packages(!). We typically handle guix through a profile with

```
guix pull -p ~/opt/guix
~/opt/guix/bin/guix --version
```

So, use the latter if you want to be up-to-date. A 'guix pull' takes some time, but on our systems it is typically done every 4 months or so.

The -C means it is a proper container - i.e. only Guix dependencies are visible inside the container. This is incredibly useful for debugging the dependency graph. The -N allows network access for R to fetch sources. The -F means that we will emulate the POSIX /usr/bin /bin file hierarchy because some packages will ask for /usr/bin/env, for example.

R is a bit funny about local builds is that you can supply a directory in $HOME and pass that in with R_LIBS_USER=$HOME/.Rlibs. It does not make that directory, however, so we create it and pass it into the container with --share.

To have R build stuff it needs a bunch of dependencies. One thing to note is that using the default gcc-toolchain may cause an error similar to

```
Error in dyn.load(libLFile) :
    unable to load shared object '/tmp/RtmpKqzbYg/file3245e787c.so':
    /gnu/store/vqhamsanmlm8v6f90a635zc6gmhwlphp-gfortran-10.3.0-lib/lib/libstdc++.so.6: version 'GLIBCXX_3.4.29' not found (required by /tmp/RtmpKqzbYg/file3245e787c.so)
```

as described in, for example

=> https://issues.guix.gnu.org/60200

The reason is that the gfortran-toolchain is actually built with the older gcc (even though gfortran itself is at 11.0). That is why we drop the overall toolchain to gcc-toolchain@10.

Note that issues.guix.gnu.org is worth searching when encountering problems.

## Run without Guix container


Once that build works inside a container, to run the tool we can move out and use a non-container shell

```
mkdir -p $HOME/.Rlibs && guix shell --share=$HOME/.Rlibs libpng pkg-config openblas gsl grep bzip2 libxml2 xz gfortran-toolchain r-curl zlib gcc-toolchain@10 sed gawk make r r-preprocesscore curl r-tidyverse openssl    nss-certs linux-libre-headers bash which coreutils -- env R_LIBS_SITE=$HOME/.Rlibs:$R_LIBS_SITE R_LIBS_USER=$HOME/.Rlibs R
```

Now R is fully functional. But this is not what we want our users to type.
One option is to use `guix shell` with a manifest file that loads above dependencies. But, now it works, why not create a profile with

```
mkdir -p $HOME/opt
guix install libpng pkg-config openblas gsl grep bzip2 libxml2 xz gfortran-toolchain r-curl zlib gcc-toolchain@10 sed gawk make r r-preprocesscore curl r-tidyverse openssl nss-certs linux-libre-headers bash which coreutils -p $HOME/opt/R
```

Now we can do, after setting the environment (note there are a lot of parameters in that profile file `$HOME/opt/R/etc/profile' which should be visible to R)

```
. $HOME/opt/R/etc/profile
export R_LIBS_SITE=$HOME/.Rlibs:$R_LIBS_SITE
export R_LIBS_USER=$HOME/.Rlibs
set

```

and test R and building MEDIPS

```
which R
  /gnu/store/plmrv9fm578kza4cf042ny7jyzw81znl-profile/bin/R
R
  BiocManager::install("MEDIPS",force=TRUE)
  library("MEDIPS");
  sessionInfo() ;
```

or some other package, such as

```
install.packages("qtl")
```

## Run on PBS

And in the final step make sure this loads in the user's shell environment and also works on cluster nodes. So all the user has to do is type 'R'. Try to get a shell on a node with

```
srun -N 1 --mem=32G --pty /bin/bash
```

In the shell you can run R and check all environment settings. As I added them to the '~/.bashrc' file, they should work in bash.

Finally set up a slurm script

```
#!/bin/bash
#SBATCH -t 1:30:00
#SBATCH -N 1
#SBATCH --mem=32G

# --- Display environment
env
set
R -e 'library("MEDIPS")'
```

As a final note - apart from SLURM - I tested all of this on my workstation first. Because Guix is reproducible, once it works, it is easy to repeat on a remote server.

For more information see

=> ../octopus/slurm-user-guide