You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

389 lines
16 KiB

1 year ago
1 year ago
1 year ago
1 year ago
1 year ago
1 year ago
1 year ago
1 year ago
1 year ago
1 year ago
  1. #+TITLE: Guix Profiles for controlled Development, Testing, Staging and Production
  2. * Table of Contents :TOC:
  3. - [[#introduction][Introduction]]
  4. - [[#what-is-a-profile][What is a profile?]]
  5. - [[#development-testing-staging-production][Development, testing, staging, production!]]
  6. - [[#software-optimization][Software optimization]]
  7. - [[#containers][Containers]]
  8. - [[#running-in-a-guix-container][Running in a Guix container]]
  9. - [[#development-in-a-guix-container][Development in a Guix container]]
  10. - [[#creating-a-docker-container][Creating a Docker container]]
  11. - [[#finally][Finally]]
  12. * Introduction
  13. In this document we describe how we use Guix profiles for deployment
  14. of a complicated webservice (https://genenetwork.org). The idea is
  15. that a profile describes a snapshot of the service with all its
  16. dependencies. This allows us to create byte identical profiles over
  17. time that are not only shared between machines (important for
  18. deployment) but also between developers. As a bonus we have completely
  19. reproducible deployment over time (we can still build full
  20. installations that were deployed 5 years ago). People often ask: why
  21. not use Docker? The answer is that Docker is a partial
  22. solution. Docker images are not easily reproducible over time. The
  23. other problem with Docker is that it is a container infrastructure
  24. which is quite expensive to run (both time and complexity). Guix
  25. profiles run on bare metal, though you can opt to use Guix containers
  26. and even build Docker containers (see below). In other words, more
  27. options, lighter, faster and we still have the option to orchestrate
  28. Docker containers.
  29. * What is a profile?
  30. A profile is a tree of symlinks. If we install a piece of software, say
  31. sambamba:
  32. #+BEGIN_SRC sh
  33. tux01:~$ ~/opt/guix/bin/guix package -i sambamba -p ~/opt/sambamba
  34. The following package will be installed:
  35. sambamba 0.7.1
  36. 88.6 MB will be downloaded:
  37. /gnu/store/gxsafkxack6czm4yps3cwgp474s69vz5-htslib-for-sambamba-1.3.1-1.2f3c3ea7b
  38. /gnu/store/0cn1sd3g67nscyfn4ax71hi8pr46dlha-libconfig-1.7.2
  39. /gnu/store/z2gsnhlym1wiz9iwxar51wii9dvajssp-llvm-3.8.1
  40. /gnu/store/4sslg1vd2vbbanj4rcs1fhf4q5fjyp8w-ldc-0.17.4
  41. /gnu/store/6cq4l5ngihqjvd3ifjlpfcx6nx52591m-llvm-6.0.1
  42. /gnu/store/9mmsilz9avdl49i6a6nj5mzfyim8ihv2-tzdata-2019c
  43. /gnu/store/snqakx625fgdshkpdw6dsxsv1iribjmk-ldc-1.10.0
  44. /gnu/store/5gyxpx946k1ka9i4pm2kzc088x5hvkx0-sambamba-0.7.1
  45. #+END_SRC
  46. Guix installs sambamba with its dependencies in the profile ~/opt/sambamba.
  47. Let's see what tree says
  48. #+BEGIN_SRC
  49. tux01:~$ tree ~/opt/sambamba
  50. /home/pjotr/opt/sambamba
  51. ├── bin -> /gnu/store/j2ds5b6cm0lf9k5fjnljsdb7scinaaj4-sambamba-0.7.1/bin
  52. ├── etc
  53. │   └── profile
  54. ├── manifest
  55. └── share
  56. ├── doc -> /gnu/store/j2ds5b6cm0lf9k5fjnljsdb7scinaaj4-sambamba-0.7.1/share/doc
  57. ├── info -> /gnu/store/55a8ddzijg3ibwsai6djz4bds10w2981-info-dir/share/info
  58. └── man -> /gnu/store/ifdmgg74yhkqlynmiq5198sbc453n729-manual-database/share/man
  59. #+END_SRC
  60. You can see the profile consists of symlinks pointing into /gnu/store.
  61. The sambamba binary has built in links:
  62. #+BEGIN_SRC sh
  63. tux01:~$ ldd ~/opt/sambamba/bin/sambamba
  64. linux-vdso.so.1 (0x00007ffd765e9000)
  65. libz.so.1 => /gnu/store/qx7p7hiq90mi7r78hcr9cyskccy2j4bg-zlib-1.2.11/lib/libz.so.1 (0x00007fbf2fc59000)
  66. libhts.so.1 => /gnu/store/gxsafkxack6czm4yps3cwgp474s69vz5-htslib-for-sambamba-1.3.1-1.2f3c3ea7b/lib/libhts.so.1 (0x00007fbf2fbd0000)
  67. liblz4.so.1 => /gnu/store/bp07jwrrhayg7i2xhgn6jxhrb8ha96x9-lz4-1.9.2/lib/liblz4.so.1 (0x00007fbf2fb95000)
  68. libpthread.so.0 => /gnu/store/ahqgl4h89xqj695lgqvsaf6zh2nhy4pj-glibc-2.29/lib/libpthread.so.0 (0x00007fbf2fb72000)
  69. libm.so.6 => /gnu/store/ahqgl4h89xqj695lgqvsaf6zh2nhy4pj-glibc-2.29/lib/libm.so.6 (0x00007fbf2f919000)
  70. librt.so.1 => /gnu/store/ahqgl4h89xqj695lgqvsaf6zh2nhy4pj-glibc-2.29/lib/librt.so.1 (0x00007fbf2fb68000)
  71. libdl.so.2 => /gnu/store/ahqgl4h89xqj695lgqvsaf6zh2nhy4pj-glibc-2.29/lib/libdl.so.2 (0x00007fbf2fb63000)
  72. libgcc_s.so.1 => /gnu/store/2plcy91lypnbbysb18ymnhaw3zwk8pg1-gcc-7.4.0-lib/lib/libgcc_s.so.1 (0x00007fbf2fb4a000)
  73. libc.so.6 => /gnu/store/ahqgl4h89xqj695lgqvsaf6zh2nhy4pj-glibc-2.29/lib/libc.so.6 (0x00007fbf2f75f000)
  74. /gnu/store/ahqgl4h89xqj695lgqvsaf6zh2nhy4pj-glibc-2.29/lib/ld-linux-x86-64.so.2 => /lib64/ld-linux-x86-64.so.2 (0x00007fbf2fa59000)
  75. #+END_SRC
  76. and you can see *all* dependencies are contained in the
  77. /gnu/store. This amazing facility means that Guix packages are
  78. independent of the underlying (in this case Debian) distribution. Also
  79. note that libz was already in the store so it was not reinstalled.
  80. To run sambamba we can now do
  81. #+BEGIN_SRC sh
  82. tux01:~$ ~/opt/sambamba/bin/sambamba
  83. sambamba 0.7.1
  84. by Artem Tarasov and Pjotr Prins (C) 2012-2019
  85. LDC 1.10.0 / DMD v2.080.1 / LLVM6.0.1 / bootstrap LDC - the LLVM D compiler (0.17.4)
  86. #+END_SRC
  87. Not all software is self contained. For example Python needs to find its modules.
  88. For this Guix provides a profile file which contains the necessary shell settings
  89. With sambamba it is just a path:
  90. #+BEGIN_SRC sh
  91. tux01:~$ cat ~/opt/sambamba/etc/profile
  92. # Source this file to define all the relevant environment variables in Bash
  93. # for this profile. You may want to define the 'GUIX_PROFILE' environment
  94. # variable to point to the "visible" name of the profile, like this:
  95. #
  96. # GUIX_PROFILE=/path/to/profile ; \
  97. # source /path/to/profile/etc/profile
  98. #
  99. # When GUIX_PROFILE is undefined, the various environment variables refer
  100. # to this specific profile generation.
  101. export PATH="${GUIX_PROFILE:-/gnu/store/7bdvafgqpm3d8l4k677d3k063qg07miv-profile}/bin${PATH:+:}$PATH"
  102. #+END_SRC
  103. so, sourcing this file brings sambamba into the environment
  104. #+BEGIN_SRC sh
  105. tux01:~$ source ~/opt/sambamba/etc/profile
  106. tux01:~$ sambamba
  107. sambamba 0.7.1
  108. by Artem Tarasov and Pjotr Prins (C) 2012-2019
  109. LDC 1.10.0 / DMD v2.080.1 / LLVM6.0.1 / bootstrap LDC - the LLVM D compiler (0.17.4)
  110. #+END_SRC
  111. Profiles allow you to be able to run specific versions too. Say you want
  112. test an older gcc you could do
  113. #+BEGIN_SRC sh
  114. tux01:~$ ~/opt/guix/bin/guix package -i gcc-toolchain@6.5.0 -p ~/opt/gcc-6
  115. tux01:~$ source ~/opt/gcc-6/etc/profile
  116. tux01:~$ gcc --version
  117. gcc (GCC) 6.5.0
  118. Copyright (C) 2017 Free Software Foundation, Inc.
  119. This is free software; see the source for copying conditions. There is NO
  120. warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
  121. #+END_SRC
  122. and it becomes trivial to juggle dependencies. Note btw that we are
  123. installing software as a normal user here! No need for a system
  124. administrator or root level access because Guix has a build daemon
  125. that can only access /gnu/store.
  126. * Development, testing, staging, production!
  127. Essentially these are all profiles! Now the question is how
  128. to deal with versions of profiles. For this we use git.
  129. One profile consists of a combination of (1) a version of core GNU
  130. Guix and (2) a version of our special packages. The source code of the
  131. GNU Guix [[https://guix.gnu.org/packages/][package tree]] lives at git [[https://savannah.gnu.org/git/?group=guix][gnu.org]]. Our package source tree
  132. can be found on our own [[http://git.genenetwork.org/guix-bioinformatics/guix-bioinformatics][git service]]. The latter package tree can be
  133. combined in two ways: by using Guix [[https://guix.gnu.org/manual/en/html_node/Channels.html][channels]] or by pulling modules in
  134. using the special ~GUIX_PACKAGE_PATH~ environment variable. We are going
  135. to use the latter here.
  136. To get a fully reproducible GUIX it can be built using a hash value
  137. that comes from the git tree. This is what happens:
  138. A developer comes in and says I developed a new function and it is
  139. ready for testing. I used GNU Guix at commit
  140. ~8a7784381ac19d0756dc862bf3d8e082406bd958~ and ~guix-bioinformatics~ at
  141. ~b0c38d151324e37448ade758cc48d02d89f94b60~.
  142. To update GNU Guix to that commit we can do
  143. #+BEGIN_SRC sh
  144. tux01:~$ ~/opt/guix/bin/guix pull --commit=8a7784381ac19d0756dc862bf3d8e082406bd958
  145. #+END_SRC
  146. The new Guix will be installed in
  147. Next checkout the guix-bioinformatics repo
  148. #+BEGIN_SRC sh
  149. tux01:~$ git clone http://git.genenetwork.org/guix-bioinformatics/guix-bioinformatics.git
  150. tux01:~$ cd guix-bioinformatics
  151. tux01:~$ git checkout -b b0c38d151324e37448ade758cc48d02d89f94b60 b0c38d151324e37448ade758cc48d02d89f94b60
  152. #+END_SRC
  153. Next we install our software using these two repos into a new profile
  154. #+BEGIN_SRC sh
  155. cd
  156. env GUIX_PACKAGE_PATH=~/guix-bioinformatics/ ~/.config/guix/current/bin/guix package -A genenetwork
  157. guix package: warning: failed to load '(gn services genenetwork)':
  158. no code for module (past packages python)
  159. #+END_SRC
  160. Oh wait, we also use the Guix past [[https://gitlab.inria.fr/guix-hpc/guix-past][channel]] for older packages (such as
  161. Python2.4). Need to add that too
  162. #+BEGIN_SRC sh
  163. tux01:~$ git clone https://gitlab.inria.fr/guix-hpc/guix-past.git
  164. tux01:~$ env GUIX_PACKAGE_PATH=~/guix-bioinformatics:~/guix-past/modules/ ~/.config/guix/current/bin/guix package -A genenetwork
  165. genenetwork1 0.0.0-2.acf65ac out /home/pjotr/guix-bioinformatics/gn/packages/genenetwork.scm:759:4
  166. genenetwork2 2.11-guix-1538ffd out /home/pjotr/guix-bioinformatics/gn/packages/genenetwork.scm:287:2
  167. genenetwork2-database-small 1.0 out /home/pjotr/guix-bioinformatics/gn/packages/genenetwork.scm:569:4
  168. genenetwork2-files-small 1.0 out /home/pjotr/guix-bioinformatics/gn/packages/genenetwork.scm:530:4
  169. genenetwork3 2.10rc5-5bff4f4 out /home/pjotr/guix-bioinformatics/gn/packages/genenetwork.scm:626:4
  170. python3-genenetwork2 3.11-guix-84cbf35 out /home/pjotr/guix-bioinformatics/gn/packages/genenetwork.scm:450:4
  171. #+END_SRC
  172. That is starting to look good. Let's do the actual installation:
  173. #+BEGIN_SRC sh
  174. tux01:~$ env GUIX_PACKAGE_PATH=~/guix-bioinformatics:~/guix-past/modules/ ~/.config/guix/current/bin/guix package -i genenetwork2 -p ~/opt/genenetwork2-test --dry-run
  175. The following package would be installed:
  176. genenetwork2 2.11-guix-1538ffd
  177. The following derivations would be built:
  178. /gnu/store/ks7q232cgz2pp38yss54008py7s9brwb-genenetwork2-2.11-guix-1538ffd.drv
  179. /gnu/store/65fg7a5csgwsh2qb77brkr1fwzxf1z59-js-smart-time-ago-0.1.5-1.055c385.drv
  180. /gnu/store/6kd6zqqcr338clsgllvif60cng2h9cyb-javascript-smart-time-ago-0.1.5-1.055c385-checkout.drv
  181. /gnu/store/l6w0wn31xv8bjxa4rzqf4hyrcfgkcmyx-module-import.drv
  182. /gnu/store/npjdpnlpw35h4wah6ck1in3pqhhzc1d4-module-import-compiled.drv
  183. /gnu/store/7cya0g156j784jf2gf0fi6xyzm7gfnxj-js-md5-0.7.3.drv
  184. /gnu/store/9q9n0gsppv27v0bji2zw11q80id50k6a-javascript-md5-0.7.3-checkout.drv
  185. /gnu/store/h1m63df02wc6myvcwyvkbna2z33ms2l1-js-jstat-1.9.1.drv
  186. /gnu/store/7har7wm18gwdknqw19i8snyvg843g10p-javascript-jstat-1.9.1-checkout.drv
  187. /gnu/store/m5y01bni5nakvw265p5wqymvy4nnsa97-python-twint-2.1.20.drv
  188. /gnu/store/qdjnz8ncjzyq9l1h8qnd79jj6ww717sg-rust-qtlreaper-0.1.4.drv
  189. /gnu/store/qhd629gkj6yq53gcnnd2v118glakl27y-js-parsley-2.9.1.drv
  190. /gnu/store/f5fjawq4xmwacpj7a8dpkldh46h8a35j-javascript-parsley-2.9.1-checkout.drv
  191. /gnu/store/qigqv9jwnzw929zrwwajc59a0mvmnpxw-js-underscore-1.9.1.drv
  192. /gnu/store/yar112d76r52zzi35xsrbq1nx5la2wh9-javascript-underscore-1.9.1-checkout.drv
  193. /gnu/store/r0pcgxgy19jmp0ll8cm1nca5zx4rm2rp-python2-flask-sqlalchemy-2.4.4.drv
  194. #+END_SRC
  195. That looks good. Note we can add our own substitute server where many packages
  196. have been built by other users.
  197. #+BEGIN_SRC sh
  198. tux01:~$ env GUIX_PACKAGE_PATH=~/guix-bioinformatics:~/guix-past/modules/ ~/.config/guix/current/bin/guix package -i genenetwork2 -p ~/opt/genenetwork2-test --dry-run --substitute-urls="http://guix.genenetwork.org https://berlin.guixsd.org https://ci.guix.gnu.org https://mirror.hydra.gnu.org"
  199. The following package would be installed:
  200. genenetwork2 2.11-guix-1538ffd
  201. substitute: updating substitutes from 'http://guix.genenetwork.org'... 100.0%
  202. substitute: updating substitutes from 'https://berlin.guixsd.org'... 100.0%
  203. 17 items would be downloaded
  204. #+END_SRC
  205. Now no more builds! After removing the ~--dry-run~ switch it should just install and
  206. we can run
  207. #+BEGIN_SRC
  208. tux01:~$ ~/opt/genenetwork2-test/bin/genenetwork2
  209. #+END_SRC
  210. Which starts off the webserver. Note this profile is pretty massive
  211. with loads of tools pulled in! Because Guix knows about the full
  212. dependency graph we can visualize it with
  213. #+BEGIN_SRC sh
  214. tux01:~$ env GUIX_PACKAGE_PATH=~/guix-bioinformatics:~/guix-past/modules/ ~/.config/guix/current/bin/guix graph genenetwork2 |dot -Tpdf > genenetwork2-references.pdf
  215. #+END_SRC
  216. To see the full graph see [[./images/genenetwork2-references.pdf]]. It is
  217. huge! And visiting it one can question why some of the dependencies
  218. are there in the first place.
  219. Back to profiles on a common server we install the profiles in /usr/local/guix,
  220. so it may look like
  221. #+BEGIN_EXAMPLE
  222. tux01:~$ ls /usr/local/guix-profiles/ -1 --color=never|sort
  223. gn2-latest
  224. gn2-stable
  225. gn-latest-20181014
  226. gn-latest-20181119
  227. gn-latest-20190905
  228. gn-latest-20200428
  229. gn-latest-20200513
  230. gn-latest-20200725
  231. gn-latest-20200811
  232. #+END_EXAMPLE
  233. which shows we don't update the full graph that often. The last months
  234. we see more upticks because of a Python2 -> Python3 migration. Even
  235. today we can easily roll back to a profile from 2018 without any
  236. software installation.
  237. We use a calender date scheme, but you might as well name the profiles
  238. #+BEGIN_EXAMPLE
  239. gn-development
  240. gn-testing
  241. gn-staging
  242. gn-production
  243. #+END_EXAMPLE
  244. and refine it further.
  245. The important take home message is that the combination of hash values
  246. the developer handed us has /carved our deployment in stone/! Note
  247. that these versions often go hand-in-hand, so it is good practice to
  248. store that information somewhere.
  249. * Software optimization
  250. There exists an idea that GNU Guix only allows for generic
  251. builds. This is not true. Guix provides channels that allow for
  252. specific builds. Where Guix can go back to using older software (such
  253. as provided by [[https://gitlab.inria.fr/guix-hpc/guix-past][Guix past]]) it can also go forward by providing
  254. different flavours of optimization. The openblas we use for gemma in
  255. GeneNetwork is hand optimized, see [[http://git.genenetwork.org/guix-bioinformatics/guix-bioinformatics/src/branch/master/gn/packages/gemma.scm][here]].
  256. * Containers
  257. ** Running in a Guix container
  258. Because GNU Guix has full control of the dependency graph one can
  259. create run above installation in a container where no other software
  260. is visible. I.e., in complete isolation. To start the container
  261. takes only 10 seconds
  262. #+BEGIN_SRC sh
  263. tux01:~$ env GUIX_PACKAGE_PATH=~/guix-bioinformatics:~/guix-past/modules/ ~/.config/guix/current/bin/guix environment -C genenetwork2
  264. #+END_SRC
  265. and gives a full environment to explore dependencies in a different
  266. way:
  267. #+BEGIN_SRC sh
  268. pjotr@tux01 ~ [env]$ gemma
  269. GEMMA 0.98.2 (2020-05-28) by Xiang Zhou and team (C) 2012-2020
  270. #+END_SRC
  271. We run websites this way in containers to enhance security. We also
  272. use containers for development:
  273. ** Development in a Guix container
  274. When starting a container the current directory is automatically
  275. mounted so you can compile and test software using the tools in the
  276. container. We use it, for example, for sambamba and gemma
  277. development. To develop GEMMA fetch the git repo and
  278. #+BEGIN_SRC sh
  279. guix environment -C guix --ad-hoc gcc-toolchain gdb gsl openblas zlib bash ld-wrapper perl vim which
  280. make
  281. make check
  282. #+END_SRC
  283. will create the full build environment. To test against against an older gcc we
  284. can simply do
  285. #+BEGIN_SRC sh
  286. guix environment -C guix --ad-hoc gcc-toolchain@6.3.0 gdb gsl openblas zlib bash ld-wrapper perl vim which
  287. make
  288. make check
  289. #+END_SRC
  290. Or for any other dependency. E.g., for openblas we even create our own
  291. optimized versions that are deployed in the GeneNetwork stack.
  292. It is the cats whiskers because no dependencies can bleed in from the
  293. surrounding Linux distribution. /Full control on reproducible software
  294. deployment from software cradle to software grave/.
  295. ** Creating a Docker container
  296. To create a Docker container is just as trivial.
  297. #+BEGIN_SRC sh
  298. time env GUIX_PACKAGE_PATH=~/guix-bioinformatics:~/guix-past/modules/ ~/.config/guix/current/bin/guix pack -f docker genenetwork2
  299. #+END_SRC
  300. and takes a full 12 seconds to generate a 966 Mb ~tar.gz~ Docker file!
  301. Try and beat that.
  302. For more information see [[./CONTAINERS.org]].
  303. * Finally
  304. Guix is great for controlled software deployment in development
  305. environments. It is beyond the scope of this document, but GNU Guix
  306. also allows for defining full (Cloud) operating systems as
  307. deterministic software definitions. At UTHSC we are building an HPC
  308. this way.