||11 months ago|
|README.org||11 months ago|
Installing GNU Guix on the Octopus HPC Cluster
Having run GNU Guix on our machines for some years with great results we decided to build a new HPC cluster consisting of Dell PowerEdge R6515 machines with AMD EPYC 7402P 24-Core Processor on board. Two machines (the head nodes) have 1TB RAM and the other 9 machines have 128GB RAM on board.
Because we own the cluster with admin rights we can deploy software as we see fit. The nodes can boot from an on demand image. We'll allow for Docker, Singularity and even VMs to run. Researchers will use our cluster and we want to give them freedom.
To bootstrap the installation we use remote terminal access (serial over LAN by iDRAC) and boot into a system using a Debian10 live USB stick. Next we start the network and install Debian10 on the hard disks with debootstrap as a base install. We need to get up to speed quickly and this is the fastest route. In time we may have GNU Guix images which may allow us to skip a Debian install. The design of the system will allow for foreign Docker images and VMs which means the underlying system can be very lean. A suitable target for GNU Guix.
But first bootstrapping:
Getting the machines up and running
We use remote out-of-band access using a serial interface over LAN connecting with ssh. E.g.
First thing is to check serial settings
racadm>>get iDRAC.Serial [Key=iDRAC.Embedded.1#Serial.1] BaudRate=57600 Command= Enable=Enabled HistorySize=8192 IdleTimeout=300 NoAuth=Disabled set IDRAC.serial.BaudRate 115200
You could set user and password, but leave that for now. Connect to serial interface (com2 even though we are using ttyS0 throughout)
Probably a blank, to leave serial type control-backslash or
racadm>>serveraction powercycle connect com2
ESC+! and ENTER a few times and wait. What we want to do is select
a USB boot drive.
ESC+! equals F11 which selects the bootmanager. It
will pop up in 30s or 2min (on a large RAM machine).
Until we have PXE network and images we are going to manage installs via USB mounts. On the boot menu make a note of the service tag:
Service Tag: C5R6R53 PowerEdge R6515
One-shot BIOS Boot Menu. Choose
USB 2: U3 Cruzer Micro
which has a Debian10 live rescue system with serial access
enabled. You don't see the select menu, but you need to press [ENTER]
a few times. Login and you see we also have network and you can log in
At this stage we can use debootstrap to start installing the machine.
Partition the first drive. You can model it on Octopus01, but essentially you are free to do what you want ;). EFI 500Mb, a small SWAP space 8GB, and a Linux partition 16GB is about minimal. The rest of these drives should be part of the distributed store.
debootstrap --include=openssh-server buster $dest http://ftp.nl.debian.org/debian/
set dest to ext4 partition:
Make sure the partition is bootable (with fdisk), and
env LANG=C.UTF-8 chroot $dest /bin/bash apt-get install makedev mount none /proc -t proc cd /dev MAKEDEV generic
(may take a while)
ls -l /dev/tty* # should exist now
Set locales to include en_US.UTF-8
apt-get install locales dpkg-reconfigure locales # mount -t proc proc /proc apt-get install vim openssh-server less passwd # set root password, don't forget!
/dev/sda2 / ext4 errors=remount-ro 0 1 proc /proc proc defaults 0 0 /dev/sda2 none swap sw 0 0 # /dev/sda3 /export ext4 errors=remount-ro 0 1
Install kernel and headers (missing in target normally!)
apt-cache search linux-image apt-get install linux-image-amd64 linux-source apt-get install firmware-linux-free grub2
/etc/default/grub to give serial access and symlink and enable
email@example.com -> /lib/systemd/system/getty@.service
apt-get install vim git-core ssh tmux
Run git in /etc
cd /etc git init git add . git commit -a -m init chmod 0700 .git/
Check the OS
cat /etc/os-release PRETTY_NAME="Debian GNU/Linux 10 (buster)"
Run grub in the chroot boot partition. Or install grub or grub-pc
apt-get install grub2
Make a note of the existing grub menu entries
GRUB_CMDLINE_LINUX_DEFAULT="console=tty0 console=ttyS1,115200n8" GRUB_CMDLINE_LINUX="" GRUB_TERMINAL=serial GRUB_SERIAL_COMMAND="serial --speed=115200 --unit=1 --word=8 --parity=no --stop=1"
and enable serial in systemd
systemctl enable serial-getty@ttyS0.service systemctl start serial-getty@ttyS0.service
Install grub from chroot with
fdisk -l # mount /dev/?? /mnt/tmp mount -t proc none $dest/proc mount -o bind /dev $dest/dev mount -t sysfs sys $dest/sys LANG=C.UTF-8 chroot $dest /bin/bash # update-grub2 <- may not be right /usr/sbin/grub-install --recheck --no-floppy /dev/sda /usr/sbin/grub-install --recheck --no-floppy /dev/sdb update-grub2 # Check grub menu, it may not have been set! sync & reboot
Setup /etc/network/interfaces to include lo and eno
# The loopback network interface auto lo iface lo inet loopback # The primary network interface allow-hotplug eno1 iface eno1 inet dhcp
Boot into partition.