summary refs log tree commit diff
path: root/topics/octopus
diff options
context:
space:
mode:
Diffstat (limited to 'topics/octopus')
-rw-r--r--topics/octopus/lizardfs/lizard-maintenance.gmi (renamed from topics/octopus/lizardfs/README.gmi)100
-rw-r--r--topics/octopus/maintenance.gmi12
-rw-r--r--topics/octopus/moosefs/moosefs-maintenance.gmi252
-rw-r--r--topics/octopus/octopussy-needs-love.gmi266
4 files changed, 623 insertions, 7 deletions
diff --git a/topics/octopus/lizardfs/README.gmi b/topics/octopus/lizardfs/lizard-maintenance.gmi
index 7c91136..a34ef3e 100644
--- a/topics/octopus/lizardfs/README.gmi
+++ b/topics/octopus/lizardfs/lizard-maintenance.gmi
@@ -1,4 +1,4 @@
-# Information about lizardfs, and some usage suggestions
+# Lizard maintenance
 
 On the octopus cluster the lizardfs head node is on octopus01, with disks being added mainly from the other nodes. SSDs are added to the lizardfs-chunkserver.service systemd service and SDDs added to the lizardfs-chunkserver-hdd.service. The storage pool is available on all nodes at /lizardfs, with the default storage option of "slow", which corresponds to two copies of the data, both on SDDs.
 
@@ -73,6 +73,17 @@ Chunks deletion state:
         2ssd    7984    -       -       -       -       -       -       -       -       -       -
 ```
 
+<<<<<<< HEAD
+This table essentially says that slow and fast are replicating data (if they are in column 0 it is OK!). This looks good for fast:
+
+```
+Chunks replication state:
+        Goal    0       1       2       3       4       5       6       7       8       9       10+
+        slow    -       137461  448977  -       -       -       -       -       -       -       -
+        fast    6133152 -       5       -       -       -       -       -       -       -       -
+```
+This table essentially says that slow and fast are replicating data (if they are in column 0 it is OK!).
+
 To query how the individual disks are filling up and if there are any errors:
 
 List all disks
@@ -83,6 +94,42 @@ lizardfs-admin list-disks octopus01 9421 | less
 
 Other commands can be found with `man lizardfs-admin`.
 
+## Info
+
+```
+lizardfs-admin info octopus01 9421
+LizardFS v3.12.0
+Memory usage:   2.5GiB23
+
+Total space:    250TiB                                                                                                 Available space:        10TiB
+Trash space:    510GiB
+Trash files:    188
+Reserved space: 21GiB                                                                                                  Reserved files: 18
+FS objects:     7369883
+Directories:    378782
+Files:  6858803
+Chunks: 9100088
+Chunk copies:   20017964
+Regular copies (deprecated):    20017964
+```
+
+```
+lizardfs-admin chunks-health  octopus01 9421
+Chunks availability state:
+        Goal    Safe    Unsafe  Lost
+        slow    1323220 1       -
+        fast    6398524 -       5
+
+Chunks replication state:
+        Goal    0       1       2       3       4       5       6       7       8       9       10+
+        slow    -       218663  1104558 -       -       -       -       -       -       -       -
+        fast    6398524 -       5       -       -       -       -       -       -       -       -
+
+Chunks deletion state:
+        Goal    0       1       2       3       4       5       6       7       8       9       10+
+        slow    -       104855  554911  203583  76228   39425   19348   8659    3276    20077   292859
+        fast    6380439 18060   30      -       -       -       -       -       -       -       -
+```
 
 ## Deleted files
 
@@ -188,3 +235,54 @@ KeyringMode=inherit
 [Install]
 WantedBy=multi-user.target
 ```
+
+# To deplete and remove a drive in LizardFS
+
+**1. Mark the chunkserver (or specific disk) for removal**
+
+Edit the chunkserver's disk configuration file (typically `/etc/lizardfs/mfshdd.cfg`) and prefix the drive path with an asterisk:
+
+```
+*/mnt/disk_to_remove
+```
+
+Restart the chunkserver process on the node
+
+```bash
+systemctl stop lizardfs-chunkserver
+systemctl start lizardfs-chunkserver
+```
+
+**3. Monitor the evacuation progress**
+
+The master will begin migrating chunks off the marked drive. You can monitor progress with:
+
+```bash
+lizardfs-admin list-disks octopus01 9421
+lizardfs-admin list-disks octopus01 9421|grep 172.23.19.59 -A 7
+172.23.19.59:9422:/mnt/sdc/lizardfs_vol/
+        to delete: yes
+        damaged: no
+        scanning: no
+        last error: no errors
+        total space: 3.6TiB
+        used space: 3.4TiB
+        chunks: 277k
+```
+
+Look for the disk showing evacuation status. The "to delete" chunks count should decrease over time as data is replicated elsewhere.
+
+You can also check the CGI web interface if you have it running—it shows disk status and chunk counts.
+
+**4. Remove the drive once empty**
+
+Once all chunks have been evacuated (the disk shows 0 chunks or is marked as empty), you can safely:
+
+1. Remove the line from `mfshdd.cfg` entirely
+2. Reload the configuration again
+3. Physically remove or repurpose the drive
+
+**Important notes:**
+- Ensure you have enough free space on other disks to absorb the migrating chunks
+- The evacuation time depends on the amount of data and network/disk speed
+- Don't forcibly remove a drive before evacuation completes, or you risk data loss if replication goals aren't met
diff --git a/topics/octopus/maintenance.gmi b/topics/octopus/maintenance.gmi
index 65ea52e..00cc575 100644
--- a/topics/octopus/maintenance.gmi
+++ b/topics/octopus/maintenance.gmi
@@ -11,7 +11,7 @@ octopus02
 - Devices: 2 3.7T SSDs + 2 894.3G SSDs + 2 4.6T HDDs
 - **Status: Slurm not OK, LizardFS not OK**
 - Notes:
-  - `octopus02 mfsmount[31909]: can't resolve master hostname and/or portname (octopus01:9421)`, 
+  - `octopus02 mfsmount[31909]: can't resolve master hostname and/or portname (octopus01:9421)`,
   - **I don't see 2 drives that are physically mounted**
 
 octopus03
@@ -21,7 +21,7 @@ octopus03
 
 octopus04
 - Devices: 4 7.3 T SSDs (Neil) + 1 4.6T HDD + 1 3.7T SSD + 2 894.3G SSDs
-- Status: Slurm NO, LizardFS OK (we don't share the HDD) 
+- Status: Slurm NO, LizardFS OK (we don't share the HDD)
 - Notes: no
 
 octopus05
@@ -31,7 +31,7 @@ octopus05
 
 octopus06
 - Devices: 1 7.3 T SSDs (Neil) + 1 4.6T HDD + 4 3.7T SSDs + 2 894.3G SSDs
-- Status: Slurm OK, LizardFS OK (we don't share the HDD) 
+- Status: Slurm OK, LizardFS OK (we don't share the HDD)
 - Notes: no
 
 octopus07
@@ -41,17 +41,17 @@ octopus07
 
 octopus08
 - Devices: 1 7.3 T SSDs (Neil) + 1 4.6T HDD + 4 3.7T SSDs + 2 894.3G SSDs
-- Status: Slurm OK, LizardFS OK (we don't share the HDD) 
+- Status: Slurm OK, LizardFS OK (we don't share the HDD)
 - Notes: no
 
 octopus09
 - Devices: 1 7.3 T SSDs (Neil) + 1 4.6T HDD + 4 3.7T SSDs + 2 894.3G SSDs
-- Status: Slurm OK, LizardFS OK (we don't share the HDD) 
+- Status: Slurm OK, LizardFS OK (we don't share the HDD)
 - Notes: no
 
 octopus10
 - Devices: 1 7.3 T SSDs (Neil) + 4 3.7T SSDs + 2 894.3G SSDs
-- Status: Slurm OK, LizardFS OK (we don't share the HDD) 
+- Status: Slurm OK, LizardFS OK (we don't share the HDD)
 - Notes: **I don't see 1 device that is physically mounted**
 
 octopus11
diff --git a/topics/octopus/moosefs/moosefs-maintenance.gmi b/topics/octopus/moosefs/moosefs-maintenance.gmi
new file mode 100644
index 0000000..1032cde
--- /dev/null
+++ b/topics/octopus/moosefs/moosefs-maintenance.gmi
@@ -0,0 +1,252 @@
+# Moosefs
+
+We use moosefs as a network distributed storage system with redundancy. The setup is to use SSDs for fast access and spinning storage for redundancy/backups (in turn these are in RAID5 configuration). In addition we'll experiment with a non-redundant fast storage access using the fastest drives and network connections.
+
+# Configuration
+
+## Ports
+
+We should use different ports than lizard. Lizard uses 9419-24 by default. So let's use
+9519- ports.
+
+* 9519 for moose meta logger
+* 9520 for chunk server connections
+* 9521 for mount connections
+* 9522 for slow HDD chunks (HDD)
+* 9523 for replicating SSD chunks (SSD)
+* 9524 for fast non-redundant SSD chunks (FAST)
+
+## Topology
+
+Moosefs uses topology to decide where to fetch data. We can host the slow spinning HDD drives in a 'distant' location, so that data is fetched last.
+
+## Disks
+
+Some disks are slower than others. To test we can do:
+
+```
+root@octopus03:/export# dd if=/dev/zero of=test1.img bs=1G count=1
+1+0 records in
+1+0 records out
+1073741824 bytes (1.1 GB, 1.0 GiB) copied, 2.20529 s, 487 MB/s
+/sbin/sysctl -w vm.drop_caches=3
+root@octopus03:/export#  dd if=test1.img of=/dev/null bs=1G count=1
+1+0 records in
+1+0 records out
+1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.649035 s, 1.7 GB/s
+rm test1.img
+```
+
+Above is on a RAID5 setup. Typical values are:
+
+```
+                       Write         Read
+Octopus Dell NVME      1.2 GB/s      2.0 GB/s
+Octopus03 RAID5        487 MB/s      1.7 GB/s
+Octopus01 RAID5        127 MB/s      163 MB/s
+Samsung SSD 870        408 MB/s      565 MB/s
+```
+
+```
+mfs#octopus03:9521   3.7T  4.0G  3.7T   1% /moosefs-fast
+```
+
+## Command line
+
+```
+. /usr/local/guix-profiles/moosefs/etc/profile
+mfscli -H octopus03 -P 9521 -SCS
+```
+
+## Config
+
+```
+root@octopus03:/etc/mfs# diff example/mfsexports.cfg.sample mfsexports.cfg
+2c2,4
+< *                     /       rw,alldirs,admin,maproot=0:0
+---
+> 172.23.21.0/24                       /       rw,alldirs,maproot=0,ignoregid
+> 172.23.22.0/24                       /       rw,alldirs,maproot=0,ignoregid
+> 172.23.17.0/24                       /       rw,alldirs,maproot=0,ignoregid
+```
+
+```
+root@octopus03:/etc/mfs# diff example/mfsmaster.cfg.sample mfsmaster.cfg
+4a5,10
+> ## Only one metadata server in LizardFS shall have 'master' personality.
+> PERSONALITY = master
+>
+> ## Password for administrative connections and commands.
+> ADMIN_PASSWORD = nolizard
+>
+6c12
+< # WORKING_USER = nobody
+---
+> WORKING_USER = mfs
+9c15
+< # WORKING_GROUP =
+---
+> WORKING_GROUP = mfs
+27c33
+< # DATA_PATH = /gnu/store/yg0xb1g9mls04h4085kmfbbg8z36a7c2-moosefs-4.58.3/var/mfs
+---
+> DATA_PATH = /export/var/lib/mfs
+34c40
+< # EXPORTS_FILENAME = /gnu/store/yg0xb1g9mls04h4085kmfbbg8z36a7c2-moosefs-4.58.3/etc/mfs/mfsexports.cfg
+---
+> EXPORTS_FILENAME = /etc/mfs/mfsexports.cfg
+87c93
+< # MATOML_LISTEN_PORT = 9419
+---
+> MATOML_LISTEN_PORT = 9519
+103c109
+< # MATOCS_LISTEN_PORT = 9420
+---
+> MATOCS_LISTEN_PORT = 9520
+219c225
+< # MATOCL_LISTEN_PORT = 9421
+---
+> MATOCL_LISTEN_PORT = 9521
+```
+
+```
+root@octopus03:/etc/mfs# cat mfsgoals.cfg
+# safe - 2 copies, 1 on slow disk, 1 on fast disk
+11 slow: HDD SSD
+
+# Fast storage - 1 copy on fast disks, no redundancy
+12 fast: FAST
+```
+
+```
++++ b/mfs/mfschunkserver-fast.cfg
+ # user to run daemon as (default is nobody)
+-# WORKING_USER = nobody
++WORKING_USER = mfs
+
+ # group to run daemon as (optional - if empty then default user group will be used)
+-# WORKING_GROUP =
++WORKING_GROUP = mfs
+
+ # name of process to place in syslog messages (default is mfschunkserver)
+ # SYSLOG_IDENT = mfschunkserver
+@@ -28,6 +28,7 @@
+
+ # where to store daemon lock file (default is /gnu/store/yg0xb1g9mls04h4085kmfbbg8z36a7c2-moosefs-4.58.3/var/mfs)
+ # DATA_PATH = /gnu/store/yg0xb1g9mls04h4085kmfbbg8z36a7c2-moosefs-4.58.3/var/mfs
++DATA_PATH=/var/lib/mfs
+
+ # when set to one chunkserver will not abort start even when incorrect entries are found in 'mfshdd.cfg' file
+ # ALLOW_STARTING_WITH_INVALID_DISKS = 0
+@@ -41,6 +42,7 @@
+
+ # alternate location/name of mfshdd.cfg file (default is /gnu/store/yg0xb1g9mls04h4085kmfbbg8z36a7c2-moosefs-4.58.3/etc/mfs/mfshdd.cfg); this
+file will be re-read on each process reload, regardless if the path was changed
+ # HDD_CONF_FILENAME = /gnu/store/yg0xb1g9mls04h4085kmfbbg8z36a7c2-moosefs-4.58.3/etc/mfs/mfshdd.cfg
++HDD_CONF_FILENAME = /etc/mfs/mfsdisk-fast.cfg
+
+ # speed of background chunk tests in MB/s per disk (formally entry defined in mfshdd.cfg). Value can be given as a decimal number (default is
+1.0)
+ # deprecates: HDD_TEST_FREQ (if HDD_TEST_SPEED is not defined, but there is redefined HDD_TEST_FREQ, then HDD_TEST_SPEED = 10 / HDD_TEST_FREQ)
+@@ -109,10 +111,10 @@
+ # BIND_HOST = *
+
+ # MooseFS master host, IP is allowed only in single-master installations (default is mfsmaster)
+-# MASTER_HOST = mfsmaster
++MASTER_HOST = octopus03
+
+ # MooseFS master command port (default is 9420)
+-# MASTER_PORT = 9420
++MASTER_PORT = 9520
+
+ # timeout in seconds for master connections. Value >0 forces given timeout, but when value is 0 then CS asks master for timeout (default is 0
+- ask master)
+ # MASTER_TIMEOUT = 0
+@@ -134,5 +136,5 @@
+ # CSSERV_LISTEN_HOST = *
+
+ # port to listen for client (mount) connections (default is 9422)
+-# CSSERV_LISTEN_PORT = 9422
++CSSERV_LISTEN_PORT = 9524
+```
+
+```
++++ b/mfs/mfsmount.cfg
+mfsmaster=octopus03,nosuid,nodev,noatime,nosuid,mfscachemode=AUTO,mfstimeout=30,mfswritecachesize=2048,mfsreadaheadsize=2048,mfsport=9521
+/moosefs-fast
+```
+
+## systemd
+
+
+```
+root@octopus03:/etc# cat systemd/system/moosefs-master.service
+Description=MooseFS master server daemon
+Documentation=man:mfsmaster
+After=network.target
+Wants=network-online.target
+
+[Service]
+Type=forking
+TimeoutSec=0
+ExecStart=/usr/local/guix-profiles/moosefs/sbin/mfsmaster -d start -c /etc/mfs/mfsmaster.cfg -x
+ExecStop=/usr/local/guix-profiles/moosefs/sbin/mfsmaster -c /etc/mfs/mfsmaster.cfg stop
+ExecStop=/usr/local/guix-profiles/moosefs/sbin/mfsmaster -c /etc/mfs/mfsmaster.cfg reload
+ExecReload=/bin/kill -HUP $MAINPID
+User=mfs
+Group=mfs
+Restart=on-failure
+RestartSec=60
+OOMScoreAdjust=-999
+
+[Install]
+WantedBy=multi-user.target
+```
+
+```
+ cat systemd/system/moosefs-mount.service
+[Unit]
+Description=Moosefs mounts
+After=syslog.target network.target
+
+[Service]
+Type=forking
+TimeoutSec=600
+ExecStart=/usr/local/guix-profiles/moosefs/bin/mfsmount -c /etc/mfs/mfsmount.cfg
+ExecStop=/usr/bin/umount /moosefs-fast
+
+[Install]
+WantedBy=multi-user.target
+root@octopus04:/etc# cat systemd/system/moosefs-chunkserver-fast.service
+[Unit]
+Description=MooseFS Chunkserver (Fast)
+After=network.target
+
+[Service]
+Type=simple
+ExecStart=/usr/local/guix-profiles/moosefs/sbin/mfschunkserver -f -c /etc/mfs/mfschunkserver-fast.cfg
+User=mfs
+Group=mfs
+Restart=on-failure
+RestartSec=5
+LimitNOFILE=65535
+
+[Install]
+WantedBy=multi-user.target
+```
+
+```
+cat systemd/system/moosefs-mount.service
+[Unit]
+Description=Moosefs mounts
+After=syslog.target network.target
+
+[Service]
+Type=forking
+TimeoutSec=600
+ExecStart=/usr/local/guix-profiles/moosefs/bin/mfsmount -c /etc/mfs/mfsmount.cfg
+ExecStop=/usr/bin/umount /moosefs-fast
+
+[Install]
+WantedBy=multi-user.target
+```
diff --git a/topics/octopus/octopussy-needs-love.gmi b/topics/octopus/octopussy-needs-love.gmi
new file mode 100644
index 0000000..8c6315d
--- /dev/null
+++ b/topics/octopus/octopussy-needs-love.gmi
@@ -0,0 +1,266 @@
+# Octopussy needs love
+
+At UTHSC, Memphis, TN, around October 2020 Efraim and I installed Octopus on Debian+Guix with lizard as a distributed network storage system and slurm for job control. Around October 2023 we added 5 genoa tux05-09 machines, doubling the cluster in size. See
+
+=> https://genenetwork.org/gn-docs/facilities
+
+Octopus made a lot of work possible we can't really do on larger HPCs and led to a bunch of high impact studies and publications, particularly on pangenomics.
+
+In the coming period we want te replace lizard with moosefs. Lizard is no longer maintained and as it was a fork of Moose, it is only logical to go forward on that one. We also looked at Ceph, but apparently Ceph is not great for systems that carry no redundancy. So far, lizard has been using redundancy, but we figure we can do without if the occassional (cheap) SSD goes bad.
+
+We also need to look at upgrading some of the Dell BIOS - particularly tux05-09 - as they can be occassionally problematic with non-OEM SSDs.
+
+On the worker nodes it may be wise to upgrade Debian. Followed by an upgrade to the head nodes and other supporting machines. Even though we rely on Guix for latest and greatest, there may be good upgrades in the underlying Linux kernel and drivers.
+
+Our Slurm PBS we are up-to-date because we run that completely on Guix and Arun supports the latest and greatest.
+
+Another thing we ought to fix is introduce centralized user management. So far we have had few users and just got by. But sometimes it bites us that users have different UIDs on the nodes.
+
+## Architecture overview
+
+* O1 is the old head node hosting lizardfs - will move to a compute
+* O2 is the old backup hosting the lizardfs shadow - will move to compute
+* O3 is the new head node hosting moosefs
+* O4 is the backup head node hosting moosefs shadow - will act as a compute node too
+
+All the other nodes are for compute. O1 and O4 will be the last nodes to remain on older Debian. They will handle the last bits of lizard.
+
+# Tasks
+
+* [X] Create moosefs package
+* [X] Install moosefs
+* [X] Upgrade bios (all tuxes)
+* [ ] Migrate lizardfs nodes to moosefs (one at a time)
+* [ ] Add server monitoring with sheepdog
+* [ ] Upgrade Debian
+* - [ ] Maybe, just maybe, boot the nodes from a central server
+* [ ] Introduce centralized user management
+
+# Progress
+
+## Lizardfs and Moosefs
+
+Our Lizard documention lives at
+
+=> lizardfs/README
+
+Efraim wrote a lizardfs for Guix at the time in guix-bioinformatics, but we ended up deploying with Debian. Going back now, the package does not look too taxing (I think we dropped it because the Guix system configuration did not play well).
+
+=> https://git.genenetwork.org/guix-bioinformatics/tree/gn/packages/file-systems.scm
+
+Looking at the Debian package
+
+=> https://salsa.debian.org/debian/moosefs
+
+It carries no special patches, but a few nice hints in *.README.debian. I think it is worth trying to write a Guix package so we can easily upgrade (even on an aging Debian). Future proofing is key.
+
+The following built moosefs in a guix shell:
+
+```
+guix shell -C -D -F coreutils make autoconf automake fuse libpcap zlib pkg-config python libtool gcc-toolchain
+autoreconf -f -i
+make
+```
+
+Next I created a guix package that installs with:
+
+```
+guix build -L ~/guix-bioinformatics -L ~/guix-past/modules moosefs
+```
+
+See
+
+=> https://git.genenetwork.org/guix-bioinformatics/commit/?id=236903baaab0f84f012a55700c1917265a2b701c
+
+Next stop testing and deploying!
+
+## Choosing a head node
+
+Currently octopus01 is the head node. It probably is a good idea to change that, so we can safely upgrade the new server. The first choice would be octopus02 (o2). We can mirror the moose daemons on octopus01 (o1) later. Let's see what that looks like.
+
+A quick assessment of o1 shows that we have 14T storage on o1 that takes care of /home and /gnu. But only 1.2T is used.
+
+o2 has also quite a few disks (up 1417 days!), but a bunch of SSDs appears to error out. E.g.
+
+```
+Sep 04 07:44:56 octopus02 mfschunkserver[22766]: can't create lock file /mnt/sdd1/lizardfs_vol/.lock, marking hdd as damaged: Input/output error
+UUID=277c05de-64f5-48a8-8614-8027a53be212 /mnt/sdd1 xfs rw,exec,nodev,noatime,nodiratime,largeio,inode64 0 1
+```
+
+Lizard also complains 4 SSDs have been wiped out.
+We'll need to reboot the server to see what storage still may work. The slurm connection appears to be misconfigured:
+
+```
+[2025-12-20T09:36:27.846] error: service_connection: slurm_receive_msg: Insane message length
+[2025-12-20T09:36:28.415] error: unpackstr_xmalloc: Buffer to be unpacked is too large (1700881509 > 1073741824)       [2025-12-20T09:36:28.415] error: unpacking header                                                                      [2025-12-20T09:36:28.415] error: destroy_forward: no init                                                              [2025-12-20T09:36:28.415] error: slurm_receive_msg_and_forward: [[nessus6.uthsc.edu]:35553] failed: Message receive failure
+```
+
+looks like Andrea is the only one using the machine right now though some others logged in. Before rebooting I'll block users, ask Andrea to move off, and deplete slurm and lizard. But o2 is a large RAM machine, so we should not use that as a head node.
+
+Let's take a look at o3. This one has less RAM. Flavia is running some tools, but I don't think the machine is really used right now. Slurm is running, but shows similar configuration issues as o2. Let's take a look at slurm
+
+=> ../systems/hpc/octopus-maintenance
+=> ../hpc/octopus/slurm-user-guide
+
+Alright, I depleted and removed slurm from o3. I think it would be wise to also deplete the lizard drives on that machine.
+
+The big users on lizard are:
+
+```
+1.6T    dashbrook
+1.8T    pangenomes
+2.1T    erikg
+3.4T    aruni
+3.4T    junh
+8.4T    hchen
+9.2T    salehi
+13T     guarracino
+16T     flaviav
+```
+
+it seems we can clean some of that up! We have some backup storage that we can use. Alternatively move to ISAAC.
+
+We'll slowly start depleting the lizard. See also
+
+=> lizardfs/README
+
+O3 has 4 lizard drives. We'll start by depleting one.
+
+
+# O2
+
+```
+172.23.22.159:9422:/mnt/sde1/lizardfs_vol/
+        to delete: no
+        damaged: yes
+        scanning: no
+        last error: no errors
+        total space: 0B
+        used space: 0B
+        chunks: 0
+172.23.22.159:9422:/mnt/sdd1/lizardfs_vol/
+        to delete: no
+        damaged: yes
+        scanning: no
+        last error: no errors
+        total space: 0B
+        used space: 0B
+        chunks: 0
+172.23.22.159:9422:/mnt/sdc1/lizardfs_vol/
+        to delete: no
+        damaged: yes
+        scanning: no
+        last error: no errors
+        total space: 0B
+        used space: 0B
+        chunks: 0
+```
+
+Stopped the chunk server.
+sde remounted after xfs_repair. The others were not visible, so rebooted. The folloing storage should add to the total again:
+
+```
+/dev/sdc1            4.6T  3.9T  725G  85% /mnt/sdc1
+/dev/sdd1            4.6T  4.2T  428G  91% /mnt/sdd1
+/dev/sdf1            4.6T  4.2T  358G  93% /mnt/sdf1
+/dev/sde             3.7T  3.7T  4.0G 100% /mnt/sde
+/dev/sdg1            3.7T  3.7T  3.9G 100% /mnt/sdg1
+```
+
+After adding this storage and people removing material it starts to look better:
+
+```
+mfs#octopus01:9421   171T   83T   89T  49% /lizardfs
+```
+
+# O3
+
+I have marked the disks (4x4T) on o3 for deletion - that will subtract 7T. This in preparation for upgrading Linux and migrating those disks to moosefs. Continue below.
+
+# T5
+
+T5 requires a new bios - it has the same one as the unreliable T4. I also need to see if there are any disks in the bios we don't see right now. T5 has two small fast SSDs and one larger one (3.5T).
+
+I managed to install the new bios, but I had trouble getting into linux because of some network/driver issues. ipmi was suspect. Finally managed rescue mode by adding 'systemd.unit=emergency.target' in the grub line. 'single' is no longer enough (grrr). One to keep in mind.
+
+Had to disable ipmi modules. See my idrac.org.
+
+# T6
+
+Tux06 (T6) contains two unused drives that appear to have contained XFS. xfs_repair did not really help...
+The BIOS on T6 is newer than on T4+T5. That probably explains why the higher T numbers have no disk issues, while T4+T5 had problems with non-OEM! Anyway, as I was at it, I updated the BIOS for all.
+
+T6 has 4 SSDs, 2x 3.5T. Both unused. The lizard chunk server is failing, so might as well disable it.
+
+I am using T6 to test network boots because it is not serving lizard.
+
+# T7
+
+On T7 root was full(!?). Culprit was Andrea with /tmp/sweepga_genomes_111850/.
+T7 has 3x3.5T with one unused.
+
+# T8
+
+T8 has 3x3.5T, all used. After the BIOS upgrade the efi partition did not boot. After a few reboots it did get into grub and I made a copy of the efi partition on sdd (just in case).
+
+# T9
+
+T9 has 1x3.5T. Used. I had to reduce HDD_LEAVE_SPACE_DEFAULT to give the chunkserver some air.
+
+# O3 + O4
+
+Back to O3, our future head node. lizard has mostly been depleted. Though every drive has a few chunks left. I just pulled down the chunkserver and lizard appears to be fine (no errors). Good!
+
+Next install Linux. I have two routes, one is using debootstrap, the other is via PXE. I want to try the latter.
+
+So far, I managed to boot into ipxe on Octopus.
+The linux kernel loads over http, but it does not show output. Likely I need to:
+
+* [X] Build ipxe with serial support
+* [X] Test the installer with serial support
+* [X] Add NFS support
+* [X] debootstrap install of new Debian on /export/nfs/nodes/debian14
+* [X] Make available through NFS and boot through IPXE
+
+I managed to boot T6 over the network.
+Essentially we have a running Debian last stable on T6 that is completely run over NFS!
+In the next steps I need to figure out:
+
+* [X] Mount NFS with root access
+* [ ] Every PXE node needs its own hard disk configuration
+* [ ] Mount NFS from octopus01
+* [ ] Start slurm
+
+We can have this as a test node pretty soon.
+But first we have to start moosefs and migrate data.
+
+I am doing some small tests and will put (old) T6 back on slurm again.
+
+To get every node booted with its own version of fstab and state logging on a local disk we need to pull some trick with initrd.
+
+Basically NFS boot initrd needs to contain a script that invokes changes for every node. The node hostname and primary partition can be passed on from ipxe using the kernel myhost=client01 localdisk=/dev/sda1. So that is the differentiator. The script in /etc/nodes/initramfs-tools/update-node-etc will remount /tmp and /var onto $localdisk and copy /etc there too. Next it will symnlink a few files, such as /etc/hostname and /etc/fstab to adjust for local settings.
+
+This way we will deploy all nodes centrally. One aspect is that we don't need dynamic user management as it is centrally orchestrated! The user files can be copied from the head node when they change.
+
+O4 is going to be the backup head node. It will act as a compute node too, until we need it as the head node. O4 is currently not on the slurm queue.
+
+* [X] Update guix on O1
+* [X] Install guix moosefs
+* [X] Start moosefs master on O3
+* [X] Start moosefs metalogger on O4
+* [ ] Check moosefs logging facilities
+* [ ] See if we can mark drives so it is easier to track them
+* [ ] Test broken (?) /dev/sdf on octopus03
+
+We can start moose master on O3. We should use different ports than lizard. Lizard uses 9419-24 by default. So let's use
+9519- ports. See
+
+=> moosefs/moosefs-maintenance.gmi
+
+# P2
+
+Penguin2 has 80T of spinning disk storage. We are going to use that for redundancy. Basically these disks get a moosefs goal of HDD 'slow' and we'll configure them on a remote rack - so chunks get fetched from local chunk servers (first). This will gain us 40T of immediate storage. Adding more spinning disks will free up SSDs further.
+
+* [X] P2 Update Guix
+* [X] Install moosefs
+* [ ] Create HDD chunk server