topics/octopus/octopussy-needs-love.gmi


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165

# Octopussy needs love

At UTHSC, Memphis, TN, around October 2020 Efraim and I installed Octopus on Debian+Guix with lizard as a distributed network storage system and slurm for job control. Around October 2023 we added 5 genoa tux05-09 machines, doubling the cluster in size. See

=> https://genenetwork.org/gn-docs/facilities

Octopus made a lot of work possible we can't really do on larger HPCs and led to a bunch of high impact studies and publications, particularly on pangenomics.

In the coming period we want te replace lizard with moosefs. Lizard is no longer maintained and as it was a fork of Moose, it is only logical to go forward on that one. We also looked at Ceph, but apparently Ceph is not great for systems that carry no redundancy. So far, lizard has been using redundancy, but we figure we can do without if the occassional (cheap) SSD goes bad.

We also need to look at upgrading some of the Dell BIOS - particularly tux05-09 - as they can be occassionally problematic with non-OEM SSDs.

On the worker nodes it may be wise to upgrade Debian. Followed by an upgrade to the head nodes and other supporting machines. Even though we rely on Guix for latest and greatest, there may be good upgrades in the underlying Linux kernel and drivers.

Our Slurm PBS we are up-to-date because we run that completely on Guix and Arun supports the latest and greatest.

Another thing we ought to fix is introduce centralized user management. So far we have had few users and just got by. But sometimes it bites us that users have different UIDs on the nodes.


# Tasks

* [X] Create moosefs package
* [ ] Install moosefs
* [ ] Upgrade bios (tuxes)
* [ ] Migrate lizardfs nodes to moosefs (one at a time)
* [ ] Add server monitoring with sheepdog
* [ ] Upgrade Debian
* - [ ] Maybe, just maybe, boot the nodes from a central server
* [ ] Introduce centralized user management

# Progress

## Lizardfs and Moosefs

Our Lizard documention lives at

=> lizardfs/README

Efraim wrote a lizardfs for Guix at the time in guix-bioinformatics, but we ended up deploying with Debian. Going back now, the package does not look too taxing (I think we dropped it because the Guix system configuration did not play well).

=> https://git.genenetwork.org/guix-bioinformatics/tree/gn/packages/file-systems.scm

Looking at the Debian package

=> https://salsa.debian.org/debian/moosefs

It carries no special patches, but a few nice hints in *.README.debian. I think it is worth trying to write a Guix package so we can easily upgrade (even on an aging Debian). Future proofing is key.

The following built moosefs in a guix shell:

```
guix shell -C -D -F coreutils make autoconf automake fuse libpcap zlib pkg-config python libtool gcc-toolchain
autoreconf -f -i
make
```

Next I created a guix package that installs with:

```
guix build -L ~/guix-bioinformatics -L ~/guix-past/modules moosefs
```

See

=> https://git.genenetwork.org/guix-bioinformatics/commit/?id=236903baaab0f84f012a55700c1917265a2b701c

Next stop testing and deploying!

## Choosing a head node

Currently octopus01 is the head node. It probably is a good idea to change that, so we can safely upgrade the new server. The first choice would be octopus02 (o2). We can mirror the moose daemons on octopus01 (o1) later. Let's see what that looks like.

A quick assessment of o1 shows that we have 14T storage on o1 that takes care of /home and /gnu. But only 1.2T is used.

o2 has also quite a few disks (up 1417 days!), but a bunch of SSDs appears to error out. E.g.

```
Sep 04 07:44:56 octopus02 mfschunkserver[22766]: can't create lock file /mnt/sdd1/lizardfs_vol/.lock, marking hdd as damaged: Input/output error
UUID=277c05de-64f5-48a8-8614-8027a53be212 /mnt/sdd1 xfs rw,exec,nodev,noatime,nodiratime,largeio,inode64 0 1
```

Lizard also complains 4 SSDs have been wiped out.
We'll need to reboot the server to see what storage still may work. The slurm connection appears to be misconfigured:

```
[2025-12-20T09:36:27.846] error: service_connection: slurm_receive_msg: Insane message length
[2025-12-20T09:36:28.415] error: unpackstr_xmalloc: Buffer to be unpacked is too large (1700881509 > 1073741824)       [2025-12-20T09:36:28.415] error: unpacking header                                                                      [2025-12-20T09:36:28.415] error: destroy_forward: no init                                                              [2025-12-20T09:36:28.415] error: slurm_receive_msg_and_forward: [[nessus6.uthsc.edu]:35553] failed: Message receive failure
```

looks like Andrea is the only one using the machine right now though some others logged in. Before rebooting I'll block users, ask Andrea to move off, and deplete slurm and lizard. But o2 is a large RAM machine, so we should not use that as a head node.

Let's take a look at o3. This one has less RAM. Flavia is running some tools, but I don't think the machine is really used right now. Slurm is running, but shows similar configuration issues as o2. Let's take a look at slurm

=> ../systems/hpc/octopus-maintenance
=> ../hpc/octopus/slurm-user-guide

Alright, I depleted and removed slurm from o3. I think it would be wise to also deplete the lizard drives on that machine.

The big users on lizard are:

```
1.6T    dashbrook
1.8T    pangenomes
2.1T    erikg
3.4T    aruni
3.4T    junh
8.4T    hchen
9.2T    salehi
13T     guarracino
16T     flaviav
```

it seems we can clean some of that up! We have some backup storage that we can use. Alternatively move to ISAAC.

We'll slowly start depleting the lizard. See also

=> lizardfs/README

o3 has 4 lizard drives. We'll start by depleting one.

# O2

```
172.23.22.159:9422:/mnt/sde1/lizardfs_vol/
        to delete: no
        damaged: yes
        scanning: no
        last error: no errors
        total space: 0B
        used space: 0B
        chunks: 0
172.23.22.159:9422:/mnt/sdd1/lizardfs_vol/
        to delete: no
        damaged: yes
        scanning: no
        last error: no errors
        total space: 0B
        used space: 0B
        chunks: 0
172.23.22.159:9422:/mnt/sdc1/lizardfs_vol/
        to delete: no
        damaged: yes
        scanning: no
        last error: no errors
        total space: 0B
        used space: 0B
        chunks: 0
```

Stopped the chunk server.
sde remounted after xfs_repair. The others were not visible, so rebooted. The folloing storage should add to the total again:

```
/dev/sdc1            4.6T  3.9T  725G  85% /mnt/sdc1
/dev/sdd1            4.6T  4.2T  428G  91% /mnt/sdd1
/dev/sdf1            4.6T  4.2T  358G  93% /mnt/sdf1
/dev/sde             3.7T  3.7T  4.0G 100% /mnt/sde
/dev/sdg1            3.7T  3.7T  3.9G 100% /mnt/sdg1
```

After adding this storage and people removing material it starts to look better:

```
mfs#octopus01:9421   171T   83T   89T  49% /lizardfs
```