diff options
| author | Pjotr Prins | 2025-12-31 12:09:53 +0100 |
|---|---|---|
| committer | Pjotr Prins | 2026-01-05 11:12:11 +0100 |
| commit | 34f1d7c24d2122bbfbccdf7d0a42b9de0594afed (patch) | |
| tree | 8553942703a780a347c5be8ab39e78ee559c377e | |
| parent | d23c612809b1d516d8af1d9c5810be969bfa6e91 (diff) | |
| download | gn-gemtext-34f1d7c24d2122bbfbccdf7d0a42b9de0594afed.tar.gz | |
Octopus
| -rw-r--r-- | topics/octopus/lizardfs/lizard-maintenance.gmi | 12 | ||||
| -rw-r--r-- | topics/octopus/octopussy-needs-love.gmi | 43 |
2 files changed, 50 insertions, 5 deletions
diff --git a/topics/octopus/lizardfs/lizard-maintenance.gmi b/topics/octopus/lizardfs/lizard-maintenance.gmi index 69bd125..ef04b69 100644 --- a/topics/octopus/lizardfs/lizard-maintenance.gmi +++ b/topics/octopus/lizardfs/lizard-maintenance.gmi @@ -73,6 +73,15 @@ Chunks deletion state: 2ssd 7984 - - - - - - - - - - ``` +This table essentially says that slow and fast are replicating data (if they are in column 0 it is OK!). This looks good for fast: + +``` +Chunks replication state: + Goal 0 1 2 3 4 5 6 7 8 9 10+ + slow - 137461 448977 - - - - - - - - + fast 6133152 - 5 - - - - - - - - +``` + To query how the individual disks are filling up and if there are any errors: List all disks @@ -88,7 +97,8 @@ Other commands can be found with `man lizardfs-admin`. ``` lizardfs-admin info octopus01 9421 LizardFS v3.12.0 -Memory usage: 2.5GiB +Memory usage: 2.5GiB23 + Total space: 250TiB Available space: 10TiB Trash space: 510GiB Trash files: 188 diff --git a/topics/octopus/octopussy-needs-love.gmi b/topics/octopus/octopussy-needs-love.gmi index 035f402..3261f7c 100644 --- a/topics/octopus/octopussy-needs-love.gmi +++ b/topics/octopus/octopussy-needs-love.gmi @@ -16,6 +16,14 @@ Our Slurm PBS we are up-to-date because we run that completely on Guix and Arun Another thing we ought to fix is introduce centralized user management. So far we have had few users and just got by. But sometimes it bites us that users have different UIDs on the nodes. +## Architecture overview + +* O1 is the old head node hosting lizardfs - will move to a compute +* O2 is the old backup hosting the lizardfs shadow - will move to compute +* O3 is the new head node hosting moosefs +* O4 is the backup head node hosting moosefs shadow - will act as a compute node too + +All the other nodes are for compute. O1 and O4 will be the last nodes to remain on older Debian. They will handle the last bits of lizard. # Tasks @@ -121,7 +129,8 @@ We'll slowly start depleting the lizard. See also => lizardfs/README -o3 has 4 lizard drives. We'll start by depleting one. +O3 has 4 lizard drives. We'll start by depleting one. + # O2 @@ -188,6 +197,8 @@ The BIOS on T6 is newer than on T4+T5. That probably explains why the higher T n T6 has 4 SSDs, 2x 3.5T. Both unused. The lizard chunk server is failing, so might as well disable it. +I am using T6 to test network boots because it is not serving lizard. + # T7 On T7 root was full(!?). Culprit was Andrea with /tmp/sweepga_genomes_111850/. @@ -210,7 +221,31 @@ Next install Linux. I have two routes, one is using debootstrap, the other is vi So far, I managed to boot into ipxe on Octopus. The linux kernel loads over http, but it does not show output. Likely I need to: -* [ ] Build ipxe with serial support -* [ ] Test the installer with serial support +* [X] Build ipxe with serial support +* [X] Test the installer with serial support +* [X] Add NFS support +* [X] debootstrap install of new Debian on /export/nfs/nodes/debian14 +* [X] Make available through NFS and boot through IPXE + +I managed to boot T6 over the network. +Essentially we have a running Debian last stable on T6 that is completely run over NFS! +In the next steps I need to figure out: + +* [ ] Mount NFS with root access +* [ ] Every PXE node needs its own hard disk configuration +* [ ] Mount NFS from octopus01 +* [ ] Start slurm + +We can have this as a test node pretty soon. +But first we have to start moosefs and migrate data. + +I am doing some small tests and will put (old) T6 back on slurm again. + +# O4 + +O4 is going to be the backup head node. It will act as a compute node too, until we need it as the head node. O4 is currently not on the slurm queue. -This is best done using linux VMs locally. +* [X] Update guix on O1 +* [ ] Install guix moosefs +* [ ] Start moosefs master on O3 +* [ ] Start moosefs shadow on O4 |
