diff options
| author | Pjotr Prins | 2025-12-23 09:07:23 +0100 |
|---|---|---|
| committer | Pjotr Prins | 2026-01-05 11:12:11 +0100 |
| commit | 9aef970b739a1983215987a5d4956e689706e94c (patch) | |
| tree | 5a9ea3e80fb083eb857214d56be9afb326d640c0 /topics/octopus | |
| parent | b1b2d68813935a747d08029e182671441a03a8a4 (diff) | |
| download | gn-gemtext-9aef970b739a1983215987a5d4956e689706e94c.tar.gz | |
octopus
Diffstat (limited to 'topics/octopus')
| -rw-r--r-- | topics/octopus/lizardfs/README.gmi | 86 | ||||
| -rw-r--r-- | topics/octopus/octopussy-needs-love.gmi | 71 |
2 files changed, 156 insertions, 1 deletions
diff --git a/topics/octopus/lizardfs/README.gmi b/topics/octopus/lizardfs/README.gmi index 7c91136..b52c03e 100644 --- a/topics/octopus/lizardfs/README.gmi +++ b/topics/octopus/lizardfs/README.gmi @@ -83,6 +83,41 @@ lizardfs-admin list-disks octopus01 9421 | less Other commands can be found with `man lizardfs-admin`. +## Info + +``` +lizardfs-admin info octopus01 9421 +LizardFS v3.12.0 +Memory usage: 2.5GiB +Total space: 250TiB Available space: 10TiB +Trash space: 510GiB +Trash files: 188 +Reserved space: 21GiB Reserved files: 18 +FS objects: 7369883 +Directories: 378782 +Files: 6858803 +Chunks: 9100088 +Chunk copies: 20017964 +Regular copies (deprecated): 20017964 +``` + +``` +lizardfs-admin chunks-health octopus01 9421 +Chunks availability state: + Goal Safe Unsafe Lost + slow 1323220 1 - + fast 6398524 - 5 + +Chunks replication state: + Goal 0 1 2 3 4 5 6 7 8 9 10+ + slow - 218663 1104558 - - - - - - - - + fast 6398524 - 5 - - - - - - - - + +Chunks deletion state: + Goal 0 1 2 3 4 5 6 7 8 9 10+ + slow - 104855 554911 203583 76228 39425 19348 8659 3276 20077 292859 + fast 6380439 18060 30 - - - - - - - - +``` ## Deleted files @@ -188,3 +223,54 @@ KeyringMode=inherit [Install] WantedBy=multi-user.target ``` + +# To deplete and remove a drive in LizardFS + +**1. Mark the chunkserver (or specific disk) for removal** + +Edit the chunkserver's disk configuration file (typically `/etc/lizardfs/mfshdd.cfg`) and prefix the drive path with an asterisk: + +``` +*/mnt/disk_to_remove +``` + +Restart the chunkserver process on the node + +```bash +systemctl stop lizardfs-chunkserver +systemctl start lizardfs-chunkserver +``` + +**3. Monitor the evacuation progress** + +The master will begin migrating chunks off the marked drive. You can monitor progress with: + +```bash +lizardfs-admin list-disks octopus01 9421 +lizardfs-admin list-disks octopus01 9421|grep 172.23.19.59 -A 7 +172.23.19.59:9422:/mnt/sdc/lizardfs_vol/ + to delete: yes + damaged: no + scanning: no + last error: no errors + total space: 3.6TiB + used space: 3.4TiB + chunks: 277k +``` + +Look for the disk showing evacuation status. The "to delete" chunks count should decrease over time as data is replicated elsewhere. + +You can also check the CGI web interface if you have it running—it shows disk status and chunk counts. + +**4. Remove the drive once empty** + +Once all chunks have been evacuated (the disk shows 0 chunks or is marked as empty), you can safely: + +1. Remove the line from `mfshdd.cfg` entirely +2. Reload the configuration again +3. Physically remove or repurpose the drive + +**Important notes:** +- Ensure you have enough free space on other disks to absorb the migrating chunks +- The evacuation time depends on the amount of data and network/disk speed +- Don't forcibly remove a drive before evacuation completes, or you risk data loss if replication goals aren't met diff --git a/topics/octopus/octopussy-needs-love.gmi b/topics/octopus/octopussy-needs-love.gmi index fc8e285..f3e64de 100644 --- a/topics/octopus/octopussy-needs-love.gmi +++ b/topics/octopus/octopussy-needs-love.gmi @@ -79,7 +79,8 @@ Sep 04 07:44:56 octopus02 mfschunkserver[22766]: can't create lock file /mnt/sdd UUID=277c05de-64f5-48a8-8614-8027a53be212 /mnt/sdd1 xfs rw,exec,nodev,noatime,nodiratime,largeio,inode64 0 1 ``` -we'll need to reboot the server to see what storage still may work. The slurm connection appears to be misconfigured: +Lizard also complains 4 SSDs have been wiped out. +We'll need to reboot the server to see what storage still may work. The slurm connection appears to be misconfigured: ``` [2025-12-20T09:36:27.846] error: service_connection: slurm_receive_msg: Insane message length @@ -94,3 +95,71 @@ Let's take a look at o3. This one has less RAM. Flavia is running some tools, bu => ../hpc/octopus/slurm-user-guide Alright, I depleted and removed slurm from o3. I think it would be wise to also deplete the lizard drives on that machine. + +The big users on lizard are: + +``` +1.6T dashbrook +1.8T pangenomes +2.1T erikg +3.4T aruni +3.4T junh +8.4T hchen +9.2T salehi +13T guarracino +16T flaviav +``` + +it seems we can clean some of that up! We have some backup storage that we can use. Alternatively move to ISAAC. + +We'll slowly start depleting the lizard. See also + +=> lizardfs/README + +o3 has 4 lizard drives. We'll start by depleting one. + +# O2 + +``` +172.23.22.159:9422:/mnt/sde1/lizardfs_vol/ + to delete: no + damaged: yes + scanning: no + last error: no errors + total space: 0B + used space: 0B + chunks: 0 +172.23.22.159:9422:/mnt/sdd1/lizardfs_vol/ + to delete: no + damaged: yes + scanning: no + last error: no errors + total space: 0B + used space: 0B + chunks: 0 +172.23.22.159:9422:/mnt/sdc1/lizardfs_vol/ + to delete: no + damaged: yes + scanning: no + last error: no errors + total space: 0B + used space: 0B + chunks: 0 +``` + +Stopped the chunk server. +sde remounted after xfs_repair. The others were not visible, so rebooted. The folloing storage should add to the total again: + +``` +/dev/sdc1 4.6T 3.9T 725G 85% /mnt/sdc1 +/dev/sdd1 4.6T 4.2T 428G 91% /mnt/sdd1 +/dev/sdf1 4.6T 4.2T 358G 93% /mnt/sdf1 +/dev/sde 3.7T 3.7T 4.0G 100% /mnt/sde +/dev/sdg1 3.7T 3.7T 3.9G 100% /mnt/sdg1 +``` + +After adding this storage and people removing material it starts to look better: + +``` +mfs#octopus01:9421 171T 83T 89T 49% /lizardfs +``` |
