From f854d917fecdbd9d85c50dc92bc2331babe2646d Mon Sep 17 00:00:00 2001 From: Pjotr Prins Date: Tue, 23 Dec 2025 09:07:51 +0100 Subject: rename --- topics/octopus/lizardfs/README.gmi | 276 ------------------------- topics/octopus/lizardfs/lizard-maintenance.gmi | 276 +++++++++++++++++++++++++ 2 files changed, 276 insertions(+), 276 deletions(-) delete mode 100644 topics/octopus/lizardfs/README.gmi create mode 100644 topics/octopus/lizardfs/lizard-maintenance.gmi diff --git a/topics/octopus/lizardfs/README.gmi b/topics/octopus/lizardfs/README.gmi deleted file mode 100644 index b52c03e..0000000 --- a/topics/octopus/lizardfs/README.gmi +++ /dev/null @@ -1,276 +0,0 @@ -# Information about lizardfs, and some usage suggestions - -On the octopus cluster the lizardfs head node is on octopus01, with disks being added mainly from the other nodes. SSDs are added to the lizardfs-chunkserver.service systemd service and SDDs added to the lizardfs-chunkserver-hdd.service. The storage pool is available on all nodes at /lizardfs, with the default storage option of "slow", which corresponds to two copies of the data, both on SDDs. - -## Interacting with lizardfs - -It is possible to query the server for all the available goals: - -``` -$ lizardfs-admin list-goals octopus01 9421 - -Goal definitions: -Id Name Definition -1 1_copy 1_copy: $std _ -2 2_copy 2_copy: $std {_ _} -... -19 slow slow: $std {HDD HDD} -20 fast fast: $std {SSD SSD} -21 2ssd 2ssd: $std {SSD SSD} -... -``` - -To change the replication level: - -``` -$ lizardfs setgoal slow /lizardfs/efraimf -r - -/lizardfs/efraimf/: - inodes with goal changed: 2 - inodes with goal not changed: 0 - inodes with permission denied: 0 -``` - -And to see the replication level: - -``` -$ lizardfs getgoal /lizardfs/efraimf/ - -/lizardfs/efraimf/: slow - -$ lizardfs getgoal /lizardfs/efraimf/ -r - -/lizardfs/efraimf/: - files with goal slow : 1 - directories with goal slow : 1 -``` - -## Checking the health of the pool - -There are a couple of commands which can be used to check on the health of the pool. They all take the syntax of `lizardfs-admin `. - -To find out the overall health of the data on the pool: - -``` -$ lizardfs-admin chunks-health octopus01 9421 - -Chunks availability state: - Goal Safe Unsafe Lost - slow 202726 26005 2073 - fast 43397 1085 - - 2ssd 7984 - - - -Chunks replication state: - Goal 0 1 2 3 4 5 6 7 8 9 10+ - slow 95 1870 228839 - - - - - - - - - fast 17253 2317 24912 - - - - - - - - - 2ssd 7984 - - - - - - - - - - - -Chunks deletion state: - Goal 0 1 2 3 4 5 6 7 8 9 10+ - slow 68 15 2081 27598 201022 20 - - - - - - fast 12603 720 1880 5377 23902 - - - - - - - 2ssd 7984 - - - - - - - - - - -``` - -To query how the individual disks are filling up and if there are any errors: - -List all disks - -``` -lizardfs-admin list-disks octopus01 9421 | less -``` - -Other commands can be found with `man lizardfs-admin`. - -## Info - -``` -lizardfs-admin info octopus01 9421 -LizardFS v3.12.0 -Memory usage: 2.5GiB -Total space: 250TiB Available space: 10TiB -Trash space: 510GiB -Trash files: 188 -Reserved space: 21GiB Reserved files: 18 -FS objects: 7369883 -Directories: 378782 -Files: 6858803 -Chunks: 9100088 -Chunk copies: 20017964 -Regular copies (deprecated): 20017964 -``` - -``` -lizardfs-admin chunks-health octopus01 9421 -Chunks availability state: - Goal Safe Unsafe Lost - slow 1323220 1 - - fast 6398524 - 5 - -Chunks replication state: - Goal 0 1 2 3 4 5 6 7 8 9 10+ - slow - 218663 1104558 - - - - - - - - - fast 6398524 - 5 - - - - - - - - - -Chunks deletion state: - Goal 0 1 2 3 4 5 6 7 8 9 10+ - slow - 104855 554911 203583 76228 39425 19348 8659 3276 20077 292859 - fast 6380439 18060 30 - - - - - - - - -``` - -## Deleted files - -Lizardfs also keeps deleted files, by default for 30 days in `/mnt/lizardfs-meta/trash`. If you need to recover deleted files (or delete them permanently) then the metadata directory can be mounted with: - -``` -$ mfsmount /path/to/unused/mount -o mfsmeta -``` - -For more information see the lizardfs documentation online -=> https://lizardfs-docs.readthedocs.io/en/latest/adminguide/advanced_configuration.html#trash-directory lizardfs documentation for the trash directory - -## Start lizardfs-mount (lizardfs reader daemon) after a system reboot - -``` -sudo bash -systemctl daemon-reload -systemctl restart lizardfs-mount -systemctl status lizardfs-mount -``` - -## Gotchas - -It should be noted that any goal using erasure_coding is incredibly slow to write to, and defining goals like this should be avoided. Although it does decrease the amount of space each file takes up in the pool, the trade-off when it is mistakenly used for data or folders which will be written to outweighs the benefits. - -"speeding up" replication or resilvering of the data can be done in /etc/lizardfs/mfsmaster.cfg. Uncomment the following lines to increase their effect 10-fold from their defaults: - -``` -# CHUNKS_SOFT_DEL_LIMIT = 100 -# CHUNKS_HARD_DEL_LIMIT = 250 -# CHUNKS_WRITE_REP_LIMIT = 20 -# CHUNKS_READ_REP_LIMIT = 100 -``` - -followed by either restarting the lizardfs-master.service or by running (probably as root on octopus01): - -``` -lizardfs-admin reload-config octopus01 9421 -``` - -It has not yet been tested to see how much this affects reading and writing to the HDDs or SSDs while this change is in effect. - -# Adding a node to the pool - -We can add a mount point using mfsmount using systemd - -``` -[Unit] -Description=LizardFS mounts -After=syslog.target network.target - -[Service] -Type=forking -TimeoutSec=600 -ExecStart=/usr/local/guix-profiles/octo/bin/mfsmount -c /etc/lizardfs/mfsmount.cfg -ExecStop=/usr/bin/umount /lizardfs - -[Install] -WantedBy=multi-user.target -``` - -note it runs as the root user. - -It is a good idea to also run a chunk server on the node, so it effectively can cache information locally. For this we create a lizard account: - -``` -addgroup -gid 600 lizardfs -adduser -uid 600 -gid 600 lizardfs -``` - -In password file - -``` -lizardfs:x:600:600:Lizard,,,:/var/lib/lizardfs:/bin/sbin/nologin -``` - -Now we can run - -``` -/usr/local/guix-profiles/octo/sbin/mfschunkserver -c /etc/lizardfs/mfschunkserver_hdd.cfg -d start -``` - -and set up systemd with something like - -``` -[Unit] -Description=LizardFS chunkserver daemon -Documentation=man:mfschunkserver -After=local-fs.target network.target lizardfs-master.service -Wants=local-fs.target network-online.target - -[Service] -Type=notify -ExecStart=/usr/local/guix-profiles/octo/sbin/mfschunkserver -c /etc/lizardfs/mfschunkserver_hdd.cfg -d start -ExecReload=/bin/kill -HUP $MAINPID -Restart=on-abort -OOMScoreAdjust=-999 -IOAccounting=true -IOWeight=250 -StartupIOWeight=100 -KeyringMode=inherit - -[Install] -WantedBy=multi-user.target -``` - -# To deplete and remove a drive in LizardFS - -**1. Mark the chunkserver (or specific disk) for removal** - -Edit the chunkserver's disk configuration file (typically `/etc/lizardfs/mfshdd.cfg`) and prefix the drive path with an asterisk: - -``` -*/mnt/disk_to_remove -``` - -Restart the chunkserver process on the node - -```bash -systemctl stop lizardfs-chunkserver -systemctl start lizardfs-chunkserver -``` - -**3. Monitor the evacuation progress** - -The master will begin migrating chunks off the marked drive. You can monitor progress with: - -```bash -lizardfs-admin list-disks octopus01 9421 -lizardfs-admin list-disks octopus01 9421|grep 172.23.19.59 -A 7 -172.23.19.59:9422:/mnt/sdc/lizardfs_vol/ - to delete: yes - damaged: no - scanning: no - last error: no errors - total space: 3.6TiB - used space: 3.4TiB - chunks: 277k -``` - -Look for the disk showing evacuation status. The "to delete" chunks count should decrease over time as data is replicated elsewhere. - -You can also check the CGI web interface if you have it running—it shows disk status and chunk counts. - -**4. Remove the drive once empty** - -Once all chunks have been evacuated (the disk shows 0 chunks or is marked as empty), you can safely: - -1. Remove the line from `mfshdd.cfg` entirely -2. Reload the configuration again -3. Physically remove or repurpose the drive - -**Important notes:** -- Ensure you have enough free space on other disks to absorb the migrating chunks -- The evacuation time depends on the amount of data and network/disk speed -- Don't forcibly remove a drive before evacuation completes, or you risk data loss if replication goals aren't met diff --git a/topics/octopus/lizardfs/lizard-maintenance.gmi b/topics/octopus/lizardfs/lizard-maintenance.gmi new file mode 100644 index 0000000..b52c03e --- /dev/null +++ b/topics/octopus/lizardfs/lizard-maintenance.gmi @@ -0,0 +1,276 @@ +# Information about lizardfs, and some usage suggestions + +On the octopus cluster the lizardfs head node is on octopus01, with disks being added mainly from the other nodes. SSDs are added to the lizardfs-chunkserver.service systemd service and SDDs added to the lizardfs-chunkserver-hdd.service. The storage pool is available on all nodes at /lizardfs, with the default storage option of "slow", which corresponds to two copies of the data, both on SDDs. + +## Interacting with lizardfs + +It is possible to query the server for all the available goals: + +``` +$ lizardfs-admin list-goals octopus01 9421 + +Goal definitions: +Id Name Definition +1 1_copy 1_copy: $std _ +2 2_copy 2_copy: $std {_ _} +... +19 slow slow: $std {HDD HDD} +20 fast fast: $std {SSD SSD} +21 2ssd 2ssd: $std {SSD SSD} +... +``` + +To change the replication level: + +``` +$ lizardfs setgoal slow /lizardfs/efraimf -r + +/lizardfs/efraimf/: + inodes with goal changed: 2 + inodes with goal not changed: 0 + inodes with permission denied: 0 +``` + +And to see the replication level: + +``` +$ lizardfs getgoal /lizardfs/efraimf/ + +/lizardfs/efraimf/: slow + +$ lizardfs getgoal /lizardfs/efraimf/ -r + +/lizardfs/efraimf/: + files with goal slow : 1 + directories with goal slow : 1 +``` + +## Checking the health of the pool + +There are a couple of commands which can be used to check on the health of the pool. They all take the syntax of `lizardfs-admin `. + +To find out the overall health of the data on the pool: + +``` +$ lizardfs-admin chunks-health octopus01 9421 + +Chunks availability state: + Goal Safe Unsafe Lost + slow 202726 26005 2073 + fast 43397 1085 - + 2ssd 7984 - - + +Chunks replication state: + Goal 0 1 2 3 4 5 6 7 8 9 10+ + slow 95 1870 228839 - - - - - - - - + fast 17253 2317 24912 - - - - - - - - + 2ssd 7984 - - - - - - - - - - + +Chunks deletion state: + Goal 0 1 2 3 4 5 6 7 8 9 10+ + slow 68 15 2081 27598 201022 20 - - - - - + fast 12603 720 1880 5377 23902 - - - - - - + 2ssd 7984 - - - - - - - - - - +``` + +To query how the individual disks are filling up and if there are any errors: + +List all disks + +``` +lizardfs-admin list-disks octopus01 9421 | less +``` + +Other commands can be found with `man lizardfs-admin`. + +## Info + +``` +lizardfs-admin info octopus01 9421 +LizardFS v3.12.0 +Memory usage: 2.5GiB +Total space: 250TiB Available space: 10TiB +Trash space: 510GiB +Trash files: 188 +Reserved space: 21GiB Reserved files: 18 +FS objects: 7369883 +Directories: 378782 +Files: 6858803 +Chunks: 9100088 +Chunk copies: 20017964 +Regular copies (deprecated): 20017964 +``` + +``` +lizardfs-admin chunks-health octopus01 9421 +Chunks availability state: + Goal Safe Unsafe Lost + slow 1323220 1 - + fast 6398524 - 5 + +Chunks replication state: + Goal 0 1 2 3 4 5 6 7 8 9 10+ + slow - 218663 1104558 - - - - - - - - + fast 6398524 - 5 - - - - - - - - + +Chunks deletion state: + Goal 0 1 2 3 4 5 6 7 8 9 10+ + slow - 104855 554911 203583 76228 39425 19348 8659 3276 20077 292859 + fast 6380439 18060 30 - - - - - - - - +``` + +## Deleted files + +Lizardfs also keeps deleted files, by default for 30 days in `/mnt/lizardfs-meta/trash`. If you need to recover deleted files (or delete them permanently) then the metadata directory can be mounted with: + +``` +$ mfsmount /path/to/unused/mount -o mfsmeta +``` + +For more information see the lizardfs documentation online +=> https://lizardfs-docs.readthedocs.io/en/latest/adminguide/advanced_configuration.html#trash-directory lizardfs documentation for the trash directory + +## Start lizardfs-mount (lizardfs reader daemon) after a system reboot + +``` +sudo bash +systemctl daemon-reload +systemctl restart lizardfs-mount +systemctl status lizardfs-mount +``` + +## Gotchas + +It should be noted that any goal using erasure_coding is incredibly slow to write to, and defining goals like this should be avoided. Although it does decrease the amount of space each file takes up in the pool, the trade-off when it is mistakenly used for data or folders which will be written to outweighs the benefits. + +"speeding up" replication or resilvering of the data can be done in /etc/lizardfs/mfsmaster.cfg. Uncomment the following lines to increase their effect 10-fold from their defaults: + +``` +# CHUNKS_SOFT_DEL_LIMIT = 100 +# CHUNKS_HARD_DEL_LIMIT = 250 +# CHUNKS_WRITE_REP_LIMIT = 20 +# CHUNKS_READ_REP_LIMIT = 100 +``` + +followed by either restarting the lizardfs-master.service or by running (probably as root on octopus01): + +``` +lizardfs-admin reload-config octopus01 9421 +``` + +It has not yet been tested to see how much this affects reading and writing to the HDDs or SSDs while this change is in effect. + +# Adding a node to the pool + +We can add a mount point using mfsmount using systemd + +``` +[Unit] +Description=LizardFS mounts +After=syslog.target network.target + +[Service] +Type=forking +TimeoutSec=600 +ExecStart=/usr/local/guix-profiles/octo/bin/mfsmount -c /etc/lizardfs/mfsmount.cfg +ExecStop=/usr/bin/umount /lizardfs + +[Install] +WantedBy=multi-user.target +``` + +note it runs as the root user. + +It is a good idea to also run a chunk server on the node, so it effectively can cache information locally. For this we create a lizard account: + +``` +addgroup -gid 600 lizardfs +adduser -uid 600 -gid 600 lizardfs +``` + +In password file + +``` +lizardfs:x:600:600:Lizard,,,:/var/lib/lizardfs:/bin/sbin/nologin +``` + +Now we can run + +``` +/usr/local/guix-profiles/octo/sbin/mfschunkserver -c /etc/lizardfs/mfschunkserver_hdd.cfg -d start +``` + +and set up systemd with something like + +``` +[Unit] +Description=LizardFS chunkserver daemon +Documentation=man:mfschunkserver +After=local-fs.target network.target lizardfs-master.service +Wants=local-fs.target network-online.target + +[Service] +Type=notify +ExecStart=/usr/local/guix-profiles/octo/sbin/mfschunkserver -c /etc/lizardfs/mfschunkserver_hdd.cfg -d start +ExecReload=/bin/kill -HUP $MAINPID +Restart=on-abort +OOMScoreAdjust=-999 +IOAccounting=true +IOWeight=250 +StartupIOWeight=100 +KeyringMode=inherit + +[Install] +WantedBy=multi-user.target +``` + +# To deplete and remove a drive in LizardFS + +**1. Mark the chunkserver (or specific disk) for removal** + +Edit the chunkserver's disk configuration file (typically `/etc/lizardfs/mfshdd.cfg`) and prefix the drive path with an asterisk: + +``` +*/mnt/disk_to_remove +``` + +Restart the chunkserver process on the node + +```bash +systemctl stop lizardfs-chunkserver +systemctl start lizardfs-chunkserver +``` + +**3. Monitor the evacuation progress** + +The master will begin migrating chunks off the marked drive. You can monitor progress with: + +```bash +lizardfs-admin list-disks octopus01 9421 +lizardfs-admin list-disks octopus01 9421|grep 172.23.19.59 -A 7 +172.23.19.59:9422:/mnt/sdc/lizardfs_vol/ + to delete: yes + damaged: no + scanning: no + last error: no errors + total space: 3.6TiB + used space: 3.4TiB + chunks: 277k +``` + +Look for the disk showing evacuation status. The "to delete" chunks count should decrease over time as data is replicated elsewhere. + +You can also check the CGI web interface if you have it running—it shows disk status and chunk counts. + +**4. Remove the drive once empty** + +Once all chunks have been evacuated (the disk shows 0 chunks or is marked as empty), you can safely: + +1. Remove the line from `mfshdd.cfg` entirely +2. Reload the configuration again +3. Physically remove or repurpose the drive + +**Important notes:** +- Ensure you have enough free space on other disks to absorb the migrating chunks +- The evacuation time depends on the amount of data and network/disk speed +- Don't forcibly remove a drive before evacuation completes, or you risk data loss if replication goals aren't met -- cgit 1.4.1