topics/octopus/lizardfs/lizard-maintenance.gmi


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288

# Lizard maintenance

On the octopus cluster the lizardfs head node is on octopus01, with disks being added mainly from the other nodes. SSDs are added to the lizardfs-chunkserver.service systemd service and SDDs added to the lizardfs-chunkserver-hdd.service. The storage pool is available on all nodes at /lizardfs, with the default storage option of "slow", which corresponds to two copies of the data, both on SDDs.

## Interacting with lizardfs

It is possible to query the server for all the available goals:

```
$ lizardfs-admin list-goals octopus01 9421

Goal definitions:
Id      Name    Definition
1       1_copy  1_copy: $std _
2       2_copy  2_copy: $std {_ _}
...
19      slow    slow: $std {HDD HDD}
20      fast    fast: $std {SSD SSD}
21      2ssd    2ssd: $std {SSD SSD}
...
```

To change the replication level:

```
$ lizardfs setgoal slow /lizardfs/efraimf -r

/lizardfs/efraimf/:
 inodes with goal changed:               2
 inodes with goal not changed:           0
 inodes with permission denied:          0
```

And to see the replication level:

```
$ lizardfs getgoal /lizardfs/efraimf/

/lizardfs/efraimf/: slow

$ lizardfs getgoal /lizardfs/efraimf/ -r

/lizardfs/efraimf/:
 files with goal        slow :          1
 directories with goal  slow :          1
```

## Checking the health of the pool

There are a couple of commands which can be used to check on the health of the pool. They all take the syntax of `lizardfs-admin <command> <head-node> <port>`.

To find out the overall health of the data on the pool:

```
$ lizardfs-admin chunks-health octopus01 9421

Chunks availability state:
        Goal    Safe    Unsafe  Lost
        slow    202726  26005   2073
        fast    43397   1085    -
        2ssd    7984    -       -

Chunks replication state:
        Goal    0       1       2       3       4       5       6       7       8       9       10+
        slow    95      1870    228839  -       -       -       -       -       -       -       -
        fast    17253   2317    24912   -       -       -       -       -       -       -       -
        2ssd    7984    -       -       -       -       -       -       -       -       -       -

Chunks deletion state:
        Goal    0       1       2       3       4       5       6       7       8       9       10+
        slow    68      15      2081    27598   201022  20      -       -       -       -       -
        fast    12603   720     1880    5377    23902   -       -       -       -       -       -
        2ssd    7984    -       -       -       -       -       -       -       -       -       -
```

<<<<<<< HEAD
This table essentially says that slow and fast are replicating data (if they are in column 0 it is OK!). This looks good for fast:

```
Chunks replication state:
        Goal    0       1       2       3       4       5       6       7       8       9       10+
        slow    -       137461  448977  -       -       -       -       -       -       -       -
        fast    6133152 -       5       -       -       -       -       -       -       -       -
```
This table essentially says that slow and fast are replicating data (if they are in column 0 it is OK!).

To query how the individual disks are filling up and if there are any errors:

List all disks

```
lizardfs-admin list-disks octopus01 9421 | less
```

Other commands can be found with `man lizardfs-admin`.

## Info

```
lizardfs-admin info octopus01 9421
LizardFS v3.12.0
Memory usage:   2.5GiB23

Total space:    250TiB                                                                                                 Available space:        10TiB
Trash space:    510GiB
Trash files:    188
Reserved space: 21GiB                                                                                                  Reserved files: 18
FS objects:     7369883
Directories:    378782
Files:  6858803
Chunks: 9100088
Chunk copies:   20017964
Regular copies (deprecated):    20017964
```

```
lizardfs-admin chunks-health  octopus01 9421
Chunks availability state:
        Goal    Safe    Unsafe  Lost
        slow    1323220 1       -
        fast    6398524 -       5

Chunks replication state:
        Goal    0       1       2       3       4       5       6       7       8       9       10+
        slow    -       218663  1104558 -       -       -       -       -       -       -       -
        fast    6398524 -       5       -       -       -       -       -       -       -       -

Chunks deletion state:
        Goal    0       1       2       3       4       5       6       7       8       9       10+
        slow    -       104855  554911  203583  76228   39425   19348   8659    3276    20077   292859
        fast    6380439 18060   30      -       -       -       -       -       -       -       -
```

## Deleted files

Lizardfs also keeps deleted files, by default for 30 days in `/mnt/lizardfs-meta/trash`. If you need to recover deleted files (or delete them permanently) then the metadata directory can be mounted with:

```
$ mfsmount /path/to/unused/mount -o mfsmeta
```

For more information see the lizardfs documentation online
=> https://lizardfs-docs.readthedocs.io/en/latest/adminguide/advanced_configuration.html#trash-directory lizardfs documentation for the trash directory

## Start lizardfs-mount (lizardfs reader daemon) after a system reboot

```
sudo bash
systemctl daemon-reload
systemctl restart lizardfs-mount
systemctl status lizardfs-mount
```

## Gotchas

It should be noted that any goal using erasure_coding is incredibly slow to write to, and defining goals like this should be avoided. Although it does decrease the amount of space each file takes up in the pool, the trade-off when it is mistakenly used for data or folders which will be written to outweighs the benefits.

"speeding up" replication or resilvering of the data can be done in /etc/lizardfs/mfsmaster.cfg. Uncomment the following lines to increase their effect 10-fold from their defaults:

```
# CHUNKS_SOFT_DEL_LIMIT = 100
# CHUNKS_HARD_DEL_LIMIT = 250
# CHUNKS_WRITE_REP_LIMIT = 20
# CHUNKS_READ_REP_LIMIT = 100
```

followed by either restarting the lizardfs-master.service or by running (probably as root on octopus01):

```
lizardfs-admin reload-config octopus01 9421
```

It has not yet been tested to see how much this affects reading and writing to the HDDs or SSDs while this change is in effect.

# Adding a node to the pool

We can add a mount point using mfsmount using systemd

```
[Unit]
Description=LizardFS mounts
After=syslog.target network.target

[Service]
Type=forking
TimeoutSec=600
ExecStart=/usr/local/guix-profiles/octo/bin/mfsmount -c /etc/lizardfs/mfsmount.cfg
ExecStop=/usr/bin/umount /lizardfs

[Install]
WantedBy=multi-user.target
```

note it runs as the root user.

It is a good idea to also run a chunk server on the node, so it effectively can cache information locally. For this we create a lizard account:

```
addgroup -gid 600 lizardfs
adduser -uid 600 -gid 600 lizardfs
```

In password file

```
lizardfs:x:600:600:Lizard,,,:/var/lib/lizardfs:/bin/sbin/nologin
```

Now we can run

```
/usr/local/guix-profiles/octo/sbin/mfschunkserver -c /etc/lizardfs/mfschunkserver_hdd.cfg -d start
```

and set up systemd with something like

```
[Unit]
Description=LizardFS chunkserver daemon
Documentation=man:mfschunkserver
After=local-fs.target network.target lizardfs-master.service
Wants=local-fs.target network-online.target

[Service]
Type=notify
ExecStart=/usr/local/guix-profiles/octo/sbin/mfschunkserver -c /etc/lizardfs/mfschunkserver_hdd.cfg -d start
ExecReload=/bin/kill -HUP $MAINPID
Restart=on-abort
OOMScoreAdjust=-999
IOAccounting=true
IOWeight=250
StartupIOWeight=100
KeyringMode=inherit

[Install]
WantedBy=multi-user.target
```

# To deplete and remove a drive in LizardFS

**1. Mark the chunkserver (or specific disk) for removal**

Edit the chunkserver's disk configuration file (typically `/etc/lizardfs/mfshdd.cfg`) and prefix the drive path with an asterisk:

```
*/mnt/disk_to_remove
```

Restart the chunkserver process on the node

```bash
systemctl stop lizardfs-chunkserver
systemctl start lizardfs-chunkserver
```

**3. Monitor the evacuation progress**

The master will begin migrating chunks off the marked drive. You can monitor progress with:

```bash
lizardfs-admin list-disks octopus01 9421
lizardfs-admin list-disks octopus01 9421|grep 172.23.19.59 -A 7
172.23.19.59:9422:/mnt/sdc/lizardfs_vol/
        to delete: yes
        damaged: no
        scanning: no
        last error: no errors
        total space: 3.6TiB
        used space: 3.4TiB
        chunks: 277k
```

Look for the disk showing evacuation status. The "to delete" chunks count should decrease over time as data is replicated elsewhere.

You can also check the CGI web interface if you have it running—it shows disk status and chunk counts.

**4. Remove the drive once empty**

Once all chunks have been evacuated (the disk shows 0 chunks or is marked as empty), you can safely:

1. Remove the line from `mfshdd.cfg` entirely
2. Reload the configuration again
3. Physically remove or repurpose the drive

**Important notes:**
- Ensure you have enough free space on other disks to absorb the migrating chunks
- The evacuation time depends on the amount of data and network/disk speed
- Don't forcibly remove a drive before evacuation completes, or you risk data loss if replication goals aren't met