diff options
author | Pjotr Prins | 2023-02-13 15:29:47 -0600 |
---|---|---|
committer | Pjotr Prins | 2023-02-13 15:29:53 -0600 |
commit | 78ea484a951894924fde1f37359c12b4f4179416 (patch) | |
tree | cf5800afaaf530cabb5f8c0cbcc0c4a406b9cd8b | |
parent | 3a23d9cf6b922a6bcb49ff1c904623a28ee817f8 (diff) | |
download | gn-gemtext-78ea484a951894924fde1f37359c12b4f4179416.tar.gz |
Work on octopus
-rw-r--r-- | tasks/andreag.gmi | 22 | ||||
-rw-r--r-- | topics/systems/hpc/octopus-maintenance.gmi | 36 |
2 files changed, 58 insertions, 0 deletions
diff --git a/tasks/andreag.gmi b/tasks/andreag.gmi new file mode 100644 index 0000000..6132b56 --- /dev/null +++ b/tasks/andreag.gmi @@ -0,0 +1,22 @@ +# Andrea tasks + +## Tags + +* kanban: andreag +* assigned: andreag +* status: in progress + +## Notes + +=> https://issues.genenetwork.org + +## Tasks + +### Meta-tasks + + +* [ ] Pjotr should give root access on all nodes +* [ ] Move /gnu to new partition on Oct01 and update nfs /etc/exports +* [ ] /dev/sdc1 is giving errors on Oct03 (XFS) + - Disk /dev/sdc: 3.7 TiB, Disk model: Samsung SSD 870 +* [ ] visit all lizardfs drives, remove USB (see /etc/lizardfs; fdisk -l) diff --git a/topics/systems/hpc/octopus-maintenance.gmi b/topics/systems/hpc/octopus-maintenance.gmi new file mode 100644 index 0000000..6f44433 --- /dev/null +++ b/topics/systems/hpc/octopus-maintenance.gmi @@ -0,0 +1,36 @@ +# Octopus Maintenance + +## Slurm + +Status of slurm + +``` +sinfo +sinfo -R +squeue +``` + +we have draining nodes, but no jobs running on them + +Reviving draining node (as root) + +``` +scontrol + update NodeName=octopus05 State=DOWN Reason="undraining" + update NodeName=octopus05 State=RESUME + show node octopus05 +``` + +Kill time can lead to drain state + +``` +scontrol show config | grep kill +UnkillableStepProgram = (null) +UnkillableStepTimeout = 60 sec +``` + +check valid configuration with `slurmd -C` and update nodes with + +``` +scontrol reconfigure +``` |