summaryrefslogtreecommitdiff
path: root/topics/systems/hpc
diff options
context:
space:
mode:
authorPjotr Prins2023-02-13 15:29:47 -0600
committerPjotr Prins2023-02-13 15:29:53 -0600
commit78ea484a951894924fde1f37359c12b4f4179416 (patch)
treecf5800afaaf530cabb5f8c0cbcc0c4a406b9cd8b /topics/systems/hpc
parent3a23d9cf6b922a6bcb49ff1c904623a28ee817f8 (diff)
downloadgn-gemtext-78ea484a951894924fde1f37359c12b4f4179416.tar.gz
Work on octopus
Diffstat (limited to 'topics/systems/hpc')
-rw-r--r--topics/systems/hpc/octopus-maintenance.gmi36
1 files changed, 36 insertions, 0 deletions
diff --git a/topics/systems/hpc/octopus-maintenance.gmi b/topics/systems/hpc/octopus-maintenance.gmi
new file mode 100644
index 0000000..6f44433
--- /dev/null
+++ b/topics/systems/hpc/octopus-maintenance.gmi
@@ -0,0 +1,36 @@
+# Octopus Maintenance
+
+## Slurm
+
+Status of slurm
+
+```
+sinfo
+sinfo -R
+squeue
+```
+
+we have draining nodes, but no jobs running on them
+
+Reviving draining node (as root)
+
+```
+scontrol
+ update NodeName=octopus05 State=DOWN Reason="undraining"
+ update NodeName=octopus05 State=RESUME
+ show node octopus05
+```
+
+Kill time can lead to drain state
+
+```
+scontrol show config | grep kill
+UnkillableStepProgram = (null)
+UnkillableStepTimeout = 60 sec
+```
+
+check valid configuration with `slurmd -C` and update nodes with
+
+```
+scontrol reconfigure
+```