blob: 87495188e9b56853facdbfd019905daff6e7e385 (
about) (
plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
|
# Machine room tasks
# Tags
* assigned: pjotrp
* priority: medium
* type: system administration
* keywords: system administration, octopus, gateway, tux02, tux01, tux03
# Tasks
## GN
* [ ] Trait vectors for Johannes
* [ ] !!Organize pluto, update Julia and add apps to GN menu Jupyter notebooks
* [ ] !!Xusheng jumpshiny services
* [ ] Slurm+ravanan on production for GEMMA speedup
* [ ] Embed R/qtl2 (Alex)
* [ ] Hoot in GN2 (Andrew)
* [ ] tux02 certbot failing (manual now)
* [ ] penguin2 has 32TB of space we can use on NFS/backups
## Octopus:
* [X] Fix Tux05 badblocks on /dev/sdb2 1050624 47925247 46874624 22.4G Linux filesystem
- see add-boot-partition
* [X] Copy linux partition on tux04, tux05, tux02
* [ ] !!Ceph on Tuxes
* [ ] Centralized user management system
* [ ] Monitor nodes
* [ ] Check machines so they talk with each other over fiber
## Backups & storage:
* [ ] Create and check backups of tux04 etc etc.
* [ ] set up zero to backup tux02 and report to redis
* [ ] reintroduce borg-borg on zero
* [+] run sheepdog as root: redis password error; introduce SHEEPDOG_CONF
* [ ] tux01 has unused 4TB spinning disk
* [ ] tux02 has unused 2x4TB spinning disks and 2TB nvme /dev/nvme0n1 on adapter
https://www.cyberciti.biz/faq/upgrade-update-samsung-ssd-firmware/
apt-get install fwupd fwupdate
fwupdmgr get-devices
fwupdmgr update
The previously problematic Samsung 980 Pro was basically using the 3B2QGXA7, and now Samsung has introduced a new 5B2QGXA7 firmware to fix the problem. The problem mainly affects the 2TB version of the 980 Pro
Security:
* [ ] Limit idrac access
## Spice
* [ ] Add firewall test to sheepdog
## Done
* [X] describe machines with Rick Stripes
* [X] get bacchus back on line
* [X] fix www.genenetwork.org and gn2.genenetwork.org https
* [-] get data from summer211.uthsc.edu (access machine room)
* [X] VPN access and FoUT
* [X] lambda: get fiber working
* [X] lambda: add to Octopus HPC
* [X] lambda: racked up and runs
* [X] lambda: add network (Roger)
link/ether 7c:c2:55:11:9c:ac brd ff:ff:ff:ff:ff:ff
inet 172.23.18.212/21 brd 172.23.23.255 scope global dynamic eno1
* [X] lambda: get service tag Tamara (with Erik?)
* [X] lambda: install ubuntu (with Erik)
* [X] Order storage and caddies (w. Tamara)
* [X] Spice: Firewall out of band
* [X] Spice: Add storage
* [X] Tux01 and Tux02 disk space issues
* [X] Reinstate backup drops on tux02, rabbit, &space and &epysode; reduce incoming IP
* [X] Pluto tool with Zach & Efraim
* [X] Order drives and caddies tux01 & tux02 (with @haoc)
* [X] Introduce &disk space and mdstat monitor
* [X] Machine room HDDs
* [X] decommission/surplus out-racked machines (whith @arthurc)
+ see also ../issues/systems/decommission-machines.gmi
* [X] Install tux04-tux09
* [X] tux04 and tux05 give errors
* [X] use fiber optics for subnet Octopus and Tuxes
* [X] Octopus11 has no fiber
* [X] tux06 has temp fiber
* [X] tux07 has no fiber
* [X] tux08 has no fiber
* [X] tux09 has no fiber
### Lambda
* [X] remote access? (with Erik)
* [X] get BMC password
* [X] space server out-of-band access
### Spice
* [X] Run GN off balg01
* [X] Convert balg02 to Guix server
|