1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
|
# Machine room tasks
# Tags
* assigned: pjotrp, fredm
* priority: medium
* type: system administration
* keywords: system administration, octopus, gateway, tuxes
# Tasks
## Short term
* [ ] Tux02 and tux03 need cables to wire up 2nd PSUs
* [ ] Octopi have 1 PSU only
* [ ] Order drives. We have some caddies.
* - [ ] add drives on 'tux03', then we'll need an additional 5 DXD9H caddies and 4 M.2-to-SATA adapters
* - [ ] ample room on tux01 - can take old SSDs
* - [ ] room on tux05-9 - each can hold 8 nvme - we have 20 slots
* - [ ] octopus slots are full. There are some 10+ spinning disks and 2x smaller 1TB nvmes we can replace.
* [ ] repurpose samsung drives with heatsink (@fredm)
* [X] map out physical network layout in MR (@fredm)
* [+] get disks from chenbro and rehost (@fredm)
* [ ] work on speeding up moosefs (@pjotrp)
* - [ ] test network interfaces and switches (@pjotrp)
* [ ] add storage server with spinning disks
* - [X] put order in
* - [+] install machine
* [+] migrate GN production to tux04 (@fredm)
* - [+] first remove Samsung PCIe (@fredm)
* - [X] add RAID (@pjotrp)
* [ ] remove P2 data flavia, hao, davida to make space for moose
## GN
* [ ] secure RDF guile server(s)
* [X] penguin2 has 90TB of space we can use on NFS/backups - they went on moosefs
* [ ] Replace reaper with GEMMA
* [X] Transfer nervenet.org to dnsimple
* [X] Trait vectors for Johannes
* [X] grub on tux04
* [X] nft on tux04
* [X] !!Xusheng jumpshiny services
* [ ] Fix apps and create system containers for herd services - see issues/systems/apps
* [ ] Slurm+ravanan on production for GEMMA speedup
* [ ] Embed R/qtl2 (Alex)
* [ ] Hoot in GN2 (Andrew)
* [ ] tux02 certbot failing (manual now)
## Octopus:
* [X] Fix Tux05 badblocks on /dev/sdb2 1050624 47925247 46874624 22.4G Linux filesystem
- see add-boot-partition
* [X] Copy linux partition on tux04, tux05, tux02 and test reboot
* [X] moosefs on Tuxes
* [X] Boot and install over network
* [ ] Add synology to moosefs
* [ ] Centralized user management system (no longer needed with distributed nodes?)
* [ ] Monitor nodes
* [ ] Check machines so they talk with each other over fiber
## Backups & storage:
* [ ] Create and check backups of tux04 etc etc.
* [ ] set up zero to backup tux02 and report to redis
* [ ] reintroduce borg-borg on zero
* [+] run sheepdog as root: redis password error; introduce SHEEPDOG_CONF
* [ ] tux01 has unused 4TB spinning disk
* [ ] tux02 has unused 2x4TB spinning disks and 2TB nvme /dev/nvme0n1 on adapter
https://www.cyberciti.biz/faq/upgrade-update-samsung-ssd-firmware/
apt-get install fwupd fwupdate
fwupdmgr get-devices
fwupdmgr update
The previously problematic Samsung 980 Pro was basically using the 3B2QGXA7, and now Samsung has introduced a new 5B2QGXA7 firmware to fix the problem. The problem mainly affects the 2TB version of the 980 Pro
Security:
* [ ] Limit idrac access
## Spice
* [ ] Add 2nd boot partition on balg01
* [ ] Add firewall test to sheepdog
## Maybe
* [-] !!Organize pluto, update Julia and add apps to GN menu Jupyter notebooks
* [-] get data from summer211 (access machine room)
## Done
* [X] add SSD to tux03 (@fredm)
* [X] describe machines with Rick Stripes
* [X] get bacchus back on line
* [X] fix www.genenetwork.org and gn2.genenetwork.org https
* [X] VPN access and FoUT
* [X] lambda: get fiber working
* [X] lambda: add to Octopus HPC
* [X] lambda: racked up and runs
* [X] lambda: add network (Roger)
link/ether 7c:c2:55:11:9c:ac brd ff:ff:ff:ff:ff:ff
inet 172.23.18.212/21 brd 172.23.23.255 scope global dynamic eno1
* [X] lambda: get service tag Tamara (with Erik?)
* [X] lambda: install ubuntu (with Erik)
* [X] Order storage and caddies (w. Tamara)
* [X] Spice: Firewall out of band
* [X] Spice: Add storage
* [X] Tux01 and Tux02 disk space issues
* [X] Reinstate backup drops on tux02, rabbit, &space and &epysode; reduce incoming IP
* [X] Pluto tool with Zach & Efraim
* [X] Order drives and caddies tux01 & tux02 (with @haoc)
* [X] Introduce &disk space and mdstat monitor
* [X] Machine room HDDs
* [X] decommission/surplus out-racked machines (whith @arthurc)
+ see also ../issues/systems/decommission-machines.gmi
* [X] Install tux04-tux09
* [X] tux04 and tux05 give errors
* [X] use fiber optics for subnet Octopus and Tuxes
* [X] Octopus11 has no fiber
* [X] tux06 has temp fiber
* [X] tux07 has no fiber
* [X] tux08 has no fiber
* [X] tux09 has no fiber
### Lambda
* [X] remote access? (with Erik)
* [X] get BMC password
* [X] space server out-of-band access
### Spice
* [X] Run GN off balg01
* [X] Convert balg02 to Guix server
|