# Machine room tasks # Tags * assigned: pjotrp, fredm * priority: medium * type: system administration * keywords: system administration, octopus, gateway, tuxes # Tasks ## Short term * [-] add SSD to tux03 (@fredm) * [ ] repurpose samsung drives with heatsink (@fredm) * [X] map out physical network layout in MR (@fredm) * [+] get disks from chenbro and rehost (@fredm) * [ ] work on speeding up moosefs (@pjotrp) * - [ ] test network interfaces and switches (@pjotrp) * [ ] add storage server with spinning disks * - [+] put order in * - [ ] install machine * [ ] migrate GN production to tux04 (@fredm) * - [ ] first remove Samsung PCIe (@fredm) * - [ ] add RAID (@pjotrp) * [ ] remove P2 data flavia, hao, davida to make space for moose ## GN * [X] penguin2 has 90TB of space we can use on NFS/backups - they went on moosefs * [ ] Replace reaper with GEMMA * [X] Transfer nervenet.org to dnsimple * [X] Trait vectors for Johannes * [X] grub on tux04 * [X] nft on tux04 * [X] !!Xusheng jumpshiny services * [ ] Fix apps and create system containers for herd services - see issues/systems/apps * [ ] Slurm+ravanan on production for GEMMA speedup * [ ] Embed R/qtl2 (Alex) * [ ] Hoot in GN2 (Andrew) * [ ] tux02 certbot failing (manual now) ## Octopus: * [X] Fix Tux05 badblocks on /dev/sdb2 1050624 47925247 46874624 22.4G Linux filesystem - see add-boot-partition * [X] Copy linux partition on tux04, tux05, tux02 and test reboot * [X] moosefs on Tuxes * [X] Boot and install over network * [ ] Add synology to moosefs * [ ] Centralized user management system (no longer needed with distributed nodes?) * [ ] Monitor nodes * [ ] Check machines so they talk with each other over fiber ## Backups & storage: * [ ] Create and check backups of tux04 etc etc. * [ ] set up zero to backup tux02 and report to redis * [ ] reintroduce borg-borg on zero * [+] run sheepdog as root: redis password error; introduce SHEEPDOG_CONF * [ ] tux01 has unused 4TB spinning disk * [ ] tux02 has unused 2x4TB spinning disks and 2TB nvme /dev/nvme0n1 on adapter https://www.cyberciti.biz/faq/upgrade-update-samsung-ssd-firmware/ apt-get install fwupd fwupdate fwupdmgr get-devices fwupdmgr update The previously problematic Samsung 980 Pro was basically using the 3B2QGXA7, and now Samsung has introduced a new 5B2QGXA7 firmware to fix the problem. The problem mainly affects the 2TB version of the 980 Pro Security: * [ ] Limit idrac access ## Spice * [ ] Add 2nd boot partition on balg01 * [ ] Add firewall test to sheepdog ## Maybe * [-] !!Organize pluto, update Julia and add apps to GN menu Jupyter notebooks * [-] get data from summer211 (access machine room) ## Done * [X] describe machines with Rick Stripes * [X] get bacchus back on line * [X] fix www.genenetwork.org and gn2.genenetwork.org https * [X] VPN access and FoUT * [X] lambda: get fiber working * [X] lambda: add to Octopus HPC * [X] lambda: racked up and runs * [X] lambda: add network (Roger) link/ether 7c:c2:55:11:9c:ac brd ff:ff:ff:ff:ff:ff inet 172.23.18.212/21 brd 172.23.23.255 scope global dynamic eno1 * [X] lambda: get service tag Tamara (with Erik?) * [X] lambda: install ubuntu (with Erik) * [X] Order storage and caddies (w. Tamara) * [X] Spice: Firewall out of band * [X] Spice: Add storage * [X] Tux01 and Tux02 disk space issues * [X] Reinstate backup drops on tux02, rabbit, &space and &epysode; reduce incoming IP * [X] Pluto tool with Zach & Efraim * [X] Order drives and caddies tux01 & tux02 (with @haoc) * [X] Introduce &disk space and mdstat monitor * [X] Machine room HDDs * [X] decommission/surplus out-racked machines (whith @arthurc) + see also ../issues/systems/decommission-machines.gmi * [X] Install tux04-tux09 * [X] tux04 and tux05 give errors * [X] use fiber optics for subnet Octopus and Tuxes * [X] Octopus11 has no fiber * [X] tux06 has temp fiber * [X] tux07 has no fiber * [X] tux08 has no fiber * [X] tux09 has no fiber ### Lambda * [X] remote access? (with Erik) * [X] get BMC password * [X] space server out-of-band access ### Spice * [X] Run GN off balg01 * [X] Convert balg02 to Guix server