summaryrefslogtreecommitdiff
path: root/topics/systems/reboot-tux01-tux02.gmi
diff options
context:
space:
mode:
authorArun Isaac2022-07-01 18:44:01 +0530
committerArun Isaac2022-07-01 18:44:01 +0530
commit2f84a722e5944bd4458abbd45fd637a9286bab04 (patch)
tree33e02614a9eb5e457c6162e8b2f67278ba07ddfe /topics/systems/reboot-tux01-tux02.gmi
parentc049be0d57f87151ad8db733150d9a49fb30ea31 (diff)
downloadgn-gemtext-2f84a722e5944bd4458abbd45fd637a9286bab04.tar.gz
Move issues lost inside the topics directory.
Diffstat (limited to 'topics/systems/reboot-tux01-tux02.gmi')
-rw-r--r--topics/systems/reboot-tux01-tux02.gmi54
1 files changed, 0 insertions, 54 deletions
diff --git a/topics/systems/reboot-tux01-tux02.gmi b/topics/systems/reboot-tux01-tux02.gmi
deleted file mode 100644
index 3186a0d..0000000
--- a/topics/systems/reboot-tux01-tux02.gmi
+++ /dev/null
@@ -1,54 +0,0 @@
-# Rebooting the GN production machine(s)
-
-I needed to add the hard disks in the BIOS to make them visible - one of the annoying aspects of these Dell machines. First on Tux02 I cheched the borg backups to see if we have a recent copy of MariaDB, GN2 etc. The DB is from 2 days ago and the genotypes of GN2 are a week old (because of a permission problem). I'll add a copy by hand of both - an opportunity to test the new 10Gbs router.
-
-Something funny is going on. When eno4 goes down the external webserver interface is not working. It appears, somehow, that 128.169.4.67 is covering for 128.169.5.59. I need to check that!
-
-# Tasks
-
-Before rebooting
-
-On Tux02:
-
-* [X] Check backups of DB and services
-* [X] Copy trees between machines
-
-On Tux01:
-
-* [ ] Network confused. See above.
-
-On both:
-
-* [X] Check network interface definitions (what happens on reboot)
-* [X] Check IPMI access - should get serial login
-
-
-# Info
-
-## Routing
-
-On tux02 eno2d1 is the 10Gbs network interface. Unfortunately I can't get it to connect at 10Gbs with Tux01 because the latter is using that port for the outside world.
-
-Playing with 10Gbs on Tux01 sent the hardware in a tail spin, what to think of this solution on
-
-```
-bnxt_en 0000:01:00.1 (unnamed net_device) (uninitialized): Error (timeout: 500015) msg {0x0 0x0} len:0
-
-Solution was to power down the server(s) and *remove* power cords for 5 minutes.
-```
-
-=> https://www.dell.com/community/PowerEdge-Hardware-General/Critical-network-bnxt-en-module-crashes-on-14G-servers/td-p/6031769
-
-The Linux kernel shows some fixes that are not on Tux01 yet
-
-=> https://lkml.org/lkml/2021/2/17/970
-
-In our case a simple reboot worked, fortunately.
-
-## Tags
-
-* assigned: pjotrp
-* status: unclear
-* priority: medium
-* type: system administration
-* keywords: systems, tux01, tux02, production