From 2f84a722e5944bd4458abbd45fd637a9286bab04 Mon Sep 17 00:00:00 2001 From: Arun Isaac Date: Fri, 1 Jul 2022 18:44:01 +0530 Subject: Move issues lost inside the topics directory. --- topics/systems/reboot-tux01-tux02.gmi | 54 ----------------------------------- 1 file changed, 54 deletions(-) delete mode 100644 topics/systems/reboot-tux01-tux02.gmi (limited to 'topics/systems/reboot-tux01-tux02.gmi') diff --git a/topics/systems/reboot-tux01-tux02.gmi b/topics/systems/reboot-tux01-tux02.gmi deleted file mode 100644 index 3186a0d..0000000 --- a/topics/systems/reboot-tux01-tux02.gmi +++ /dev/null @@ -1,54 +0,0 @@ -# Rebooting the GN production machine(s) - -I needed to add the hard disks in the BIOS to make them visible - one of the annoying aspects of these Dell machines. First on Tux02 I cheched the borg backups to see if we have a recent copy of MariaDB, GN2 etc. The DB is from 2 days ago and the genotypes of GN2 are a week old (because of a permission problem). I'll add a copy by hand of both - an opportunity to test the new 10Gbs router. - -Something funny is going on. When eno4 goes down the external webserver interface is not working. It appears, somehow, that 128.169.4.67 is covering for 128.169.5.59. I need to check that! - -# Tasks - -Before rebooting - -On Tux02: - -* [X] Check backups of DB and services -* [X] Copy trees between machines - -On Tux01: - -* [ ] Network confused. See above. - -On both: - -* [X] Check network interface definitions (what happens on reboot) -* [X] Check IPMI access - should get serial login - - -# Info - -## Routing - -On tux02 eno2d1 is the 10Gbs network interface. Unfortunately I can't get it to connect at 10Gbs with Tux01 because the latter is using that port for the outside world. - -Playing with 10Gbs on Tux01 sent the hardware in a tail spin, what to think of this solution on - -``` -bnxt_en 0000:01:00.1 (unnamed net_device) (uninitialized): Error (timeout: 500015) msg {0x0 0x0} len:0 - -Solution was to power down the server(s) and *remove* power cords for 5 minutes. -``` - -=> https://www.dell.com/community/PowerEdge-Hardware-General/Critical-network-bnxt-en-module-crashes-on-14G-servers/td-p/6031769 - -The Linux kernel shows some fixes that are not on Tux01 yet - -=> https://lkml.org/lkml/2021/2/17/970 - -In our case a simple reboot worked, fortunately. - -## Tags - -* assigned: pjotrp -* status: unclear -* priority: medium -* type: system administration -* keywords: systems, tux01, tux02, production -- cgit v1.2.3