summaryrefslogtreecommitdiff
path: root/topics/systems/reboot-tux01-tux02.gmi
blob: 3186a0dd2be5ff49d41a2503eb69f6fcfb784bc4 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
# Rebooting the GN production machine(s)

I needed to add the hard disks in the BIOS to make them visible - one of the annoying aspects of these Dell machines. First on Tux02 I cheched the borg backups to see if we have a recent copy of MariaDB, GN2 etc. The DB is from 2 days ago and the genotypes of GN2 are a week old (because of a permission problem). I'll add a copy by hand of both - an opportunity to test the new 10Gbs router.

Something funny is going on. When eno4 goes down the external webserver interface is not working. It appears, somehow, that 128.169.4.67 is covering for 128.169.5.59. I need to check that!

# Tasks

Before rebooting

On Tux02:

* [X] Check backups of DB and services
* [X] Copy trees between machines

On Tux01:

* [ ] Network confused. See above.

On both:

* [X] Check network interface definitions (what happens on reboot)
* [X] Check IPMI access - should get serial login


# Info

## Routing

On tux02 eno2d1 is the 10Gbs network interface. Unfortunately I can't get it to connect at 10Gbs with Tux01 because the latter is using that port for the outside world.

Playing with 10Gbs on Tux01 sent the hardware in a tail spin, what to think of this solution on

```
bnxt_en 0000:01:00.1 (unnamed net_device) (uninitialized): Error (timeout: 500015) msg {0x0 0x0} len:0

Solution was to power down the server(s) and *remove* power cords for 5 minutes.
```

=> https://www.dell.com/community/PowerEdge-Hardware-General/Critical-network-bnxt-en-module-crashes-on-14G-servers/td-p/6031769

The Linux kernel shows some fixes that are not on Tux01 yet

=> https://lkml.org/lkml/2021/2/17/970

In our case a simple reboot worked, fortunately.

## Tags

* assigned: pjotrp
* status: unclear
* priority: medium
* type: system administration
* keywords: systems, tux01, tux02, production