blob: 96281ece6a9e125b33cdc4c18b2b500aaef5aac0 (
plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
|
# Fallbacks and backups
As a hurricane is barreling towards our machine room in Memphis we are checking our fallbacks and backups for GeneNetwork. For years we have been making backups on Amazon - both S3 and a running virtual machine. The latter was expensive, so I replaced it with a bare metal server which earns itself (if it hadn't been down for months, but that is a different story).
## Tags
* enhancement
* deploy
* assigned: pjotrp
## Tasks
* [ ] Get backups running again on fallback
## Backup and restore
We are using borg for backing up data. Borg is excellent at deduplication and compression of data and is pretty fast too. Incremental copies work with rsync - so that is fast. To restore the full MariaDB database from a local borg repo takes a few minutes:
```
wrk@epysode:/export/restore_tux01$ time borg extract -v /export2/backup/tux01/borg-tux01::BORG-TUX01-MARIADB-20210829-04:20-Sun
real 17m32.498s
user 8m49.877s
sys 4m25.934s
```
This all contrasts heavily with restoring 300GB from Amazon S3.
Next restore the GN2 home dir
```
root@epysode:/# borg extract export2/backup/tux01/borg-genenetwork::TUX01_BORG_GN2_HOME-20210830-04:00-Mon
```
## Get backups running on fallback
Recently epysode was reinstated after hardware failure. I took the opportunity to reinstall the machine. The backups are described in the repo (genenetwork org members have access)
=> https://github.com/genenetwork/gn-services/blob/master/services/backups.org
As epysode was one of the main sheepdog messaging servers I need to reinstate:
* [X] scripts for sheepdog
* [X] enable trim
* [X] reinstate monitoring web services
* [t] reinstate daily backup from penguin2
* [X] CRON
* [X] make sure messaging works through redis
* [x] fix and propagate GN1 backup
* [ ] fix and propagate IPFS and gitea backups
* [ ] add GN1 backup
* [ ] add IPFS backup
* [ ] other backups
* [ ] email on fail
Tux01 is backup up now. Need to make sure it propagates to
* [ ] P2
* [ ] epysode
* [ ] rabbit
|