gn-gemtext - GeneNetwork gemtext knowledge base and issue tracker (mirror right now)

diff options


context:
space:
mode:

Diffstat (limited to 'issues/systems/fallbacks-and-backups.gmi')

-rw-r--r--

issues/systems/fallbacks-and-backups.gmi

1 files changed, 69 insertions, 0 deletions

diff --git a/issues/systems/fallbacks-and-backups.gmi b/issues/systems/fallbacks-and-backups.gmi
new file mode 100644
index 0000000..1a22db9
--- /dev/null
+++ b/issues/systems/fallbacks-and-backups.gmi

@@ -0,0 +1,69 @@

+# Fallbacks and backups

+As a hurricane is barreling towards our machine room in Memphis we are checking our fallbacks and backups for GeneNetwork. For years we have been making backups on Amazon - both S3 and a running virtual machine. The latter was expensive, so I replaced it with a bare metal server which earns itself (if it hadn't been down for months, but that is a different story).

+## Tags

+* type: enhancement

+* assigned: pjotrp

+* keywords: systems, fallback, backup, deploy

+* status: in progress

+* priority: critical

+## Tasks

+* [.] backup ratspub, r/shiny, bnw, covid19, hegp, pluto services

+* [X] /etc /home/shepherd backups for Octopus

+* [X] /etc /home/shepherd backups for P2

+* [X] Get backups running again on fallback

+* [ ] fix redis queue for P2 - needs to be on rabbit

+* [ ] fix bacchus large backups

+* [ ] backup octopus01:/lizardfs/backup-pangenome on bacchus

+## Backup and restore

+We are using borg for backing up data. Borg is excellent at deduplication and compression of data and is pretty fast too. Incremental copies work with rsync - so that is fast. To restore the full MariaDB database from a local borg repo takes a few minutes:

+```

+wrk@epysode:/export/restore_tux01$ time borg extract -v /export2/backup/tux01/borg-tux01::BORG-TUX01-MARIADB-20210829-04:20-Sun

+real 17m32.498s

+user 8m49.877s

+sys 4m25.934s

+```

+This all contrasts heavily with restoring 300GB from Amazon S3.

+Next restore the GN2 home dir

+```

+root@epysode:/# borg extract export2/backup/tux01/borg-genenetwork::TUX01_BORG_GN2_HOME-20210830-04:00-Mon

+```

+## Get backups running on fallback

+Recently epysode was reinstated after hardware failure. I took the opportunity to reinstall the machine. The backups are described in the repo (genenetwork org members have access)

+=> https://github.com/genenetwork/gn-services/blob/master/services/backups.org

+As epysode was one of the main sheepdog messaging servers I need to reinstate:

+* [X] scripts for sheepdog

+* [X] enable trim

+* [X] reinstate monitoring web services

+* [X] reinstate daily backup from penguin2

+* [X] CRON

+* [X] make sure messaging works through redis

+* [X] fix and propagate GN1 backup

+* [X] fix and propagate IPFS and gitea backups

+* [X] add GN1 backup

+* [X] add IPFS backup

+* [X] other backups

+* [ ] email on fail

+Tux01 is backed up now. Need to make sure it propagates to

+* [X] P2

+* [X] epysode

+* [X] rabbit

+* [X] Tux02

+* [ ] bacchus