diff options
author | Arun Isaac | 2022-07-01 18:44:01 +0530 |
---|---|---|
committer | Arun Isaac | 2022-07-01 18:44:01 +0530 |
commit | 2f84a722e5944bd4458abbd45fd637a9286bab04 (patch) | |
tree | 33e02614a9eb5e457c6162e8b2f67278ba07ddfe /issues/systems/fallbacks-and-backups.gmi | |
parent | c049be0d57f87151ad8db733150d9a49fb30ea31 (diff) | |
download | gn-gemtext-2f84a722e5944bd4458abbd45fd637a9286bab04.tar.gz |
Move issues lost inside the topics directory.
Diffstat (limited to 'issues/systems/fallbacks-and-backups.gmi')
-rw-r--r-- | issues/systems/fallbacks-and-backups.gmi | 69 |
1 files changed, 69 insertions, 0 deletions
diff --git a/issues/systems/fallbacks-and-backups.gmi b/issues/systems/fallbacks-and-backups.gmi new file mode 100644 index 0000000..1a22db9 --- /dev/null +++ b/issues/systems/fallbacks-and-backups.gmi @@ -0,0 +1,69 @@ +# Fallbacks and backups + +As a hurricane is barreling towards our machine room in Memphis we are checking our fallbacks and backups for GeneNetwork. For years we have been making backups on Amazon - both S3 and a running virtual machine. The latter was expensive, so I replaced it with a bare metal server which earns itself (if it hadn't been down for months, but that is a different story). + +## Tags + +* type: enhancement +* assigned: pjotrp +* keywords: systems, fallback, backup, deploy +* status: in progress +* priority: critical + +## Tasks + +* [.] backup ratspub, r/shiny, bnw, covid19, hegp, pluto services +* [X] /etc /home/shepherd backups for Octopus +* [X] /etc /home/shepherd backups for P2 +* [X] Get backups running again on fallback +* [ ] fix redis queue for P2 - needs to be on rabbit +* [ ] fix bacchus large backups +* [ ] backup octopus01:/lizardfs/backup-pangenome on bacchus + +## Backup and restore + +We are using borg for backing up data. Borg is excellent at deduplication and compression of data and is pretty fast too. Incremental copies work with rsync - so that is fast. To restore the full MariaDB database from a local borg repo takes a few minutes: + +``` +wrk@epysode:/export/restore_tux01$ time borg extract -v /export2/backup/tux01/borg-tux01::BORG-TUX01-MARIADB-20210829-04:20-Sun +real 17m32.498s +user 8m49.877s +sys 4m25.934s +``` + +This all contrasts heavily with restoring 300GB from Amazon S3. + +Next restore the GN2 home dir + +``` +root@epysode:/# borg extract export2/backup/tux01/borg-genenetwork::TUX01_BORG_GN2_HOME-20210830-04:00-Mon +``` + +## Get backups running on fallback + +Recently epysode was reinstated after hardware failure. I took the opportunity to reinstall the machine. The backups are described in the repo (genenetwork org members have access) + +=> https://github.com/genenetwork/gn-services/blob/master/services/backups.org + +As epysode was one of the main sheepdog messaging servers I need to reinstate: + +* [X] scripts for sheepdog +* [X] enable trim +* [X] reinstate monitoring web services +* [X] reinstate daily backup from penguin2 +* [X] CRON +* [X] make sure messaging works through redis +* [X] fix and propagate GN1 backup +* [X] fix and propagate IPFS and gitea backups +* [X] add GN1 backup +* [X] add IPFS backup +* [X] other backups +* [ ] email on fail + +Tux01 is backed up now. Need to make sure it propagates to + +* [X] P2 +* [X] epysode +* [X] rabbit +* [X] Tux02 +* [ ] bacchus |