From 3f02dcfa33c6c50a094a5194c0dc4dd9b8eaa594 Mon Sep 17 00:00:00 2001 From: Pjotr Prins Date: Mon, 24 Jun 2024 08:16:24 -0500 Subject: Revisit sheepdog and backups --- issues/systems/fallbacks-and-backups.gmi | 46 +++++++++++++++++--------------- 1 file changed, 25 insertions(+), 21 deletions(-) diff --git a/issues/systems/fallbacks-and-backups.gmi b/issues/systems/fallbacks-and-backups.gmi index 9b890c7..34cecd2 100644 --- a/issues/systems/fallbacks-and-backups.gmi +++ b/issues/systems/fallbacks-and-backups.gmi @@ -1,6 +1,10 @@ # Fallbacks and backups -As a hurricane is barreling towards our machine room in Memphis we are checking our fallbacks and backups for GeneNetwork. For years we have been making backups on Amazon - both S3 and a running virtual machine. The latter was expensive, so I replaced it with a bare metal server which earns itself (if it hadn't been down for months, but that is a different story). +A revisit to previous work on backups etc. The sheepdog hosts are no longer responding and we should really run sheepdog on a machine that is not physically with the other machines. In time sheepdog should also move away from redis and run in a system container, but that is for later. I did most of the work late 2021 when I wrote: + +> As a hurricane is barreling towards our machine room in Memphis we are checking our fallbacks and backups for GeneNetwork. For years we have been making backups on Amazon - both S3 and a running virtual machine. The latter was expensive, so I replaced it with a bare metal server which earns itself (if it hadn't been down for months, but that is a different story). + +As we are introducing an external sheepdog server we may give it a DNS entry as sheepdog.genenetwork.org. See also @@ -16,13 +20,14 @@ See also ## Tasks -* [.] backup ratspub, r/shiny, bnw, covid19, hegp, pluto services -* [X] /etc /home/shepherd backups for Octopus -* [X] /etc /home/shepherd backups for P2 -* [X] Get backups running again on fallback -* [ ] fix redis queue for P2 - needs to be on rabbit +* [X] fix redis queue and sheepdog server +* [ ] check backups on tux01 +* [ ] backup ratspub, r/shiny, bnw, covid19, hegp, pluto services +* [ ] /etc /home/shepherd backups for Octopus +* [ ] /etc /home/shepherd /home/git CI-CD GN-QA backups on Tux02 +* [ ] Get backups running again on fallback * [ ] fix bacchus large backups -* [ ] backup octopus01:/lizardfs/backup-pangenome on bacchus +* [ ] mount bacchus on HPC ## Backup and restore @@ -52,22 +57,21 @@ Recently epysode was reinstated after hardware failure. I took the opportunity t As epysode was one of the main sheepdog messaging servers I need to reinstate: * [X] scripts for sheepdog -* [X] enable trim -* [X] reinstate monitoring web services -* [X] reinstate daily backup from penguin2 -* [X] CRON -* [X] make sure messaging works through redis -* [X] fix and propagate GN1 backup -* [X] fix and propagate IPFS and gitea backups -* [X] add GN1 backup -* [X] add IPFS backup -* [X] other backups +* [ ] Check tunnel on tux01 is reinstated +* [ ] enable trim +* [ ] reinstate monitoring web services +* [ ] reinstate daily backups +* [ ] CRON +* [ ] make sure messaging works through redis +* [ ] fix and propagate GN1 backup +* [ ] fix and propagate fileserver and git backups +* [ ] add GN1 backup +* [ ] other backups * [ ] email on fail Tux01 is backed up now. Need to make sure it propagates to -* [X] P2 -* [X] epysode -* [X] rabbit -* [X] Tux02 +* [ ] rabbit +* [ ] Tux02 +* [ ] balg01 * [ ] bacchus -- cgit v1.2.3