# Orchestration and fallbacks After the Penguin2 crash in Aug. 2022 it has become increasingly clear how hard it is to deploy GeneNetwork. GNU Guix helps a great deal with dependencies, but it does not handle orchestration between machines/services well. Also we need to look at the future. What is GN today in terms of services * [X] Main GN2 server (Python, 20+ processes, 3+ instances: depends on all below) * [X] Matching GN3 server and REST endpoint (Python: less dependencies) * [X] Mariadb * [X] redis * [X] virtuoso (@aruni) * [X] GN-proxy (Racket, authentication handler: redis, mariadb) * [X] Alias proxy (Racket, gene aliases wikidata) * [X] opar server * [+] Jupyter, R-shiny and Julia notebooks, nb-hub server * [X] BNW server (@efraimf) * [+] UCSC browser (@efraimf) * [X] GN1 instances (older python, 12 instances in principle, 2 running today) * [ ] Access to HPC for GEMMA (coming) * [+] Backup services (sheepdog, rsync, borg) * [+] monitoring services (incl. systemd, gunicorn, shepherd, sheepdog) * [ ] mail server * [X] https certificates * [X] http(s) proxy (nginx) * [X] CI/CD services (with github webhooks) * [+] git server (gitea or cgit) * [X] file server (formerly IPFS) * [ ] SPARQL endpoint Somewhat decoupled services: * [X] genecup * [X] R/shiny power service Dave * [ ] biohackrxiv * [ ] hegp * [ ] covid19 * [ ] guix publish server (runs on penguin2, needs tux02 @efraimf) I am still missing a few! All run by a man and his diligent dog. For the future the orchestration needs to be more robust and resilient. This means: * A fallback for every service on a separate machine * Improved privacy protection for (future) human data * Separate servers serving different data sources * Partial synchronization between data sources The only way we *can* scale is by adding machines. But the system is not yet ready for that. Also getting rid of monolithic primary databases in favor of files helps synchronization.