From bf597c0aa403a92f453c8791178f052459e692bc Mon Sep 17 00:00:00 2001 From: Pjotr Prins Date: Sun, 9 Oct 2022 09:38:06 -0500 Subject: Expanded on orchestration/services --- topics/systems/gn-services.gmi | 4 +++ topics/systems/orchestration.gmi | 57 ++++++++++++++++++++++++---------------- 2 files changed, 38 insertions(+), 23 deletions(-) (limited to 'topics') diff --git a/topics/systems/gn-services.gmi b/topics/systems/gn-services.gmi index 6f9f7fd..ba96c55 100644 --- a/topics/systems/gn-services.gmi +++ b/topics/systems/gn-services.gmi @@ -23,3 +23,7 @@ curl http://localhost:8000/gene/aliases/BRCA2 3. genenetwork3 (python3) And then there are mariadb and redis. + +## See also + +=> orchestration.gmi \ No newline at end of file diff --git a/topics/systems/orchestration.gmi b/topics/systems/orchestration.gmi index 5e0a298..bee60c8 100644 --- a/topics/systems/orchestration.gmi +++ b/topics/systems/orchestration.gmi @@ -1,35 +1,46 @@ -* Orchestration and fallbacks +# Orchestration and fallbacks After the Penguin2 crash in Aug. 2022 it has become increasingly clear how hard it is to deploy GeneNetwork. GNU Guix helps a great deal with dependencies, but it does not handle orchestration between machines/services well. Also we need to look at the future. What is GN today in terms of services - 1. Main GN2 server (Python, 20+ processes, 3+ instances: depends on all below) - 2. Matching GN3 server and REST endpoint (Python: less dependencies) - 3. Mariadb - 4. redis - 5. virtuoso - 6. GN-proxy (Racket, authentication handler: redis, mariadb) - 7. Alias proxy (Racket, gene aliases wikidata) - 8. Jupyter R and Julia notebooks - 9. BNW server (Octave) -10. UCSC browser -11. GN1 instances (older python, 12 instances in principle, 2 running today) -12. Access to HPC for GEMMA (coming) -13. Backup services (sheepdog, rsync, borg) -14. monitoring services (incl. systemd, gunicorn, shepherd, sheepdog) -15. mail server -16. https certificates -17. http(s) proxy (nginx) -18. CI/CD server (with github webhooks) +* [X] Main GN2 server (Python, 20+ processes, 3+ instances: depends on all below) +Matching GN3 server and REST endpoint (Python: less dependencies) +Mariadb +* [X] redis +* [ ] virtuoso +* [X] GN-proxy (Racket, authentication handler: redis, mariadb) +* [X] Alias proxy (Racket, gene aliases wikidata) +* [X] opar server +* [ ] Jupyter, R-shiny and Julia notebooks, nb-hub server +* [ ] BNW server (Octave) +* [ ] UCSC browser +* [X] GN1 instances (older python, 12 instances in principle, 2 running today) +* [ ] Access to HPC for GEMMA (coming) +* [ ] Backup services (sheepdog, rsync, borg) +* [ ] monitoring services (incl. systemd, gunicorn, shepherd, sheepdog) +* [ ] mail server +* [+] https certificates +* [X] http(s) proxy (nginx) +* [X] CI/CD services (with github webhooks) +* [+] git server (gitea or cgit) +* [X] file server (formerly IPFS) + +Somewhat decoupled services: + +* [+] genecup +* [ ] R/shiny power service Dave +* [ ] biohackrxiv +* [ ] covid19 +* [ ] guix publish server I am still missing a few! All run by a man and his diligent dog. For the future the orchestration needs to be more robust and resilient. This means: - 1. A fallback for every service on a separate machine - 2. Improved privacy protection for (future) human data - 3. Separate servers serving different data sources - 4. Partial synchronization between data sources +* A fallback for every service on a separate machine +* Improved privacy protection for (future) human data +* Separate servers serving different data sources +* Partial synchronization between data sources The only way we *can* scale is by adding machines. But the system is not yet ready for that. Also getting rid of monolithic primary databases in favor of files helps synchronization. -- cgit v1.2.3