diff options
Diffstat (limited to 'issues/CI-CD/cd-is-slow.gmi')
| -rw-r--r-- | issues/CI-CD/cd-is-slow.gmi | 172 |
1 files changed, 172 insertions, 0 deletions
diff --git a/issues/CI-CD/cd-is-slow.gmi b/issues/CI-CD/cd-is-slow.gmi new file mode 100644 index 0000000..d888e82 --- /dev/null +++ b/issues/CI-CD/cd-is-slow.gmi @@ -0,0 +1,172 @@ +# CD is slow + +The pages are slow and some are broken. + +# Tags + +* type: bug +* status: open +* priority: high +* assigned: pjotrp +* interested: pjotrp, bonfacem +* keywords: deployment, server + +# Measuring + +bonfacekilz: +I'm currently instrumenting the requests. See what hogs up time. Loading the landing page takes up 32 seconds! + +Something's off. From outside the container: + +``` +123bonfacem@tux02 ~ $ guix shell python-wrapper python-requests -- python time.py +Status: 200 +Time taken: 32.989222288131714 seconds +``` + +From inside the container: + +``` +12025-07-18 14:46:36 INFO:gn2.wqflask:Landing page rendered in 8.12 seconds +``` + +And I see: + +## CD + +``` +> curl -w @- -o /dev/null -s https://cd.genenetwork.org <<EOF +\n +DNS lookup: %{time_namelookup}s\n +Connect time: %{time_connect}s\n +TLS handshake: %{time_appconnect}s\n +Pre-transfer: %{time_pretransfer}s\n +Start transfer: %{time_starttransfer}s\n +Total time: %{time_total}s\n +EOF + +DNS lookup: 8.117543s +Connect time: 8.117757s +TLS handshake: 8.197767s +Pre-transfer: 8.197861s +Start transfer: 33.096467s +Total time: 33.096601s +``` + +## Production +``` +> curl -w @- -o /dev/null -s https://genenetwork.org <<EOF +\n +DNS lookup: %{time_namelookup}s\n +Connect time: %{time_connect}s\n +TLS handshake: %{time_appconnect}s\n +Pre-transfer: %{time_pretransfer}s\n +Start transfer: %{time_starttransfer}s\n +Total time: %{time_total}s\n +EOF + +DNS lookup: 8.075794s +Connect time: 8.076402s +TLS handshake: 8.147322s +Pre-transfer: 8.147370s +Start transfer: 8.797107s +Total time: 8.797299s +``` + +## On tux02 (outside CD container) + +``` +> curl -w @- -o /dev/null -s http://localhost:9092 <<EOF +\n +DNS lookup: %{time_namelookup}s\n +Connect time: %{time_connect}s\n +TLS handshake: %{time_appconnect}s\n +Pre-transfer: %{time_pretransfer}s\n +Start transfer: %{time_starttransfer}s\n +Total time: %{time_total}s\n +EOF + +DNS lookup: 0.000068s +Connect time: 0.000543s +TLS handshake: 0.000000s +Pre-transfer: 0.000606s +Start transfer: 24.851069s +Total time: 24.851166s +``` + +This does not look like an nginx problem (at least on tux02 itself). Also the nginx configuration was not really changed. +The mysql configuration ditto. I can still test both, but it looks like the problem is inside the system container. + +The container logs are at + +``` +root@tux02:/export2/guix-containers/genenetwork-development/var/log/cd# tail -100 genenetwork2.log +``` + +Some interesting errors there that need resolving, such as + +``` +tail gn-guile.log +2025-07-20 04:49:49 X.509 certificate of 'sparql.genenetwork.org' could not be verified: +2025-07-20 04:49:49 signer-not-found invalid +``` + +Let's try to get inside the container with nsenter: + +``` +ps xau|grep genenetwork-development-container +root 115940 0.0 0.0 163692 26296 ? Ssl Jul18 0:00 /gnu/store/ylwk2vn18dkzkj0nxq2h4vjzhz17bm7c-guile-3.0.9/bin/guile --no-auto-compile /usr/local/bin/genenetwork-development-container +pgrep -P 115940 +115961 +``` + +Use this PID and a recent nsenter: + +``` +/gnu/store/w7a3frdmffpw3hvxpvvxwxgzfhyqdm6n-profile/bin/nsenter -m -p -t 115961 /run/current-system/profile/bin/bash -login +``` + +System tools are in '/run/current-system/profile/bin/' + +We can pick up curl (I note the system container has full access to the /gnu/store on the host: + +``` +root@tux02 /# /gnu/store/vdaspmq10c3zmqhp38lfqy812w6r4xg3-curl-8.6.0/bin/curl -w @- -o /dev/null -s http://localhost:9092 <<EOF +\n +DNS lookup: %{time_namelookup}s\n +Connect time: %{time_connect}s\n +TLS handshake: %{time_appconnect}s\n +Pre-transfer: %{time_pretransfer}s\n +Start transfer: %{time_starttransfer}s\n +Total time: %{time_total}s\n +EOF + +DNS lookup: 0.000064s +Connect time: 0.000478s +TLS handshake: 0.000000s +Pre-transfer: 0.000551s +Start transfer: 24.792926s +Total time: 24.793015s +``` + +That rules out container and nginx streaming issues. + +So the problem is with GN and its DBs. The gn-machines is used from /home/aruni and it checkout is March. Has CD been slow since then? I don't think so. Also the changes to the actual scripts are even older. Also the guix-bioinformatics repo shows no changes. Remaining culprits I suspect are: + +* [*] MySQL +* [ ] Interaction gn-auth with gn2 +* [ ] Interaction gnqa with gn2 + +Running a standard test on mysql shows it is fine: + +``` +time mysql -u webqtlout -pwebqtlout db_webqtl < $rundir/../shared/sql/test02.sql +Name FullName Name Symbol CAST(ProbeSet."description" AS BINARY) CAST(ProbeSet."Probe_Target_Description" AS BINARY) Chr Mb Mean LRS Locus pValue additive geno_chr geno_mb +HC_M2_0606_P Hippocampus Consortium M430v2 (Jun06) PDNN 1457545_at 9530036O11Rik long non-coding RNA, expressed sequence tag (EST) AK035474 with high bladder expression antisense EST 14 Kb upstream of Shh 5 28.480441 6.7419292929293 15.2845189682605 rsm10000001525 0.055 0.0434848484848485 3 9.671673 +HC_M2_0606_P Hippocampus Consortium M430v2 (Jun06) PDNN 1427571_at Shh sonic hedgehog (hedgehog) last exon 5 28.457886 6.50113131313131 9.58158655605723 rs8253327 0.697 0.0494097096188748 1 191.908118 +HC_M2_0606_P Hippocampus Consortium M430v2 (Jun06) PDNN 1436869_at Shh sonic hedgehog (hedgehog) mid distal 3' UTR 5 28.457155 9.279090909090911 12.7711275309832 rs8253327 0.306 -0.214087568058076 1 191.908118 + +real 0m0.010s +user 0m0.004s +sys 0m0.000s +``` |
