diff options
Diffstat (limited to 'issues')
144 files changed, 6220 insertions, 153 deletions
diff --git a/issues/CI-CD/cd-is-slow.gmi b/issues/CI-CD/cd-is-slow.gmi new file mode 100644 index 0000000..9b0e1ee --- /dev/null +++ b/issues/CI-CD/cd-is-slow.gmi @@ -0,0 +1,276 @@ +# CD is slow + +The pages are slow and some are broken. + +We found out that there are quite a full network calls using DNS - and DNS was slow. The configured DNS server was not responding. Using Google's DNS made things go fast again. We will probably introduce dnsmasq in the container to make things even faster. + +# Tags + +* type: bug +* status: in progress +* priority: high +* assigned: pjotrp +* interested: pjotrp, bonfacem +* keywords: deployment, server + +# Tasks + +* [ ] Use dnsmasq caching - it is a guix system service +* [ ] Run less gunicorn processes on CD (2 should do) +* [ ] Increase debugging output for GN2 +* [ ] Fix GN3 hook for github (it is not working) +* [X] gn-guile lacks certificates it can use for sparql + +# Measuring + +bonfacekilz: +I'm currently instrumenting the requests. See what hogs up time. Loading the landing page takes up 32 seconds! + +Something's off. From outside the container: + +``` +123bonfacem@tux02 ~ $ guix shell python-wrapper python-requests -- python time.py +Status: 200 +Time taken: 32.989222288131714 seconds +``` + +From inside the container: + +``` +12025-07-18 14:46:36 INFO:gn2.wqflask:Landing page rendered in 8.12 seconds +``` + +And I see: + +## CD + +``` +> curl -w @- -o /dev/null -s https://cd.genenetwork.org <<EOF +\n +DNS lookup: %{time_namelookup}s\n +Connect time: %{time_connect}s\n +TLS handshake: %{time_appconnect}s\n +Pre-transfer: %{time_pretransfer}s\n +Start transfer: %{time_starttransfer}s\n +Total time: %{time_total}s\n +EOF + +DNS lookup: 8.117543s +Connect time: 8.117757s +TLS handshake: 8.197767s +Pre-transfer: 8.197861s +Start transfer: 33.096467s +Total time: 33.096601s +``` + +## Production +``` +> curl -w @- -o /dev/null -s https://genenetwork.org <<EOF +\n +DNS lookup: %{time_namelookup}s\n +Connect time: %{time_connect}s\n +TLS handshake: %{time_appconnect}s\n +Pre-transfer: %{time_pretransfer}s\n +Start transfer: %{time_starttransfer}s\n +Total time: %{time_total}s\n +EOF + +DNS lookup: 8.075794s +Connect time: 8.076402s +TLS handshake: 8.147322s +Pre-transfer: 8.147370s +Start transfer: 8.797107s +Total time: 8.797299s +``` + +## On tux02 (outside CD container) + +``` +> curl -w @- -o /dev/null -s http://localhost:9092 <<EOF +\n +DNS lookup: %{time_namelookup}s\n +Connect time: %{time_connect}s\n +TLS handshake: %{time_appconnect}s\n +Pre-transfer: %{time_pretransfer}s\n +Start transfer: %{time_starttransfer}s\n +Total time: %{time_total}s\n +EOF + +DNS lookup: 0.000068s +Connect time: 0.000543s +TLS handshake: 0.000000s +Pre-transfer: 0.000606s +Start transfer: 24.851069s +Total time: 24.851166s +``` + +This does not look like an nginx problem (at least on tux02 itself). Also the nginx configuration was not really changed. +The mysql configuration ditto. I can still test both, but it looks like the problem is inside the system container. + +The container logs are at + +``` +root@tux02:/export2/guix-containers/genenetwork-development/var/log/cd# tail -100 genenetwork2.log +``` + +Some interesting errors there that need resolving, such as + +## gn-guile error + +``` +tail gn-guile.log +2025-07-20 04:49:49 X.509 certificate of 'sparql.genenetwork.org' could not be verified: +2025-07-20 04:49:49 signer-not-found invalid +``` + +Guile is not finding the certificates for our virtuoso server. It does work with curl, try + +``` +curl -G https://query.wikidata.org/sparql -H "Accept: application/json; charset=utf-8" --data-urlencode query="SELECT DISTINCT * where { + wd:Q158695 wdt:P225 ?o . +} limit 5" +{ + "head" : { + "vars" : [ "o" ] }, "results" : { "bindings" : [ { "o" : { + "type" : "literal", + "value" : "Arabidopsis thaliana" + } + } ] + } +``` + +Also inside the container: + +``` +curl http://localhost:8091/gene/aliases/Shh +``` + +renders the same error! X.509 certificate of 'query.wikidata.org' could not be verified. so it is a gn-guile issue. + +## GN2 error reporting + +Also there are too many gunicorn processes - and strikingly - no debug output. Also I see a missing robots.txt file (even though LLMs hardly honour them). + +Let's try to get inside the container with nsenter: + +``` +ps xau|grep genenetwork-development-container +root 115940 0.0 0.0 163692 26296 ? Ssl Jul18 0:00 /gnu/store/ylwk2vn18dkzkj0nxq2h4vjzhz17bm7c-guile-3.0.9/bin/guile --no-auto-compile /usr/local/bin/genenetwork-development-container +pgrep -P 115940 +115961 +``` + +Use this child PID and a recent nsenter: + +``` +/gnu/store/w7a3frdmffpw3hvxpvvxwxgzfhyqdm6n-profile/bin/nsenter -m -p -t 115961 /run/current-system/profile/bin/bash -login +``` + +System tools are in '/run/current-system/profile/bin/' + +Make it a one-liner with + +``` +/gnu/store/w7a3frdmffpw3hvxpvvxwxgzfhyqdm6n-profile/bin/nsenter -m -p -t $(pgrep -P `ps xau|grep genenetwork-development-container|awk '{print $2}'|sort -r|head -1`) /run/current-system/profile/bin/bash -login +``` + +Once inside we can pick up curl (I note the system container has full access to the /gnu/store on the host: + +``` +root@tux02 /# /gnu/store/vdaspmq10c3zmqhp38lfqy812w6r4xg3-curl-8.6.0/bin/curl -w @- -o /dev/null -s http://localhost:9092 <<EOF +\n +DNS lookup: %{time_namelookup}s\n +Connect time: %{time_connect}s\n +TLS handshake: %{time_appconnect}s\n +Pre-transfer: %{time_pretransfer}s\n +Start transfer: %{time_starttransfer}s\n +Total time: %{time_total}s\n +EOF + +DNS lookup: 0.000064s +Connect time: 0.000478s +TLS handshake: 0.000000s +Pre-transfer: 0.000551s +Start transfer: 24.792926s +Total time: 24.793015s +``` + +That rules out container and nginx streaming issues. + +So the problem is with GN and its DBs. The gn-machines is used from /home/aruni and it checkout is March. Has CD been slow since then? I don't think so. Also the changes to the actual scripts are even older. Also the guix-bioinformatics repo shows no changes. Remaining culprits I suspect are: + +* [*] MySQL +* [ ] Interaction gn-auth with gn2 +* [ ] Interaction gnqa with gn2 + +Running a standard test on mysql shows it is fine: + +``` +time mysql -u webqtlout -pwebqtlout db_webqtl < $rundir/../shared/sql/test02.sql +Name FullName Name Symbol CAST(ProbeSet."description" AS BINARY) CAST(ProbeSet."Probe_Target_Description" AS BINARY) Chr Mb Mean LRS Locus pValue additive geno_chr geno_mb +HC_M2_0606_P Hippocampus Consortium M430v2 (Jun06) PDNN 1457545_at 9530036O11Rik long non-coding RNA, expressed sequence tag (EST) AK035474 with high bladder expression antisense EST 14 Kb upstream of Shh 5 28.480441 6.7419292929293 15.2845189682605 rsm10000001525 0.055 0.0434848484848485 3 9.671673 +HC_M2_0606_P Hippocampus Consortium M430v2 (Jun06) PDNN 1427571_at Shh sonic hedgehog (hedgehog) last exon 5 28.457886 6.50113131313131 9.58158655605723 rs8253327 0.697 0.0494097096188748 1 191.908118 +HC_M2_0606_P Hippocampus Consortium M430v2 (Jun06) PDNN 1436869_at Shh sonic hedgehog (hedgehog) mid distal 3' UTR 5 28.457155 9.279090909090911 12.7711275309832 rs8253327 0.306 -0.214087568058076 1 191.908118 + +real 0m0.010s +user 0m0.004s +sys 0m0.000s +``` + +# Profiling CD + +Ran a profiler against a traits page. See the following: + +=> /issues/CI-CD/profiling-flask + +## Results/Interpretation + +* By fixing gn-guile and gene-alias resolution, times dropped by ~10s. However, the page takes 37.9s to run. + +* Resolving a DNS takes around 4.585s. We make 7 requests. Totalling to 32.09. Typically, a traits page should take 8.79s. The difference: (- 37.9 32.09) = 5.8s; which explains the slowness: + +``` + ncall tottime percall cumtime percall filename:lineno(function) +---------------------------------------------------------------------------- + 7 0.00002618 3.741e-05 32.09 4.585 socket.py:938(getaddrinfo) +``` + +* The above is consistent all the analysis I've done across all the profile dumps. + +* Testing my theory out: + +``` +@app.route("/test-network") +def test_network(): + start = time.time() + http_url = urljoin( + current_app.config["GN_SERVER_URL"], + "version" + ) + result = requests.get(http_url) + duration = time.time() - start + app.logger.error(f"{http_url}: {duration:.4f}s") + + start = time.time() + local_url = "http://localhost:9093/api/version" + result = requests.get(local_url) + duration = time.time() - start + app.logger.error(f"{local_url}: {duration:.4f}s") + return result.json() +``` + +* Results: + +``` +2025-07-24 10:20:43 [2025-07-24 10:20:43 +0000] [101] [ERROR] https://cd.genenetwork.org/api3/version: 8.1647s +2025-07-24 10:20:43 ERROR:gn2.wqflask:https://cd.genenetwork.org/api3/version: 8.1647s +2025-07-24 10:20:43 [2025-07-24 10:20:43 +0000] [101] [ERROR] result: 1.0 +2025-07-24 10:20:43 ERROR:gn2.wqflask:result: 1.0 +2025-07-24 10:20:43 [2025-07-24 10:20:43 +0000] [101] [ERROR] http://localhost:9093/api/version: 0.0088s +2025-07-24 10:20:43 ERROR:gn2.wqflask:http://localhost:9093/api/version: 0.0088s +2025-07-24 10:20:43 [2025-07-24 10:20:43 +0000] [101] [ERROR] result: 1.0 +``` + +## Possible Mitigations + +* Switch over gn-auth.genenetwork.org to localhost. diff --git a/issues/CI-CD/configurations.gmi b/issues/CI-CD/configurations.gmi index 54cea47..acd2512 100644 --- a/issues/CI-CD/configurations.gmi +++ b/issues/CI-CD/configurations.gmi @@ -4,7 +4,7 @@ * assigned: aruni, fredm * priority: normal -* status: open +* status: closed, completed * keywords: CI, CD, configuration, config * type: bug @@ -38,3 +38,7 @@ and at least one of the values other than "localhost" is used to determine the c The secrets (e.g. SECRET_KEY, OAUTH_CLIENT_ID, OAUTH_CLIENT_SECRET, etc) can be encrypted and stored in some secrets management system (e.g. Pass [https://www.passwordstore.org/] etc.) setup in each relevant host: better yet, have all configurations (secret or otherwise) encrypted and stored in such a secrets management system and fetch them from there. This reduces the mental overhead of dealing with multiple places to fetch the configs. From these, the CI/CD system can them build and intern the configurations into the store with guix functions like "plain-file", "local-file", etc. + +## Notes + +This idea was mostly rejected — it seems — in favour of using external settings files that are shared with the running container and separate build scripts for the different environments. This mostly covers all the bases necessary to get the settings correct. diff --git a/issues/CI-CD/development-container-checklist.gmi b/issues/CI-CD/development-container-checklist.gmi new file mode 100644 index 0000000..7cf4687 --- /dev/null +++ b/issues/CI-CD/development-container-checklist.gmi @@ -0,0 +1,101 @@ +# Deploying GeneNetwork CD + +## Prerequisites + +Ensure you have `fzf' installed and Guix is set up with your preferred channel configuration. + + +## Step 1: Pull the Latest Profiles + +``` +guix pull -C channels.scm -p ~/.guix-extra-profiles/gn-machines --allow-downgrades +guix pull -C channels.scm -p ~/.guix-extra-profiles/gn-machines-shepherd-upgrade --allow-downgrades +``` + + +## Step 2: Source the Correct Profile + +``` +. ,choose-profile +``` + + +### Contents of `,choose-profile' + +This script lets you interactively select a profile using `fzf': + +``` +#!/bin/env sh + +export GUIX_PROFILE="$(guix package --list-profiles | fzf --multi)" +. "$GUIX_PROFILE/etc/profile" + +hash guix + +echo "Currently using: $GUIX_PROFILE" +``` + + +## Step 3: Verify the Profile + +``` +guix describe +``` + + +## Step 4: Pull the Latest Code + +``` +cd gn-machines +git pull +``` + + +## Step 5: Run the Deployment Script + +``` +./genenetwork-development-deploy.sh +``` + + +## Step 6: Restart the Development Container + +``` +sudo systemctl restart genenetwork-development-container +``` + + +## Step 7: Verify Changes + +Manually confirm that the intended changes were applied correctly. + + +# Accessing the Development Container on tux02 + +To enter the running container shell, ensure you're using the *parent* PID of the `shepherd' process. + + +## Step 1: Identify the Correct PID + +Use this command to locate the correct container parent process: + +``` +ps -u root -f --forest | grep -A4 '/usr/local/bin/genenetwork-development-container' | grep shepherd +``` + + +## Step 2: Enter the Container + +Replace `46804' with your actual parent PID: + +``` +sudo /home/bonfacem/.config/guix/current/bin/guix container exec 46804 \ + /gnu/store/m6c5hgqg569mbcjjbp8l8m7q82ascpdl-bash-5.1.16/bin/bash \ + --init-file /home/bonfacem/.guix-profile/etc/profile --login +``` + + +## Notes + +* Ensure the PID is the container’s *shepherd parent*, not a child process. +* Always double-check your environment and profiles before deploying. diff --git a/issues/CI-CD/failing-services-startup.gmi b/issues/CI-CD/failing-services-startup.gmi new file mode 100644 index 0000000..751e61c --- /dev/null +++ b/issues/CI-CD/failing-services-startup.gmi @@ -0,0 +1,236 @@ +# Failing Services' Startup + +## Tags + +* type: bug +* status: closed, completed +* priority: high +* assigned: fredm, bonfacem +* interested: pjotrp, bonfacem, aruni +* keywords: deployment, CI, CD + +## Description + +Upgrading guix to `34453b97005ff86355399df89c8827c57839d9c7` for CI/CD fails with: + +``` +2025-08-20 16:05:20 Backtrace: +2025-08-20 16:05:20 6 (primitive-load "/gnu/store/xbxd2zihw9dssrhips925gri0yn?") +2025-08-20 16:05:20 In ice-9/eval.scm: +2025-08-20 16:05:20 191:35 5 (_ _) +2025-08-20 16:05:20 In gnu/build/linux-container.scm: +2025-08-20 16:05:20 368:8 4 (call-with-temporary-directory #<procedure 7f014aa3a3f0?>) +2025-08-20 16:05:20 476:16 3 (_ "/tmp/guix-directory.VWRNbv") +2025-08-20 16:05:20 62:6 2 (call-with-clean-exit #<procedure 7f014aa1de80 at gnu/b?>) +2025-08-20 16:05:20 321:20 1 (_) +2025-08-20 16:05:20 In guix/build/syscalls.scm: +2025-08-20 16:05:20 1231:10 0 (_ 268566528) +2025-08-20 16:05:20 +2025-08-20 16:05:20 guix/build/syscalls.scm:1231:10: In procedure unshare: 268566528: Invalid argument +2025-08-20 16:05:20 Backtrace: +2025-08-20 16:05:20 4 (primitive-load "/gnu/store/xbxd2zihw9dssrhips925gri0yn?") +2025-08-20 16:05:20 In ice-9/eval.scm: +2025-08-20 16:05:20 191:35 3 (_ #f) +2025-08-20 16:05:20 In gnu/build/linux-container.scm: +2025-08-20 16:05:20 368:8 2 (call-with-temporary-directory #<procedure 7f014aa3a3f0?>) +2025-08-20 16:05:20 485:7 1 (_ "/tmp/guix-directory.VWRNbv") +2025-08-20 16:05:20 In unknown file: +2025-08-20 16:05:20 0 (waitpid #f #<undefined>) +2025-08-20 16:05:20 +2025-08-20 16:05:20 ERROR: In procedure waitpid: +2025-08-20 16:05:20 Wrong type (expecting exact integer): #f +``` + +Failing services: + +* genenetwork3: consistently +* genenetwork2: consistently +* gn-auth: intermittently + +## Troubleshooting Notes + +### Unable to run genenetwork2 in a shell container with the "-C" flag + +With the following channels: + +``` +$ guix describe +Generation 3 Aug 28 2025 03:56:44 (current) + gn-bioinformatics cffafde + repository URL: file:///home/bonfacem/guix-bioinformatics/ + branch: master + commit: cffafde125f3e711418d3ebb62eacd48a3efa8cf + guix-forge 3c8dc85 + repository URL: https://git.genenetwork.org/guix-forge/ + branch: main + commit: 3c8dc85a584c98bc90088ec1c85933d4d10e7383 + guix-past b14d7f9 + repository URL: https://codeberg.org/guix-science/guix-past + branch: master + commit: b14d7f997ae8eec788a7c16a7252460cba3aaef8 + guix 34453b9 + repository URL: https://codeberg.org/guix/guix + branch: master + commit: 34453b97005ff86355399df89c8827c57839d9c7 +``` + +Running: + +``` +$ guix shell -C genenetwork2 +``` + +Produces: + +``` +guix shell: error: unshare: 268566528: Invalid argument +Backtrace: + 16 (primitive-load "/export3/local/home/bonfacem/.guix-ext…") +In guix/ui.scm: + 2399:7 15 (run-guix . _) + 2362:10 14 (run-guix-command _ . _) +In ice-9/boot-9.scm: + 1752:10 13 (with-exception-handler _ _ #:unwind? _ # _) +In guix/status.scm: + 842:4 12 (call-with-status-report _ _) +In guix/store.scm: + 703:3 11 (_) +In ice-9/boot-9.scm: + 1752:10 10 (with-exception-handler _ _ #:unwind? _ # _) +In guix/store.scm: + 690:37 9 (thunk) + 1331:8 8 (call-with-build-handler _ _) + 1331:8 7 (call-with-build-handler #<procedure 7fc86bb50de0 at g…> …) +In guix/scripts/environment.scm: + 1205:11 6 (proc _) +In guix/store.scm: + 2212:25 5 (run-with-store #<store-connection 256.100 7fc87a46d820> …) +In guix/scripts/environment.scm: + 911:8 4 (_ _) +In gnu/build/linux-container.scm: + 485:7 3 (call-with-container _ _ #:namespaces _ #:host-uids _ # …) +In unknown file: + 2 (waitpid #f #<undefined>) +In ice-9/boot-9.scm: + 1685:16 1 (raise-exception _ #:continuable? _) + 1685:16 0 (raise-exception _ #:continuable? _) + +ice-9/boot-9.scm:1685:16: In procedure raise-exception: +Wrong type (expecting exact integer): #f +``` + +This is fixed by increasing the value of respawn-delay (default is 0.5s) to 5s. + + +### Unable to write to a temporary directory and issues with running git inside the g-exp + +Stack trace: +``` +2025-09-03 12:23:32 In ice-9/eval.scm: +2025-09-03 12:23:32 191:35 3 (_ #f) +2025-09-03 12:23:32 In gnu/build/linux-container.scm: +2025-09-03 12:23:32 368:8 2 (call-with-temporary-directory #<procedure 7f012241d3f0?>) +2025-09-03 12:23:32 485:7 1 (_ "/tmp/guix-directory.Bl6jtx") +2025-09-03 12:23:32 In unknown file: +2025-09-03 12:23:32 0 (waitpid #f #<undefined>) +2025-09-03 12:23:32 + +``` + +Cryptic message. Running the g-exps as a program shows: + +``` +Receiving objects: 100% (698/698), 16.18 MiB | 30.29 MiB/s, done. +Resolving deltas: 100% (49/49), done. +================================================== +error: cannot run less: No such file or directory +fatal: unable to execute pager 'less' +Backtrace: + 5 (primitive-load "/gnu/store/c9bvy90s5mglp6xdfkc1s4qkzj8?") +In ice-9/eval.scm: + 619:8 4 (_ #f) +In ice-9/boot-9.scm: + 142:2 3 (dynamic-wind #<procedure 7fa954b25880 at ice-9/eval.s?> ?) + 142:2 2 (dynamic-wind #<procedure 7fa94b7970c0 at ice-9/eval.s?> ?) +In ice-9/eval.scm: + 619:8 1 (_ #(#(#<directory (guile-user) 7fa954b03c80>))) +In guix/build/utils.scm: + 822:6 0 (invoke "git" "log" "--max-count" "1") + +guix/build/utils.scm:822:6: In procedure invoke: +ERROR: + 1. &invoke-error: + program: "git" + arguments: ("log" "--max-count" "1") + exit-status: 128 + term-signal: #f + stop-signal: #f +``` + +Fixed by adding "less" to the with-packages form and setting: + +``` +(setenv "TERM" "xterm-256color") + +``` + +### gn-auth: sqlite3.OperationalError: unable to open database file + +Despite having all file perms correctly set with 0644, we see: + +``` +Traceback (most recent call last): + File "/gnu/store/ag1m9bv22iwm3sq87xly35y138l6kzd7-profile/lib/python3.11/site-packages/flask/app.py", line 917, in full_dispatch_request + rv = self.dispatch_request() + ^^^^^^^^^^^^^^^^^^^^^^^ + File "/gnu/store/ag1m9bv22iwm3sq87xly35y138l6kzd7-profile/lib/python3.11/site-packages/flask/app.py", line 902, in dispatch_request + return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args) # type: ignore[no-any-return] + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + File "/export/data/repositories/gn-auth/gn_auth/auth/authentication/oauth2/views.py", line 102, in authorise + return with_db_connection(__authorise__) + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + File "/export/data/repositories/gn-auth/gn_auth/auth/db/sqlite3.py", line 63, in with_db_connection + return func(conn) + ^^^^^^^^^^ + File "/export/data/repositories/gn-auth/gn_auth/auth/authentication/oauth2/views.py", line 90, in __authorise__ + return server.create_authorization_response(request=request, grant_user=user) + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + File "/gnu/store/ag1m9bv22iwm3sq87xly35y138l6kzd7-profile/lib/python3.11/site-packages/authlib/oauth2/rfc6749/authorization_server.py", line 297, in create_authorization_response + args = grant.create_authorization_response(redirect_uri, grant_user) + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + File "/export/data/repositories/gn-auth/gn_auth/auth/authentication/oauth2/grants/authorisation_code_grant.py", line 31, in create_authorization_response + response = super().create_authorization_response( + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + File "/gnu/store/ag1m9bv22iwm3sq87xly35y138l6kzd7-profile/lib/python3.11/site-packages/authlib/oauth2/rfc6749/grants/authorization_code.py", line 158, in create_authorization_response + self.save_authorization_code(code, self.request) + File "/export/data/repositories/gn-auth/gn_auth/auth/authentication/oauth2/grants/authorisation_code_grant.py", line 45, in save_authorization_code + return __save_authorization_code__( + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + File "/export/data/repositories/gn-auth/gn_auth/auth/authentication/oauth2/grants/authorisation_code_grant.py", line 106, in __save_authorization_code__ + return with_db_connection(lambda conn: save_authorisation_code(conn, code)) + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + File "/export/data/repositories/gn-auth/gn_auth/auth/db/sqlite3.py", line 63, in with_db_connection + return func(conn) + ^^^^^^^^^^ + File "/export/data/repositories/gn-auth/gn_auth/auth/authentication/oauth2/grants/authorisation_code_grant.py", line 106, in <lambda> + return with_db_connection(lambda conn: save_authorisation_code(conn, code)) + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + File "/export/data/repositories/gn-auth/gn_auth/auth/authentication/oauth2/models/authorization_code.py", line 92, in save_authorisation_code + cursor.execute( +sqlite3.OperationalError: unable to open database file +``` + +Fixed above by correctly mapping: + +``` +- (source auth-db-path) ++ (source (dirname auth-db-path)) +``` + +in the relevant g-exp, and making sure that the parent directory is set to #o775 (rwx for both user/group). + +## Also See + +=> https://issues.guix.gnu.org/78356 Broken system and home containers +=> https://codeberg.org/guix/guix/src/commit/34453b97005ff86355399df89c8827c57839d9c7/guix/build/syscalls.scm#L1218-L1233 How "unshare" is defined +=> https://codeberg.org/guix/guix/src/commit/34453b97005ff86355399df89c8827c57839d9c7/gnu/build/linux-container.scm#L321 Where `unshare` is called diff --git a/issues/CI-CD/profiling-flask.gmi b/issues/CI-CD/profiling-flask.gmi new file mode 100644 index 0000000..2d0c539 --- /dev/null +++ b/issues/CI-CD/profiling-flask.gmi @@ -0,0 +1,33 @@ +# Profiling GN + +Use this simple structure: + +``` +from werkzeug.middleware.profiler import ProfilerMiddleware + + +app = Flask(__name__) +app.config["PROFILE"] = True +app.wsgi_app = ProfilerMiddleware( + app.wsgi_app, + restrictions=[40, "main"], + profile_dir="profiler_dump", + filename_format="{time:.0f}-{method}-{path}-{elapsed:.0f}ms.prof", +) +``` + + +You can use gprof2dot to visualise the profile + +``` +guix shell gprof2dot -- gprof2dot -f pstats 1753202013-GET-show_trait-37931ms.prof > 1753202013-GET-show_trait-37931ms.prof.dot +guix shell xdot -- xdot 1753202013-GET-show_trait-37931ms.prof.dot +``` + +Or snakeviz to visualize it: + + +``` +scp genenetwork:/home/bonfacem/profiling/1753202013-GET-show_trait-37931ms.prof /tmp/test +snakeviz 1753202013-GET-show_trait-37931ms.prof +``` diff --git a/issues/CI-CD/troubleshooting-within-the-development-container.gmi b/issues/CI-CD/troubleshooting-within-the-development-container.gmi new file mode 100644 index 0000000..3aa8c3b --- /dev/null +++ b/issues/CI-CD/troubleshooting-within-the-development-container.gmi @@ -0,0 +1,46 @@ +# Troubleshooting inside the GN dev container +* type: systems, debugging, container +* keywords: container, troubleshooting, logs, webhooks + +You need to find the development container so that you can begin troubleshooting: + +``` +ps -u root -f --forest | grep -A4 '/usr/local/bin/genenetwork-development-container' | grep shepherd +``` + +Example output: + +``` +root 16182 16162 0 03:57 ? 00:00:04 \_ /gnu/store/n87px1cazqkav83npg80ccp1n777j08s-guile-3.0.9/bin/guile --no-auto-compile /gnu/store/b4n5ax7l1ccia7sr123fqcjqi4vy03pv-shepherd-1.0.2/bin/shepherd --config /gnu/store/5ahb3745wlpa5mjsbk8j6frn78khvzzw-shepherd.conf +``` + +Get into the container: + +``` +# Use the correct pid and guix/bash path. + +sudo /home/bonfacem/.config/guix/current/bin/guix container exec 16182 /gnu/store/m6c5hgqg569mbcjjbp8l8m7q82ascpdl-bash-5.1.16/bin/bash --init-file /home/bonfacem/.guix-profile/etc/profile --login +``` + +All the gn related logs can be found in "/var/log/cd": + +``` +genenetwork2.log +genenetwork3.log +gn-auth.log +gn-guile.log +``` + +All the nginx log are in "/var/log/nginx" + +Sometimes, it's useful to trigger webhooks while troubleshooting. Here are all the relevant webhooks: + +``` +/gn-guile +/genenetwork2 +/genenetwork3 +/gn-libs +/gn-auth +``` + +Inside the container, we have "coreutils-minimal", and "curl" that you can use to troubleshoot. diff --git a/issues/acme-error.gmi b/issues/acme-error.gmi new file mode 100644 index 0000000..b31d04b --- /dev/null +++ b/issues/acme-error.gmi @@ -0,0 +1,106 @@ +# uACME Error: "urn:ietf:params:acme:error:unauthorized" + +## Tags + +* status: closed, completed +* priority: high +* type: bug +* assigned: fredm +* keywords: uacme, certificates, "urn:ietf:params:acme:error:unauthorized" + +## Description + +Sometimes, when we attempt to request TLS certificates from Let's Encrypt using uacme, we run into an error of the following form: + +``` +uacme: polling challenge status at https://acme-v02.api.letsencrypt.org/acme/chall/2399017717/599167439271/jFB2Pg +uacme: challenge https://acme-v02.api.letsencrypt.org/acme/chall/2399017717/599167439271/jFB2Pg failed with status invalid +uacme: the server reported the following error: +{ + "type": "urn:ietf:params:acme:error:unauthorized", + "detail": "128.xxx.xxx.xxx: Invalid response from http://sparql.genenetwork.org/.well-known/acme-challenge/N-P-mhiK04c-Iophbem4iFYsaB +yeaxeSyXHSijx3e6k: 404", + "status": 403 +} +uacme: running /gnu/store/zwqavgjqyk0f0krv8ndwhv3767f6cnx1-uacme-hook failed http-01 sparql.genenetwork.org N-P-mhiK04c-Iophbem4iFYsaBy +eaxeSyXHSijx3e6k N-P-mhiK04c-Iophbem4iFYsaByeaxeSyXHSijx3e6k.9dRdXFhCbqeDGWYndRd_hTh920rplmy-ef-_aLgjJJE +uacme: failed to authorize order at https://acme-v02.api.letsencrypt.org/acme/order/2399017717/438986245271 + +``` + +From the above error, we note that the request for the "/.well-known/..." path fails with a 404 code: Why. + +Let's try figuring it out; connect to the running container: + +``` +$ sudo guix container exec 89086 /run/current-system/profile/bin/bash --login +root@sparql /# cd /var/run/acme/acme-challenge/ +root@sparql /var/run/acme/acme-challenge# while true; do ls; sleep 0.5; clear; done +``` + +In a separate terminal, connect to the same container and run `/usr/bin/acme renew`. + +The loop we created to list what files are created in the challenge directory outputs the file + +``` +root@sparql /var/run/acme/acme-challenge# while true; do ls; sleep 0.5; clear; done +Rm7qCec3naVvqPldGSGI9W4i9AceW0X3MUNSAbC7SVE +Rm7qCec3naVvqPldGSGI9W4i9AceW0X3MUNSAbC7SVE +⋮ +``` + +but we are still getting the same error: + +``` +uacme: challenge https://acme-v02.api.letsencrypt.org/acme/chall/2399017717/599184604221/7mTNdA failed with status invalid +uacme: the server reported the following error: +{ + "type": "urn:ietf:params:acme:error:unauthorized", + "detail": "128.169.5.101: Invalid response from http://sparql.genenetwork.org/.well-known/acme-challenge/Rm7qCec3naVvqPldGSGI9W4i9AceW0X3MUNSAbC7SVE: 404", + "status": 403 +} +uacme: running /gnu/store/zwqavgjqyk0f0krv8ndwhv3767f6cnx1-uacme-hook failed http-01 sparql.genenetwork.org Rm7qCec3naVvqPldGSGI9W4i9AceW0X3MUNSAbC7SVE Rm7qCec3naVvqPldGSGI9W4i9AceW0X3MUNSAbC7SVE.9dRdXFhCbqeDGWYndRd_hTh920rplmy-ef-_aLgjJJE +uacme: failed to authorize order at https://acme-v02.api.letsencrypt.org/acme/order/2399017717/438997397751 +``` + +meaning that somehow, nginx is not able to serve up this file. + +## Discovered Cause: 2025-10-20 + +There are 2 layers of nginx, the host nginx, and the internal/container nginx. + +The host nginx was proxying directly to the virtuoso http server rather than proxying to nte internal/container nginx. This led to the failure because the internal/container nginx handles the TLS/SSL certificates for the site. The host nginx should have offloaded the handling of the TLS/SSL certificates to the internal/container nginx, but since it was not going through the internal nginx, that led to the failure. + +A simile of the error condition and the solution are in the sections below: + +### Error Condition: Wrong proxying + +In host's "nginx.conf": +``` +⋮ + proxy_pass http://localhost:<virtuoso-http-server-port>; +⋮ +``` + +In internal/container "nginx.conf": +``` +⋮ + proxy_pass http://localhost:<virtuoso-http-server-port>; +⋮ +``` + +### Solution/Fix + +In host's "nginx.conf": +``` +⋮ + proxy_pass http://localhost:<container-nginx-http-port>; +⋮ +``` + +In internal/container "nginx.conf": +``` +⋮ + proxy_pass http://localhost:<virtuoso-http-server-port>; +⋮ +``` diff --git a/issues/add-documentation-and-data-retrieval-for-AI-repo.gmi b/issues/add-documentation-and-data-retrieval-for-AI-repo.gmi index 11f8f30..a96c18d 100644 --- a/issues/add-documentation-and-data-retrieval-for-AI-repo.gmi +++ b/issues/add-documentation-and-data-retrieval-for-AI-repo.gmi @@ -6,7 +6,6 @@ * priority: high * type: ui * keywords: phenotypes -* status: stalled ## Description @@ -15,3 +14,4 @@ * Share alternate way of getting sparql json-ld data from public endpoint outside isql. * Share json-ld gotchas. +* closed diff --git a/issues/add-genotype-files-to-rdf.gmi b/issues/add-genotype-files-to-rdf.gmi index 85ac39c..856c070 100644 --- a/issues/add-genotype-files-to-rdf.gmi +++ b/issues/add-genotype-files-to-rdf.gmi @@ -3,7 +3,7 @@ ## Tags * assigned: bonfacem * type: bug -* status: open, in progress +* status: stalled In Penguin2, genotype files are located in: /export/data/genenetwork/genotype_files/genotype. Each genotype files has an identifier to a dataset it refers to: diff --git a/issues/add-unique-identifiers-for-case-attributes.gmi b/issues/add-unique-identifiers-for-case-attributes.gmi new file mode 100644 index 0000000..0c3123d --- /dev/null +++ b/issues/add-unique-identifiers-for-case-attributes.gmi @@ -0,0 +1,11 @@ +# Add Case Attributes to RDF + +## Tags + +* assigned: bonfacem +* priority: high +* status: open + +## Description + +Add case attributes and their metadata into RDF. diff --git a/issues/assorted-ui-issues.gmi b/issues/assorted-ui-issues.gmi new file mode 100644 index 0000000..5fbacea --- /dev/null +++ b/issues/assorted-ui-issues.gmi @@ -0,0 +1,36 @@ +# Various UI issues raised by Rob (8/19/2024) + +# Tags + +* assigned: zsloan +* keywords: user-interface +* priority: medium +* open + +## Tasks + +* [X] Fix collection encoding issue + +* [X] Don't import empty collections (like the Default Collection) + +* [X] Update/Creation dates aren't listed for collections + +* [X] Remove in-between ticks for Effect Size Plot (from mapping page) so it's just -1/0/1 + +* [X] Also make Effect Size Plot more narrow + +* [X] Prevent X/Y-aix summary text from extending beyond the graph width + +* [X] Longer tick markers as well + +* [X] Remove triangle for phenotype mapping + +* [X] Remove ProbeSetPosition from mapping for traits with no position + +* [X] Make Haplotype legend image thicker + change text to Haplotypes (Mat, Pat, Het, Unknown) + +* [X] Change "Sequence Site" in legend to "Gene Location" + +* [X] When adding genotype marker as covariate (for scatter-plot, maybe also mapping), change description to Position instead of "undefined" + +* [ ] Check Add Covariation colorbox popup on Apple laptop (it shows up weird for Rob, but normal for me) diff --git a/issues/auth/masquarade-as-bug.gmi b/issues/auth/masquarade-as-bug.gmi index 12c2c5f..36fe34a 100644 --- a/issues/auth/masquarade-as-bug.gmi +++ b/issues/auth/masquarade-as-bug.gmi @@ -2,6 +2,7 @@ * assigned: fredm * tags: critical +* status: closed, completed Right now you can't masquared as another user. Here's the trace: diff --git a/issues/auth/reset-password-feature.gmi b/issues/auth/reset-password-feature.gmi index 8eaaa6a..299f915 100644 --- a/issues/auth/reset-password-feature.gmi +++ b/issues/auth/reset-password-feature.gmi @@ -1,6 +1,16 @@ # Reset/Forgot Password Feature for GN2 +# Tags + * assigned: fredm -* tags: critical +* priority: critical +* status: closed +* keywords: gn-auth, auth, reset password +* type: feature-request + +## Description Should a user forget his/her password, there's no clear way to reset the password. + +This issue is +=> https://git.genenetwork.org/gn-auth/tree/gn_auth/auth/authorisation/users/views.py?id=e829074e99fd5bec033765d18d5efa55e1edce44#n454 implemented with the latest code. diff --git a/issues/cleanup-base-file-gn2.gmi b/issues/cleanup-base-file-gn2.gmi new file mode 100644 index 0000000..8a05323 --- /dev/null +++ b/issues/cleanup-base-file-gn2.gmi @@ -0,0 +1,30 @@ +# Cleanup GN2 Base HTML File + +## Tags + +* Assigned: alexm +* Keywords: base, HTML, JavaScript, cleanup +* type: Refactoring +* Status: closed, completed, done + +## Description + +The base file should contain no custom JavaScript since it is inherited in almost all files in GN2. It should only include what is necessary. As a result, we need to move the global search from the base file to the index page, which renders the GN2 home. + +## Tasks + +* [x] Remove global search code from the base file and move it to the index page +* [x] Fix formatting and linting issues in the base file (e.g., tags) +* [x] Inherit from index page for all gn2 templates + + +## Notes + +See the PR that seeks to fix this: +=> https://github.com/genenetwork/genenetwork2/pull/877 + +## Notes 26/09/2024 + +It was agreed that global search should be a feature for all pages, +As such all files need to inherit from the global search which +defines the global search. \ No newline at end of file diff --git a/issues/correlation-timing-out.gmi b/issues/correlation-timing-out.gmi index 419524d..bed8692 100644 --- a/issues/correlation-timing-out.gmi +++ b/issues/correlation-timing-out.gmi @@ -5,7 +5,7 @@ * assigned: fredm, zsloan, alexm * type: bug * priority: high -* status: ongoing +* status: closed, completed * keywords: correlations ## Description @@ -17,3 +17,7 @@ Do correlations against the same dataset This might be the same issue as the one in => /issues/correlation-missing-file correlation-missing-file.gmi but I'm not sure. + +## Close as completed + +This is fixed. diff --git a/issues/create-custom-rif-xapian-index.gmi b/issues/create-custom-rif-xapian-index.gmi new file mode 100644 index 0000000..a0b9039 --- /dev/null +++ b/issues/create-custom-rif-xapian-index.gmi @@ -0,0 +1,16 @@ +# Create Custom RIF XAPIAN Index + +## Tags + +* assigned: bonfacem +* priority: medium +* status: in-progress +* deadline: 2024-10-23 Wed + +## Description + +Given the GN Wiki search page: + +=> https://cd.genenetwork.org/genewiki GeneWiki Entries Search + +We only search by symbol. Add custom XAPIAN index to perform more powerful search. diff --git a/issues/edit-rif-metadata.gmi b/issues/edit-rif-metadata.gmi new file mode 100644 index 0000000..546dc80 --- /dev/null +++ b/issues/edit-rif-metadata.gmi @@ -0,0 +1,121 @@ +# Edit RIF Metadata in GN2 + +## Tags + +* assigned: bonfacem, jnduli +* priority: high +* status: closed + +## Tasks + +### Viewing +* [X] API: Get WIKI/RIF by symbol from rdf. + +> GET /wiki/<symbol> + +``` +[{ + "symbol": "XXXX", + "reason": "XXXX", + "species": "XXXX", + "pubmed_ids": ["XXXX", "XXXX"], // empty array when non-existent + "web_url": "XXXX" // Optional + "comment": "XXXX", + "email": "XXXX", + "categories": ["XXXX", "XXXX"], // Enumeration + "version": "XXXX", + "initial": "XXXX", // Optional user or project code or your initials. +}] +``` + +* [X] UI: Modify traits page to have "GN2 (GeneWiki)" +* [X] UI: Integrate with API + +### Editing + +* [X] API: Edit comment by id in mysql/rdf: modifies GeneRIF and GeneRIFXRef tables. +* [X] API: Modify edit comments by id to include RDF changes. + +> POST /wiki/<comment-id>/edit + +``` +{ + "symbol": "XXXX", + "reason": "XXXX", + "species": "XXXX", + "pubmed_ids": ["XXXX", "XXXX"], // Optional + "web_url": "XXXX" // Optional + "comment": "XXXX", + "email": "XXXX", + "categories": ["XXXX", "XXXX"], // Enumeration + "initial": "XXXX", // Optional user or project code or your initials. +} +``` +* [X] UI: Add buttons that edit various relevant sections. +* [X] UI: Edit page needs to fetch categories from GeneCategory table. When comment write fails, alert with error. When comment write success, update the comment on the page, and alert with success. +* [X] API: Modify edit comments by id to include RDF changes. +* [X] GN auth integration + +### History + +* [X] API: End-point to fetch all the historical data +* [X] UI: Page that contains history for how comments changes. + +> GET /wiki/<comment-id>/history + +``` +[{ + "symbol": "XXXX", + "reason": "XXXX", + "species": "XXXX", + "pubmed_ids": ["XXXX", "XXXX"], // Optional + "web_url": "XXXX" // Optional + "comment": "XXXX", + "email": "XXXX", + "categories": ["XXXX", "XXXX"], // Enumeration + "version": "XXXX", + "initial": "XXXX", // Optional user or project code or your initials. +}] +``` + +### Misc ToDos: + +* [X] Review performance of query used in 72d9a24e8e65 [Genenetwork3] + +### Ops + +* [X] RDF synchronization with SQL (gn-machines). +* [X] Update RDF in tux02. +* [X] UI: Add "edit" button after testing. + +### Resolution + +Genenetwork2: +=> https://github.com/genenetwork/genenetwork2/pull/858 UI/fetch rif using recent apis #858 +=> https://github.com/genenetwork/genenetwork2/pull/864 Add comment history page. #864 +=> https://github.com/genenetwork/genenetwork2/pull/865 Add support for auth in Rif Edit #865 +=> https://github.com/genenetwork/genenetwork2/pull/866 Add a page for searching GeneWiki by symbol. #866 +=> https://github.com/genenetwork/genenetwork2/pull/881 Add display page for NCBI RIF metadata. #881 +=> https://github.com/genenetwork/genenetwork2/pull/881 Add display page for NCBI RIF metadata. #881 +=> https://github.com/genenetwork/genenetwork2/pull/882 GN editting UI improvements #882 + + +GeneNetwork3: +=> https://github.com/genenetwork/genenetwork3/pull/180 Update script that updates Generif_BASIC table #180 +=> https://github.com/genenetwork/genenetwork3/pull/181 Add case insensitive prefixes for rif wiki #181 +=> https://github.com/genenetwork/genenetwork3/pull/184 Api/get wiki from rdf #184 +=> https://github.com/genenetwork/genenetwork3/pull/185 feat: add api calls to get categories and last comment #185 +=> https://github.com/genenetwork/genenetwork3/pull/186 Api/fetch the latest wiki by versionid #186 +=> https://github.com/genenetwork/genenetwork3/pull/187 Api/get end point to fetch all historical data #187 +=> https://github.com/genenetwork/genenetwork3/pull/189 Add auth to edit RIF api call #189 +=> https://github.com/genenetwork/genenetwork3/pull/190 Api/update rif queries #190 +=> https://github.com/genenetwork/genenetwork3/pull/193 Api/edit rif endpoint #193 +=> https://github.com/genenetwork/genenetwork3/pull/194 Fix C0411/C0412 pylint errors in gn3.api.metadata.api.wiki. #194 +=> https://github.com/genenetwork/genenetwork3/pull/195 Add rif tests #195 +=> https://github.com/genenetwork/genenetwork3/pull/196 Handle missing GN3_SECRETS for CI testing. #196 +=> https://github.com/genenetwork/genenetwork3/pull/197 Rif edit atomicity #197 +=> https://github.com/genenetwork/genenetwork3/pull/198 Run tests against Virtuoso that is spun locally. #198 +=> https://github.com/genenetwork/genenetwork3/pull/199 Add rdf-tests after the check phase. #199 +=> https://github.com/genenetwork/genenetwork3/pull/200 Api/ncbi metadata #200 + +* closed diff --git a/issues/editing-dataset-metadata.gmi b/issues/editing-dataset-metadata.gmi index 17d1693..70876e0 100644 --- a/issues/editing-dataset-metadata.gmi +++ b/issues/editing-dataset-metadata.gmi @@ -5,7 +5,7 @@ * assigned: bonfacem * priority: high * type: editing -* status: in-progress +* status: stalled * keywords: metadata editing ## Description diff --git a/issues/error-handling-external-errors.gmi b/issues/error-handling-external-errors.gmi index d1707de..640e1d1 100644 --- a/issues/error-handling-external-errors.gmi +++ b/issues/error-handling-external-errors.gmi @@ -3,7 +3,7 @@ ## Tags * assigned: fredm -* status: open +* status: closed * type: bug * priority: high * keywords: error handling diff --git a/issues/fix-global-search-ui.gmi b/issues/fix-global-search-ui.gmi new file mode 100644 index 0000000..2979d99 --- /dev/null +++ b/issues/fix-global-search-ui.gmi @@ -0,0 +1,24 @@ +# Fix Broken Global Search UI + +## Tags + +* Assigned: alexm, zsloan +* Priority: high +* status: in progress +* Keyword : search, UI, bug, Refactor +* Type: UI, bug + +## Description + +The Global search UI layout is broken on certain browser versions. +This issue was reported to occur for **Firefox Version 128.3.1** ESR Version. +The root cause of the problem is unclear, +but after reviewing the global search UI code, +the following changes need to be implemented (see tasks below): + + + +## Tasks + +* [ ] Remove custom layout CSS and replace it with the Bootstrap layout for better uniformity and easier debugging. +* [ ] Modify the navbar to extend across the full width of the page on medium and small devices. diff --git a/issues/fix-pairscan-mapping.gmi b/issues/fix-pairscan-mapping.gmi new file mode 100644 index 0000000..1b48fee --- /dev/null +++ b/issues/fix-pairscan-mapping.gmi @@ -0,0 +1,28 @@ +# Fix Pairscan Mapping + +## Tags + +* assigned: alexm, +* priority: medium, +* type: bug +* keywords: pairscan, debug, fix, mapping + +## Description +Pairscan mapping is currently not working: + +Error: + +``` +GeneNetwork 3.12-rc1 https://genenetwork.org/run_mapping ( 1:01PM UTC Jan 13, 2025) +Traceback (most recent call last): + File "/gnu/store/cxawl32jm0fgavc9ahcr3g0j66zdan30-profile/lib/python3.10/site-packages/flask/app.py", line 1523, in full_dispatch_request + rv = self.dispatch_request() + File "/gnu/store/cxawl32jm0fgavc9ahcr3g0j66zdan30-profile/lib/python3.10/site-packages/flask/app.py", line 1509, in dispatch_request + return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args) + File "/gnu/store/cxawl32jm0fgavc9ahcr3g0j66zdan30-profile/lib/python3.10/site-packages/gn2/wqflask/views.py", line 1035, in mapping_results_page + template_vars = run_mapping.RunMapping(start_vars, + File "/gnu/store/cxawl32jm0fgavc9ahcr3g0j66zdan30-profile/lib/python3.10/site-packages/gn2/wqflask/marker_regression/run_mapping.py", line 312, in __init__ + self.geno_db_exists = geno_db_exists(self.dataset, results[0]['name']) + KeyError: 'name' + +``` \ No newline at end of file diff --git a/issues/fix-rqtl-rm-bug.gmi b/issues/fix-rqtl-rm-bug.gmi new file mode 100644 index 0000000..de71487 --- /dev/null +++ b/issues/fix-rqtl-rm-bug.gmi @@ -0,0 +1,95 @@ +# Investigate and Fix `rm` Command in `rqtl` Logs + +## Tags + +* assigned: alex, bonfacem +* type: Bug +* status: in progress +* keywords: external, qtl, rqtl, bug, logs + +## Description + +For QTL analysis, we invoke the `rqtl` script as an external process through Python's `subprocess` module. +For reference, see the `rqtl_wrapper.R` script: +=> https://github.com/genenetwork/genenetwork3/blob/main/scripts/rqtl_wrapper.R + +The issue is that, upon analyzing the logs for `rqtl`, we see that an `rm` command is unexpectedly invoked: + +``` +sh: line 1: rm: command not found +``` + +This command cannot be traced to its origin, and it does not appear to be part of the expected behavior. + +The issue is currently observed only in the CD environment. The only way I have attempted to reproduce this locally is by invoking the command in a shell environment with string injection, which is not the case for GeneNetwork3, where all strings are parsed and passed as a list argument. + +Here’s an example of the above attempt: + +```python +def run_process(cmd, output_file, run_id): + """Function to execute an external process and capture the stdout in a file. + + Args: + cmd: The command to execute, provided as a list of arguments. + output_file: Absolute file path to write the stdout. + run_id: Unique ID to identify the process. + + Returns: + A dictionary with the results, indicating success or failure. + """ + cmd.append(" && rm") # Injecting potentially problematic command + cmd = " ".join(cmd) # The command is passed as a string + + try: + # Phase: Execute the command in a shell environment + with subprocess.Popen( + cmd, + shell=True, + stdout=subprocess.PIPE, + stderr=subprocess.STDOUT, + ) as process: + # Process output handling goes here +``` + +The error generated at the end of the `rqtl` if the rm run does not exists inside the container is: + +``` +sh: line 1: rm: command not found +``` + +The actual code for GeneNetwork3 is: + +```python +def run_process(cmd, output_file, run_id): + """Function to execute an external process and capture the stdout in a file. + + Args: + cmd: The command to execute, provided as a list of arguments. + output_file: Absolute file path to write the stdout. + run_id: Unique ID to identify the process. + + Returns: + A dictionary with the results, indicating success or failure. + """ + try: + # Phase: Execute the command in a shell environment + with subprocess.Popen( + cmd, + stdout=subprocess.PIPE, + stderr=subprocess.STDOUT, + ) as process: + # Process output handling goes here +``` + +## Investigated and Excluded Possibilities + +* [x] The `rm` command is not explicitly invoked within the `rqtl` script. +* [x] The `rqtl` command is passed as a list of parsed arguments (i.e., no direct string injection). +* [x] The subprocess is not invoked within a shell environment, which would otherwise result in string injection. +* [x] We simulated invoking a system command within the `rqtl` script, but the error does not match the observed issue. + +## TODO + +* [ ] Test in a similar environment to the CD environment to replicate the issue. + +* [ ] Investigate the internals of the QTL library for any unintended `rm` invocation. diff --git a/issues/fix-spam-entries-in-gn-auth-production.gmi b/issues/fix-spam-entries-in-gn-auth-production.gmi index db88eec..5ef7a42 100644 --- a/issues/fix-spam-entries-in-gn-auth-production.gmi +++ b/issues/fix-spam-entries-in-gn-auth-production.gmi @@ -2,6 +2,7 @@ # Tags +* status: closed, completed * assigned: fredm * keywords: auth @@ -13,4 +14,8 @@ We have spam entries in gn-auth in production in the groups table: b59229de-2fce-4a3d-82f1-d9eeee9b7009|Business For Sale Adelaide|{"group_description": "Welcome to Business2Sell, the ultimate online platform for those seeking affordable business opportunities in Adelaide. As a trusted first-party provider, we offer the ideal marketplace for buying or selling businesses across the country. Whether you're an aspiring entrepreneur looking for your next venture or a business owner ready to sell, Business2Sell provides the perfect platform for you. Our user-friendly interface and extensive listings make it effortless to discover a wide range of businesses, all within your budget. Join our vibrant community of buyers and sellers today, and let us help you achieve your business goals in Adelaide with ease and confidence.\r\nhttps://www.business2sell.com.au/businesses/sa/adelaide"} ``` +## Close as completed +We added email verification when registering, which should help reduce the success of these automated bots. + +We also added tooling to help with users and groups management, which is helping clean up these spam data. diff --git a/issues/gemma/gemma2-has-different-output-from-rqtl2.gmi b/issues/gemma/gemma2-has-different-output-from-rqtl2.gmi new file mode 100644 index 0000000..a0b2c5c --- /dev/null +++ b/issues/gemma/gemma2-has-different-output-from-rqtl2.gmi @@ -0,0 +1,80 @@ +# GEMMA output differs from R/qtl2 + +# Tags + +* assigned: pjotrp, davea +* priority: high +* type: bug, enhancement +* status: closed +* keywords: database, gemma, reaper, rqtl2 + +# Description + +When running trait BXD_21526 results differ significantly. + +=> https://genenetwork.org/show_trait?trait_id=21526&dataset=BXDPublish +=> https://genenetwork.org/show_trait?trait_id=21529&dataset=BXDPublish + +So I confirm I am getting the same results as Dave in GN for GEMMA (see Conclusion below). + +# Tasks + +## GeneNetwork + +I run GEMMA for precompute on the command line and that I confirmed to +be the same as what we see in the browser. This suggests either data +or method is different with Dave's approach. + +I confirmed that gemma in GN matches Dave's results. It is interesting +to see that running without LOCO has some impact, but not as bad as +the R/qtl2 difference. First we should check the genotype files to see +if they match. I checked that the phenotypes match. + +Our inputs are different if I count genotypes (first yours, the other +on production): + +``` + 1 2184941 B + 2 2132744 D + 3 628980 H + 1 2195662 B + 2 2142959 D + 3 650168 H +``` + +The number of rows/markers is the same. So we probably added some +genometypes, but if we miss one that would matter. Dave you can find +the file in /home/wrk/BXD.geno on tux02 if you want to look. + +I notice that we don't use H in the R/qtl2 control file. That +might make a difference though it probably won't explain what we see +now. BTW I also correlated the LOD scores from GEMMA and R/qtl2 in +the spreadsheet and at 0.7 that is too low. So it is probably not +just a magnitude problem. The results differ a lot in your +spreadsheet. + +Next step is that I need to run R/qtl2 using the script in your +dropbox and see what Karl's code does. The exercise does not hurt +because it will help us bring R/qtl2 to GN. + +## R/qtl2 + +R/qtl2 is packaged in guix and can be run in a shell with + +``` +guix shell -C r r-qtl2 +> library(qtl2) +> bxd <- read_cross2(file = "bxd_cancer_new_GN_July_2024.json") +Warning messages: +1: In recode_geno(sheet, genotypes) : + 630519 genotypes treated as missing: "H", "U" +2: In matrix(as.numeric(unlist(pheno)), ncol = nc) : + NAs introduced by coercion +3: In check_cross2(output) : Physical map out of order on chr 1, 2, 11, 19 +``` + +The first warning matches above. If data is missing it may be filtered out. We'll have to check for that. The third warning I am not sure about. Probably a ranking of markers. + +# Conclusion + +It turned out that R/qtl was running HK - so it was a QTL mapping rather than an LMM. diff --git a/issues/genenetwork/cannot-connect-to-mariadb.gmi b/issues/genenetwork/cannot-connect-to-mariadb.gmi new file mode 100644 index 0000000..3dfe1bc --- /dev/null +++ b/issues/genenetwork/cannot-connect-to-mariadb.gmi @@ -0,0 +1,121 @@ +# Cannot Connect to MariaDB + + +## Description + +GeneNetwork3 is failing to connect to mariadb with the error: + +``` +⋮ +2024-11-05 14:49:00 Traceback (most recent call last): +2024-11-05 14:49:00 File "/gnu/store/83v79izrqn36nbn0l1msbcxa126v21nz-profile/lib/python3.10/site-packages/flask/app.py", line 1523, in full_dispatch_request +2024-11-05 14:49:00 rv = self.dispatch_request() +2024-11-05 14:49:00 File "/gnu/store/83v79izrqn36nbn0l1msbcxa126v21nz-profile/lib/python3.10/site-packages/flask/app.py", line 1509, in dispatch_request +2024-11-05 14:49:00 return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args) +2024-11-05 14:49:00 File "/gnu/store/83v79izrqn36nbn0l1msbcxa126v21nz-profile/lib/python3.10/site-packages/gn3/api/menu.py", line 13, in generate_json +2024-11-05 14:49:00 with database_connection(current_app.config["SQL_URI"], logger=current_app.logger) as conn: +2024-11-05 14:49:00 File "/gnu/store/lzw93sik90d780n09svjx5la1bb8g3df-python-3.10.7/lib/python3.10/contextlib.py", line 135, in __enter__ +2024-11-05 14:49:00 return next(self.gen) +2024-11-05 14:49:00 File "/gnu/store/83v79izrqn36nbn0l1msbcxa126v21nz-profile/lib/python3.10/site-packages/gn3/db_utils.py", line 34, in database_connection +2024-11-05 14:49:00 connection = mdb.connect(db=db_name, +2024-11-05 14:49:00 File "/gnu/store/83v79izrqn36nbn0l1msbcxa126v21nz-profile/lib/python3.10/site-packages/MySQLdb/__init__.py", line 121, in Connect +2024-11-05 14:49:00 return Connection(*args, **kwargs) +2024-11-05 14:49:00 File "/gnu/store/83v79izrqn36nbn0l1msbcxa126v21nz-profile/lib/python3.10/site-packages/MySQLdb/connections.py", line 195, in __init__ +2024-11-05 14:49:00 super().__init__(*args, **kwargs2) +2024-11-05 14:49:00 MySQLdb.OperationalError: (2002, "Can't connect to local MySQL server through socket '/tmp/mysql.sock' (2)") +``` + +We have previously defined the default socket file[^1][^2] as "/run/mysqld/mysqld.sock". + +## Troubleshooting Logs + +### 2024-11-05 + +I attempted to just bind `/run/mysqld/mysqld.sock` to `/tmp/mysql.sock` by adding the following mapping in GN3's `gunicorn-app` definition: + +``` +(file-system-mapping + (source "/run/mysqld/mysqld.sock") + (target "/tmp/mysql.sock") + (writable? #t)) +``` + +but that does not fix things. + +I had tried to change the mysql URI to use IP addresses, i.e. + +``` +SQL_URI="mysql://webqtlout:webqtlout@128.169.5.119:3306/db_webqtl" +``` + +but that simply changes the error from the above to the one below: + +``` +2024-11-05 15:27:12 MySQLdb.OperationalError: (2002, "Can't connect to MySQL server on '128.169.5.119' (115)") +``` + +I tried with both `127.0.0.1` and `128.169.5.119`. + +My hail-mary was to attempt to expose the `my.cnf` file generated by the `mysql-service-type` definition to the "pola-wrapper", but that is proving tricky, seeing as the file is generated elsewhere[^4] and we do not have a way of figuring out the actual final path of the file. + +I tried: + +``` +(file-system-mapping + (source (mixed-text-file "my.cnf" + (string-append "[client]\n" + "socket=/run/mysqld/mysqld.sock"))) + (target "/etc/mysql/my.cnf")) +``` + +but that did not work either. + +### 2024-11-07 + +Start digging into how GNU Guix services are defined[^5] to try and understand why the file mapping attempt did not work. + +=> http://git.savannah.gnu.org/cgit/guix.git/tree/gnu/system/file-systems.scm?id=2394a7f5fbf60dd6adc0a870366adb57166b6d8b#n575 +Looking at the code linked above specifically at lines 575 to 588, and 166, it seems, to me, that the mappings attempt should have worked. + +Try it again, taking care to verify that the paths are correct, with: + +``` +(file-system-mapping + (source (mixed-text-file "my.cnf" + (string-append "[client-server]\n" + "socket=/run/mysqld/mysqld.sock"))) + (target "/etc/my.cnf")) +``` + +Try rebuilding on tux04: started getting `Segmentation fault` errors out of the blue for many guix commands 🤦🏿. +Try building container on local dev machine: this took a long time - quit and continue later. + +### 2024-11-08 + +After guix broke, causing the `Segmentation fault` errors above, I did some troubleshooting and was able to finally fix that by pinning guix to version b0b988c41c9e0e591274495a1b2d6f27fcdae15a as shown in the troubleshooting transcript[^6]. + +Now the fixes I did to make python requests work with the newer guix (defined in guix-bioinformatics[^7]) seem to be leading to failures in the older guix version. + +Let me attempt rebasing to reorder the commits, to make the python requests commit come last, to more easily do a `git reset` before rebuilding the container — not successful. +=> https://git.genenetwork.org/gn-machines/commit/?h=production-container&id=610049b2bfa32cae5d3f992b95aac711290efa2a Manually "undo" the changes in a new commit, + +then rebuild the container. This exposes a bug in gn-auth. + +=> https://git.genenetwork.org/gn-auth/commit/?id=4c21d0e43cf0de1084d0e0a243e441c6e72236eb Fix that. + +and update the `public-jwks-uri` value for the client in the admin dashboard, and voila!!! Now the system works. + +Attempt pulling guix "2394a7f5fbf60dd6adc0a870366adb57166b6d8b" into a profile locally: went through without a hitch + +Upgrade guix daemon, and restart it. Delete profile and run `guix gc`, then try pulling guix "2394a7f5fbf60dd6adc0a870366adb57166b6d8b" again. It also went through without a problem. This eliminates the daemon being the culprit: Running `sudo -i guix pull --list-generations` on both tux04 and my local dev machine gives both daemon commits as `2a6d96425eea57dc6dd48a2bec16743046e32e06`. + + +### Footnotes + +=> https://git.genenetwork.org/gn-machines/tree/production.scm?id=46a1c4c8d01198799e6ac3b99998dca40d2c7094#n47 [^1] Lines 47 to 49 of production.scm +=> https://guix.gnu.org/manual/en/html_node/Database-Services.html#index-mysql_002dconfiguration [^2] Guix's mysql-service-type configurations +=> https://mariadb.com/kb/en/server-system-variables/#socket [^3] MariaDB configuration variables: socket +=> https://git.savannah.gnu.org/cgit/guix.git/tree/gnu/services/databases.scm?id=4c56d0cccdc44e12484b26332715f54768738c5f#n576 [^4] Guix: mysql-service-type configuration code +=> https://guix.gnu.org/manual/en/html_node/Defining-Services.html [^5] Guix documentation: Defining Services +=> https://github.com/genenetwork/gn-gemtext-threads/blob/d785b06643b5e5a2470fd0da075dcf77bda82d16/miscellaneous/broken-guix-on-tux04-20241108.org [^6] Broken guix on tux04: Troubleshooting transcript +=> https://git.genenetwork.org/guix-bioinformatics/commit/?id=eb7beb340a9731775e8ad177e47b70dba2f2a84f [^7] guix-bioinformatics: Upgrade guix channel to 2394a7f diff --git a/issues/genenetwork/containerising-production-issues.gmi b/issues/genenetwork/containerising-production-issues.gmi new file mode 100644 index 0000000..ed5702a --- /dev/null +++ b/issues/genenetwork/containerising-production-issues.gmi @@ -0,0 +1,33 @@ +# Containerising Production: Issues + +## Tags + +* type: bug +* assigned: fredm +* priority: critical +* status: closed, completed +* keywords: production, container, tux04 +* interested: alexk, aruni, bonfacem, fredm, pjotrp, soloshelby, zsloan, jnduli + +## Description + +We have recently got production into a container and deployed it: It has come up, however, that there are services that are useful to get a full-featured GeneNetwork system running that are not part of the container. + +This is, therefore, a meta-issue, tracking all issues that relate to the deployment of the disparate services that make up GeneNetwork. + +## Documentation + +=> https://issues.genenetwork.org/topics/genenetwork/genenetwork-services + +The link above documents the various services that make up the GeneNetwork service. + +## Issues + +* [x] Move user directories to a large partition +=> ./handle-tmp-dirs-in-container [x] Link TMPDIR in container to a directory on a large partition +=> ./markdown-editing-service-not-deployed [ ] Define and deploy Markdown Editing service +=> ./umhet3-samples-timing-slow [ ] Figure out and fix UM-HET3 Samples mappings on Tux04 +=> ./setup-mailing-on-tux04 [x] Setting up email service on Tux04 +=> ./virtuoso-shutdown-clears-data [x] Virtuoso seems to lose data on restart +=> ./python-requests-error-in-container [x] Fix python's requests library certificates error +=> ./cannot-connect-to-mariadb [ ] GN3 cannot connect to mariadb server diff --git a/issues/genenetwork/guix-bioinformatics-remove-guix-rust-past-crates-channel.gmi b/issues/genenetwork/guix-bioinformatics-remove-guix-rust-past-crates-channel.gmi new file mode 100644 index 0000000..b804e10 --- /dev/null +++ b/issues/genenetwork/guix-bioinformatics-remove-guix-rust-past-crates-channel.gmi @@ -0,0 +1,23 @@ +# guix-bioinformatics: Remove `guix-rust-past-crates` channel + +## Tags + +* assigned: alexm, bonfacem +* interested: fredm +* priority: normal +* status: open +* type: bug +* keywords: guix-bioinformatics, guix-rust-past-crates, guix, rust, crates + +## Description + +GNU Guix recently changed[1] the way it handles packaging of rust packages. + +The old rust packages got moved to the "guix-rust-past-crates" to help avoid huge breakages for systems depending on the older packaging system. "guix-bioinformatics" used a number of rust packages, defined in the old form, and we needed a quick fix, thus the introduction of the "guix-rust-past-crates" channel as a dependency. + +We need to move away from depending on this channel, by updating all the rust crates we use to the new packaging model. + + +## Footnotes + +=> https://guix.gnu.org/en/blog/2025/a-new-rust-packaging-model/ [1] diff --git a/issues/genenetwork/handle-tmp-dirs-in-container.gmi b/issues/genenetwork/handle-tmp-dirs-in-container.gmi new file mode 100644 index 0000000..5f6eb92 --- /dev/null +++ b/issues/genenetwork/handle-tmp-dirs-in-container.gmi @@ -0,0 +1,22 @@ +# Handle Temporary Directories in the Container + +## Tags + +* type: feature +* assigned: fredm +* priority: critical +* status: closed, completed +* keywords: production, container, tux04 +* interested: alexk, aruni, bonfacem, pjotrp, zsloan + +## Description + +The container's temporary directories should be in a large partition on the host to avoid a scenario where the writes fill up one of the smaller drives. + +Currently, we use the `/tmp` directory by default, but we should look into transitioning away from that — `/tmp` is world readable and world writable and therefore needs careful consideration to keep safe. + +Thankfully, we are running our systems within a container, and can bind the container's `/tmp` directory to a non-world-accessible directory, keeping things at least contained. + +### Fixes + +=> https://git.genenetwork.org/gn-machines/commit/?id=7306f1127df9d4193adfbfa51295615f13d32b55 diff --git a/issues/genenetwork/markdown-editing-service-not-deployed.gmi b/issues/genenetwork/markdown-editing-service-not-deployed.gmi new file mode 100644 index 0000000..9d72e4e --- /dev/null +++ b/issues/genenetwork/markdown-editing-service-not-deployed.gmi @@ -0,0 +1,39 @@ +# Markdown Editing Service: Not Deployed + +## Tags + +* type: bug +* status: closed, completed, fixed +* assigned: fredm +* priority: critical +* keywords: production, container, tux04 +* interested: alexk, aruni, bonfacem, fredm, pjotrp, zsloan + +## Description + +The Markdown Editing service is not working on production. + +* Link: https://genenetwork.org/facilities/ +* Repository: https://git.genenetwork.org/gn-guile + +Currently, the code is being run directly on the host, rather than inside the container. + +Some important things to note: + +* The service requires access to a checkout of https://github.com/genenetwork/gn-docs +* Currently, the service is hard-coded to use a specific port: we should probably fix that. + +## Reopened: 2024-11-01 + +While the service was deployed, the edit functionality is not working right, specifically, pushing the edits upstream to the remote seems to fail. + +If you do an edit and refresh the page, it will show up in the system, but it will not proceed to be pushed up to the remote. + +Set `CGIT_REPO_PATH="https://git.genenetwork.org/gn-guile"` which seems to allow the commit to work, but we do not actually get the changes pushed to the remote in any useful sense. + +It seems to me, that we need to configure the environment in such a way that it will be able to push the changes to remote. + + +## Close as Completed + +The markdown editing service is deployed and configured correctly. diff --git a/issues/genenetwork/python-requests-error-in-container.gmi b/issues/genenetwork/python-requests-error-in-container.gmi new file mode 100644 index 0000000..0289762 --- /dev/null +++ b/issues/genenetwork/python-requests-error-in-container.gmi @@ -0,0 +1,174 @@ +# Python Requests Error in Container + +## Tags + +* type: bug +* assigned: fredm +* priority: critical +* status: closed, completed, fixed +* interested: alexk, aruni, bonfacem, pjotrp, zsloan +* keywords: production, container, tux04, python, requests + +## Description + +Building the container with the +=> https://git.genenetwork.org/guix-bioinformatics/commit/?id=eb7beb340a9731775e8ad177e47b70dba2f2a84f upgraded guix definition +leads to python's requests library failing. + +``` +2024-10-30 16:04:13 OSError: Could not find a suitable TLS CA certificate bundle, invalid path: /etc/ssl/certs/ca-certificates.crt +``` + +If you login to the container itself, however, you find that the file `/etc/ssl/certs/ca-certificates.crt` actually exists and has content. + +Possible fixes suggested are to set up correct envvars for the requests library, such as `REQUESTS_CA_BUNDLE` + +See +=> https://requests.readthedocs.io/en/latest/user/advanced/#ssl-cert-verification + +### Troubleshooting Logs + +Try reproducing the issue locally: + +``` +$ guix --version +hint: Consider installing the `glibc-locales' package and defining `GUIX_LOCPATH', along these lines: + + guix install glibc-locales + export GUIX_LOCPATH="$HOME/.guix-profile/lib/locale" + +See the "Application Setup" section in the manual, for more info. + +guix (GNU Guix) 2394a7f5fbf60dd6adc0a870366adb57166b6d8b +Copyright (C) 2024 the Guix authors +License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> +This is free software: you are free to change and redistribute it. +There is NO WARRANTY, to the extent permitted by law. +$ +$ guix shell --container --network python python-requests coreutils +[env]$ ls "${GUIX_ENVIRONMENT}/etc" +ld.so.cache profile +``` + +We see from the above that there are no certificates in the environment with just python and python-requests. + +Okay. Now let's write a simple python script to test things out with: + +``` +import requests + +resp = requests.get("https://github.com") +print(resp) +``` + +and run it! + +``` +$ guix shell --container --network python python-requests coreutils -- python3 test.py +Traceback (most recent call last): + File "/tmp/test.py", line 1, in <module> + import requests + File "/gnu/store/b6ny4p29f32rrnnvgx7zz1nhsms2zmqk-profile/lib/python3.10/site-packages/requests/__init__.py", line 164, in <module> + from .api import delete, get, head, options, patch, post, put, request + File "/gnu/store/b6ny4p29f32rrnnvgx7zz1nhsms2zmqk-profile/lib/python3.10/site-packages/requests/api.py", line 11, in <module> + from . import sessions + File "/gnu/store/b6ny4p29f32rrnnvgx7zz1nhsms2zmqk-profile/lib/python3.10/site-packages/requests/sessions.py", line 15, in <module> + from .adapters import HTTPAdapter + File "/gnu/store/b6ny4p29f32rrnnvgx7zz1nhsms2zmqk-profile/lib/python3.10/site-packages/requests/adapters.py", line 81, in <module> + _preloaded_ssl_context.load_verify_locations( +FileNotFoundError: [Errno 2] No such file or directory +``` + +Uhmm, what is this new error? + +Add `nss-certs` and try again. + +``` +$ guix shell --container --network python python-requests nss-certs coreutils +[env]$ ls ${GUIX_ENVIRONMENT}/etc/ssl/ +certs +[env]$ python3 test.py +Traceback (most recent call last): + File "/tmp/test.py", line 1, in <module> + import requests + File "/gnu/store/17dw8qczqqz9fmj2kxzsbfqn730frqd7-profile/lib/python3.10/site-packages/requests/__init__.py", line 164, in <module> + from .api import delete, get, head, options, patch, post, put, request + File "/gnu/store/17dw8qczqqz9fmj2kxzsbfqn730frqd7-profile/lib/python3.10/site-packages/requests/api.py", line 11, in <module> + from . import sessions + File "/gnu/store/17dw8qczqqz9fmj2kxzsbfqn730frqd7-profile/lib/python3.10/site-packages/requests/sessions.py", line 15, in <module> + from .adapters import HTTPAdapter + File "/gnu/store/17dw8qczqqz9fmj2kxzsbfqn730frqd7-profile/lib/python3.10/site-packages/requests/adapters.py", line 81, in <module> + _preloaded_ssl_context.load_verify_locations( +FileNotFoundError: [Errno 2] No such file or directory +[env]$ +[env]$ export REQUESTS_CA_BUNDLE="${GUIX_ENVIRONMENT}/etc/ssl/certs/ca-certificates.crt" +[env]$ $ python3 test.py +Traceback (most recent call last): + File "/tmp/test.py", line 1, in <module> + import requests + File "/gnu/store/17dw8qczqqz9fmj2kxzsbfqn730frqd7-profile/lib/python3.10/site-packages/requests/__init__.py", line 164, in <module> + from .api import delete, get, head, options, patch, post, put, request + File "/gnu/store/17dw8qczqqz9fmj2kxzsbfqn730frqd7-profile/lib/python3.10/site-packages/requests/api.py", line 11, in <module> + from . import sessions + File "/gnu/store/17dw8qczqqz9fmj2kxzsbfqn730frqd7-profile/lib/python3.10/site-packages/requests/sessions.py", line 15, in <module> + from .adapters import HTTPAdapter + File "/gnu/store/17dw8qczqqz9fmj2kxzsbfqn730frqd7-profile/lib/python3.10/site-packages/requests/adapters.py", line 81, in <module> + _preloaded_ssl_context.load_verify_locations( +FileNotFoundError: [Errno 2] No such file or directory +``` + +Welp! Looks like this error is a whole different thing. + +Let us try with the genenetwork2 package. + +``` +$ guix shell --container --network genenetwork2 coreutils +[env]$ ls "${GUIX_ENVIRONMENT}/etc" +bash_completion.d jupyter ld.so.cache profile +``` + +This does not seem to have the certificates in place either, so let's add nss-certs + +``` +$ guix shell --container --network genenetwork2 coreutils nss-certs +[env]$ ls "${GUIX_ENVIRONMENT}/etc" +bash_completion.d jupyter ld.so.cache profile ssl +[env]$ python3 test.py +Traceback (most recent call last): + File "/tmp/test.py", line 3, in <module> + resp = requests.get("https://github.com") + File "/gnu/store/qigjz4i0dckbsjbd2has0md2dxwsa7ry-profile/lib/python3.10/site-packages/requests/api.py", line 73, in get + return request("get", url, params=params, **kwargs) + File "/gnu/store/qigjz4i0dckbsjbd2has0md2dxwsa7ry-profile/lib/python3.10/site-packages/requests/api.py", line 59, in request + return session.request(method=method, url=url, **kwargs) + File "/gnu/store/qigjz4i0dckbsjbd2has0md2dxwsa7ry-profile/lib/python3.10/site-packages/requests/sessions.py", line 587, in request + resp = self.send(prep, **send_kwargs) + File "/gnu/store/qigjz4i0dckbsjbd2has0md2dxwsa7ry-profile/lib/python3.10/site-packages/requests/sessions.py", line 701, in send + r = adapter.send(request, **kwargs) + File "/gnu/store/qigjz4i0dckbsjbd2has0md2dxwsa7ry-profile/lib/python3.10/site-packages/requests/adapters.py", line 460, in send + self.cert_verify(conn, request.url, verify, cert) + File "/gnu/store/qigjz4i0dckbsjbd2has0md2dxwsa7ry-profile/lib/python3.10/site-packages/requests/adapters.py", line 263, in cert_verify + raise OSError( +OSError: Could not find a suitable TLS CA certificate bundle, invalid path: /etc/ssl/certs/ca-certificates.crt +``` + +We get the expected certificates error! This is good. Now define the envvar and try again. + +``` +[env]$ export REQUESTS_CA_BUNDLE="${GUIX_ENVIRONMENT}/etc/ssl/certs/ca-certificates.crt" +[env]$ python3 test.py +<Response [200]> +``` + +Success!!! + +Adding nss-certs and setting the `REQUESTS_CA_BUNDLE` fixes things. We'll need to do the same for the container, for both the genenetwork2 and genenetwork3 packages (and any other packages that use requests library). + +### Fixes + +=> https://git.genenetwork.org/guix-bioinformatics/commit/?id=fec68c4ca87eeca4eb9e69e71fc27e0eae4dd728 +=> https://git.genenetwork.org/guix-bioinformatics/commit/?id=c3bb784c8c70857904ef97ecd7d36ec98772413d +The two commits above add nss-certs package to all the flask apps, which make use of the python-requests library, which requires a valid CA certificates bundle in each application's environment. + +=> https://git.genenetwork.org/gn-machines/commit/?h=production-container&id=04506c4496e5ca8b3bc38e28ed70945a145fb036 +The commit above defines the "REQUESTS_CA_BUNDLE" environment variable for all the flask applications that make use of python's requests library. diff --git a/issues/genenetwork/setup-mailing-on-tux04.gmi b/issues/genenetwork/setup-mailing-on-tux04.gmi new file mode 100644 index 0000000..45605d9 --- /dev/null +++ b/issues/genenetwork/setup-mailing-on-tux04.gmi @@ -0,0 +1,16 @@ +# Setup Mailing on Tux04 + +## Tags + +* type: bug +* status: closed +* assigned: fredm +* priority: critical +* interested: pjotrp, zsloan +* keywords: production, container, tux04 + +## Description + +We use emails to verify user accounts and allow changing of user passwords. We therefore need to setup a way to send emails from the system. + +I updated the configurations to use UTHSC's mail server diff --git a/issues/genenetwork/umhet3-samples-timing-slow.gmi b/issues/genenetwork/umhet3-samples-timing-slow.gmi new file mode 100644 index 0000000..a3a33a7 --- /dev/null +++ b/issues/genenetwork/umhet3-samples-timing-slow.gmi @@ -0,0 +1,72 @@ +# UM-HET3 Timing: Slow + +## Tags + +* type: bug +* status: open +* assigned: fredm +* priority: critical +* interested: fredm, pjotrp, zsloan +* keywords: production, container, tux04, UM-HET3 + +## Description + +In email from @robw: + +``` +> > Not sure why. Am I testing the wrong way? +> > Are we using memory and RAM in the same way on the two machines? +> > Here are data on the loading time improvement for Tux2: +> > I tested this using a "worst case" trait that we know when—the 25,000 +> > UM-HET3 samples: +> > [1]https://genenetwork.org/show_trait?trait_id=10004&dataset=HET3-ITPPu +> > blish +> > Tux02: 15.6, 15.6, 15.3 sec +> > Fallback: 37.8, 38.7, 38.5 sec +> > Here are data on Gemma speed/latency performance: +> > Also tested "worst case" performance using three large BXD data sets +> > tested in this order: +> > [2]https://genenetwork.org/show_trait?trait_id=10004&dataset=BXD-Longev +> > ityPublish +> > [3]https://genenetwork.org/show_trait?trait_id=10003&dataset=BXD-Longev +> > ityPublish +> > [4]https://genenetwork.org/show_trait?trait_id=10002&dataset=BXD-Longev +> > ityPublish +> > Tux02: 107.2, 329.9 (ouch), 360.0 sec (double ouch) for 1004, 1003, and +> > 1002 respectively. On recompute (from cache) 19.9, 19.9 and 20.0—still +> > too slow. +> > Fallback: 154.1, 115.9 for the first two traits (trait 10002 already in +> > the cache) +> > On recompute (from cache) 59.6, 59.0 and 59.7. Too slow from cache. +> > PROBLEM 2: Tux02 is unable to map UM-HET3. I still get an nginx 413 +> > error: Entity Too Large. +> +> Yeah, Fred should fix that one. It is an nginx setting - we run 2x +> nginx. It was reported earlier. +> +> > I need this to work asap. Now mapping our amazing UM-HET3 data. I can +> > use Fallback, but it is painfully slow and takes about 214 sec. I hope +> > Tux02 gets that down to a still intolerable slow 86 sec. +> > Can we please fix and confirm by testing. The Trait is above for your +> > testing pleasure. +> > Even 86 secs is really too slow and should motivate us (or users like +> > me) to think about how we are using all of those 24 ultra-fast cores on +> > the AMD 9274F. Why not put them all to use for us and users. It is not +> > good enough just to have "it work". It has to work in about 5–10 +> > seconds. +> > Here are my questions for you guys: Are we able to use all 24 cores +> > for any one user? How does each user interact with the CPU? Can we +> > handle a class of 24 students with 24 cores, or is it "complicated"? +> > PROBLEM 3: Zach, Fred. Are we computing render time or transport +> > latency correctly? Ideally the printout at the bottom of mapping pages +> > would be true latency as experienced by the user. As far as I can tell +> > with a stop watch our estimates of time are incorrect by as much as 3 +> > secs. And note that the link +> > to [5]http://joss.theoj.org/papers/10.21105/joss.00025 is not working +> > correctly in the footer (see image below). Oddly enough it works fine +> > on Tux02 +> +> Fred, take a note. +``` + +Figure out what this is about and fix it. diff --git a/issues/genenetwork/virtuoso-shutdown-clears-data.gmi b/issues/genenetwork/virtuoso-shutdown-clears-data.gmi new file mode 100644 index 0000000..2e01238 --- /dev/null +++ b/issues/genenetwork/virtuoso-shutdown-clears-data.gmi @@ -0,0 +1,98 @@ +# Virtuoso: Shutdown Clears Data + +## Tags + +* type: bug +* assigned: fredm +* priority: critical +* status: closed, completed +* interested: bonfacem, pjotrp, zsloan +* keywords: production, container, tux04, virtuoso + +## Description + +It seems that virtuoso has the bad habit of clearing data whenever it is stopped/restarted. + +This issue will track the work necessary to get the service behaving correctly. + +According to the documentation on +=> https://vos.openlinksw.com/owiki/wiki/VOS/VirtBulkRDFLoader the bulk loading process + +``` +The bulk loader also disables checkpointing and the scheduler, which also need to be re-enabled post bulk load +``` + +That needs to be handled. + +### Notes + +After having a look at +=> https://docs.openlinksw.com/virtuoso/ch-server/#databaseadmsrv the configuration documentation +it occurs to me that the reason virtuoso supposedly clears the data is that the `DatabaseFile` value is not set, so it defaults to a new database file every time the server is restarted (See also the `Striping` setting). + +### Troubleshooting + +Reproduce locally: + +We begin by getting a look at the settings for the remote virtuoso +``` +$ ssh tux04 +fredm@tux04:~$ cat /gnu/store/bg6i4x96nm32gjp4qhphqmxqc5vggk3h-virtuoso.ini +[Parameters] +ServerPort = localhost:8981 +DirsAllowed = /var/lib/data +NumberOfBuffers = 4000000 +MaxDirtyBuffers = 3000000 +[HTTPServer] +ServerPort = localhost:8982 +``` + +Copy these into a file locally, and adjust the `NumberOfBuffers` and `MaxDirtyBuffers` for smaller local dev environment. Also update `DirsAllowed`. + +We end up with our local configuration in `~/tmp/virtuoso/etc/virtuoso.ini` with the content: + +``` +[Parameters] +ServerPort = localhost:8981 +DirsAllowed = /var/lib/data +NumberOfBuffers = 10000 +MaxDirtyBuffers = 6000 +[HTTPServer] +ServerPort = localhost:8982 +``` + +Run virtuoso! +``` +$ cd ~/tmp/virtuoso/var/lib/virtuoso/ +$ ls +$ ~/opt/virtuoso/bin/virtuoso-t +foreground +configfile ~/tmp/virtuoso/etc/virtuoso.ini +``` + +Here we start by changing into the `~/tmp/virtuoso/var/lib/virtuoso/` directory which will be where virtuoso will put its state. Now in a different terminal list the files created int the state directory: + +``` +$ ls ~/tmp/virtuoso/var/lib/virtuoso +virtuoso.db virtuoso.lck virtuoso.log virtuoso.pxa virtuoso.tdb virtuoso.trx +``` + +That creates the database file (and other files) with the documented default values, i.e. `virtuoso.*`. + +We cannot quite reproduce the issue locally, since every reboot will have exactly the same value for the files locally. + +Checking the state directory for virtuoso on tux04, however: + +``` +fredm@tux04:~$ sudo ls -al /export2/guix-containers/genenetwork/var/lib/virtuoso/ | grep '\.db$' +-rw-r--r-- 1 986 980 3787456512 Oct 28 14:16 js1b7qjpimdhfj870kg5b2dml640hryx-virtuoso.db +-rw-r--r-- 1 986 980 4152360960 Oct 28 17:11 rf8v0c6m6kn5yhf00zlrklhp5lmgpr4x-virtuoso.db +``` + +We see that there are multiple db files, each created when virtuoso was restarted. There is an extra (possibly) random string prepended to the `virtuoso.db` part. This happens for our service if we do not actually provide the `DatabaseFile` configuration. + + +## Fixes + +=> https://github.com/genenetwork/gn-gemtext-threads/commit/8211c1e49498ba2f3b578ed5b11b15c52299aa08 Document how to restart checkpointing and the scheduler after bulk loading +=> https://git.genenetwork.org/guix-bioinformatics/commit/?id=2dc335ca84ea7f26c6977e6b432f3420b113f0aa Add configs for scheduler and checkpointing +=> https://git.genenetwork.org/guix-bioinformatics/commit/?id=7d793603189f9d41c8ee87f8bb4c876440a1fce2 Set up virtuoso database configurations +=> https://git.genenetwork.org/gn-machines/commit/?id=46a1c4c8d01198799e6ac3b99998dca40d2c7094 Explicitly name virtuoso database files. diff --git a/issues/genenetwork2-account-registration-error.gmi b/issues/genenetwork2-account-registration-error.gmi index d617f93..14b6322 100644 --- a/issues/genenetwork2-account-registration-error.gmi +++ b/issues/genenetwork2-account-registration-error.gmi @@ -5,7 +5,7 @@ * type: bug * priority: critical * assigned: zachs, zsloan, fredm -* status: open +* status: closed, completed * keywords: genenetwork2, account management, user, registration ## Description diff --git a/issues/genenetwork2-cd-sometimes-fails-to-restart.gmi b/issues/genenetwork2-cd-sometimes-fails-to-restart.gmi index d2d2013..603de59 100644 --- a/issues/genenetwork2-cd-sometimes-fails-to-restart.gmi +++ b/issues/genenetwork2-cd-sometimes-fails-to-restart.gmi @@ -10,4 +10,7 @@ A reminder that CD logs are publicly accessible on tux02. => /topics/cd-logs ## Resolution + This issue has been re-opened. Originally, we believed that the restart failures were due to occasional breakage in GN code, and were not a problem with the CI/CD system itself. This will need further investigation to figure out what the root cause is. + +* closed diff --git a/issues/genenetwork2/broken-collections-features.gmi b/issues/genenetwork2/broken-collections-features.gmi new file mode 100644 index 0000000..4239929 --- /dev/null +++ b/issues/genenetwork2/broken-collections-features.gmi @@ -0,0 +1,44 @@ +# Broken Collections Features + +## Tags + +* type: bug +* status: open +* priority: high +* assigned: zachs, fredm +* keywords: gn2, genenetwork2, genenetwork 2, collections + +## Descriptions + +There are some features in the search results page, and/or the collections page that are broken — these are: + +* "CTL" feature +* "MultiMap" feature +* "Partial Correlations" feature +* "Generate Heatmap" feature + +### Reproduce Issue + +* Go to https://genenetwork.org +* Select "Mouse (Mus musculus, mm10) for "Species" +* Select "BXD Family" for "Group" +* Select "Traits and Cofactors" for "Type" +* Select "BXD Published Phenotypes" for "Dataset" +* Type "locomotion" in the "Get Any" field (without the quotes) +* Click "Search" +* In the results page, select the traits with the following "Record" values: "BXD_10050", "BXD_10051", "BXD_10088", "BXD_10091", "BXD_10092", "BXD_10455", "BXD_10569", "BXD_10570", "BXD_11316", "BXD_11317" +* Click the "Add" button and add them to a new collection +* In the resulting collections page, click the button for any of the listed failing features above + +### Failure modes + +* The "CTL" and "WCGNA" features have a failure mode that might have been caused by recent changes making use of AJAX calls, rather than submitting the form manually. +* The "MultiMap" and "Generate Heatmap" features raise exceptions that need to be investigated and resolved +* The "Partial Correlations" feature seems to run forever + +## Break-out Issues + +We break-out the issues above into separate pages to track the progress of the fixes for each feature separately. + +=> /issues/genenetwork3/ctl-maps-error +=> /issues/genenetwork3/generate-heatmaps-failing diff --git a/issues/genenetwork2/fix-display-for-time-consumed-for-correlations.gmi b/issues/genenetwork2/fix-display-for-time-consumed-for-correlations.gmi new file mode 100644 index 0000000..0c8e9c8 --- /dev/null +++ b/issues/genenetwork2/fix-display-for-time-consumed-for-correlations.gmi @@ -0,0 +1,15 @@ +# Fix Display for the Time Consumed for Correlations + +## Tags + +* type: bug +* status: closed, completed +* priority: low +* assigned: @alexm, @bonz +* keywords: gn2, genenetwork2, genenetwork 2, gn3, genenetwork3 genenetwork 3, correlations, time display + +## Description + +The breakdown of the time consumed for the correlations computations, displayed at the bottom of the page, is not representative of reality. The time that GeneNetwork3 (or background process) takes for the computations is not actually represented in the breakdown, leading to wildly inaccurate displays of total time. + +This will need to be fixed. diff --git a/issues/genenetwork/genenetwork2_configurations.gmi b/issues/genenetwork2/genenetwork2_configurations.gmi index 7d08db0..4ba0a89 100644 --- a/issues/genenetwork/genenetwork2_configurations.gmi +++ b/issues/genenetwork2/genenetwork2_configurations.gmi @@ -4,7 +4,7 @@ * assigned: fredm * priority: normal -* status: open +* status: closed, obsoleted * keywords: configuration, config, gn2, genenetwork, genenetwork2 * type: bug @@ -72,3 +72,10 @@ For `wqflask/run_gunicorn.py`, the route can remain as is, since this is an entr ### Non-Executable Configuration Files Eschew executable formats (*.py) for configuration files and prefer non-executable formats e.g. *.cfg, *.json, *.conf etc + + +## Closed as obsoleted + +I am closing this issue as obsoleted, since a lot of things have changed since this issue was set up. The `bin/genenetwork2` script no longer exists and most of the paths mentioned have changed. + +The configuration issue(s) mentioned above still abound, but the changes will have to be incremental to avoid breaking the system. diff --git a/issues/genenetwork2/haley-knott-regression-mapping-error.gmi b/issues/genenetwork2/haley-knott-regression-mapping-error.gmi new file mode 100644 index 0000000..25bb221 --- /dev/null +++ b/issues/genenetwork2/haley-knott-regression-mapping-error.gmi @@ -0,0 +1,80 @@ +# Haley-Knott Regression Mapping Error + +## Tags + +* type: bug +* status: closed, completed +* priority: high +* assigned: fredm +* keywords: gn2, genenetwork2, genenetwork 2, mapping, haley-knott + +## Description + +To run the mapping: + +* Do a search +* Click on any trait in the results +* On the trait page, expand the "Mapping Tools" section +* Select the "Haley-Knott Regression" option under "Mapping Tools" +* Click "Compute" + +On running the mapping as above, we got the following error: + +``` + GeneNetwork 2.11-rc2 https://gn2-fred.genenetwork.org/run_mapping ( 6:14AM UTC Sep 11, 2024) +Traceback (most recent call last): + File "/gnu/store/hgcvlkn4bjl0f9wqiakpk5w66brbfxk6-profile/lib/python3.10/site-packages/flask/app.py", line 1523, in full_dispatch_request + rv = self.dispatch_request() + File "/gnu/store/hgcvlkn4bjl0f9wqiakpk5w66brbfxk6-profile/lib/python3.10/site-packages/flask/app.py", line 1509, in dispatch_request + return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args) + File "/gnu/store/hgcvlkn4bjl0f9wqiakpk5w66brbfxk6-profile/lib/python3.10/site-packages/gn2/wqflask/views.py", line 1004, in mapping_results_page + gn1_template_vars = display_mapping_results.DisplayMappingResults( + File "/gnu/store/hgcvlkn4bjl0f9wqiakpk5w66brbfxk6-profile/lib/python3.10/site-packages/gn2/wqflask/marker_regression/display_mapping_results.py", line 651, in __init__ + self.perm_filename = self.drawPermutationHistogram() + File "/gnu/store/hgcvlkn4bjl0f9wqiakpk5w66brbfxk6-profile/lib/python3.10/site-packages/gn2/wqflask/marker_regression/display_mapping_results.py", line 3056, in drawPermutationHistogram + Plot.plotBar(myCanvas, perm_output, XLabel=self.LRS_LOD, + File "/gnu/store/hgcvlkn4bjl0f9wqiakpk5w66brbfxk6-profile/lib/python3.10/site-packages/gn2/utility/Plot.py", line 184, in plotBar + scaleFont = ImageFont.truetype(font=COUR_FILE, size=11) + File "/gnu/store/hgcvlkn4bjl0f9wqiakpk5w66brbfxk6-profile/lib/python3.10/site-packages/PIL/ImageFont.py", line 959, in truetype + return freetype(font) + File "/gnu/store/hgcvlkn4bjl0f9wqiakpk5w66brbfxk6-profile/lib/python3.10/site-packages/PIL/ImageFont.py", line 956, in freetype + return FreeTypeFont(font, size, index, encoding, layout_engine) + File "/gnu/store/hgcvlkn4bjl0f9wqiakpk5w66brbfxk6-profile/lib/python3.10/site-packages/PIL/ImageFont.py", line 247, in __init__ + self.font = core.getfont( +OSError: cannot open resource +``` + +### Hypothesis + +My hypothesis is that the use of relative paths[fn:1] is the cause of the failure. + +When running the application with the working directory being the root of the GeneNetwork2 repository, use of the relative paths works well. Unfortunately, that assumption breaks quickly if the application is ever run outside of the root of the GN2 repo. + +Verification: + +*Question*: Does the application run on root of GN2 repository/package? + +* Log out the path of the font file and use the results to answer the question +* https://github.com/genenetwork/genenetwork2/commit/ca8018a61f2e014b4aee4da2cbd00d7b591b2f6a +* https://github.com/genenetwork/genenetwork2/commit/01d56903ba01a91841d199fe393f9b307a7596a2 + +*Answer*: No! The application does not run with the working directory on the root of the GN2 repository/package, as evidenced by this snippet from the logs: + +``` +2024-09-11 07:41:13 [2024-09-11 07:41:13 +0000] [494] [DEBUG] POST /run_mapping +2024-09-11 07:41:18 [2024-09-11 07:41:18 +0000] [494] [DEBUG] Font file path: /gn2/wqflask/static/fonts/courbd.ttf +2024-09-11 07:41:18 DEBUG:gn2.wqflask:Font file path: /gn2/wqflask/static/fonts/courbd.ttf +2024-09-11 07:41:18 [2024-09-11 07:41:18 +0000] [494] [ERROR] https://gn2-fred.genenetwork.org/run_mapping ( 7:41AM UTC Sep 11, 2024) +2024-09-11 07:41:18 Traceback (most recent call last): +``` + +We see from this that the application seems to be running with the working directory being "/" rather than the root for the application's package files. + +### Fixes + +* https://github.com/genenetwork/genenetwork2/commit/d001c1e7cae8f69435545b8715038b1d0fc1ee62 +* https://git.genenetwork.org/guix-bioinformatics/commit/?id=7a1bf5bc1c3de67f01eabd23e1ddc0150f81b22b + +# Footnotes + +[fn:1] https://github.com/genenetwork/genenetwork2/blob/50fc0b4bc4106164745afc7e1099bb150f6e635f/gn2/utility/Plot.py#L44-L46 diff --git a/issues/genenetwork2/handle-oauth-errors-better.gmi b/issues/genenetwork2/handle-oauth-errors-better.gmi new file mode 100644 index 0000000..77ad7ad --- /dev/null +++ b/issues/genenetwork2/handle-oauth-errors-better.gmi @@ -0,0 +1,21 @@ +# Handle OAuth Errors Better + +## Tags + +* type: bug +* status: closed, completed +* priority: high +* assigned: fredm +* interested: zachs, robw +* keywords: gn2, genenetwork2, ui, user interface, oauth, oauth errors + +## Description + +When a session expires, for whatever reason, a notification is displayed to the user as shown in the image below: +=> ./session_expiry_oauth_error.png + +The message is a little jarring to the end user. Make it gentler, and probably more informative, so the user is not as surprised. + +## Close as complete + +This should be fixed at this point. Closing this as complete. diff --git a/issues/genenetwork2/mapping-error.gmi b/issues/genenetwork2/mapping-error.gmi new file mode 100644 index 0000000..7e7d0a7 --- /dev/null +++ b/issues/genenetwork2/mapping-error.gmi @@ -0,0 +1,66 @@ +# Mapping Error + +## Tags + +* type: bug +* status: closed +* priority: medium +* assigned: zachs, fredm, flisso +* keywords: gn2, genenetwork2, genenetwork 2, mapping + +## Reproduction + +* Go to https://staging.genenetwork.org/ +* For 'Species' select "Arabidopsis (Arabidopsis thaliana, araTha1)" +* For 'Group' select "BayXSha(RIL by sib-mating)" +* For 'Type' select "arabidopsis seeds" +* For 'Dataset' select "Arabidopsis BayXShaXRIL_expr_reg _ATH1" +* Leave 'Get Any' blank +* Enter "*" for "Combined" +* Click "Search" +* On the search results page, click on "AT1G01010" +* Expand the "Mapping Tools" section +* For 'Chromosome' select "All" +* For 'Minor Allele ≥' enter "0.05" +* For 'Use LOCO' select "Yes" +* Ignore covariates +* Click "Compute" + +### Expected + +The system would compute the maps and display the mapping diagram(s) and data. + +### Actual + +The computation fails with: + +``` + GeneNetwork 2.11-rc2 https://staging.genenetwork.org/loading ( 6:50PM UTC Jul 03, 2024) +Traceback (most recent call last): + File "/gnu/store/jsvqai0gz6fn40k7kx3r12yq4hzfini6-profile/lib/python3.10/site-packages/flask/app.py", line 1523, in full_dispatch_request + rv = self.dispatch_request() + File "/gnu/store/jsvqai0gz6fn40k7kx3r12yq4hzfini6-profile/lib/python3.10/site-packages/flask/app.py", line 1509, in dispatch_request + return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args) + File "/gnu/store/jsvqai0gz6fn40k7kx3r12yq4hzfini6-profile/lib/python3.10/site-packages/gn2/wqflask/views.py", line 812, in loading_page + for sample in samples: +TypeError: 'NoneType' object is not iterable +``` + +### Updates + +This is likely just because the genotype file doesn't exist in the necessary format (BIMBAM). We probably need to convert the R/qtl2 genotypes to BIMBAM. + +## Stalled + +This is currently stalled, until we can upload genotypes via the uploader. + + +## Notes + +### 2025-12-31 + +I am closing this issue as WONTFIX because of the following reasons: + +- Better fix is to prevent mapping in the first place, if no genotypes exist for the given trait(s) +- Issue relies on non-implemented feature (Genotypes upload) to fix it +- Issue does not exist on production diff --git a/issues/genenetwork2/mechanical-rob-add-partial-correlations-tests.gmi b/issues/genenetwork2/mechanical-rob-add-partial-correlations-tests.gmi new file mode 100644 index 0000000..e38f653 --- /dev/null +++ b/issues/genenetwork2/mechanical-rob-add-partial-correlations-tests.gmi @@ -0,0 +1,22 @@ +# mechanical-rob: Add Partial Correlations Tests + +## Tags + +* assigned: fredm +* priority: medium +* status: open +* keywords: genenetwork2, gn2, mechanical-rob, partial correlations, tests, regression +* type: enhancement + +## Description + +Add regression tests to verify that the partial correlations feature still works +as expected. + +### TODOS + +- [-] Tests for "entry-point" page +- [x] Tests for partial correlation using Pearson's R against select traits +- [ ] Tests for partial correlation using Spearman's Rho against select traits +- [ ] Tests for partial correlation using Pearson's R against an entire dataset +- [ ] Tests for partial correlation using Spearman's Rho against an entire dataset diff --git a/issues/genenetwork2/refresh-token-failure.gmi b/issues/genenetwork2/refresh-token-failure.gmi new file mode 100644 index 0000000..c488820 --- /dev/null +++ b/issues/genenetwork2/refresh-token-failure.gmi @@ -0,0 +1,111 @@ +# Refresh Token Failure + +## Tags + +* status: closed, obsoleted +* priority: high +* type: bug +* assigned: fredm, zsloan, zachs +* keywords: gn2, genenetwork2 + +## Description + +* Go to https://genenetwork.org +* Click "Sign in" and sign in to the application +* Wait 15 minutes +* Close the entire browser +* Open the browser and go to https://genenetwork.org +* Observe the "ERROR" message at the "Collections" link's badge + +The expectation is that the Collections badge would list the number of collection the user has, rather than the error message. + +The logs fail with an 'invalid_client' error: + +``` +2025-01-08 20:48:56 raise self.oauth_error_class( +2025-01-08 20:48:56 authlib.integrations.base_client.errors.OAuthError: invalid_client: +2025-01-08 20:48:56 ERROR:gn2.wqflask:Error loading number of collections +2025-01-08 20:48:56 Traceback (most recent call last): +2025-01-08 20:48:56 File "/gnu/store/3n1cl5cxal3qk7p9q363qgm2ag45a177-profile/lib/python3.10/site-packages/gn2/wqflask/__init__.py", +line 55, in numcoll +2025-01-08 20:48:56 return num_collections() +2025-01-08 20:48:56 File "/gnu/store/3n1cl5cxal3qk7p9q363qgm2ag45a177-profile/lib/python3.10/site-packages/gn2/wqflask/oauth2/collect +ions.py", line 13, in num_collections +2025-01-08 20:48:56 all_collections = all_collections + oauth2_get( +2025-01-08 20:48:56 File "/gnu/store/3n1cl5cxal3qk7p9q363qgm2ag45a177-profile/lib/python3.10/site-packages/gn2/wqflask/oauth2/client. +py", line 168, in oauth2_get +2025-01-08 20:48:56 resp = oauth2_client().get( +2025-01-08 20:48:56 File "/gnu/store/3n1cl5cxal3qk7p9q363qgm2ag45a177-profile/lib/python3.10/site-packages/requests/sessions.py", lin +e 600, in get +2025-01-08 20:48:56 return self.request("GET", url, **kwargs) +2025-01-08 20:48:56 File "/gnu/store/3n1cl5cxal3qk7p9q363qgm2ag45a177-profile/lib/python3.10/site-packages/authlib/integrations/reque +sts_client/oauth2_session.py", line 109, in request +2025-01-08 20:48:56 return super(OAuth2Session, self).request( +2025-01-08 20:48:56 File "/gnu/store/3n1cl5cxal3qk7p9q363qgm2ag45a177-profile/lib/python3.10/site-packages/requests/sessions.py", lin +e 573, in request +2025-01-08 20:48:56 prep = self.prepare_request(req) +2025-01-08 20:48:56 File "/gnu/store/3n1cl5cxal3qk7p9q363qgm2ag45a177-profile/lib/python3.10/site-packages/requests/sessions.py", lin +e 484, in prepare_request +2025-01-08 20:48:56 p.prepare( +2025-01-08 20:48:56 File "/gnu/store/3n1cl5cxal3qk7p9q363qgm2ag45a177-profile/lib/python3.10/site-packages/requests/models.py", line +372, in prepare +2025-01-08 20:48:56 self.prepare_auth(auth, url) +2025-01-08 20:48:56 File "/gnu/store/3n1cl5cxal3qk7p9q363qgm2ag45a177-profile/lib/python3.10/site-packages/requests/models.py", line +603, in prepare_auth +2025-01-08 20:48:56 r = auth(self) +2025-01-08 20:48:56 File "/gnu/store/3n1cl5cxal3qk7p9q363qgm2ag45a177-profile/lib/python3.10/site-packages/authlib/integrations/reque +sts_client/oauth2_session.py", line 24, in __call__ +2025-01-08 20:48:56 self.ensure_active_token() +2025-01-08 20:48:56 File "/gnu/store/3n1cl5cxal3qk7p9q363qgm2ag45a177-profile/lib/python3.10/site-packages/authlib/integrations/reque +sts_client/oauth2_session.py", line 20, in ensure_active_token +2025-01-08 20:48:56 if self.client and not self.client.ensure_active_token(self.token): +2025-01-08 20:48:56 File "/gnu/store/3n1cl5cxal3qk7p9q363qgm2ag45a177-profile/lib/python3.10/site-packages/authlib/oauth2/client.py", + line 262, in ensure_active_token +2025-01-08 20:48:56 self.refresh_token(url, refresh_token=refresh_token) +2025-01-08 20:48:56 File "/gnu/store/3n1cl5cxal3qk7p9q363qgm2ag45a177-profile/lib/python3.10/site-packages/authlib/oauth2/client.py", + line 252, in refresh_token +2025-01-08 20:48:56 return self._refresh_token( +2025-01-08 20:48:56 File "/gnu/store/3n1cl5cxal3qk7p9q363qgm2ag45a177-profile/lib/python3.10/site-packages/authlib/oauth2/client.py", + line 373, in _refresh_token +2025-01-08 20:48:56 token = self.parse_response_token(resp) +2025-01-08 20:48:56 File "/gnu/store/3n1cl5cxal3qk7p9q363qgm2ag45a177-profile/lib/python3.10/site-packages/authlib/oauth2/client.py", + line 340, in parse_response_token +2025-01-08 20:48:56 raise self.oauth_error_class( +2025-01-08 20:48:56 authlib.integrations.base_client.errors.OAuthError: invalid_client: +``` + + +### Troubleshooting + +The following commits were done as part of the troubleshooting: + +=> https://github.com/genenetwork/genenetwork2/commit/55da5809d851a3c8bfa13637947b019a2c02cc93 +=> https://git.genenetwork.org/guix-bioinformatics/commit/?id=d1cada0f0933732eb68b7786fb04ea541d8c51c9 +=> https://github.com/genenetwork/genenetwork2/commit/93dd7f7583af4e0bdd3c7b9c88d375fdc4b40039 +=> https://git.genenetwork.org/guix-bioinformatics/commit/?id=5fe04ca1545f740cbb91474576891c7fd1dff13a +=> https://github.com/genenetwork/genenetwork2/commit/2031da216f3b62c23dca64eb6d1c533c07dc81f1 +=> https://github.com/genenetwork/genenetwork2/commit/125c436f5310b194c10385ce9d81135518ac0adf +=> https://git.genenetwork.org/guix-bioinformatics/commit/?id=758e6f0fbf6af4af5b94b9aa5a9264c31f050153 +=> https://github.com/genenetwork/genenetwork2/commit/8bf483a3ab23ebf25d73380e78271c368ff06b2d +=> https://git.genenetwork.org/guix-bioinformatics/commit/?id=f1ee97a17e670b12112d48bea8969e2ee162f808 +=> https://github.com/genenetwork/genenetwork2/commit/de01f83090184fc56dce2f9887d2dc910edc60fe +=> https://github.com/genenetwork/genenetwork2/commit/91017b97ee346e73bed9b77e3f3f72daa4acbacd +=> https://github.com/genenetwork/genenetwork2/commit/7e6bfe48167c70d26e27b043eb567608bc1fda84 +=> https://git.genenetwork.org/guix-bioinformatics/commit/?id=1f71a1e78af87266e7a4170ace8860111a1569d6 +=> https://github.com/genenetwork/genenetwork2/commit/9bdc8ca0b17739c1df9dc504f8cd978296b987dd +=> https://git.genenetwork.org/guix-bioinformatics/commit/?id=02a9a99e7e3c308157f7d740a244876ab4196337 +=> https://github.com/genenetwork/genenetwork2/commit/236a48835dc6557ba0ece6aef6014f496ddb163e +=> https://git.genenetwork.org/guix-bioinformatics/commit/?id=f928be361d2e331d72448416300c331e47341807 +=> https://github.com/genenetwork/genenetwork2/commit/5fb56c51ad4eaff13a7e24b6022dffb7d82aa41d +=> https://github.com/genenetwork/genenetwork2/commit/c6c9ef71718d650f9c19ae459d6d4e25e72de00a +=> https://github.com/genenetwork/genenetwork2/commit/dc606f39fb4aad74004959a6a15e481fa74d52ff +=> https://git.genenetwork.org/guix-bioinformatics/commit/?id=4ab597b734968916af5bae6332756af8168783b3 +=> https://github.com/genenetwork/genenetwork2/commit/854639bd46293b6791c629591fd934d1f34038ac +=> https://git.genenetwork.org/guix-bioinformatics/commit/?id=7e0083555150d151e566cebed4bd82d69e347eb6 +=> https://github.com/genenetwork/genenetwork2/commit/c4508901027a2d3ea98e1e9b3f8767a455cad02f +=> https://git.genenetwork.org/guix-bioinformatics/commit/?id=955e4ce9370be9811262d7c73fa5398385cc04d8 + + +# Closed as Obsolete + +We no longer rely on refresh tokens. This issue is no longer present. diff --git a/issues/genenetwork2/remove-bin-genenetwork2-script.gmi b/issues/genenetwork2/remove-bin-genenetwork2-script.gmi new file mode 100644 index 0000000..da11be7 --- /dev/null +++ b/issues/genenetwork2/remove-bin-genenetwork2-script.gmi @@ -0,0 +1,114 @@ +# Remove `bin/genenetwork2` Script + +## Tags + +* type: improvement +* status: closed, completed +* priority: medium +* assigned: fredm, bonfacem, alexm, zachs +* interested: pjotrp, aruni +* keywords: gn2, bin/genenetwork2, startup script + +## Description + +The `bin/genenetwork2` script was used for a really long time to launch Genenetwork2, and has served that purpose with honour and dedication. We applaud that. + +It is, however, time to retire the script, since at this point in time, it serves more to obfuscate the startup that as a helpful tool. + +On production, we have all but abandoned the use of the script, and we need to do the same for CI/CD, and eventually, development. + +This issue tracks the process, and problems that come up during the move to retire the script. + +### Process + +* [x] Identify how to run unit tests without the script +* [x] Document how to run unit tests without the script +* [x] Identify how to run mechanical-rob tests without the script +* [x] Document how to run mechanical-rob tests without the script +* [x] Update CI/CD definitions to get rid of the references to the script +* [x] Delete the script from the repository + +### Setup + +First, we need to setup the following mandatory environment variables: + +* GN2_PROFILE +* GN2_SETTINGS +* JS_GUIX_PATH +* GEMMA_COMMAND +* PLINK_COMMAND +* GEMMA_WRAPPER_COMMAND +* REQUESTS_CA_BUNDLE + +Within a guix shell, you could do that with something like: + +``` +export GN2_PROFILE="${GUIX_ENVIRONMENT}" +export GN2_SETTINGS="/home/frederick/genenetwork/gn2_settings.conf" +export JS_GUIX_PATH="${GN2_PROFILE}/share/genenetwork2/javascript" +export GEMMA_COMMAND="${GN2_PROFILE}/bin/gemma" +export PLINK_COMMAND="${GN2_PROFILE}/bin/plink2" +export GEMMA_WRAPPER_COMMAND="${GN2_PROFILE}/bin/gemma-wrapper" +export REQUESTS_CA_BUNDLE="${GUIX_ENVIRONMENT}/etc/ssl/certs/ca-certificates.crt" +``` + +Note that, you can define all the variables derived from "GN2_PROFILE" in your settings file, if such a settings file is computed. + +### Running Unit Tests + +To run unit tests, run pytest at the root of the repository. + +``` +$ cd /path/to/genenetwork2 +$ pytest +``` + +### Running "mechanical-rob" Tests + +At the root of the repository, run something like: + +``` +python test/requests/test-website.py --all http://localhost:5033 +``` + +Change the port, as appropriate. + + +### Launching Application + +In addition to the minimum set of envvars defined in the "Setup" section above, we need the following variables defined to get the application to launch: + +* FLASK_APP + +In a guix shell, you could do: + +``` +export FLASK_APP="gn2.wsgi" +``` + +Now you can launch the application with flask with something like: + +``` +flask run --port=5033 --with-threads +``` + +or with green unicorn with something like: + +``` +gunicorn --reload \ + --workers 3 \ + --timeout 1200 \ + --log-level="debug" \ + --keep-alive 6000 \ + --max-requests 10 \ + --bind="127.0.0.1:5033" \ + --max-requests-jitter 5 \ + gn2.wsgi:application +``` + +You can change the gunicorn setting to fit your scenario. + + +## Close as completed + +The script has been deleted. diff --git a/issues/genenetwork2/session_expiry_oauth_error.png b/issues/genenetwork2/session_expiry_oauth_error.png new file mode 100644 index 0000000..34e2dda --- /dev/null +++ b/issues/genenetwork2/session_expiry_oauth_error.png Binary files differdiff --git a/issues/genenetwork3/01828928-26e6-4cad-bbc8-59fd7a7977de.json.zip b/issues/genenetwork3/01828928-26e6-4cad-bbc8-59fd7a7977de.json.zip new file mode 100644 index 0000000..7681b88 --- /dev/null +++ b/issues/genenetwork3/01828928-26e6-4cad-bbc8-59fd7a7977de.json.zip Binary files differdiff --git a/issues/genenetwork3/broken-aliases.gmi b/issues/genenetwork3/broken-aliases.gmi new file mode 100644 index 0000000..2bfbdae --- /dev/null +++ b/issues/genenetwork3/broken-aliases.gmi @@ -0,0 +1,188 @@ +# Broken Aliases + +## Tags + +* type: bug +* status: open +* priority: high +* assigned: pjotrp +* interested: pjotrp +* keywords: aliases, aliases server + +## Tasks + +* [X] Rewrite server in gn-guile +* [X] Fix menu search +* [X] Fix global search aliases +* [ ] Deploy and test aliases in GN2 + +## Repository + +=> https://github.com/genenetwork/gn3 + +moved to + +gn-guile repo. + +## Bug Report + +### Actual + +* Go to https://genenetwork.org/gn3/gene/aliases2/Shh,Brca2 +* Note that an exception is raised, with a "404 Not Found" message + +### Expected + +* We expected a list of aliases to be returned for the given symbols as is done in https://fallback.genenetwork.org/gn3/gene/aliases2/Shh,Brca2 + +## Resolution + +Actually the server is up, but it is not part of the main deployment because it is written in Racket - and we don't have much support in Guix. I wrote the code the days after my bike accident: + +=> https://github.com/genenetwork/gn3/blob/master/gn3/web/wikidata.rkt + +and it is probably easiest to move it to gn-guile. Guile is another Scheme after all ;). Only fitting I spent days in hospital only recently (for a different reason). gn-guile already has its own web server and provides a REST API for our markdown editor, for example. On tux04 it responds with + +``` +curl http://127.0.0.1:8091/version +"4.0.0" +``` + +What we want is to add the aliases server that should respond to + +``` +curl http://localhost:8000/gene/aliases/Shh # direct on tux01 +["9530036O11Rik","Dsh","Hhg1","Hx","Hxl3","M100081","ShhNC","ShhNC"] +curl https://genenetwork.org/gn3/gene/aliases2/Shh,Brca2 +[["Shh",["9530036O11Rik","Dsh","Hhg1","Hx","Hxl3","M100081","ShhNC","ShhNC"]],["Brca2",["Fancd1","RAB163"]]] +``` + +Note this is used by search functionality in GN, as well as the gene aliases list on the mapping page. In principle we cache it for the duration of the running server so as not to overload wikidata. No one uses aliases2, that I can tell, so we only implement the first 'aliases'. + +Note the wikidata interface has been stable all this time. That is good. + +Turns out we already use wikidata in the gn-guile implementation for fetching the wikidata id for a species (as part of metadata retrieval). I wrote that about two years ago as part of the REST API expansion. + +Unfortunately + +``` +(sparql-scm (wd-sparql-endpoint-url) (wikidata-gene-alias "Q24420953")) +``` + +throws a 403 forbidden error. + +This however works: + +``` +scheme@(gn db sparql) [15]> (sparql-wd-species-info "Q83310") +;;; ("https://query.wikidata.org/sparql?query=%0ASELECT%20DISTINCT%20%3Ftaxon%20%3Fncbi%20%3Fdescr%20where%20%7B%0A%20%20%20%20wd%3AQ83310%20wdt%3AP225%20%3Ftaxon%20%3B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20wdt%3AP685%20%3Fncbi%20%3B%0A%20%20%20%20%20%20schema%3Adescription%20%3Fdescr%20.%0A%20%20%20%20%3Fspecies%20wdt%3AP685%20%3Fncbi%20.%0A%20%20%20%20FILTER%20%28lang%28%3Fdescr%29%3D%27en%27%29%0A%7D%20limit%205%0A%0A") +$11 = "?taxon\t?ncbi\t?descr\n\"Mus musculus\"\t\"10090\"\t\"species of mammal\"@en\n" +``` + +(if you can see the mouse ;). + +Ah, this works + +``` +scheme@(gn db sparql) [17]> (sparql-tsv (wd-sparql-endpoint-url) (wikidata-query-geneids "Shh" )) +;;; ("https://query.wikidata.org/sparql?query=SELECT%20DISTINCT%20%3Fwikidata_id%0A%20%20%20%20%20%20%20%20%20%20%20%20WHERE%20%7B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%3Fwikidata_id%20wdt%3AP31%20wd%3AQ7187%3B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20wdt%3AP703%20%3Fspecies%20.%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20VALUES%20%28%3Fspecies%29%20%7B%20%28wd%3AQ15978631%20%29%20%28%20wd%3AQ83310%20%29%20%28%20wd%3AQ184224%20%29%20%7D%20.%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%3Fwikidata_id%20rdfs%3Alabel%20%22Shh%22%40en%20.%0A%20%20%20%20%20%20%20%20%7D%0A") +$12 = "?wikidata_id\n<http://www.wikidata.org/entity/Q14860079>\n<http://www.wikidata.org/entity/Q24420953>\n" +``` + +But this does not + +``` +scheme@(gn db sparql) [17]> (sparql-scm (wd-sparql-endpoint-url) (wikidata-query-geneids "Shh" )) +ice-9/boot-9.scm:1685:16: In procedure raise-exception: +In procedure utf8->string: Wrong type argument in position 1 (expecting bytevector): "<html>\r\n<head><title>403 Forbidden</title></head>\r\n<body>\r\n<center><h1>403 Forbidden</h1></center>\r\n<hr><center>nginx/1.18.0</center>\r\n</body>\r\n</html>\r\n" +``` + +Going via tsv does work + +``` +scheme@(gn db sparql) [18]> (tsv->scm (sparql-tsv (wd-sparql-endpoint-url) (wikidata-query-geneids "Shh" ))) + +;;; ("https://query.wikidata.org/sparql?query=SELECT%20DISTINCT%20%3Fwikidata_id%0A%20%20%20%20%20%20%20%20%20%20%20%20WHERE%20%7B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%3Fwikidata_id%20wdt%3AP31%20wd%3AQ7187%3B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20wdt%3AP703%20%3Fspecies%20.%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20VALUES%20%28%3Fspecies%29%20%7B%20%28wd%3AQ15978631%20%29%20%28%20wd%3AQ83310%20%29%20%28%20wd%3AQ184224%20%29%20%7D%20.%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%3Fwikidata_id%20rdfs%3Alabel%20%22Shh%22%40en%20.%0A%20%20%20%20%20%20%20%20%7D%0A") +$13 = ("?wikidata_id") +$14 = (("<http://www.wikidata.org/entity/Q14860079>") ("<http://www.wikidata.org/entity/Q24420953>")) +``` + +that is nice enough. + +We now got a working alias server that is part of gn-guile. E.g. + +``` +curl http://127.0.0.1:8091/gene/aliases/Brca2 +["breast cancer 2","breast cancer 2, early onset","Fancd1","RAB163","BRCA2, DNA repair associated"] +``` + +it is part of gn-guile. gn-guile also has the 'commit/' handler by Alex, documented as +'curl -X POST http://127.0.0.1:8091/commit' in git-markdown-editor.md. Let's see how that is wired up. The web interface is at, for example, +https://genenetwork.org/editor/edit?file-path=general/help/facilities.md. Part of gn2's + +``` +gn2/wqflask/views.py +398:@app.route("/editor/edit", methods=["GET"]) +408:@app.route("/editor/settings", methods=["GET"]) +414:@app.route("/editor/commit", methods=["GET", "POST"]) +``` + +which has the code + +``` +@app.route("/editor/edit", methods=["GET"]) +@require_oauth2 +def edit_gn_doc_file(): + file_path = urllib.parse.urlencode( + {"file_path": request.args.get("file-path", "")}) + response = requests.get(f"http://localhost:8091/edit?{file_path}") + response.raise_for_status() + return render_template("gn_editor.html", **response.json()) +``` + +Running over localhost. This is unfortunately hard coded, and we should change that! In guix system +configuration it is already a variable as 'genenetwork-configuration-gn-guile-port 8091'. gn-guile should also be visible from outside, so that is a separate configuration. + +Also I note that the mapping page does three requests to wikidata (for mouse, rat and human). That could really be one. + +# Search + +Aliases are also used in search. You can tell when GN search renders too few results that aliases are not used. When aliases work we expect to list '2310010I16Rik' with + +=> https://genenetwork.org/search?species=mouse&group=BXD&type=Hippocampus+mRNA&dataset=HC_M2_0606_P&search_terms_or=sh*&search_terms_and=&FormID=searchResult + +Sheepdog tests for that and it has been failing for a while. + +Global search finds way more results, but also lacks that alias! Meanwhile GN1 does find that alias for record 1431728_at. GN2 finds it with hippocampus mRNA + +=> https://genenetwork.org/search?species=mouse&group=BXD&type=Hippocampus+mRNA&dataset=HC_M2_0606_P&search_terms_or=1431728_at%0D%0A&search_terms_and=&accession_id=None&FormID=searchResult + +in standard search. +But neither 1431728_at or '2310010I16Rik' has a hit in *global* search and the result for Ssh should include the record in both search systems. + +# Deploy + +We introduced a new environment variable that does not show up on CD, part of the mapping page: + +=> + +In the logs on /export2: + +``` +root@tux02:/export2/guix-containers/genenetwork-development/var/log/cd# tail -100 genenetwork2.log +2025-07-20 04:19:43 File "/genenetwork2/gn2/base/trait.py", line 157, in wikidata_alias_fmt +2025-07-20 04:19:43 GN_GUILE_SERVER_URL + "gene/aliases/" + self.symbol.upper()) +2025-07-20 04:19:43 NameError: name 'GN_GUILE_SERVER_URL' is not defined +``` + +One thing I ran into is http://genenetwork.org/gn3-proxy/ - what is that for? + +## Deploy Updates: 2025-08-15 +=> https://git.genenetwork.org/guix-bioinformatics/commit/?id=269f99f1e1f0c253ecdd99f04bc7c6697012b0aa Update commit of gn-guile used on production + +This does not fix the issue on https://gn2-fred.genenetwork.org/show_trait?trait_id=1427571_at&dataset=HC_M2_0606_P, instead we get + +``` +fredm@tux04:~$ curl http://localhost:8091/gene/aliases/Brca2 +Resource not found: /gene/aliases/Brca2 +``` diff --git a/issues/genenetwork3/check-for-mandatory-settings.gmi b/issues/genenetwork3/check-for-mandatory-settings.gmi new file mode 100644 index 0000000..16a2f8a --- /dev/null +++ b/issues/genenetwork3/check-for-mandatory-settings.gmi @@ -0,0 +1,40 @@ +# Check for Mandatory Settings + +## Tags + +* status: open +* priority: high +* type: bug, improvement +* interested: fredm, bonz +* assigned: jnduli, rookie101 +* keywords: GN3, gn3, genenetwork3, settings, config, configs, configurations + +## Explanation + +Giving defaults to some important settings leads to situations where the correct configuration is not set up correctly leading at best to failure, and at worst, to subtle failures that can be difficult to debug: e.g. When a default URI to a server points to an active domain, just not the correct one. + +We want to make such (arguably, sensitive) configurations explicit, and avoid giving them defaults. We want to check that they are set up before allowing the application to run, and fail loudly and obnoxiously if they are not provided. + +Examples of configuration variables that should be checked for: + +* All external URIs (external to app/repo under consideration) +* All secrets (secret keys, salts, tokens, etc) + +We should also eliminate from the defaults: + +* Computed values +* Calls to get values from ENVVARs (`os.environ.get(…)` calls) + +### Note on ENVVARs + +The environment variables should be used for overriding values under specific conditions, therefore, it should both be explicit and the last thing loaded to ensure they actually override settings. + +=> https://git.genenetwork.org/gn-auth/tree/gn_auth/__init__.py?id=3a276642bea934f0a7ef8f581d8639e617357a2a#n70 See this example for a possible way of allowing ENVVARs to override settings. + +The example above could be improved by maybe checking for environment variables starting with a specific value, e.g. the envvar `GNAUTH_SECRET_KEY` would override the `SECRET_KEY` configuration. This allows us to override settings without having to change the code. + +## Tasks + +* [ ] Explicitly check configs for ALL external URIs +* [ ] Explicitly check configs for ALL secrets +* [ ] Explicitly load ENVVARs last to override settings diff --git a/issues/genenetwork3/ctl-maps-error.gmi b/issues/genenetwork3/ctl-maps-error.gmi new file mode 100644 index 0000000..6726357 --- /dev/null +++ b/issues/genenetwork3/ctl-maps-error.gmi @@ -0,0 +1,46 @@ +# CTL Maps Error + +## Tags + +* type: bug +* status: open +* priority: high +* assigned: alexm, zachs, fredm +* keywords: CTL, CTL Maps, gn3, genetwork3, genenetwork 3 + +## Description + +Trying to run the CTL Maps feature in the collections page as described in +=> /issues/genenetwork2/broken-collections-feature + +We get an error in the results page of the form: + +``` +{'error': '{\'code\': 1, \'output\': \'Loading required package: MASS\\nLoading required package: parallel\\nLoading required package: qtl\\nThere were 13 warnings (use warnings() to see them)\\nError in xspline(x, y, shape = 0, lwd = lwd, border = col, lty = lty, : \\n invalid value specified for graphical parameter "lwd"\\nCalls: ctl.lineplot -> draw.spline -> xspline\\nExecution halted\\n\'}'} +``` + +on the CLI the same error is rendered: +``` +Loading required package: MASS +Loading required package: parallel +Loading required package: qtl +There were 13 warnings (use warnings() to see them) +Error in xspline(x, y, shape = 0, lwd = lwd, border = col, lty = lty, : + invalid value specified for graphical parameter "lwd" +Calls: ctl.lineplot -> draw.spline -> xspline +Execution halted +``` + +On my local development machine, the command run was +``` +Rscript /home/frederick/genenetwork/genenetwork3/scripts/ctl_analysis.R /tmp/01828928-26e6-4cad-bbc8-59fd7a7977de.json +``` + +Here is a zipped version of the json file (follow the link and click download): +=> https://github.com/genenetwork/gn-gemtext-threads/blob/main/issues/genenetwork3/01828928-26e6-4cad-bbc8-59fd7a7977de.json.zip + +Troubleshooting a while, I suspect +=> https://github.com/genenetwork/genenetwork3/blob/27d9c9d6ef7f37066fc63af3d6585bf18aeec925/scripts/ctl_analysis.R#L79-L80 this is the offending code. + +=> https://cran.r-project.org/web/packages/ctl/ctl.pdf The manual for the ctl library +indicates that our call above might be okay, which might mean something changed in the dependencies that the ctl library used. diff --git a/issues/genenetwork/genenetwork3_configuration.gmi b/issues/genenetwork3/genenetwork3_configuration.gmi index fcab572..cdd7c15 100644 --- a/issues/genenetwork/genenetwork3_configuration.gmi +++ b/issues/genenetwork3/genenetwork3_configuration.gmi @@ -1,10 +1,10 @@ -# Genenetwork2 Configurations +# Genenetwork3 Configurations ## Tags * assigned: fredm * priority: normal -* status: open +* status: closed, completed * keywords: configuration, config, gn2, genenetwork, genenetwork2 * type: bug @@ -13,3 +13,7 @@ The configuration file should only ever contain settings, and no code. Remove all code from the default settings file. Eschew executable formats (*.py) for configuration files and prefer non-executable formats e.g. *.cfg, *.json, *.conf etc + +## Closed as Completed + +See commit https://github.com/genenetwork/genenetwork3/commit/977efbb54da284fb3e8476f200206d00cb8e64cd diff --git a/issues/genenetwork3/generate-heatmaps-failing.gmi b/issues/genenetwork3/generate-heatmaps-failing.gmi new file mode 100644 index 0000000..522dc27 --- /dev/null +++ b/issues/genenetwork3/generate-heatmaps-failing.gmi @@ -0,0 +1,64 @@ +# Generate Heatmaps Failing + +## Tags + +* type: bug +* status: open +* priority: medium +* assigned: fredm, zachs, zsloan +* keywords: genenetwork3, gn3, GN3, heatmaps + +## Reproduce + +* Go to https://genenetwork.org/ +* Under "Select and Search" menu, enter "synap*" for the "Get Any" field +* Click "Search" +* In search results page, select first 10 traits +* Click "Add" +* Under "Create a new collection" enter the name "newcoll" and click "Create collection" +* In the collections page that shows up, click "Select All" once +* Ensure all the traits are selected +* Click "Generate Heatmap" and wait +* Note how system fails silently with no heatmap presented + +### Notes + +On https://gn2-fred.genenetwork.org the heatmaps fails with a note ("ERROR: undefined"). In the logs, I see "Module 'scipy' has no attribute 'array'" which seems to be due to a change in numpy. +=> https://github.com/MaartenGr/BERTopic/issues/1791 +=> https://github.com/scipy/scipy/issues/19972 + +This issue should not be present with python-plotly@5.20.0 but since guix-bioinformatics pins the guix version to `b0b988c41c9e0e591274495a1b2d6f27fcdae15a`, we are not able to pull in newer versions of packages from guix. + + +### Update 2025-04-08T10:59CDT + +Got the following error when I ran the background command manually: + +``` +$ export RUST_BACKTRACE=full +$ /gnu/store/dp4zq4xiap6rp7h6vslwl1n52bd8gnwm-profile/bin/qtlreaper --geno /home/frederick/genotype_files/genotype/genotype/BXD.geno --n_permutations 1000 --traits /tmp/traits_test_file_n2E7V06Cx7.txt --main_output /tmp/qtlreaper/main_output_NGVW4sfYha.txt --permu_output /tmp/qtlreaper/permu_output_MJnzLbrsrC.txt +thread 'main' panicked at src/regression.rs:216:25: +index out of bounds: the len is 20 but the index is 20 +stack backtrace: + 0: 0x61399d77d46d - <unknown> + 1: 0x61399d7b5e13 - <unknown> + 2: 0x61399d78b649 - <unknown> + 3: 0x61399d78f26f - <unknown> + 4: 0x61399d78ee98 - <unknown> + 5: 0x61399d78f815 - <unknown> + 6: 0x61399d77d859 - <unknown> + 7: 0x61399d77d679 - <unknown> + 8: 0x61399d78f3f4 - <unknown> + 9: 0x61399d6f4063 - <unknown> + 10: 0x61399d6f41f7 - <unknown> + 11: 0x61399d708f18 - <unknown> + 12: 0x61399d6f6e4e - <unknown> + 13: 0x61399d6f9e93 - <unknown> + 14: 0x61399d6f9e89 - <unknown> + 15: 0x61399d78e505 - <unknown> + 16: 0x61399d6f8d55 - <unknown> + 17: 0x75ee2b945bf7 - __libc_start_call_main + 18: 0x75ee2b945cac - __libc_start_main@GLIBC_2.2.5 + 19: 0x61399d6f4861 - <unknown> + 20: 0x0 - <unknown> +``` diff --git a/issues/genenetwork3/rqtl2-mapping-error.gmi b/issues/genenetwork3/rqtl2-mapping-error.gmi new file mode 100644 index 0000000..b43d66f --- /dev/null +++ b/issues/genenetwork3/rqtl2-mapping-error.gmi @@ -0,0 +1,46 @@ +# R/qtl2 Maps Error + +## Tags + +* type: bug +* status: closed, completed +* priority: high +* assigned: alexm, zachs, fredm +* keywords: R/qtl2, R/qtl2 Maps, gn3, genetwork3, genenetwork 3 + +## Reproduce + +* Go to https://genenetwork.org/ +* In the "Get Any" field, enter "synap*" and press the "Enter" key +* In the search results, click on the "1435464_at" trait +* Expand the "Mapping Tools" accordion section +* Select the "R/qtl2" option +* Click "Compute" +* In the "Computing the Maps" page that results, click on "Display System Log" + +### Observed + +A traceback is observed, with an error of the following form: + +``` +⋮ +FileNotFoundError: [Errno 2] No such file or directory: '/opt/gn/tmp/gn3-tmpdir/JL9PvKm3OyKk.txt' +``` + +### Expected + +The mapping runs successfully and the results are presented in the form of a mapping chart/graph and a table of values. + +### Debug Notes + +The directory "/opt/gn/tmp/gn3-tmpdir/" exists, and is actually used by other mappings (i.e. The "R/qtl" and "Pair Scan" mappings) successfully. + +This might imply a code issue: Perhaps +* a path is hardcoded, or +* the wrong path value is passed + +The same error occurs on https://cd.genenetwork.org but does not seem to prevent CD from running the mapping to completion. Maybe something is missing on production — what, though? + +## Closed as Completed + +This seems fixed now. diff --git a/issues/genetics/speeding-up-gemma.gmi b/issues/genetics/speeding-up-gemma.gmi new file mode 100644 index 0000000..91bab17 --- /dev/null +++ b/issues/genetics/speeding-up-gemma.gmi @@ -0,0 +1,492 @@ +# Speeding up GEMMA + +GEMMA is slow, but usually fast enough. Earlier I wrote gemma-wrapper to speed things up. In genenetwork.org, by using gemma-wrapper with LOCO, most traits are mapped in a few seconds on a a large server (30 individuals x 200K markers). By expanding makers to over 1 million, however, runtimes degrade to 6 minutes. Increasing the number of individuals to 1000 may slow mapping down to hour(s). As we are running 'precompute' on 13K traits - and soon maybe millions - it would be beneficial to reduce runtimes again. + +One thing to look at is Sen's bulklmm. It can do phenotypes in parallel, provided there is no missing data. This is perfect for permutations which we'll also do. For multiple phenotypes it is a bit tricky however, because you'll have to mix and match experiments to show the same individuals (read samples). + +So the approach is to first analyze steps in GEMMA and see where it is particularly inefficient. Maybe we can do something about that. I note I started the pangemma effort (and mgamma effort before). The idea is to use a propagator network for incremental improvements and also to introduce a new build system and testing framework. In parallel we'll try to scale out on HPC using Arun's ravanan software. + +There is no such thing as a free lunch. So, let's dive in. + +# Description + +# Tags + +* assigned: pjotrp +* type: feature +* priority: high + +# Tasks + +* [X] Try gzipped version +* [X] Run without debug +* [ ] Use lmdb for genotypes +* - [X] convert genotypes to lmdb +* - [X] replace GEMMA ReadGenotypes +* - [X] replace reading genotypes in AnalyzeBimbam +* - [+] Apply similar SNP filtering as the original +* - [X] Add SNP info tho Geno file +* - [X] Try different geno encodings +* - [+] Fix support for NAs - also in compute +* [X] Use lmdb for SNPs (probably part of Geno file) +* [X] Match output +* [ ] Write lmdb for output with filter +* [X] Optimize openblas for target architecture +* [ ] Use profiler +* [ ] Hash genotypes? Try buf.hash or xxhash +* [ ] Skip highly correlated markers with backtracking +* [ ] Perhaps try a faster malloc library for GEMMA +* [ ] Fix sqrt(NaN) when running big file example with -debug +* [ ] Fix/check assumption that geno is between 0 and 2 +* [ ] Try 64-bit integer index for lmdb +* [ ] Other improvements... + +# Summary + +Convert a geno file to mdb with + +``` +./bin/anno2mdb.rb mouse_hs1940.anno.txt +./bin/geno2mdb.rb mouse_hs1940.geno.txt --anno mouse_hs1940.anno.txt.mdb --eval Gf # convert to floating point +real 0m14.042s +user 0m12.639s +sys 0m0.402s +``` + +``` +../bin/anno2mdb.rb snps-matched.txt +../bin/geno2mdb.rb pangenome-13M-genotypes.txt --geno-json bxd_inds.list.json --anno snps-matched.txt.mdb --eval Gf +../bin/geno2mdb.rb pangenome-13M-genotypes.txt --geno-json bxd_inds.list.json --anno snps-matched.txt.mdb --eval Gb +``` + +even with floats a 30G pangenome genotype file got reduced to 12G. A quick full run of the mdb version takes 6 minutes. That is a massive 3x speedup. It also used less RAM (because it is one process instead of 20) and had a 40x core usage, much of it in the Linux kernel: + +``` +/bin/time -v ./build/bin/Release/gemma -k tmp/93f6b39ec06c09fb9ba9ca628b5fb990921b6c60.11.cXX.txt.cXX.txt -p tmp/pheno.json.txt -g pangenome-13M-genotypes.txt.mdb -lmm 9 -maf 0.1 -n 2 -debug +LD_LIBRARY_PATH=$GUIX_ENVIRONMENT/lib /bin/time -v ./build/bin/Release/gemma -k tmp/93f6b39ec06c09fb9ba9ca628b5fb990921b6c60.3.cXX.txt.cXX.txt -p tmp/pheno.json.txt -g tmp/pangenome-13M-genotypes.txt.mdb -lmm 9 -maf 0.1 -n 2 -no-check +real 5m47.587s +user 39m33.796s +sys 211m1.143s + +Command being timed: "./build/bin/Release/gemma -k tmp/93f6b39ec06c09fb9ba9ca628b5fb990921b6c60.3.cXX.txt.cXX.txt -p tmp/pheno.json.txt -g tmp/pangenome-13M-genotypes.txt.mdb -lmm 9 -maf 0.1 -n 2 -no-check" + User time (seconds): 2169.77 + System time (seconds): 11919.04 + Percent of CPU this job got: 3919% + Elapsed (wall clock) time (h:mm:ss or m:ss): 5:59.48 + Maximum resident set size (kbytes): 13377040 +``` + +as we only read the genotype file once it shows how much is IO bound! Moving to lmdb was the right choice to speed up pangemma. + +Old gemma does: + +``` + Command being timed: "/bin/gemma -k 93f6b39ec06c09fb9ba9ca628b5fb990921b6c60.11.cXX.txt.cXX.txt -p pheno.json.txt -g pangenome-13M-genotypes.txt.gz -a snps-matched.txt -lmm 9 -maf 0.1 -n 2 -no-check" + User time (seconds): 2017.25 + System time (seconds): 62.21 + Percent of CPU this job got: 240% + Elapsed (wall clock) time (h:mm:ss or m:ss): 14:24.17 + Maximum resident set size (kbytes): 9736884 +``` + +So we are at 3x speed. + +With Gb byte encoding the file got further reduced from 13Gb to 4Gb. + +What is more exciting is that LOCO now runs in 30s - compared to gemma's earlier 6 minutes, so that is at 10x speed, using about 1/3 of RAM. Note the CPU usage: + +``` + Command being timed: "./build/bin/Release/gemma -k tmp/93f6b39ec06c09fb9ba9ca628b5fb990921b6c60.3.cXX.txt.cXX.txt -p tmp/pheno.json.txt -g tmp/pangenome-13M-genotypes.txt-Gb.mdb -loco 2 -lmm 9 -maf 0.1 -n 2 -no-check" User time (seconds): 177.81 + System time (seconds): 934.92 + Percent of CPU this job got: 3391% + Elapsed (wall clock) time (h:mm:ss or m:ss): 0:32.80 + Maximum resident set size (kbytes): 4326308 +``` + +it looks like disk IO is no longer the bottleneck. The Gb version is much smaller than Gf, but runtime is only slightly better. So it is time for the profiler to find how we can make use of the other cores! But, for now, I am going to focus on getting the pipeline set up with ravanan. + +# Analysis + +As a test case we'll take on of the runs: + +``` +time -v /bin/gemma -loco 11 -k /export2/data/wrk/services/gemma-wrapper/tmp/tmp/panlmm/93f6b39ec06c09fb9ba9ca628b5fb990921b6c60.11.cXX.txt.cXX.txt -o 680029457111fdd460990f95853131c87ea20c57.11.assoc.txt -p pheno.json.txt -g pangenome-13M-genotypes.txt -a snps-matched.txt -lmm 9 -maf 0.1 -n 2 -outdir /export2/data/wrk/services/gemma-wrapper/tmp/tmp/panlmm/d20251111-588798-f81icw +``` + +which I simplify to + +``` +/bin/time -v /bin/gemma -loco 11 -k 93f6b39ec06c09fb9ba9ca628b5fb990921b6c60.11.cXX.txt.cXX.txt -p pheno.json.txt -g pangenome-13M-genotypes.txt -a snps-matched.txt -lmm 9 -maf 0.1 -n 2 -debug +Reading Files ... +number of total individuals = 143 +number of analyzed individuals = 20 +number of total SNPs/var = 13209385 +number of SNPS for K = 12376792 +number of SNPS for GWAS = 832593 +number of analyzed SNPs = 13111938 +``` + +The timer says: + +``` +User time (seconds): 365.33 +System time (seconds): 16.59 +Percent of CPU this job got: 128% +Elapsed (wall clock) time (h:mm:ss or m:ss): 4:57.01 +Average shared text size (kbytes): 0 +Average unshared data size (kbytes): 0 +Average stack size (kbytes): 0 +Average total size (kbytes): 0 +Maximum resident set size (kbytes): 11073412 +Average resident set size (kbytes): 0 +Major (requiring I/O) page faults: 0 +Minor (reclaiming a frame) page faults: 5756557 +Voluntary context switches: 1365 +nInvoluntary context switches: 478 +Swaps: 0 +File system inputs: 0 +File system outputs: 143704 +Socket messages sent: 0 +Socket messages received: 0 +Signals delivered: 0 +Page size (bytes): 4096 +Exit status: 0 +``` + +The genotype file is unzipped at 30G. Let's try running the gzipped version (which will be beneficial on a compute cluster anyhow) which comes in at 9.2G. We know that Gemma is not the most efficient when it comes to IO. So testing is crucial. +Critically the run gets slower: + +``` +Percent of CPU this job got: 118% +Elapsed (wall clock) time (h:mm:ss or m:ss): 7:43.56 +``` + +The problem is that unzip runs on a single thread in GEMMA, so it is actually slower that the gigantic raw text file. + +## Running without debug + +Without the debug swith gemma runs at the same speed with 128% CPU. That won't help much. + +## Optimizing GEMMA+OpenBLAS+GSL + +Compiling with optimization can be low hanging fruit - despite the fact that we seem to be IO bound at 128% CPU. Still, aggressive compiler optimizations may make a difference. The current build reads: + +``` +GEMMA Version = 0.98.6 (2022-08-05) +Build profile = /gnu/store/8rvid272yb53bgascf5c468z0jhsyflj-profile +GCC version = 14.3.0 +GSL Version = 2.8 +OpenBlas = OpenBLAS 0.3.30 - OpenBLAS 0.3.30 DYNAMIC_ARCH NO_AFFINITY Cooperlake MAX_THREADS=128 +arch = Cooperlake +threads = 96 +parallel type = threaded +``` + +this uses the gemma-gn2 package in + +=> https://git.genenetwork.org/guix-bioinformatics/tree/gn/packages/gemma.scm#n27 + +which is currently not built with arch optimizations (even though Cooperlake suggests differently). Another potential optimization is to use a fast malloc library. We do, however, already compile with a recent gcc, thanks to Guix. No need to improve on that. + +## Introduce lmdb for genotypes + +Rather than focussing on gzip, another potential improvement is to use lmdb with mmap. We am not going to upgrade the original gemma code (which is in maintenance mode). We are going to upgrade the new pangemma project instead: + +=> https://git.genenetwork.org/pangemma/ + +Reason being that this is our experimental project. + +So I just managed to build pangemma/gemma in Guix. Next step is to introduce lmdb genotypes. Genotypes come essentially as a matrix of markers x individuals. In the case of GN geno files and BIMBAM files they are simply stored as tab delimited values and/or probabilities. This happens in + +``` +src/param.cpp +1261:void PARAM::ReadGenotypes(gsl_matrix *UtX, gsl_matrix *K, const bool calc_K) { +1280:void PARAM::ReadGenotypes(vector<vector<unsigned char>> &Xt, gsl_matrix *K, +``` + +calling into + +``` +gemma_io.cpp +644:bool ReadFile_geno(const string &file_geno, const set<string> &setSnps, +1752:bool ReadFile_geno(const string file_geno, vector<int> &indicator_idv, +1857:bool ReadFile_geno(const string &file_geno, vector<int> &indicator_idv, +``` + +which are called from gemma.cpp. Also lmm.cpp reads the geno file in the AnalyzeBimbam function (see file_geno): + +``` +src/lmm.cpp +61: file_geno = cPar.file_geno; +1664: debug_msg(file_geno); +1665: auto infilen = file_geno.c_str(); +2291: cout << "error reading genotype file:" << file_geno << endl; +``` + +Note that also SNPs are read from a file (see file_snps). We already have an lmdb version for that! + +So, reading genotypes happens in multiple places. In fact, it is read 1x for computing K and 2x for GWA. And it is worth than this because LOCO runs GWA 20x rereading the same files. Reading it once using lmdb should speed things up. + +We'll start with the 30G 143samples.percentile.bimbam.bimbam-reduced2 file. To convert this file into lmdb we only do this once. We want to track both column and row names in the same lmdb and we will use a meta JSON record for that. On the command line we'll state wether the genotypes are stored as char or int. Floats will be packed into either of those. We'll expirement a bit to see what the default should be. A genotype is usually a number/character or a probability. In the latter case we don't have to have high precison and can choose to store an index into a range of values. We can also opt for Float16 or something more ad hoc because we don't have to store the exponent. + +But let's start with a standard float here, to keep things simple. To write the first version of code I'll use a byte conversion: + +``` +./bin/geno2mdb.rb BXD.geno.bimbam --eval '{"0"=>0,"1"=>1,"2"=>2,"NA"=>-1}' --pack 'C*' --geno-json BXD.geno.json +``` + +The lmdb file contains a metadata record that looks like: + +``` +{ + "type": "gemma-geno", + "version": 1, + "eval": "G0-2", + "key-format": "string", + "rec-format": "C*", + "geno": { + "type": "gn-geno-to-gemma", + "genofile": "BXD.geno", + "samples": [ + "BXD1", + "BXD2", + "BXD5", +etc. +``` + +i.e. it is a self-contained, efficient, genotype format. There is also another trick, we can use Plink-style compression with + +``` +./bin/geno2mdb.rb BXD.geno.bimbam --eval '{"0"=>0,"1"=>1,"2"=>2,"NA"=>4}' --geno-json BXD.geno.json --gpack 'l.each_slice(4).map { |slice| slice.map.with_index.sum {|val,i| val << (i*2) } }.pack("C*")' +``` + +reducing the original uncompressed BIMBAM from 9.9Mb to 2.7Mb. This is still a lot larger than the gzip compressed BIMBAM, but as I pointed out earlier the uncompressed version is faster by a wide margin. Compressing the lmdb file gets it in range of the compressed BIMBAM btw. So that is always an option. + +Next we create a floating point version. That reduces the file to 30% with + +``` +geno2mdb.rb fp.bimbam --geval 'g.to_f' --pack 'F*' --geno-json bxd_inds.list.json +``` + +and if we compress the probabilities into a byte reduces the file to 10%: + +``` +geno2mdb.rb fp.bimbam --geval '(g.to_f*255.0).to_i' --pack 'C*' --geno-json bxd_inds.list.json +``` + +And now the compressed version is also 4x smaller. We'll have to run gemma at scale to see what the impact is, but an uncompressed 10x reduction schould have an impact on the IO bottle neck. Note how easy it is to try these things with my little Ruby script. + +=> https://github.com/genetics-statistics/gemma-wrapper/blob/master/bin/geno2mdb.rb + +## Use lmdb genotypes from pangemma + +Rather than writing new code in C++ I proceeded embedding guile in pangemma. If it turns out to be a performance problem we can always fall back to C. Here we show a simple test witten in guile that gets called from main.cpp: + +=> https://git.genenetwork.org/pangemma/commit/?id=5b6b5e2ad97b4733125c0845cfae007e8094a687 + +## Some analysis of GEMMA + +GEMMA::BatchRun reads files and executes (b gemma.cpp:1657) +cPar.ReadFiles() + ReadFile_anno + ReadFile_pheno + ReadFile_geno (gemma_io.cpp:652) - first read to fetch SNPs info, num (ns_tset) and total SNPs (ns_total). + - it also does some checks + Note: These can all be handled by the lmdb files. So it saves one run. + +Summary of Mutated Outputs: +* indicator_snp: Binary indicators for which SNPs passed filtering +* snpInfo: Complete metadata for all SNPs in the file +* ns_test: Count of SNPs passing filters +checkpoint("read-geno-file",file_geno); + +Next start LMM9 gemma.cpp:2571 + ReadFile_kin + EigenDecomp_Zeroed + 2713 CalcUtX(U, W, UtW); + 2714 CalcUtX(U, Y, UtY); + CalcLambda + CalcLmmVgVeBeta + CalcPve + cPar.PrintSummary() + debug_msg("fit LMM (one phenotype)"); + cLmm.AnalyzeBimbam lmm.cpp:1665 and + LMM::Analyze lmm.cpp:1704 + + +Based on LLM code analysis, here's what gets mutated in the 'LMM' and Param class: + +### By 'ReadFile_geno': +This is a **standalone function** (not a member of LMM), but it mutates LMM members when passed as parameters: + +1. **'indicator_snp'** - cleared and populated with 0/1 filter flags +2. **'snpInfo'** - cleared and populated with SNP metadata +3. **'ns_test'** - set to count of SNPs that passed all filters + +### By 'LMM::AnalyzeBimbam': +(which calls 'LMM::Analyze') + +**Directly mutated in 'LMM::Analyze':** + +1. **'sumStat'** - PRIMARY OUTPUT + - Cleared at start (implied) + - Populated with one SUMSTAT entry per analyzed SNP + - Contains: beta, se, lambda_remle, lambda_mle, p_wald, p_lrt, p_score, logl_H1 + +2. **'time_UtX'** - timing accumulator + - '+= time_spent_on_matrix_multiplication' + +3. **'time_opt'** - timing accumulator + - '+= time_spent_on_optimization' + +**Read but NOT mutated:** +- 'indicator_snp' - read to determine which SNPs to process +- 'indicator_idv' - read to determine which individuals to include +- 'ni_total', 'ni_test' - used for loop bounds and assertions +- 'n_cvt' - number of covariates, used in calculations +- 'l_mle_null', 'l_min', 'l_max', 'n_region', 'logl_mle_H0' - analysis parameters +- 'a_mode' - determines which statistical tests to run +- 'd_pace' - controls progress bar display + +### Summary Table: + +| Member Variable | Mutated By | Purpose | +|----------------|------------|---------| +| 'indicator_snp' | 'ReadFile_geno' | Which SNPs passed filters | +| 'snpInfo' | 'ReadFile_geno' | SNP metadata (chr, pos, alleles, etc.) | +| 'ns_test' | 'ReadFile_geno' | Count of SNPs to analyze | +| 'sumStat' | 'Analyze' | **Main output**: Statistical results per SNP | +| 'time_UtX' | 'Analyze' | Performance profiling | +| 'time_opt' | 'Analyze' | Performance profiling | + +The key output is **'sumStat'** which contains all the association test results. + +PARAM variables directly mutated by these functions: + + indicator_snp (by ReadFile_geno) + snpInfo (by ReadFile_geno) + ns_test (by ReadFile_geno) + +LMM variables mutated: + + indicator_snp (by ReadFile_geno if passed LMM's copy) + snpInfo (by ReadFile_geno if passed LMM's copy) + ns_test (by ReadFile_geno if passed LMM's copy) + sumStat (by Analyze - this is LMM-only, not in PARAM) + time_UtX, time_opt (by Analyze) + +The actual analysis results (sumStat) exist only in LMM, not in PARAM. + +## Coding for lmdb support + +From above it should be clear that, if we have the genotypes and snp annotations in lmdb, we can skip reading the genotype file the first time. We can also rewrite the 'analyze' functions to fetch this information on the fly. + +Note that OpenBLAS will have to run single threaded when introducing SNP-based threads. + +## Fine grained multithreading + +From above it can be concluded that we can batch process SNPs in parallel. The only output is sumStat and that is written at once at the end. So, if we can collect the sumStat data without collision it should just work. + +Interestingly both Guile and C++ have recently introduced fibers. Boost.Fiber looks pretty clean: + +``` +#include <boost/fiber/all.hpp> +#include <vector> +#include <iostream> + +namespace fibers = boost::fibers; + +// Worker fiber +void compute_worker(int start, int end, + fibers::buffered_channel<int>& channel) { + for (int i = start; i < end; ++i) { + channel.push(i * i); + } +} + +void parallel_compute_fibers() { + fibers::buffered_channel<int> channel(100); + + // Spawn fibers + fibers::fiber f1([&]() { + compute_worker(0, 100, channel); + channel.close(); // Signal completion + }); + + fibers::fiber f2([&]() { + compute_worker(100, 200, channel); + }); + + // Collect results + std::vector<int> results; + int value; + while (fibers::channel_op_status::success == channel.pop(value)) { + results.push_back(value); + } + + f1.join(); + f2.join(); + + std::cout << "Total results: " << results.size() << std::endl; +} +``` + +Compare that with guile: + +``` +(use-modules (fibers) + (fibers channels)) + +;; Worker that streams individual results +(define (compute-worker-streaming start end result-channel) + (let loop ((i start)) + (when (< i end) + (put-message result-channel (* i i)) + (loop (+ i 1)))) + ;; Send completion signal + (put-message result-channel 'done)) + +;; Collector fiber +(define (result-collector result-channel num-workers) + (let loop ((results '()) + (done-count 0)) + (if (= done-count num-workers) + (reverse results) + (let ((msg (get-message result-channel))) + (if (eq? msg 'done) + (loop results (+ done-count 1)) + (loop (cons msg results) done-count)))))) + +(define (parallel-compute-streaming) + (run-fibers + (lambda () + (let ((result-channel (make-channel))) + + ;; Spawn workers + (spawn-fiber + (lambda () (compute-worker-streaming 0 100 result-channel))) + (spawn-fiber + (lambda () (compute-worker-streaming 100 200 result-channel))) + + ;; Collect results + (result-collector result-channel 2))))) +``` + +The Boost fiber is a relatively mature library now, with about 8+ years of development and real-world usage. +Interestingly Boost.fibers has work stealing built in. We can look at that later: + +=> https://www.boost.org/doc/libs/1_66_0/libs/fiber/doc/html/fiber/worker.html + +What about LOCO? Actually we can use the same fiber strategy for each chromosome as a per CHR process. We can set the number of threads differently based on chromosome SNP num, so all chromosomes take (about) the same time. Later, we can bring LOCO into one process with the advantage that the genotype data is only read once. In both cases the kinship matrices are in RAM anyway. + +# Reducing the size of the genotype file + +The first version of lmdb genotypes used simple floats. That reduced the pangenome text version from 30Gb to 12Gb with about a 3x speedup of gemma. Next I tried byte representation of the genotypes. + +# Optimizing SNP handling + +GEMMA originally used a separate SNP annotation file which proves inefficient. Now we transform the geno information to lmdb, we might as well include chr+pos. We'll make the key out of that and add a table with marker annotation. + +# Optimizing the index + +I opted for using a CHR+POS index (byte+long value). There are a few things to consider. There may be duplicates and there may be missing values. Also LMDB likes and integer index. The built-in dubsort does not work, so we need to create a unique pos for every variant. I'll do that by adding the line number. diff --git a/issues/genotype_search_bug.gmi b/issues/genotype_search_bug.gmi new file mode 100644 index 0000000..0f05f4e --- /dev/null +++ b/issues/genotype_search_bug.gmi @@ -0,0 +1,13 @@ +# The * Search for Genotypes Not Working + +## Tags + +* type: bug +* priority: medium +* status: closed +* assigned: zsloan +* keywords: bug, search + +## Description + +Currently * searches for genotypes return no results, even when data exists. diff --git a/issues/global-search-results.gmi b/issues/global-search-results.gmi deleted file mode 100644 index 9cd773a..0000000 --- a/issues/global-search-results.gmi +++ /dev/null @@ -1,32 +0,0 @@ -# Global search does not return results - -## Tags - -* priority: critical -* type: bug -* assigned: zsloan, pjotrp -* status: unclear -* keywords: global search, from github - -## Description - -=> https://github.com/genenetwork/genenetwork2/issues/629 From GitHub - -> Try a search for Brca2 -> -> I am trying to add an example to this storyboard: -> -> => https://github.com/genenetwork/gn-docs/blob/master/story-boards/starting-from-known-gene/starting-from-known-gene.md#use-the-search-page -> -> -> Interestingly luna does no better: -> -> => http://luna.genenetwork.org/gsearch?type=gene&terms=brca2 - -@pjotr @zsloan, it seems to me this might be fixed, but please have a look and fix it in case it is not - -## Resolution - -With the new xapian search, this issue is no more. - -* closed diff --git a/issues/global-search-unhandled-error.gmi b/issues/global-search-unhandled-error.gmi index b2f6ba8..7626280 100644 --- a/issues/global-search-unhandled-error.gmi +++ b/issues/global-search-unhandled-error.gmi @@ -5,7 +5,7 @@ * assigned: aruni, fredm * priority: high * type: bug -* status: open +* status: closed * keywords: global search, gn2, genenetwork2 ## Description @@ -15,3 +15,7 @@ assume the request will always be successful. This is not always the case, as ca => https://test3.genenetwork.org/gsearch?type=gene&terms=Priscilla here (as of 2024-03-04T11:25+03:00UTC). Possible errors should be checked for and handled before attempting to read and/or process expected data. + +## Closing Comments + +This issue is closed as obsoleted. The issue is really old (>=7 months). Closing it for now. To be reopened if the issue happens again. diff --git a/issues/gn-auth/email_verification.gmi b/issues/gn-auth/email_verification.gmi index 8147bb5..07e2b04 100644 --- a/issues/gn-auth/email_verification.gmi +++ b/issues/gn-auth/email_verification.gmi @@ -2,7 +2,7 @@ ## Tags -* status: open +* status: closed, completed * priority: medium * type: enhancement * assigned: fredm, zsloan @@ -12,8 +12,10 @@ When setting up e-mail verification, the following configurations should be set for gn-auth: -SMTP_HOST = "smtp.uthsc.edu" +SMTP_HOST = "smtp.uthsc" SMTP_PORT = 25 (not 587, which is what we first tried) SMTP_TIMEOUT = 200 # seconds Not sure about username/password yet. We tried UNKNOWN/UNKNOWN and my own (Zach's) username/password + +Note that this host is only visible on the internal network of UTHSC. It won't work for tux02. diff --git a/issues/gn-auth/example-privileges-script.gmi b/issues/gn-auth/example-privileges-script.gmi new file mode 100644 index 0000000..afda1a1 --- /dev/null +++ b/issues/gn-auth/example-privileges-script.gmi @@ -0,0 +1,36 @@ +# Example Python script for setting privileges for user/group + +## Description + +This is just an example of a python script for setting user/group privileges, for potential future reference + +Before running this script, stop the crontab job that automatically sets unlinked resource privileges + +```python +import uuid +import sqlite3 + +group_id = '0510dc91-0eb6-4d9d-97e5-405acc84ba2b' +resource_id = 'e5cc773d-ca28-44e2-b2a7-1c2901794238' + +publishxrefs = ('10955','10957','10960','10961','10964','10966','10969','10970','10973','10975','10978','10979','10982','10984','10987','10988','12486','12487','12489','12490','12491','12492','12493','12494','12495','12496','12497','12498','12499','12500','12501','12502','12503','12504','12505','12506','12507','12508','12509','12510','12511','12512','12513','12514','12515','12516','12517','12518','12519','12520','12521','12522','12523','12524','12525','12526','12527','12528','12529','12530','12531','12532','12533','12534','12535','12536','12537','12538','12539','12540','12541','12542','12543','12544','12545','12546','12547','12548','12549','12550','12551','12566','12567','12568','12569','12574','12575','12576','12577','12578','12579','12580','12621','12735','12737','12741','12742','12743','12744','12745','12780','12781','12782','12783','12784','12785','12786','12787','12788','12789','12790','12791','12792','12793','12794','12795','12796','12797','12798','12799','12800','12801','12803','12804','12805','12806','12807','12808','12809','12810','12812','12813','12816','12817','12961','12962','12963','12964','12965','12966','12967','12970','13029','14803','14804','14805','14806','15572','15573','16197','16375','17329','17330','17331','17332','17333','17334','17335','17336','17337','17338','17339','17340','17341','17342') + +# I generated these separatedly with uuid.uuid4(); I probably could have just done this in the script itself, but wanted to make sure they stayed the same +data_link_ids = ('3041366d-1ffd-45fb-9617-043772b285c8', 'da41fc30-3cd6-4b41-83b5-8fedc4ccd65f', '364a4010-e3fe-470f-a8c9-2a9fd359a4e3', '4e878c0a-cc92-4b21-8152-310266291967', 'ab50a999-e9bb-4bb6-91c0-9828b804156e', 'd50d30e9-15f9-4578-8b48-2bcb0d7a8afb', 'd42d2ef5-278f-4b5e-ae57-10f49f48c2e9', '78c022d7-390b-4688-96c6-c1afadd45877', '17fca9ae-8e71-4c55-b035-15d04f96d936', '4f9893de-fccf-4d6a-845d-df2f83e4d06c', '8a660b03-786a-4143-9fb3-9d00e888f3a2', '3965417a-e47a-47c8-81f6-991eef8c4152', 'e27707f7-5832-4e3f-9391-849e964bbaf6', 'bf9f6ff0-a131-46ef-8a2e-c37d8b66f992', '1ee744c4-95e1-4a66-958c-e785dc937563', '0fa79294-bbdc-4701-861d-9bb91ea72588', '38665214-7cdd-4b01-81dc-d1b78e63a0b0', '82a237df-96ce-404e-b052-8dbe45e793ee', 'ec4c1848-d326-462b-9c0d-f5e5c76e92f6', '46bee64b-8ce7-4910-80ec-211063725b1a', '7f489875-38b6-4cff-a05e-f11a7957b9b8', 'f39744a1-d673-406f-a2f1-c45082bb1975', '5f53a9e9-e40c-4a01-bf9d-430d7c2fd5ef', '1f0a4f2d-cd1c-41e5-a185-2ea2b2b05cd3', 'e282651c-7dc3-40e9-bb52-14e73c3a4ef7', '3c492e6d-e807-427b-acca-44afa4862894', '38e0df6c-3f44-4acb-9965-f0d3f0278150', '35e5ae63-3a32-49ac-93ed-b39d02ab5f5c', '0e6bfa4a-4fee-4b54-80c6-209f9b0ecd00', 'eb85e71a-8b4b-4f3f-9168-59b4ebc090a1', '3eb0325c-4dce-481e-bce7-46c37031da76', '7bc5ce49-4150-4d87-bfbf-d3a1cd20ad67', '03c0cba7-8712-4a27-9b79-e38818805b1f', '07d787ec-e0f9-4b7c-b368-d1f56ce030dc', '51d9e601-31c7-4643-b896-79d90bdc4105', '3cee3754-2822-4f0a-87ad-96bdfe2f0232', 'a7e9eb54-63bd-4ca9-a1f8-1aeac02a76db', '3ff132e5-7fb6-4763-943e-1efbe5f8000e', 'c685f0c9-084d-44d2-882e-ce66cdccef6d', 'ea062e07-1f59-4312-bfd9-6560e652c878', '75d33621-b5a4-447d-a094-7480d1d57a47', 'bb3dbd16-0c73-47d8-8e21-f095d3398b61', '0211177b-a92c-4215-a622-0cba5e8e2866', 'e2139b64-e74a-4263-9785-314e73b102df', '0426f12b-c223-487b-8ab7-baea5995c480', '4a467a72-174c-4ec7-9557-859656ad2c71', '38ab978e-e78f-4c0a-8af3-449b636fe5e6', 'a45c8d42-14d3-464d-8395-8a574148da78', 'e4171cc1-4a03-4311-a287-cee1b8084227', '75d70308-6f1a-49e4-9199-97ec8f60778e', 'efb5c834-b88a-4ee9-b09d-91913fddb546', '23866a00-a729-4ba9-af22-ee83ec164d34', '3feb1154-0613-464b-b758-aad308550a74', '7019d0f1-a590-46ce-a30e-4c21541b6ea8', '6e803182-71d2-4427-a5df-ad84651e5d11', 'fe1bf3f6-818b-4fae-9880-8ae2c1bdcff6', '66d480f7-da41-49ed-a222-8724b493313a', 'c908d2a3-8378-4574-83be-3bf8bdeff5fb', '96b36360-7258-43ab-bdda-23e93f15b0ac', 'daf90aca-6ee6-4c3c-9a60-1e7ae2e29cd2', '43800347-1fe1-40f7-9013-408f0b0740e9', 'e9350a78-a62f-4a08-8881-e6e51450d120', 'bda9a217-d605-4a18-9c3f-5139679ae413', 'cbd8f79a-4992-43c9-8391-994e221b73e1', 'c6b64d90-63ff-482d-b205-f58f3cf656df', '3ecbf267-3655-42a6-a8f9-2751439efb27', '808ae753-a255-43a6-96d4-0ed02b14aefe', '1a5424df-49b3-4274-8281-a1eed838ffda', '89e6d278-e643-43a2-8a61-746cbf446109', 'b4940ece-80a0-4382-ba57-eaad1d35e83e', 'f46cd643-fccb-4037-b642-9a4a329e84e2', '497a235c-4253-4e94-a69c-4b2f200976dd', '02aa8e3a-f9ac-459b-8e35-7081f2849f48', 'da5018e2-38af-415a-ad43-8caf8d82290d', '574ee482-f534-475e-9e7a-0a14e05f4495', 'b90b3a02-fa8d-4393-9dbb-087224a80b40', 'd68370ec-f569-42f3-9c07-a3118aa73ad5', '4b6b099b-3a7c-46c2-a2fc-92c01463b698', 'c9f5608f-3301-4835-b6dc-b1891fe81c36', 'eead972c-0fc4-4c5e-b1ad-63db4d1e9409', 'd8b295eb-6d07-4abe-8b8a-8cfef066a32e', 'a89f3944-be64-42d0-aa66-d2501021760d', '02f42124-bc38-4a14-9400-bbc8e8bf41b7', 'abbcb901-da42-4ef1-bc2c-55b95d584461', 'e28b0cef-eddb-41f2-9479-722365c0b2e0', '9135c304-1dd3-4eb5-82d4-91a86e39068a', '0bbd5f1d-eef3-4c35-84ab-484165a4240d', '08ad9a25-b20d-4ad8-a5e0-a886edc4a7aa', '7e05bdf8-51f5-49dc-9ff6-fbbc6aa20c9f', 'c82d4943-dc6f-4ec8-b76f-1309290183fe', '6a8d76bc-156b-4925-823c-b4585a847efc', '2604e9a8-a4ee-49be-a754-126b1705516e', '8c32b69b-e796-418d-b254-104a179a84ba', '532dca31-c38e-4b77-a84c-563407e9ae00', '954cacda-179e-42a9-8c1f-987e6fae1079', 'bcfced8a-bd50-48e6-9edb-4776a1e95bf5', '66308324-1747-46df-8ddf-41e5bff1cd1a', 'f797e23c-7cb6-4869-97f5-3a79b685c6a3', '0869bb57-0133-4e57-9655-2b6eb1906f5e', 'fc0dddfa-e683-4a8d-9f57-82fb368f8a84', '35b7ffc1-6782-4c85-9bf8-d51629cab2d0', '232850b6-5a53-45e0-8668-7773b9cb39c2', 'af20291c-2be6-40e1-9576-b78df5d56774', 'f52f5c1a-1f8a-4b8a-8e00-fc2bdc6edc5b', '90819230-f372-4e48-96fc-6fb97199fa07', 'b31aefbf-fb67-49dc-b357-f8f0cd76cea9', '5d695f24-674a-4dc5-9e02-7817b77ab06b', '064d5972-f636-4771-95fe-3f6260fd550f', 'c2254f71-98dc-4303-bc26-9b9640582be1', '6eac9495-a366-4e65-90d2-d63472937925', '119398e3-b8cc-4ae5-addb-ec13db9834fa', '6cce7b35-fe2a-4348-9e42-5179ea9f42f1', '65940929-c9fc-47e9-b1cf-c9c9688f7871', '73ffdb1a-f70d-4e8e-88b7-0e22cfd1916e', 'c1b25581-7d28-4535-bcdc-44dc3bc7e438', '6e03a5f7-f200-439a-a465-97056d3c9f71', '4d270b71-2e06-4cfb-a60d-258ccbc7860a', '8b82e29f-a901-454f-a9ad-2f96be9d6c44', '7d699b76-f554-44db-9c68-6ff985cd6388', '3417b2dc-a88a-4cb6-a446-9e90063731f9', '18760f59-4b50-48d5-9814-8117490ab972', '4aaebf37-9529-4365-bdb8-dd53b0ac2499', '95ecdf43-12a5-4b3c-993a-ff03b58cee93', '2b5dd4e6-2310-417e-82bb-b16e96c7346b', '92ee883a-646d-44dd-b2c6-1bffb7b0d2cb', '979038e4-9392-4836-ad04-f125cf19eafa', '1220629d-000f-4508-8a41-3706eebeb812', '42abca44-8eb3-4aa7-adae-16afc211dff4', '82fe9559-718e-4424-9465-033204e1ec03', '8353fe08-e6c8-4f87-b0d8-412ab4a41d19', '1c6bebcf-c125-42a3-9d5b-4fae3113b62b', 'ba54b2ba-fee3-4f1d-a903-18edc7c694bd', '0ea0d40d-3204-4b9b-bae2-54355dce2b5c', '5ee4857c-00b4-46d6-880c-44dbae021b45', '2caa4c03-78ce-456d-8e20-edb531bdd45a', 'e2536a5e-357d-4f6d-a764-ac85a40a2f3f', 'e6341996-80bb-42f9-8842-92062680e957', '3612e03e-430d-4da3-ac87-93a310a3d780', '88c600d2-cefd-4a99-a904-bf2260554ac6', 'f1a6af16-2525-4650-b729-cbec60ad276c', '4b854252-9e87-4d7c-99d9-84ae9297d26e', 'be580989-3ccd-48bd-8c85-a750a800afbd', '5fd675fe-e765-4bf0-8e0f-8f81107a0bb8', 'cf852032-6399-4bf8-a8e7-474c84030430', 'eef27f8a-32d2-4add-a018-ff2d34208a11', '3aca3b1d-4589-4b4c-90de-588fd43fe835', 'd6187213-5a39-4089-ac50-eb144be2a3a5', '5bf60cda-b6b9-4992-91ac-c022e523202a', '4c4395ca-2f2e-4a85-93df-37d2c7f3d1d6', 'b8f9d837-2bd6-447c-9ad8-f581f84f36c1', '029a88bb-3850-4e85-87ab-8ecb3ad59538', '39ead890-0e1a-43df-9bbc-459a3ea0a016', '4b559ad2-c4d8-4763-bc08-90cb63fc79d0', '8361884a-248b-4dac-a9f9-d56f31ab477e', 'd79e2e00-9ea6-4d43-addc-3b1955bc7e5f', '4c0a35ac-c549-4c1a-9fc8-a2e93ba1c632', '50f558d0-c7b1-4204-8ebb-5855e7588998', 'be061746-1b34-4c04-a752-ab5c8d78fdef', 'f8edfb50-c572-4025-87c6-b34e88d8fb90', '0a799ff1-df2c-4c85-9b7e-4fe4885ab5cd', 'db373aa1-8ab9-4257-8d48-11dc92448344', '1e2b9de8-74a4-446a-970e-b47c662760b2', 'ac09ffdf-9cb5-49be-8f52-b681598453f6', 'ae4a55af-a1bb-4698-b2e7-ffbed8760635', '7989ff1f-a9da-439a-bb8b-14482b15dd2e') + +# delete_query deletes from the AutoAdminGroup +delete_query = 'delete from linked_phenotype_data where group_id="5ea09f67-5426-4b66-9ea2-12bdd78350e8" and SpeciesId="1" and InbredSetId="1" and PublishFreezeId="1" and PublishXRefId=?' +resource_query = "insert into phenotype_resources values ('e5cc773d-ca28-44e2-b2a7-1c2901794238', ?)" +link_query = 'insert into linked_phenotype_data (data_link_id, group_id, SpeciesId, InbredSetId, PublishFreezeId, dataset_name, dataset_fullname, dataset_shortname, PublishXRefId) values (?,?,?,?,?,?,?,?,?)' + +db_path = '/home/gn2/auth.db' +conn = sqlite3.connect(db_path) +cursor = conn.cursor() + +the_data = tuple((dlid, group_id, 1, 1, 1, 'BXDPublish', 'BXD Phenotypes', 'BXD Publish', pxrid) for (dlid, pxrid) in zip(data_link_ids, publishxrefs)) + +cursor.executemany(delete_query, tuple((item,) for item in publishxrefs)) +cursor.executemany(link_query, the_data) +cursor.executemany(resource_query, tuple((item,) for item in data_link_ids)) +conn.commit() +``` diff --git a/issues/gn-auth/feature-request-create-test-accounts.gmi b/issues/gn-auth/feature-request-create-test-accounts.gmi new file mode 100644 index 0000000..9e8aa45 --- /dev/null +++ b/issues/gn-auth/feature-request-create-test-accounts.gmi @@ -0,0 +1,51 @@ +# Feature Request: Create Test Accounts + +## Tags + +* assigned: fredm, alex +* status: open +* type: feature request, feature-request +* priority: medium +* keywords: gn-auth, auth, test accounts + +## Description + +From the requests on Matrix: + +@alexm +``` +fredmanglis +: Can we create a generic, verified email for CD to make it easier for people to test our services that requires login? +``` + +and from @pjotrp + +``` +yes, please. Let it expire after a few weeks, or something, if possible. So we can hand out test accounts. +``` + +We, thus, want to have a feature that allows the system administrator, or some other user with the appropriate privileges, to create a bunch of test accounts that have the following properties: + +* The accounts are pre-verified +* The accounts are temporary and are deleted after a set amount of time + +This feature will need a corresponding UI, say on GN2 to enable the users with the appropriate privileges create the accounts easily. + +### Implementation Considerations + +Only system-admin level users will be able to create the test accounts + +We'll probably need to track the plain-text passwords for these accounts, probably. + +Information to collect might include: +* Start of test period (automatic on test account creation: mandatory) +* End of test period (Entered at creation time: mandatory) +* A pattern of sorts to follow when creating the accounts — this brings up the question, is there a specific domain (e.g. …@uthsc.edu, …@genenetwork.org etc.) that these test accounts should use? +* Extra details on event/conference necessitating creation of the test account(s) (optional) + + +Interaction with the rest of the system that we need to consider and handle are: +* Assign public-read for all public data: mostly easy. +* Forgot Password: If such users request a password change, what happens? Password changes requires emails to be sent out with a time-sensitive token. The emails in the test accounts are not meant to be actual existing emails and thus cannot reliably receive such emails. This needs to be considered. Probably just prevent users from changing their passwords. +* What group to assign to these test accounts? I'm thinking probably a new group that is also temporary - deleted when users are deleted. +* What happens to any data uploaded by these accounts? They should probably not upload data meant to be permanent. All their data might need to be deleted along with the temporary accounts. diff --git a/issues/gn-auth/fix-refresh-token.gmi b/issues/gn-auth/fix-refresh-token.gmi new file mode 100644 index 0000000..222b731 --- /dev/null +++ b/issues/gn-auth/fix-refresh-token.gmi @@ -0,0 +1,58 @@ +# Fix Refresh Token + +## Tags + +* status: closed, obsolete +* priority: high +* assigned: fredm +* type: feature-request, bug +* keywords: gn-auth, token, refresh token, jwt + +## Description + +The way we currently provide the refresh token is wrong, and complicated, and +leads to subtle bugs in the clients. + +The refresh tokens should be sent back together with the access token in the +same response with the following important considerations: + +* The access token is sent back as the body of the response +* The refresh token is sent back as a httpOnly cookie +* The refresh token should be opaque to the client — if it is a JWT, encrypt it + +### Server-Side Changes + +The following changes will be necessary at the generation of the access token: + +* Generate the refresh token (possibly in the `create_token_response()` function in `gn_auth.auth.authentication.oauth2.grants.JWTBearerGrant`). Put the user ID, and expiration in the refresh token. Expiration can be provided as part of initial request. +* Encrypt the refresh token (maybe use the auth-server's public key for this) +* Save refresh token to DB with link to access token ID perhaps? +* Attach the token to the response as a httpOnly cookie + +at the refreshing of the access token, we'll need to: + +* Fetch the refresh token from the cookies +* Decrypt it +* Compare the user ID in the refresh token with that in the access token provided +* Verify refresh token has not expired +* Check that the refresh token is not revoked (revocation will happen when user logs out, on manual sys-admin revocation) +* Generate new access token +* Do we attach the same refresh token or generate a new one? + +#### Gotchas + +Since there are multiple workers, you could get a flurry of refresh requests using the same refresh token. We might need to handle that — maybe save the refresh request to DB with the ID of the access token used and the new access token, and simply return the same new access token generated by the first successful refresh worker. + +This actually kills 2 birds with the one stone: +* The refresh completes successfully if the refresh token is not expired and the access token is valid +* In case the access token and refresh token are somehow compromised, the system returns the same, possibly expired access token, rendering the compromise moot. + +### Client-Side Changes + +* Get the refresh token from the cookies rather than from the body +* Maybe: make refreshing the access token unaware of threads/workers + + +## Close as Obsolete + +We no longer do refresh tokens at all, they were a pain to look into, so I simply removed them from the system. diff --git a/issues/gn-auth/implement-redirect-on-login.gmi b/issues/gn-auth/implement-redirect-on-login.gmi new file mode 100644 index 0000000..342b2e6 --- /dev/null +++ b/issues/gn-auth/implement-redirect-on-login.gmi @@ -0,0 +1,22 @@ +# Redirect Users to the Correct URL on Login for GN2 + +## Tags + +* assigned: alexm +* priority: medium +* status: in progress +* keywords: gn-auth, auth, redirect, login, completed, closed, done +* type: feature-request + +## Description + +The goal is to redirect users to the login page for services that require authentication, and then return them to the page they were trying to access before logging in, rather than sending them to the homepage. Additionally, display the message "You are required to log in" on the current page instead of on the homepage. + +## Tasks + +* [x] Redirect users to the login page if they are not logged in. +* [x] Implement a redirect to the correct resource after users log in. + +## Notes +See this PR for commits that fixes this: +=> https://github.com/genenetwork/genenetwork2/pull/875 diff --git a/issues/gn-auth/implement-refresh-token.gmi b/issues/gn-auth/implement-refresh-token.gmi index 6b697eb..0dc63f3 100644 --- a/issues/gn-auth/implement-refresh-token.gmi +++ b/issues/gn-auth/implement-refresh-token.gmi @@ -2,7 +2,7 @@ ## Tags -* status: open +* status: closed, completed, fixed * priority: high * assigned: fredm, bonfacem * type: feature-request, bug diff --git a/issues/gn-auth/new-privilegs-samples-ordering.gmi b/issues/gn-auth/new-privilegs-samples-ordering.gmi new file mode 100644 index 0000000..be9cfe9 --- /dev/null +++ b/issues/gn-auth/new-privilegs-samples-ordering.gmi @@ -0,0 +1,32 @@ +# New Privileges: Samples Ordering + +## Tags + +* status: open +* assigned: fredm +* interested: @zachs, @jnduli, @flisso +* priority: medium +* type: feature-request, feature request +* keywords: gn-auth, auth, privileges, samples, ordering + +## Description + +From the email thread: + +``` +Regarding the order of samples, it can basically be whatever we decide it is. It just needs to stay consistent (like if there are multiple genotype files). It only really affects how it's displayed, and any other genotype files we use for mapping needs to share the same order. +``` + +Since this has nothing to do with the data analysis, this could be considered a system-level privilege. I propose + +``` +system:species:samples:ordering +``` + +or something similar. + +This can be added into some sort of generic GN2 curator role (as opposed to a data curator role). + +This allows us to have users that are "data curators" that we can offload some of the data curation work to (e.g. @flisso, @suheeta etc.). + +We would then, restrict the UI and display "curation" to users like @acenteno, @robw and @zachs. This second set of users would thus have both the "data curation" roles, and still have the "UI curation" roles. diff --git a/issues/gn-auth/pass-on-unknown-get-parameters.gmi b/issues/gn-auth/pass-on-unknown-get-parameters.gmi new file mode 100644 index 0000000..a349800 --- /dev/null +++ b/issues/gn-auth/pass-on-unknown-get-parameters.gmi @@ -0,0 +1,17 @@ +# Pass on Unknown GET Parameters + +## Tags + +* status: open +* priority: medium +* type: feature-request, enhancement +* assigned: fredm, zsloan +* keywords: gn-auth, authorisation + +## Description + +A developer or user could be needing to access some feature hidden behind some flag (so called, "feature flags"). Some of these flags are set using known (to the application and developer/user) GET parameters. + +If the user provides these get parameters before login, then go through the login process, the unknown GET parameters are dropped silently, and the user has to them manually set them up again. This, while not a big deal, is annoying and wastes a few seconds each time. + +This feature request proposes to pass any unknown GET parameters untouched through the authentication/authorisation server and back to the authenticating client during the login process, to mitigate this small annoyance. diff --git a/issues/gn-auth/problems-with-roles.gmi b/issues/gn-auth/problems-with-roles.gmi index 46f3c52..2778b61 100644 --- a/issues/gn-auth/problems-with-roles.gmi +++ b/issues/gn-auth/problems-with-roles.gmi @@ -3,9 +3,9 @@ ## Tags * type: bug -* status: open * priority: critical * assigned: fredm, zachs +* status: closed, completed, fixed * keywords: gn-auth, authorisation, authorization, roles, privileges ## Description @@ -29,8 +29,8 @@ The implementation should instead, tie the roles to the specific resource, rathe * [x] migration: Add `resource:role:[create|delete|edit]-role` privileges to `resource-owner` role * [x] migration: Create new `resource_roles` db table linking each resource to roles that can act on it, and the user that created the role * [x] migration: Drop table `group_roles` deleting all data in the table: data here could already have privilege escalation in place -* [ ] Create a new "Roles" section on the "Resource-View" page, or a separate "Resource-Roles" page to handle the management of that resource's roles -* [ ] Ensure user can only assign roles they have created - maybe? +* [x] Create a new "Roles" section on the "Resource-View" page, or a separate "Resource-Roles" page to handle the management of that resource's roles +* [x] Ensure user can only assign roles they have created - maybe? ### Fixes @@ -39,3 +39,4 @@ The implementation should instead, tie the roles to the specific resource, rathe => https://git.genenetwork.org/gn-auth/commit/?h=handle-role-privilege-escalation&id=5d34332f356164ce539044f538ed74b983fcc706 => https://git.genenetwork.org/gn-auth/commit/?h=handle-role-privilege-escalation&id=f691603a8e7a1700783b2be6f855f30d30f645f1 => https://git.genenetwork.org/gn-auth/commit/?h=handle-role-privilege-escalation&id=2363842cc81132a2592d5cda98e6ebf1305e8482 +=> https://github.com/genenetwork/genenetwork2/commit/a7a8754a57594e5705fea8e5bbea391a09e8f64c diff --git a/issues/gn-auth/registration.gmi b/issues/gn-auth/registration.gmi index 6558a6d..61ea94a 100644 --- a/issues/gn-auth/registration.gmi +++ b/issues/gn-auth/registration.gmi @@ -2,8 +2,11 @@ # Tags +* type: bug * assigned: fredm * priority: critical +* status: closed, completed, fixed +* keywords: gn-auth, auth, authorisation, authentication, registration # Issues diff --git a/issues/gn-auth/resources-duplicates-in-resources-list.gmi b/issues/gn-auth/resources-duplicates-in-resources-list.gmi new file mode 100644 index 0000000..379c1eb --- /dev/null +++ b/issues/gn-auth/resources-duplicates-in-resources-list.gmi @@ -0,0 +1,29 @@ +# Resources: Duplicates in Resources List + +## Tags + +* type: bug +* status: closed +* priority: medium +* assigned: fredm, zachs, zsloan +* keywords: gn-auth, auth, authorisation, resources + +## Reproduce + +* Go to https://genenetwork.org/ +* Sign in to the system +* Click on "Profile" at the top to go to your profile page +* Click on "Resources" on your profile page to see the resources you have access to + +## Expected + +Each resource appears on the list only one time + +## Actual + +Some resources appear more than once on the list + + +## Fix + +=> https://git.genenetwork.org/gn-auth/commit/?id=00f863b3dcb76f5fdca8e139e903e2f7edb861fc diff --git a/issues/gn-auth/rework-view-resource-page.gmi b/issues/gn-auth/rework-view-resource-page.gmi new file mode 100644 index 0000000..2d6e145 --- /dev/null +++ b/issues/gn-auth/rework-view-resource-page.gmi @@ -0,0 +1,22 @@ +# Rework "View-Resource" Page + +## Tags + +* status: closed, completed +* priority: medium +* type: enhancement +* assigned: fredm, zsloan +* keywords: gn-auth, resource, resources, view resource + +## Description + +The view resource page ('/oauth2/resource/<uuid>/view') was built with only Genotype, Phenotype, and mRNA resources in mind. + +We have since moved on, and added more types of resources (group, system, inbredset-group, etc). This leads to the page breaking for these other types of resources. + +We need to update the UI and route to ensure the page renders correctly for each type, or at the very least, redirects to the correct page (e.g. in the case of groups, which have a separate "view group" page). + + +## Close as complete + +This is fixed now. diff --git a/issues/send-out-confirmation-emails-on-registration.gmi b/issues/gn-auth/send-out-confirmation-emails-on-registration.gmi index c85e26b..e32c7c0 100644 --- a/issues/send-out-confirmation-emails-on-registration.gmi +++ b/issues/gn-auth/send-out-confirmation-emails-on-registration.gmi @@ -2,11 +2,11 @@ ## Tags -* status: open +* status: closed, completed * assigned: fredm * priority: medium -* keywords: email, user registration * type: feature request, feature-request +* keywords: gn-auth, email, user registration, email confirmation ## Description diff --git a/issues/gn-auth/test1-deployment-cant-find-templates.gmi b/issues/gn-auth/test1-deployment-cant-find-templates.gmi index bd2f57e..ca3bfad 100644 --- a/issues/gn-auth/test1-deployment-cant-find-templates.gmi +++ b/issues/gn-auth/test1-deployment-cant-find-templates.gmi @@ -4,7 +4,7 @@ * assigned: fredm, aruni * priority: critical -* status: open +* status: closed, completed, fixed * type: bug * keywords: gn-auth, deployment, test1 diff --git a/issues/gn-guile/Configurations.gmi b/issues/gn-guile/Configurations.gmi new file mode 100644 index 0000000..f1ae06e --- /dev/null +++ b/issues/gn-guile/Configurations.gmi @@ -0,0 +1,60 @@ +# gn-guile Configurations + +## Tags + +* type: bug +* assigned: +* priority: high +* status: open +* keywords: gn-guile, markdown editing +* interested: alexk, bonfacem, fredm, pjotrp + +## Description + +=> https://git.genenetwork.org/gn-guile/ The gn-guile service +is used to enable markdown editing in GeneNetwork. + +There are configuration that are needed to get the system to work as expected: + +* CURRENT_REPO_PATH: The local path to the cloned repository +* CGIT_REPO_PATH: path to the bare repo (according to docs [gn-guile-docs]) + +With these settings, we should be able to make changes to make edits. These edits, however, do not get pushed upstream. + +Looking at the code +=> https://git.genenetwork.org/gn-guile/tree/web/webserver.scm?id=4623225b0adb0846a4c2e879a33b31884d2e5f05#n212 +we see both the settings above being used, and we can further have a look at +=> https://git.genenetwork.org/gn-guile/tree/web/view/markdown.scm?id=4623225b0adb0846a4c2e879a33b31884d2e5f05#n78 the definition of git-invoke. + +With the above, we could, hypothetically, do a command like: + +``` +git -C ${CURRENT_REPO_PATH} push ${REMOTE_REPO_URI} master +``` + +where REMOTE_REPO_URI can be something like "appuser@git.genenetwork.org:/home/git/public/gn-guile" + +That means we change the (git-invoke …) call seen previously to something like: + +``` +(git-invoke +current-repo-path+ "push" +remote-repo-url+ "master") +``` + +and make sure that the "+remote-repo-url+" value is something along the URI above. + +### Gotchas + +We need to fetch and rebase with every push, to avoid conflicts. That means we'll need a sequence such as the following: + +``` +(git-invoke +current-repo-path+ "fetch" +remote-repo-url+ "master") +(git-invoke +current-repo-path+ "rebase" "origin/master") +(git-invoke +current-repo-path+ "push" +remote-repo-url+ "master") +``` + +The tests above work with a normal user. We'll be running this code within a container, so we do need to expose a specific private ssh key for the user to use to push to remote. This also means that the corresponding public key should be registered with the repository server. + +## References + +* [gn-guile-docs] https://git.genenetwork.org/gn-guile/tree/doc/git-markdown-editor.md?id=4623225b0adb0846a4c2e879a33b31884d2e5f05 + diff --git a/issues/gn-guile/activations-on-production-not-running-as-expected.gmi b/issues/gn-guile/activations-on-production-not-running-as-expected.gmi new file mode 100644 index 0000000..be9cc00 --- /dev/null +++ b/issues/gn-guile/activations-on-production-not-running-as-expected.gmi @@ -0,0 +1,57 @@ +# gn-guile: Activations on Production not Running as Expected + +## Tags + +* status: closed, completed, fixed +* priority: high +* type: bug +* assigned: bonfacem, fredm, aruni +* keywords: gn-guile, deployment, activation-service-type + +## Description + +With the recent changes to guix's `least-authority-wrapper` we can no longer write to the root filesystem ("/"). That is not much of a problem. + +So I tried adding `#:directory (dirname gn-doc-git-checkout)` to the `make-forkexec-constructor` for the `gn-guile-shepherd-service` and that actually changes the working directory of the process, as I would expect. + +In `genenetwork-activation` I add: + +``` + ;; setup correct ownership for gn-docs + (for-each (lambda (file) + (chown file + (passwd:uid (getpw "genenetwork")) + (passwd:gid (getpw "genenetwork")))) + (find-files #$(dirname gn-doc-git-checkout) + #:directories? #t)) +``` + +which, ideally, should change ownership of the parent directory of the bare git checkout for "gn-docs" when we build/start the container. This does not happen — the directory is still owned by root. + +My thinking goes, the "genenetwork" user[1] is not yet created at the point when the activation[2] is run, leading to the service failing to start. + +The reason I think this, is because, when I do: + +``` +fredm@tux04:/...$ sudo guix container exec <container-pid> /run/current-system/profile/bin/bash --login +root@genenetwork-gn2-fred /# chown -R genenetwork:genenetwork /var/lib/genenetwork/ +root@genenetwork-gn2-fred /# chown -R genenetwork:genenetwork /var/lib/genenetwork/ +``` + +The bound directory's permissions change, and we can now enable and start the service: + +``` +root@genenetwork-gn2-fred /# herd enable gn-guile +root@genenetwork-gn2-fred /# herd start gn-guile +``` + +which starts the service as expected. We can also simply restart the entire container at this point, and it works too. + +## Footnotes + +=> https://git.genenetwork.org/gn-machines/tree/genenetwork/services/genenetwork.scm?id=e425671e69a321a032134fafee974442e8c1ce6f#n167 [1] "genenetwork" user declaration +=> https://git.genenetwork.org/gn-machines/tree/genenetwork/services/genenetwork.scm?id=e425671e69a321a032134fafee974442e8c1ce6f#n680 [2] Activation of services (see also the account-service-type being extended with the "genenetwork" user). + +## Close as Fixed + +This issue is fixed, with newer Guix and changes that @bonz did to the gn-machines repo. diff --git a/issues/gn-guile/rendering-images-within-markdown-documents.gmi b/issues/gn-guile/rendering-images-within-markdown-documents.gmi new file mode 100644 index 0000000..fe3ed39 --- /dev/null +++ b/issues/gn-guile/rendering-images-within-markdown-documents.gmi @@ -0,0 +1,22 @@ +# Rendering Images Linked in Markdown Documents + +## Tags + +* status: open +* priority: high +* type: bug +* assigned: alexm, bonfacem, fredm +* keywords: gn-guile, images, markdown + +## Description + +Rendering images linked within markdown documents does not work as expected — we cannot render images if they have a relative path. +As an example see the commit below: +=> https://github.com/genenetwork/gn-docs/commit/783e7d20368e370fb497974f843f985b51606d00 + +In that commit, we are forced to use the full github uri to get the images to load correctly when rendered via gn-guile. This, has two unfortunate consequences: + +* It makes editing more difficult, since the user has to remember to find and use the full github URL for their images. +* It ties the data and code to github + +This needs to be fixed, such that any and all paths relative to the markdown file are resolved at render time automatically. diff --git a/issues/gn-guile/rework-hard-dependence-on-github.gmi b/issues/gn-guile/rework-hard-dependence-on-github.gmi new file mode 100644 index 0000000..751e9fe --- /dev/null +++ b/issues/gn-guile/rework-hard-dependence-on-github.gmi @@ -0,0 +1,21 @@ +# Rework Hard Dependence on Github + +## Tags + +* status: open +* priority: medium +* type: bug +* assigned: alexm +* assigned: bonfacem +* assigned: fredm +* keywords: gn-guile, github + +## Description + +Currently, we have a hard-dependence on Github for our source repository — you can see this in lines 31, 41, 55 and 59 of the code linked below: + +=> https://git.genenetwork.org/gn-guile/tree/web/view/markdown.scm?id=0ebf6926db0c69e4c444a6f95907e0971ae9bf40 + +The most likely reason is that the "edit online" functionality might not exist in a lot of other popular source forges. + +This is rendered moot, however, since we do provide a means to edit the data on Genenetwork itself. We might as well get rid of this option, and only allow the "edit online" feature on Genenetwork and stop relying on its presence in the forges we use. diff --git a/issues/gn-libs/jobs-allow-job-cascades.gmi b/issues/gn-libs/jobs-allow-job-cascades.gmi new file mode 100644 index 0000000..f659f32 --- /dev/null +++ b/issues/gn-libs/jobs-allow-job-cascades.gmi @@ -0,0 +1,26 @@ +# Jobs: Allow Job Cascades + +## Tags + +* status: open +* priority: medium +* type: enhancement +* assigned: fredm, zsloan +* keywords: gn-libs, genenetwork, async jobs, asynchronous jobs, background jobs + +## Description + +Some jobs could require more than a single command/script to be run to complete. + +Rather than refactoring/rewriting the entire "async jobs" feature, I propose adding a way to note who started a job, i.e. +* the user, OR +* another job + +This could be tracked in an extra field in the database, say "started_by" which can have values of the form +* "user:<user-id>" +* "job:<job-id>" +where the parts in the angle bracket (i.e. "<user-id>" and "<job-id>") are replaced by actual ids. + +## Related Issues + +=> /issues/gn-libs/jobs-track-who-jobs-belong-to diff --git a/issues/gn-libs/jobs-track-who-jobs-belong-to.gmi b/issues/gn-libs/jobs-track-who-jobs-belong-to.gmi new file mode 100644 index 0000000..00eaf21 --- /dev/null +++ b/issues/gn-libs/jobs-track-who-jobs-belong-to.gmi @@ -0,0 +1,23 @@ +# Jobs: Track Who Jobs Belong To + +## Tags + +* status: open +* priority: medium +* type: enhancement +* assigned: fredm, zsloan +* keywords: gn-libs, genenetwork, async jobs, asynchronous jobs, background jobs + +## Description + +Some features in Genenetwork require long-running processes to be triggered and run in the background. We have a way to trigger such background processes, but there is no way of tracking who started what job, and therefore, no real way for a user to list only their jobs. + +This issue will track the introduction of such tracking. This will enable the building new job-related functionality such as a user being able to: +* list their past, unexpired jobs +* delete past jobs +* possibly rerun jobs that failed but are recoverable +* see currently running jobs, and their status + +## Related Issues + +=> /issues/gn-libs/jobs-allow-job-cascades diff --git a/issues/gn-uploader/AuthorisationError-gn-uploader.gmi b/issues/gn-uploader/AuthorisationError-gn-uploader.gmi new file mode 100644 index 0000000..262ad19 --- /dev/null +++ b/issues/gn-uploader/AuthorisationError-gn-uploader.gmi @@ -0,0 +1,70 @@ +# AuthorisationError in gn uploader + +## Tags +* assigned: fredm +* status: closed, obsoleted +* priority: critical +* type: error +* key words: authorisation, permission + +## Description + +Trying to create population for Kilifish dataset in the gn-uploader webpage, +then encountered the following error: +```sh +Traceback (most recent call last): + File "/gnu/store/wxb6rqf7125sb6xqd4kng44zf9yzsm5p-profile/lib/python3.10/site-packages/flask/app.py", line 917, in full_dispatch_request + rv = self.dispatch_request() + File "/gnu/store/wxb6rqf7125sb6xqd4kng44zf9yzsm5p-profile/lib/python3.10/site-packages/flask/app.py", line 902, in dispatch_request + return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args) # type: ignore[no-any-return] + File "/gnu/store/wxb6rqf7125sb6xqd4kng44zf9yzsm5p-profile/lib/python3.10/site-packages/uploader/authorisation.py", line 23, in __is_session_valid__ + return session.user_token().either( + File "/gnu/store/wxb6rqf7125sb6xqd4kng44zf9yzsm5p-profile/lib/python3.10/site-packages/pymonad/either.py", line 89, in either + return right_function(self.value) + File "/gnu/store/wxb6rqf7125sb6xqd4kng44zf9yzsm5p-profile/lib/python3.10/site-packages/uploader/authorisation.py", line 25, in <lambda> + lambda token: function(*args, **kwargs)) + File "/gnu/store/wxb6rqf7125sb6xqd4kng44zf9yzsm5p-profile/lib/python3.10/site-packages/uploader/population/views.py", line 185, in create_population + ).either( + File "/gnu/store/wxb6rqf7125sb6xqd4kng44zf9yzsm5p-profile/lib/python3.10/site-packages/pymonad/either.py", line 91, in either + return left_function(self.monoid[0]) + File "/gnu/store/wxb6rqf7125sb6xqd4kng44zf9yzsm5p-profile/lib/python3.10/site-packages/uploader/monadic_requests.py", line 99, in __fail__ + raise Exception(_data) +Exception: {'error': 'AuthorisationError', 'error-trace': 'Traceback (most recent call last): + File "/gnu/store/38iayxz7dgm86f2x76kfaa6gwicnnjg4-profile/lib/python3.10/site-packages/flask/app.py", line 917, in full_dispatch_request + rv = self.dispatch_request() + File "/gnu/store/38iayxz7dgm86f2x76kfaa6gwicnnjg4-profile/lib/python3.10/site-packages/flask/app.py", line 902, in dispatch_request + return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args) # type: ignore[no-any-return] + File "/gnu/store/38iayxz7dgm86f2x76kfaa6gwicnnjg4-profile/lib/python3.10/site-packages/authlib/integrations/flask_oauth2/resource_protector.py", line 110, in decorated + return f(*args, **kwargs) + File "/gnu/store/38iayxz7dgm86f2x76kfaa6gwicnnjg4-profile/lib/python3.10/site-packages/gn_auth/auth/authorisation/resources/inbredset/views.py", line 95, in create_population_resource + ).then( + File "/gnu/store/38iayxz7dgm86f2x76kfaa6gwicnnjg4-profile/lib/python3.10/site-packages/pymonad/monad.py", line 152, in then + result = self.map(function) + File "/gnu/store/38iayxz7dgm86f2x76kfaa6gwicnnjg4-profile/lib/python3.10/site-packages/pymonad/either.py", line 106, in map + return self.__class__(function(self.value), (None, True)) + File "/gnu/store/38iayxz7dgm86f2x76kfaa6gwicnnjg4-profile/lib/python3.10/site-packages/gn_auth/auth/authorisation/resources/inbredset/views.py", line 98, in <lambda> + "resource": create_resource( + File "/gnu/store/38iayxz7dgm86f2x76kfaa6gwicnnjg4-profile/lib/python3.10/site-packages/gn_auth/auth/authorisation/resources/inbredset/models.py", line 25, in create_resource + return _create_resource(cursor, + File "/gnu/store/38iayxz7dgm86f2x76kfaa6gwicnnjg4-profile/lib/python3.10/site-packages/gn_auth/auth/authorisation/checks.py", line 56, in __authoriser__ + raise AuthorisationError(error_description) +gn_auth.auth.errors.AuthorisationError: Insufficient privileges to create a resource +', 'error_description': 'Insufficient privileges to create a resource'} + +``` +The error above resulted from the attempt to upload the following information on the gn-uploader-`create population section` +Input details are as follows: +Full Name: Kilifish F2 Intercross Lines +Name: KF2_Lines +Population code: KF2 +Description: Kilifish second generation population +Family: Crosses, AIL, HS +Mapping Methods: GEMMA, QTLReaper, R/qtl +Genetic type: intercross + +And when pressed the `Create Population` icon, it led to the error above. + +## Closed as Obsolete + +* The service this was happening on (https://staging-uploader.genenenetwork.org) is no longer running +* Most of the authorisation issues are resolved in newer code diff --git a/issues/gn-uploader/check-genotypes-in-database-too.gmi b/issues/gn-uploader/check-genotypes-in-database-too.gmi new file mode 100644 index 0000000..4e034b7 --- /dev/null +++ b/issues/gn-uploader/check-genotypes-in-database-too.gmi @@ -0,0 +1,22 @@ +# Check Genotypes in the Database for R/qtl2 Uploads + +## Tags + +* type: bug +* assigned: fredm +* priority: high +* status: closed, completed, fixed +* keywords: gn-uploader, uploader, upload, genotypes, geno + +## Description + +Currently, the uploader expects that a R/qtl2 bundle be self-contained, i.e. it contains all the genotypes and other data that fully describe the data in that bundle. + +This is unnecessary, in a lot of situations, seeing as Genenetwork might already have the appropriate genotypes already in its database. + +This issue tracks the implementation for the check of the genotypes against both the genotypes provided in the bundle, and those already in the database. + +### Updates + +Fixed in +=> https://git.genenetwork.org/gn-uploader/commit/?id=0e74a1589db9f367cdbc3dce232b1b6168e3aca1 this commit diff --git a/issues/export-uploaded-data-to-RDF-store.gmi b/issues/gn-uploader/export-uploaded-data-to-RDF-store.gmi index c39edec..3ef05cd 100644 --- a/issues/export-uploaded-data-to-RDF-store.gmi +++ b/issues/gn-uploader/export-uploaded-data-to-RDF-store.gmi @@ -6,7 +6,7 @@ * priority: medium * type: feature-request * status: open -* keywords: API, data upload +* keywords: API, data upload, gn-uploader ## Description @@ -73,10 +73,16 @@ The metadata is useful for searching for the data. The "metadata->rdf" project[4 * [ ] How do we handle this? +## Related Issues and Topics + +=> https://issues.genenetwork.org/topics/next-gen-databases/design-doc +=> https://issues.genenetwork.org/topics/lmms/rqtl2/using-rqtl2-lmdb-adapter +=> https://issues.genenetwork.org/issues/dump-sample-data-to-lmdb +=> https://issues.genenetwork.org/topics/database/genotype-database ## Footnotes -=> https://gitlab.com/fredmanglis/gnqc_py 1: QC/Data upload project repository +=> https://git.genenetwork.org/gn-uploader/ 1: QC/Data upload project (gn-uploader) repository => https://github.com/genenetwork/genenetwork3/pull/130 2: Munyoki's Pull request => https://github.com/BonfaceKilz/gn-dataset-dump 3: Dataset -> LMDB export repository -=> https://github.com/genenetwork/dump-genenetwork-database 4: Metadata -> RDF export repository +=> https://git.genenetwork.org/gn-transform-databases/ 4: Metadata -> RDF export repository diff --git a/issues/gn-uploader/gn-uploader-container-running-wrong-gn2.gmi b/issues/gn-uploader/gn-uploader-container-running-wrong-gn2.gmi index d2c33e8..5a5cdfa 100644 --- a/issues/gn-uploader/gn-uploader-container-running-wrong-gn2.gmi +++ b/issues/gn-uploader/gn-uploader-container-running-wrong-gn2.gmi @@ -3,7 +3,7 @@ ## Tags * assigned: fredm, aruni -* status: open +* status: closed, completed * priority: high * type: bug * keywords: guix, gn-uploader diff --git a/issues/gn-uploader/guix-build-gn-uploader-error.gmi b/issues/gn-uploader/guix-build-gn-uploader-error.gmi index 44a5c4b..aeb6308 100644 --- a/issues/gn-uploader/guix-build-gn-uploader-error.gmi +++ b/issues/gn-uploader/guix-build-gn-uploader-error.gmi @@ -86,7 +86,7 @@ Filesystem Size Used Avail Use% Mounted on so we know that's not a problem. -A similar thing had shown up on space.uthsc.edu. +A similar thing had shown up on our space server. ### More Troubleshooting Efforts diff --git a/issues/gn-uploader/handling-tissues-in-uploader.gmi b/issues/gn-uploader/handling-tissues-in-uploader.gmi index 826af15..0c43040 100644 --- a/issues/gn-uploader/handling-tissues-in-uploader.gmi +++ b/issues/gn-uploader/handling-tissues-in-uploader.gmi @@ -2,11 +2,11 @@ ## Tags -* status: open +* status: closed, wontfix * priority: high * assigned: fredm * type: feature-request -* keywords: gn-uploader, tissues +* keywords: gn-uploader, tissues, archived ## Description @@ -112,3 +112,9 @@ ALTER TABLE Tissue MODIFY Id INT(5) UNIQUE NOT NULL; * [1] https://gn1.genenetwork.org/webqtl/main.py?FormID=schemaShowPage#ProbeFreeze * [2] https://gn1.genenetwork.org/webqtl/main.py?FormID=schemaShowPage#Tissue + +## Closed as WONTFIX + +I am closing this issue because it was created (2024-03-28) while I had a fundamental misunderstanding of the way data is laid out in the database. + +The information on the schema/layout of the tables is still useful, but chances are, we'll look at the tables themselves anyway should we need to figure out the schema. diff --git a/issues/gn-uploader/link-authentication-authorisation.gmi b/issues/gn-uploader/link-authentication-authorisation.gmi new file mode 100644 index 0000000..b64f887 --- /dev/null +++ b/issues/gn-uploader/link-authentication-authorisation.gmi @@ -0,0 +1,21 @@ +# Link Authentication/Authorisation + +## Tags + +* status: closed, completed +* assigned: fredm +* priority: critical +* type: feature request, feature-request +* keywords: gn-uploader, gn-auth, authorisation, authentication, uploader, upload + +## Description + +The last chain in the link to the uploads is the authentication/authorisation. Once the user uploads their data, they need access to it. The auth system, by default, will deny anyone/everyone access to any data that is not linked to a resource and which no user has any roles allowing them access to the data. + +We, currently, assign such data to the user manually, but that is not a sustainable way of working, especially as the uploader is exposed to more and more users. + +### Close as Completed + +The current iteration of the uploader does actually take into account the user that is uploading the data, granting them ownership of the uploaded data. By default, the data is not public, and is only accessible to the user who uploaded it. + +The user who uploads the data (and therefore own it) can later grant access to other users of the system. diff --git a/issues/quality-control/move-uploader-to-tux02.gmi b/issues/gn-uploader/move-uploader-to-tux02.gmi index 4459433..20c5b24 100644 --- a/issues/quality-control/move-uploader-to-tux02.gmi +++ b/issues/gn-uploader/move-uploader-to-tux02.gmi @@ -5,7 +5,7 @@ * type: migration * assigned: fredm * priority: high -* status: open +* status: closed, completed, fixed * keywords: gn-uploader, guix, container, deploy ## Databases @@ -17,13 +17,13 @@ This implies separate configurations, and separate startup. Some of the things to do to enable this, then, are: -- [x] Provide separate configs and run db server on separate port +* [x] Provide separate configs and run db server on separate port - Configs put in /etc/mysql3307 - Selected port 3307 - datadir in /var/lib/mysql3307 -> /export5 -- [x] Provide separate data directory for the content +* [x] Provide separate data directory for the content - extract backup -- [x] Maybe suffix the files with the port number, e.g. +* [x] Maybe suffix the files with the port number, e.g. ``` datadir = /var/lib/mysql3307 socket = /var/run/mysqld/mysqld3307.sock diff --git a/issues/gn-uploader/probeset-not-applicable-to-all-data.gmi b/issues/gn-uploader/probeset-not-applicable-to-all-data.gmi index 1841d36..af3b274 100644 --- a/issues/gn-uploader/probeset-not-applicable-to-all-data.gmi +++ b/issues/gn-uploader/probeset-not-applicable-to-all-data.gmi @@ -4,7 +4,7 @@ * type: bug * assigned: fredm -* status: open +* status: closed * priority: high * keywords: gn-uploader, uploader, ProbeSet @@ -20,3 +20,10 @@ applicable to our data, I don't think. ``` It seems like some of the data does not require a ProbeSet, and in that case, it should be possible to add it without one. + + +## Notes + +This "bug" is obsoleted by the fact that the implementation leading to it was entirely wrong. + +The feature that was leading to this bug no longer exists, and will have to be re-implemented from scratch with the involvement of @acenteno. diff --git a/issues/gn-uploader/provide-page-for-uploaded-data.gmi b/issues/gn-uploader/provide-page-for-uploaded-data.gmi new file mode 100644 index 0000000..5ab7f80 --- /dev/null +++ b/issues/gn-uploader/provide-page-for-uploaded-data.gmi @@ -0,0 +1,27 @@ +# Provide Page/Link for/to Uploaded Data + +## Tags + +* status: closed, completed +* assigned: fredm +* priority: medium +* type: feature, feature request, feature-request +* keywords: gn-uploader, uploader, data dashboard + +## Description + +Once a user has uploaded their data, provide them with a landing page/dashboard for the data they have uploaded, with details on what that data is. + +* Should we provide a means to edit the data here (mostly to add metadata and the like)? +* Maybe the page should actually be shown on GN2? + +## Blockers + +Depends on + +=> /issues/gn-uploader/link-authentication-authorisation + + +## Close as complete + +Current uploader directs user to a view of the data they uploader on GN2. This is complete. diff --git a/issues/gn-uploader/replace-redis-with-sqlite3.gmi b/issues/gn-uploader/replace-redis-with-sqlite3.gmi new file mode 100644 index 0000000..d3f94f0 --- /dev/null +++ b/issues/gn-uploader/replace-redis-with-sqlite3.gmi @@ -0,0 +1,29 @@ +# Replace Redis with SQL + +## Tags + +* status: open +* priority: low +* assigned: fredm +* type: feature, feature-request, feature request +* keywords: gn-uploader, uploader, redis, sqlite, sqlite3 + +## Description + +We currently (as of 2024-06-27) use Redis for tracking any asynchronous jobs (e.g. QC on uploaded files). + +A lot of what we use redis for, we can do in one of the many SQL databases (we'll probably use SQLite3 anyway), which are more standardised, and easier to migrate data from and to. It has the added advantage that we can open multiple connections to the database, enabling the different processes to update the status and metadata of the same job consistently. + +Changes done here can then be migrated to the other systems, i.e. GN2, GN3, and gn-auth, as necessary. + +### 2025-12-31: Progress Update + +Initial basic implementation can be found in: + +=> https://git.genenetwork.org/gn-libs/tree/gn_libs/jobs +=> https://git.genenetwork.org/gn-uploader/commit/?id=774a0af9db439f50421a47249c57e5a0a6932301 +=> https://git.genenetwork.org/gn-uploader/commit/?id=589ab74731aed62b1e1b3901d25a95fc73614f57 + +and others. + +More work needs to be done to clean-up some minor annoyances. diff --git a/issues/gn-uploader/resume-upload.gmi b/issues/gn-uploader/resume-upload.gmi new file mode 100644 index 0000000..0f9ba30 --- /dev/null +++ b/issues/gn-uploader/resume-upload.gmi @@ -0,0 +1,41 @@ +# gn-uploader: Resume Upload + +## Tags + +* status: closed, completed, fixed +* priority: medium +* assigned: fredm, flisso +* type: feature request, feature-request +* keywords: gn-uploader, uploader, upload, resume upload + +## Description + +If a user is uploading a particularly large file, we might need to provide a way for the user to resume their upload of the file. + +Maybe this can wait until we have +=> /issues/gn-uploader/link-authentication-authorisation linked authentication/authorisation to gn-uploader. +In this way, each upload can be linked to a specific user. + +### TODOs + +* [x] Build UI to allow uploads +* [x] Build back-end to handle uploads +* [x] Handle upload failures/errors +* [x] Deploy to staging + +### Updates + +=> https://git.genenetwork.org/gn-uploader/commit/?id=9a8dddab072748a70d43416ac8e6db69ad6fb0cb +=> https://git.genenetwork.org/gn-uploader/commit/?id=df9da3d5b5e4382976ede1b54eb1aeb04c4c45e5 +=> https://git.genenetwork.org/gn-uploader/commit/?id=47c2ea64682064d7cb609e5459d7bd2e49efa17e +=> https://git.genenetwork.org/gn-uploader/commit/?id=a68fe177ae41f2e58a64b3f8dcf3f825d004eeca + +### Possible Resources + +=> https://javascript.info/resume-upload +=> https://github.com/23/resumable.js/ +=> https://www.dropzone.dev/ +=> https://stackoverflow.com/questions/69339582/what-hash-python-3-hashlib-yields-a-portable-hash-of-file-contents + + +This is mostly fixed. Any arising bugs can be tracked is separate issues. diff --git a/issues/gn-uploader/speed-up-rqtl2-qc.gmi b/issues/gn-uploader/speed-up-rqtl2-qc.gmi new file mode 100644 index 0000000..43e6d49 --- /dev/null +++ b/issues/gn-uploader/speed-up-rqtl2-qc.gmi @@ -0,0 +1,30 @@ +# Speed Up QC on R/qtl2 Bundles + +## Tags + +## Description + +The default format for the CSV files in a R/qtl2 bundle is: + +``` +matrix of individuals × (markers/phenotypes/covariates/phenotype covariates/etc.) +``` + +(A) (f/F)ile(s) in the R/qtl2 bundle could however +=> https://kbroman.org/qtl2/assets/vignettes/input_files.html#csv-files be transposed, +which means the system needs to "un-transpose" the file(s) before processing. + +Currently, the system does this by reading all the files of a particular type, and then "un-transposing" the entire thing. This leads to a very slow system. + +This issue proposes to do the quality control/assurance processing on each file in isolation, where possible - this will allow parallelisation/multiprocessing of the QC checks. + +The main considerations that need to be handled are as follows: + +* Do QC on (founder) genotype files (when present) before any of the other files +* Genetic and physical maps (if present) can have QC run on them after the genotype files +* Do QC on phenotype files (when present) after genotype files but before any other files +* Covariate and phenotype covariate files come after the phenotype files +* Cross information files … ? +* Sex information files … ? + +We should probably detail the type of QC checks done for each type of file diff --git a/issues/gn-uploader/uploading-samples.gmi b/issues/gn-uploader/uploading-samples.gmi new file mode 100644 index 0000000..11842b9 --- /dev/null +++ b/issues/gn-uploader/uploading-samples.gmi @@ -0,0 +1,51 @@ +# Uploading Samples + +## Tags + +* status: open +* assigned: fredm +* interested: acenteno, zachs, flisso +* priority: high +* type: feature-request +* keywords: gn-uploader, uploader, samples, strains + +## Description + +This will track the various notes regarding the upload of samples onto GeneNetwork. + +### Sample Lists + +From the email thread(s) with @zachs, @flisso and @acenteno + +``` +When there's a new set of individuals, it generally needs to be added as a new group. In the absence of genotype data, a "dummy" .geno file currently needs to be generated* in order to define the sample list (if you look at the list of .geno files in genotype_files/genotype you'll find some really small files that just have either a single marker or a bunch of fake markers calls "Marker1, Marker2, etc" - these are solely just used to get the samplelist from the columns). So in theory such a file could be generated as a part of the upload process in the absence of genotypes +``` + +We note, however, that the as @zachs mentions + +``` +This is really goofy and should probably change. I've brought up the idea of just replacing these with JSON files containing group metadata (including samplelist), but we've never actually gone through with making any change to this. I already did something sorta similar to this with the existing JSON files (in genotype_files/genotype), but those are currently only used in situations where there are either multiple genotype files, or a genotype file only contains a subset of samples/strains from a group (so the JSON file tells mapping to only use those samples/strains). +``` + +We need to explore whether such a change might need updates to the GN2/GN3 code to ensure code that depends on these dummy files can also use the new format JSON files too. + +Regarding the order of the samples, from the email thread: + +``` +Regarding the order of samples, it can basically be whatever we decide it is. It just needs to stay consistent (like if there are multiple genotype files). It only really affects how it's displayed, and any other genotype files we use for mapping needs to share the same order. +``` + +The ordering of the samples has no bearing on the analysis of the data, i.e. it does not affect the results of computations. + + +### Curation + +``` +But any time new samples are involved, there probably needs to be some explicit confirmation by a curator like Rob (since we want to avoid a situation where a sample/strain just has a typo or somethin and we treat it like a new sample/strain). +``` + +also + +``` +When there's a mix of existing individuals, I think it's usually the case that it's the same group (that is being expanded with new individuals), but anything that involves adding new samples should probably involve some sort of direct/explicit confirmation from a curator like Rob or something. +``` diff --git a/issues/gn-volt-genofiles-parsing-integration.gmi b/issues/gn-volt-genofiles-parsing-integration.gmi index 8d3d149..e1b0162 100644 --- a/issues/gn-volt-genofiles-parsing-integration.gmi +++ b/issues/gn-volt-genofiles-parsing-integration.gmi @@ -5,7 +5,7 @@ * assigned: alexm, * type: improvement * priority: high -* status: in progress +* status: stalled, closed. ## Notes diff --git a/issues/gnqa/GNQA-for-evaluation.gmi b/issues/gnqa/GNQA-for-evaluation.gmi index 9f4a861..0b2e352 100644 --- a/issues/gnqa/GNQA-for-evaluation.gmi +++ b/issues/gnqa/GNQA-for-evaluation.gmi @@ -5,7 +5,7 @@ * Assigned: alexm, shelbys * Keywords: UI, GNQA, evaluation * Type: immediate -* Status: In Progress +* Status: completed ## Description @@ -13,5 +13,5 @@ We need to publish a paper on GeneNetwork Question & Answering system. To that e ## Tasks -* [ ] Add a thumbs up and down for rating the answer to a question -* [ ] Ensure to log the questions, respones, and ratings of each questions +* [X] Add a thumbs up and down for rating the answer to a question +* [X] Ensure to log the questions, respones, and ratings of each questions diff --git a/issues/gnqna/rating-system-has-no-indication-for-login-requirement.gmi b/issues/gnqa/Login_no-indicator-for-req.gmi index 7ed713a..7ed713a 100644 --- a/issues/gnqna/rating-system-has-no-indication-for-login-requirement.gmi +++ b/issues/gnqa/Login_no-indicator-for-req.gmi diff --git a/issues/fetch-pubmed-references-to-gnqa.gmi b/issues/gnqa/fetch-pubmed-references-to-gnqa.gmi index 63351d1..43c45cf 100644 --- a/issues/fetch-pubmed-references-to-gnqa.gmi +++ b/issues/gnqa/fetch-pubmed-references-to-gnqa.gmi @@ -5,7 +5,7 @@ * assigned: alexm * keywords: llm, pubmed, api, references * type: enhancements -* status: in progress +* status: completed, closed ## Description @@ -18,13 +18,13 @@ The task is to integrate PubMed references into the GNQA system by querying the * [x] Query the API with the publication titles. -* [] Display the PubMed information as reference information on the GN2 user interface. +* [x] Display the PubMed information as reference information on the GN2 user interface. -* [] dump the results to a DB e.g sqlite,lmdb +* [x] dump the results to a DB e.g sqlite,lmdb * [x] If references are not found, perform a lossy search or list the closest three papers. -* [] reimplement the reference ui to render the references as modal objects +* [x] reimplement the reference ui to render the references as modal objects For lossy search, see: diff --git a/issues/gn_llm_db_cache_integration.gmi b/issues/gnqa/gn_llm_db_cache_integration.gmi index 86f7c80..86f7c80 100644 --- a/issues/gn_llm_db_cache_integration.gmi +++ b/issues/gnqa/gn_llm_db_cache_integration.gmi diff --git a/issues/gnqa/gn_llm_integration_using_cached_searches.gmi b/issues/gnqa/gn_llm_integration_using_cached_searches.gmi new file mode 100644 index 0000000..e20b5a3 --- /dev/null +++ b/issues/gnqa/gn_llm_integration_using_cached_searches.gmi @@ -0,0 +1,43 @@ +# GN2 Integration with LLM search using cached results + +## Tags + +* assigned: jnduli, alexm, bmunyoki +* keywords: llm, genenetwork2 +* type: enhancement +* status: open + +## Description + +We'd like to include LLM searches integrated into our GN searches, when someone attempts a Xapian search e.g. when I search for `wiki:rif group:mouse nicotine`, we'd do a corresponding search for `rif mouse nicotine` on LLMs, and show the results on the main page. + +Another example: + +xapian search: rif:glioma species:human group:gtex_v8 +llm search: glioma human gtex_v8 + + +This can be phased out into + +* [ ] 1. UI integration, where we modify the search page to include a dummy content box +* [ ] 2. LLM search integration, where we perform a search and modify UI to show the results. This can either be async (i.e. the search results page waits for the LLM search results) or sync (i.e. we load the search results page after we've got the LLM results) +* [x] 2.1 create a copy branch for the gnqa-api branch +* [x] 2.2 create a PR containing all the branches +* [ ] 2.3 how much would it take to get the qa_*** branch merged into main?? +* [ ] 3. Cache design and integration: we already have some + +cache using redis (gn search history), so we may use this for the moment. + + +Let's use flag: `LLM_SEARCH_ENABLED` to enable/disable this feature during development to make sure we don't release this before it's ready. + + +## Notes + +The branch for merging to gn2: + +https://github.com/genenetwork/genenetwork2/pull/863 + +The branch for merging to gn3: + +https://github.com/genenetwork/genenetwork3/pull/188 \ No newline at end of file diff --git a/issues/gnqa/gnqa_integration_to_global_search_Design.gmi b/issues/gnqa/gnqa_integration_to_global_search_Design.gmi new file mode 100644 index 0000000..0d5afd0 --- /dev/null +++ b/issues/gnqa/gnqa_integration_to_global_search_Design.gmi @@ -0,0 +1,74 @@ +# GNQA Integration to Global Search Design Proposal + +## Tags +* assigned: jnduli, alexm +* keywords: llm, genenetwork2 +* type: feature +* status: complete, closed, done + +## Description +This document outlines the design proposal for integrating GNQA into the Global Search feature. + +## High-Level Design + +### UI Design +When the GN2 Global Search page loads: +1. A request is initiated via HTMX to the GNQA search page with the search query. +2. Based on the results, a page or subsection is rendered, displaying the query and the answer, and providing links to references. + +For more details on the UI design, refer to the pull request: +=> https://github.com/genenetwork/genenetwork2/pull/862 + +### Backend Design +The API handles requests to the Fahamu API and manages result caching. Once a request to the Fahamu API is successful, the results are cached using SQLite for future queries. Additionally, a separate API is provided to query cached results. + +## Deep Dive + +### Caching Implementation +For caching, we will use SQLite3 since it is already implemented for search history. Based on our study, this approach will require minimal space: + +*Statistical Estimation:* +We calculated that this caching solution would require approximately 79MB annually for an estimated 20 users, each querying the system 5 times a day. + +Why average request size per user and how we determined this? +The average request size was an upper bound calculation for documents returned from the Fahamu API. + +why we're assuming 20 users making 5 requests per day? + +We’re assuming 20 users making 5 requests per day to estimate typical usage of GN2 services +### Error Handling +* Handle cases where users are not logged in, as GNQA requires authentication. +* Handle scenarios where there is no response from Fahamu. +* Handle general errors. + +### Passing Questions to Fahamu +We can choose to either pass the entire query from the user to Fahamu or parse the query to search for keywords. + +### Generating Possible Questions +It is possible to generate potential questions based on the user's search and render those to Fahamu. Fahamu would then return possible related queries. + +## Related Issues +=> https://issues.genenetwork.org/issues/gn_llm_integration_using_cached_searches + +## Tasks + +* [x] Initiate a background task from HTMX to Fahamu once the search page loads. +* [x] Query Fahamu for data. +* [x] Cache results from Fahamu. +* [x] Render the UI page with the query and answer. +* [x] For "See more," render the entire GNQA page with the query, answer, references, and PubMed data. +* [x] Implement parsing for Xapian queries to normal queries. +* [x] Implement error handling. +* [x] reimplement how gnqa uses GN-AUTH in gn3. +* [x] Query Fahamu to generate possible questions based on certain keywords. + + +## Notes +From the latest Fahamu API docs, they have implemented a way to include subquestions by setting `amplify=True` for the POST request. We also have our own implementation for parsing text to extract questions. + +## PRs Merged Related to This + +=> https://github.com/genenetwork/genenetwork2/pull/868 +=> https://github.com/genenetwork/genenetwork2/pull/862 +=> https://github.com/genenetwork/genenetwork2/pull/867 +=> https://github.com/genenetwork/genenetwork3/pull/191 \ No newline at end of file diff --git a/issues/implement-auth-to-gn-llm.gmi b/issues/gnqa/implement-auth-to-gn-llm.gmi index 496a7cb..2a5456b 100644 --- a/issues/implement-auth-to-gn-llm.gmi +++ b/issues/gnqa/implement-auth-to-gn-llm.gmi @@ -6,7 +6,7 @@ * keywords: llm, auth * type: feature * priority: high -* status: done, completed +* status: done, completed, closed ## Description diff --git a/issues/gnqa/implement-no-login-requirement-for-gnqa.gmi b/issues/gnqa/implement-no-login-requirement-for-gnqa.gmi new file mode 100644 index 0000000..5b0a1ff --- /dev/null +++ b/issues/gnqa/implement-no-login-requirement-for-gnqa.gmi @@ -0,0 +1,20 @@ +# Implement No-Login Requirement for GNQA + +## Tags + +* type: feature +* status: completed, closed +* priority: medium +* assigned: alexm, +* keywords: gnqa, user experience, authentication, login, llm + +## Description +This feature will allow usage of LLM/GNQA features without requiring user authentication, while implementing measures to filter out bots + + +## Tasks + +* [x] If logged in: perform AI search with zero penalty +* [x] Add caching lifetime to save on token usage +* [x] Routes: check for referrer headers — if the previous search was not from the homepage, perform AI search +* [x] If global search returns more than *n* results (*n = number*), perform an AI search diff --git a/issues/implement-reference-rating-gn-llm.gmi b/issues/gnqa/implement-reference-rating-gn-llm.gmi index f646a6f..f646a6f 100644 --- a/issues/implement-reference-rating-gn-llm.gmi +++ b/issues/gnqa/implement-reference-rating-gn-llm.gmi diff --git a/issues/integrate_gn_llm_search.gmi b/issues/gnqa/integrate_gn_llm_search.gmi index 5dfd9da..5dfd9da 100644 --- a/issues/integrate_gn_llm_search.gmi +++ b/issues/gnqa/integrate_gn_llm_search.gmi diff --git a/issues/merge-gnqa-to-production.gmi b/issues/gnqa/merge-gnqa-to-production.gmi index 3d34bb1..6e5f119 100644 --- a/issues/merge-gnqa-to-production.gmi +++ b/issues/gnqa/merge-gnqa-to-production.gmi @@ -4,6 +4,7 @@ * assigned: alexm, * keywords: production, GNQA, integration +* status: closed, completed ## Description @@ -12,5 +13,5 @@ be pushed to production. We need to allow only logged-in users to access the ser ## Tasks -* [] Integrate GN-auth for the service -* [] Push production to the current commit \ No newline at end of file +* [x] Integrate GN-auth for the service +* [x] Push production to the current commit \ No newline at end of file diff --git a/issues/refactor-gn-llm-code.gmi b/issues/gnqa/refactor-gn-llm-code.gmi index 6e33737..64c43c4 100644 --- a/issues/refactor-gn-llm-code.gmi +++ b/issues/gnqa/refactor-gn-llm-code.gmi @@ -5,7 +5,7 @@ * assigned:alexm,shelby * keywords:refactoring,llm,tests * type: enchancements -* status: in progress +* status: completed, closed ## Description diff --git a/issues/gnqna/query-bug-DatabaseError.gmi b/issues/gnqna/query-bug-DatabaseError.gmi new file mode 100644 index 0000000..b8c1cfc --- /dev/null +++ b/issues/gnqna/query-bug-DatabaseError.gmi @@ -0,0 +1,37 @@ +# Query Bug: DatabaseError + +## Tags + +* assigned: fredm, bonfacem +* priority: high +* status: open +* type: bug +* keywords: gnqna + +## Descriptions + +* Go to https://genenetwork.org/gnqna +* Type in a query +* Press "Enter" +* Observe the error "DatabaseError" with a status code of 500. + +Expected: Query returns a result. + + +## Troubleshooting: 2025-10-27 + +* GNQNA's deployment is not part of the gn-machine's definitions! + +## Troubleshooting: 2025-12-31 + +If a user **IS NOT** logged in, the system responds with: + +``` +Search_Query: +Status_Code: 500 +Error/Reason: Login/Verification required to make this request +``` + +On the other hand, if a user is logged in, a query returns a result. + +We, therefore, probably need to notify the user that they need to be logged in to use this service. diff --git a/issues/guix-bioinformatics/guix-updates.gmi b/issues/guix-bioinformatics/guix-updates.gmi new file mode 100644 index 0000000..9c65fb9 --- /dev/null +++ b/issues/guix-bioinformatics/guix-updates.gmi @@ -0,0 +1,18 @@ +# Planned Guix Updates + +## Tags + +* status: open +* priority: medium +* type: enhancement +* assigned: fredm, bonfacem +* keywords: guix-bioinformatics, guix +* interested: pjotrp, aruni + +## Description + +The following outlines issues around the next upgrade: + +* Update pinned guix commit to the latest and see whether inferior profiles for the laminar user are properly created. +* Rust packages (new package build system) we need to think about. + diff --git a/issues/guix-bioinformatics/pin-channels-commits.gmi b/issues/guix-bioinformatics/pin-channels-commits.gmi new file mode 100644 index 0000000..216dd24 --- /dev/null +++ b/issues/guix-bioinformatics/pin-channels-commits.gmi @@ -0,0 +1,39 @@ +# Pin Channel Commits; Decouple from Guix + +## Tags + +* status: closed +* priority: medium +* type: enhancement +* assigned: fredm, bonfacem, aruni +* keywords: guix-bioinformatics, guix +* interested: pjotrp, aruni + +## Description + +Changes in upstream Guix often lead to deployment issues, due to breakages caused by changes in how GNU Guix does things. This interrupts our day-to-day operations, leading us to scramble to fix the breakages and make the builds sane again. + +In order to avoid these breakages in the future, we'll need to actually pin the commit(s) for all the channels we depend on, to avoid surprises down the line. + +### Channel Dependencies + +We depend on the following channels in guix-bioinformatics: + +* guix: Mainline Guix channel +* guix-past: Channel for old packages, no longer maintained on guix mainline +* guix-rust-past-crates: Channel for rust packages using the old packaging form +* guix-forge: Manages building containers and whatnot. The dependence is implicit here, but it is one of the main causes of breakages + +### Tasks + +* [x] Pin guix channel +* [x] Pin guix-past +* [x] Pin guix-rust-past-crates channel +* [x] Pin guix-forge channel +* [ ] Move packages from (gn packages bioinformatics) to upstream (gnu packages bioinformatics) + +### Solution + +To allow guix-bioinformatics to continue improving, while preventing random breakages, we stopped depending on guix-bioinformatics directly, rather, we changed our main channel to gn-machines, and there, we pinned the version of guix-bioinformatics we depend on. + +This allows us to continue updating our packages while keeping the channel dependencies relatively stable. diff --git a/issues/guix-ci-tests.gmi b/issues/guix-ci-tests.gmi new file mode 100644 index 0000000..ce56705 --- /dev/null +++ b/issues/guix-ci-tests.gmi @@ -0,0 +1,47 @@ +# Guix CI failure: guix-past build breaks due to missing (libchop) + +# Tags + +* assigned: bonfacem +* type: bug, infrastructure +* priority: high + +# Notes + +After fixing a permissions issue in the Laminar CI environment (/var/guix/profiles/per-user/laminar): + +``` +[laminar] Executing cfg/jobs/gn-libs.run Backtrace: 9 (primitive-load "/var/lib/laminar/cfg/jobs/gn-libs.run") In ice-9/boot-9.scm: 152:2 8 (with-fluid* _ _ _) In ice-9/eval.scm: 202:51 7 (_ #(#(#<directory (guile-user) 7fce0bc71c80> #<pro?> ?))) 293:34 6 (_ #(#(#<directory (guile-user) 7fce0bc71c80> #<pro?> ?))) In guix/inferior.scm: 1006:4 5 (inferior-for-channels _ #:cache-directory _ #:ttl _) In ice-9/boot-9.scm: 1752:10 4 (with-exception-handler _ _ #:unwind? _ # _) In guix/store.scm: 690:37 3 (thunk) 1331:8 2 (call-with-build-handler #<procedure 7fce00e9f0c0 at g?> ?) In guix/inferior.scm: 951:2 1 (cached-channel-instance #<store-connection 256.100 7f?> ?) In ice-9/boot-9.scm: 1685:16 0 (raise-exception _ #:continuable? _) ice-9/boot-9.scm:1685:16: In procedure raise-exception: In procedure mkdir: Permission denied: "/var/guix/profiles/per-user/laminar" +``` + +... by (inside the container) running: + +``` +mkdir -p /var/guix/profiles/per-user/laminar +chown -R laminar:laminar /var/guix/profiles/per-user/laminar +``` + +... the CI progressed further but now fails when attempting to build guix-past. The failure is caused by an unbound variable error for the module (libchop), indicating a mismatch or missing dependency in the pinned Guix channels. + +Error Log: + +``` +(exception unbound-variable (value #f) + (value "Unbound variable: ~S") + (value (libchop)) (value #f)) + +builder for /gnu/store/gx57wj08yv0x0g1r8rbnwcp2fc58lqvx-guix-past.drv +failed to produce output path +/gnu/store/n3q0sgqwm9mwvna5215npwmdfigfyr9f-guix-past + +cannot build derivation +/gnu/store/3fwagz1p9vv3h020lwb2ab52f6wj6z1g-profile.drv: +1 dependencies couldn't be built +``` + +# Resolution + +* Inside genenetwork-development.scm, manually create `/var/guix/profiles/per-user/laminar` if it doesn't exist. +* Update the relevant .guix-channel file to match channels in guix-bioinformatics. + +* closed diff --git a/issues/implement-gn-markdown-editor.gmi b/issues/implement-gn-markdown-editor.gmi index 7d7d08f..a0d386b 100644 --- a/issues/implement-gn-markdown-editor.gmi +++ b/issues/implement-gn-markdown-editor.gmi @@ -13,7 +13,7 @@ Example of similar implementation * assigned: alexm * type: enhancement -* status: IN PROGRESS +* status: done, completed. * keywords: markdown,editor @@ -23,7 +23,7 @@ Example of similar implementation * [x] add live preview for page markdown on edit -* [] authentication(WIP) +* [x] authentication * [x] commit changes to github repo diff --git a/issues/implement_xapian_to_text_transformer.gmi b/issues/implement_xapian_to_text_transformer.gmi new file mode 100644 index 0000000..192491a --- /dev/null +++ b/issues/implement_xapian_to_text_transformer.gmi @@ -0,0 +1,15 @@ +# Xapian to Text Transformer + +## Tags +* assigned: alexm, jnduli +* keywords: llm, genenetwork2, xapian, transform +* type: feature +* status: closed, completed + +## Description: + +Given a Xapian search query, e.g., "CYTOCHROME AND P450" or "CYTOCHROME NEAR P450," we need to convert the text to a format with no Xapian keywords. In this case, the transformed text would be "CYTOCHROME P450." + + +This issue is a part of the main issue below. +=> https://issues.genenetwork.org/issues/gn_llm_integration_using_cached_searches diff --git a/issues/inspect-discrepancies-between-xapian-and-sql-search.gmi b/issues/inspect-discrepancies-between-xapian-and-sql-search.gmi new file mode 100644 index 0000000..98b46b6 --- /dev/null +++ b/issues/inspect-discrepancies-between-xapian-and-sql-search.gmi @@ -0,0 +1,135 @@ +# Inspect Discrepancies Between Xapian and SQL Search. + +* assigned: bonfacem, rookie101 + +## Description + +When doing a Xapian search, we miss some data that is available from the SQL Search. The searches we tested: + +=> https://cd.genenetwork.org/search?species=mouse&group=BXD&type=Hippocampus+mRNA&dataset=HC_M2_0606_P&search_terms_or=WIKI%3Dglioma&search_terms_and=&accession_id=None&FormID=searchResulto SQL search for dataset=HC_M2_0606_P species=mouse group=BXD WIKI=glioma (31 results) + +=> https://cd.genenetwork.org/gsearch?type=gene&terms=species%3Amouse+group%3Abxd+dataset%3Ahc_m2_0606_p+wiki%3Aglioma species:mouse group:bxd dataset:hc_m2_0606_p wiki:glioma (26 results) + +We miss the following entries from the Xapian search: + +``` +15 1423803_s_at Gltscr2 glioma tumor suppressor candidate region gene 2 +16 1451121_a_at Gltscr2 glioma tumor suppressor candidate region 2; exons 8 and 9 +17 1452409_at Gltscr2 glioma tumor suppressor candidate region gene 2 +25 1416556_at Sas sarcoma amplified sequence +26 1430029_a_at Sas sarcoma amplified sequence +``` + +We want to figure out why there is a discrepancy between the 2 searches above. + +## Resolution + +Use "quest" to search for one of the symbols that don't appear in the Xapian search to get the exact document id: + +``` +quest --msize=2 -s en --boolean-prefix="iden:Qgene:" "iden:"1423803_s_at:hc_m2_0606_p"" \ +--db=/export/data/genenetwork-xapian/ + +Parsed Query: Query(0 * Qgene:1423803_s_at:hc_m2_0606_p) +Exactly 1 matches +MSet: +9665867: [0] +{ + "name": "1423803_s_at", + "symbol": "Gltscr2", + "description": "glioma tumor suppressor candidate region gene 2", + "chr": "1", + "mb": 4.687986, + "dataset": "HC_M2_0606_P", + "dataset_fullname": "Hippocampus Consortium M430v2 (Jun06) PDNN", + "species": "mouse", + "group": "BXD", + "tissue": "Hippocampus mRNA", + "mean": 11.749030303030299, + "lrs": 11.3847971289981, + "additive": -0.0650828877005346, + "geno_chr": "5", + "geno_mb": 137.010795 +} +``` + +From the retrieved document-id, use "xapian-delve" to inspect the terms inside the index: + +``` +xapian-delve -r 9665867 -d /export/data/genenetwork-xapian/ + +Data for record #9665867: +{ + "name": "1423803_s_at", + "symbol": "Gltscr2", + "description": "glioma tumor suppressor candidate region gene 2", + "chr": "1", + "mb": 4.687986, + "dataset": "HC_M2_0606_P", + "dataset_fullname": "Hippocampus Consortium M430v2 (Jun06) PDNN", + "species": "mouse", + "group": "BXD", + "tissue": "Hippocampus mRNA", + "mean": 11.749030303030299, + "lrs": 11.3847971289981, + "additive": -0.0650828877005346, + "geno_chr": "5", + "geno_mb": 137.010795 +} +Term List for record #9665867: 1423803_s_at 2 5330430h08rik +9430097c02rik Qgene:1423803_s_at:hc_m2_0606_p +XC1 XDShc_m2_0606_p XGbxd XIhippocampus XImrna XPC5 +XSmouse XTgene XYgltscr2 ZXDShc_m2_0606_p ZXGbxd +ZXIhippocampus ZXImrna ZXSmous ZXYgltscr2 Zbc017637 +Zbxd Zcandid Zgene Zglioma Zgltscr2 Zhc_m2_0606_p +Zhippocampus Zmous Zmrna Zregion Zsuppressor Ztumor +bc017637 bxd candidate gene glioma gltscr2 +hc_m2_0606_p hippocampus mouse mrna +region suppressor tumor +``` + +We have no wiki (XWK) entries from the above. When transforming to TTL files from SQL, we have symbols that exist in the GeneRIF table that do not exist in the GeneRIF_BASIC table: + +``` +SELECT COUNT(symbol) FROM GeneRIF WHERE +symbol NOT IN (SELECT symbol FROM GeneRIF_BASIC) +GROUP BY BINARY symbol; +``` + +Consequently, this means that after transforming to TTL files, we have some missing RDF entries that map a symbol (subject) to it's real name (object). When building the RDF cache, we thereby have some missing RIF/WIKI entries, and some entries are not indexed. This patch fixes the aforementioned error with missing symbols: + +=> https://git.genenetwork.org/gn-transform-databases/commit/?id=d95501bd2bd41ef8cf3584118382e83cbbbe0c87 [gn-transform-databases] Add missing RIF symbols. + +Now these 2 queries return the same exact results: + +=> https://cd.genenetwork.org/search?species=mouse&group=BXD&type=Hippocampus+mRNA&dataset=HC_M2_0606_P&search_terms_or=WIKI%3Dglioma&search_terms_and=&accession_id=None&FormID=searchResulto SQL search for dataset=HC_M2_0606_P species=mouse group=BXD WIKI=glioma (31 results) + +=> https://cd.genenetwork.org/gsearch?type=gene&terms=species%3Amouse+group%3Abxd+dataset%3Ahc_m2_0606_p+wiki%3Aglioma species:mouse group:bxd dataset:hc_m2_0606_p wiki:glioma (31 results) + +However, Xapian search is case insensitive while the SQL search is case sensitive: + +=> https://cd.genenetwork.org/gsearch?type=gene&terms=species%3Amouse+group%3Abxd+dataset%3Ahc_m2_0606_p+wiki%3Acancer species:mouse group:bxd dataset:hc_m2_0606_p wiki:cancer (72 results) + +=> https://cd.genenetwork.org/search?species=mouse&group=BXD&type=Hippocampus+mRNA&dataset=HC_M2_0606_P&search_terms_or=WIKI%3Dcancer&search_terms_and=&accession_id=None&FormID=searchResulto SQL search for dataset=HC_M2_0606_P species=mouse group=BXD WIKI=cancer (70 results) + +=> https://cd.genenetwork.org/search?species=mouse&group=BXD&type=Hippocampus+mRNA&dataset=HC_M2_0606_P&search_terms_or=WIKI%3DCancer&search_terms_and=&accession_id=None&FormID=searchResulto SQL search for dataset=HC_M2_0606_P species=mouse group=BXD WIKI=Cancer (Note the change in the case "Cancer": 13 results) + +Another reason for discrepancies between search results, E.g. + +=> https://cd.genenetwork.org/gsearch?type=gene&terms=species%3Amouse+group%3Abxd+dataset%3Ahc_m2_0606_p+wiki%3Adiabetes species:mouse group:bxd dataset:hc_m2_0606_p wiki:diabetes (59 results) + +=> https://cd.genenetwork.org/search?species=mouse&group=BXD&type=Hippocampus+mRNA&dataset=HC_M2_0606_P&search_terms_or=WIKI%3Ddiabetes&search_terms_and=&accession_id=None&FormID=searchResulto SQL search for dataset=HC_M2_0606_P species=mouse group=BXD WIKI=diabetes (52 results) + +is that Xapian performs stemming on the search terms. For example, in the above wiki search for "diabetes", Xapian will stem "diabetes" to "diabet" thereby matching "diabetic", "diabetes", or any other word variation of "diabetes." + +## Ordering of Results + +The ordering in the Xapian search and SQL search is different. By default, SQL orders by Symbol where we have: + +``` +[...] ORDER BY ProbeSet.symbol ASC +``` + +However, Xapian orders search results by decreasing relevance score. This is configurable. + +* closed diff --git a/issues/inspect-discrepancies-between-xapian-and-sql-search2.gmi b/issues/inspect-discrepancies-between-xapian-and-sql-search2.gmi new file mode 100644 index 0000000..451d5c3 --- /dev/null +++ b/issues/inspect-discrepancies-between-xapian-and-sql-search2.gmi @@ -0,0 +1,11 @@ +# Inspect Discrepancies Between Xapian and SQL Search. + +* assigned: bonfacem, rookie101 + +## Description + +When we type BXD_21526 in xapian search we should find + +=> https://genenetwork.org/search?species=mouse&group=BXD&type=Phenotypes&dataset=BXDPublish&search_terms_or=BXD_21526&search_terms_and=&accession_id=None&FormID=searchResult + +This is not the case right now. diff --git a/issues/integrate-markdown-editor-to-gn2.gmi b/issues/integrate-markdown-editor-to-gn2.gmi index 98c170b..5904eac 100644 --- a/issues/integrate-markdown-editor-to-gn2.gmi +++ b/issues/integrate-markdown-editor-to-gn2.gmi @@ -1,3 +1,4 @@ + # GN Markdown Editor Integration ## Tags @@ -5,26 +6,168 @@ * assigned: alexm * status: in progress * priority: high +* tags: markdown, integration, guile ## Notes -This is a to-do list to integrate the GN Markdown editor into GN2. + +This is a to-do list to integrate the GN Markdown editor into GN2. To see the implementation, see: -=> https://github.com/Alexanderlacuna/geditor +=> https://git.genenetwork.org/gn-guile/ ## Tasks -* [ ] Implement APIs to fetch file for edit -* [ ] Add verification for the repository -* [ ] Implement API to edit and commit changes -* [ ] Replace JS with HTMX -* [ ] Support external links and image rendering -* [ ] Package dependencies -* [ ] Handle errors +* [x] Implement APIs to fetch files for editing +* [x] Add verification for the repository +* [x] Implement API to edit and commit changes +* [x] Replace JS with HTMX +* [x] Support external links and image rendering +* [x] Package dependencies +* [x] show diff for files +* [x] Handle errors * [ ] Review by users -* [ ] Integrate auth to the system. +* [x] Integrate authentication into the system + + +## API Documentation + +This APi endpoints are implemented in guile See repo: + +=> https://git.genenetwork.org/gn-guile/ + +The main endpoints are: `/edit` and `/commit` + +### Edit (GET) + +This is a `GET` request to retrieve file content. Make sure you pass a valid `file_path` as `search_query` (the path should be relative to the repository). + +**Edit Request Example:** + +```bash + +curl -G -d "file_path=test.md" localhost:8091/edit +``` + +In case of a successful response, the expected result is: + + +```json +{ +"path": "<file_path>", +"content": "Test for new user\n test 2 for line\n test 3 for new line\n ## real markdown two test\n", +"hash": "<commit_sha>" +} +``` + +In case of an error, the expected response is: + +```json +{ +"error": "<error_type>", +"msg": "<error_reason>" +} +``` + +### Commit (POST) + +**Endpoint:** + +``` +localhost:8091/commit +``` + + +```bash + +curl -X POST http://127.0.0.1:8091/commit \ +-H 'Content-Type: application/json' \ +-d '{ +"content": "make test commit", +"filename": "test.md", +"email": "test@gmail.com", +"username": "test", +"commit_message": "init commit", +"prev_commit": "7cbfc40d98b49a64e98e7cd562f373053d0325bd" +}' + +``` -Related issues: +It expects the following data in JSON format: + +* `content` (the data you want to commit to the file, *valid markdown*) +* `prev_commit` (required for integrity) +* `filename` (file path to the file you are modifying) +* `username` (identifier for the user, in our case from auth) +* `email` (identifier email from the user, in our case from auth) +* `commit_message` + +If the request succeeds, the response should be: + +```json +{ +"status": "201", +"message": "Committed file successfully", +"content": "Test for new user\n test 2 for line\n test 3 for new line\n ## real markdown two test\n", +"commit_sha": "47df3b7f13a935d50cc8b40e98ca9e513cba104c", +"commit_message": "commit by genetics" +} +``` + +If there are no changes to the file: + +```json +{ +"status": "200", +"message": "Nothing to commit, working tree clean", +"commit_sha": "ecd96f27c45301279150fbda411544687db1aa45" +} +``` + +If the request fails, the expected results are: + +```json +{ +"error": "<error_type>", +"msg": "Commits do not match. Please pull in the latest changes for the current commit *ecd96f27c45301279150fbda411544687db1aa45* and previous commits." +} +``` + +## Related Issues => https://issues.genenetwork.org/issues/implement-gn-markdown-editor-in-guile -=> https://issues.genenetwork.org/issues/implement-gn-markdown-editor \ No newline at end of file +=> https://issues.genenetwork.org/issues/implement-gn-markdown-editor + +## Notes on Gn-Editor UI + +Here is the link to the PR for integrating the GN-Editor, including screenshots: + +=> https://github.com/genenetwork/genenetwork2/pull/854 + +Genenetwork2 consumes the endpoint for the GN-Editor. Authentication is required to prevent access by malicious users and bots. + +The main endpoint to fetch and edit a file is: + +``` +genenetwork.org/editor/edit?file-path=<relative file path> +``` + +This loads the editor with the content for editing. + +### Modifying Editor Settings + +You can modify editor settings, such as font size and keyboard bindings. To do this, navigate to: + +``` +genenetwork.org/editor/settings +``` + +Be sure to save your changes for them to take effect. + +### Showing Diff for Editor + +The editor also provides a diff functionality to show you the changes made to the file. Use the "Diff" button in the navigation to view these changes. + +### Committing Changes + +To commit your changes, use the "Commit" button. A commit message is required in the text area for the commit to be processed. + diff --git a/issues/mgamma/mgamma-design.gmi b/issues/mgamma/mgamma-design.gmi index 23e02d5..ed4c061 100644 --- a/issues/mgamma/mgamma-design.gmi +++ b/issues/mgamma/mgamma-design.gmi @@ -7,3 +7,31 @@ We have a lot of experience running and hacking the GEMMA tool in GeneNetwork.or GEMMA proves to give great GWA results and has a decent speed for a single threaded implementation - even though the matrix calls to openblas use multiple threads. The source code base of GEMMA, however, proves hard to build on. This is why we are creating a next generation tool that has a focus on *performance and hackability*. After several attempts using R, D, Julia, python, Ruby we have in 2023 settled on Guile+C+Zig. Guile provides a REPL and great hackabability. C+Zig we'll use for performance. The other languages are all great, but we think we can work faster in this setup. + +Well, it is the end of 2024 and we have ditched that effort. Who said life was easy! The guile interface proved problematic - and Zig went out of favour because of its bootstrap story which prevents it becoming part of Guix, Debian etc. Also I discovered new tensor MPUs support f64 - so we may want to support vector and matrix computations on these cores. + +To write a gemma replacement I am now favouring to chunk up existing gemma and make sure its components can talk with alternative implementations. We may use a propagated network approach. Critical is to keep the data in RAM, so it may need some message passing interface with memory that can be shared. The chunking into CELLs (read propagator network PN) is a requirement because we kept tripping over state in GEMMA. So a PN should make sure we can run two implementations of the same CELL and compare outcomes for testing. Also it will allow us to test AVX, tensor and (say) MKL or CUDA implementations down the line. Also it should allow us to start using new functionality on GN faster. It would also be fun to have an implementation run on the RISC-V manycore. + +So, what do we want out of our languages: + +* Nice matrix interface (Julia) +* Support for AVX (Julia) +* Possibility to drop to low level C programming (Julia+prescheme+C?) +* High level -- PN -- glue (Julia+Guile?) + +Julia looks like a great candidate, even though it has notable downsides including the big 'server' blob deployment and the garbage collector (the latter also being a strength, mind). Alternatives could be Rust and Prescheme which have no such concerns, but lack the nice matrix notation. + +The approach will be to start with Julia and reimplementing GEMMA functions so they can be called from Julia and/or guile. + +Oh, I just found out that Julia, like zig, is no longer up-to-date on Debian. And the Guix version is 2 years old. That is really bad. If these languages don't get supported on major distros it is a dead end! + +=> https://mastodon.social/@pjotrprins/113379842047170785 + +What to now? + +* Nice matrix interface (?) +* Support for AVX (?) +* Possibility to drop to low level C programming (?+prescheme+C?) +* High level -- PN -- glue (?+Guile?) + +Current candidates for ? are Nim and Rust. Neither has a really nice matrix interface - though Nim's is probably what I prefer and it is close to python. Chicken may work too when I get fed with mentioned two languages. diff --git a/issues/mgamma/mgamma-lmm.gmi b/issues/mgamma/mgamma-lmm.gmi new file mode 100644 index 0000000..61481c2 --- /dev/null +++ b/issues/mgamma/mgamma-lmm.gmi @@ -0,0 +1,17 @@ +# MGAMMA LMM + +MGamma does GWAS, which means it has to do Linear Mixed Models—both univariate and multivariate. + +# Tags + +* assigned: pjotrp, artyom +* type: feature +* priority: high + +# Tasks + +* [X] Kinship matrix computation. +* [X] Univariate LMM. +* [ ] Multivariate LMM. +* [X] Export data from GEMMA. +* [ ] Compare and ensure data match between MGamma and GEMMA. \ No newline at end of file diff --git a/issues/move-racket-gn-rest-api-to-guile.gmi b/issues/move-racket-gn-rest-api-to-guile.gmi index 185e7de..659c586 100644 --- a/issues/move-racket-gn-rest-api-to-guile.gmi +++ b/issues/move-racket-gn-rest-api-to-guile.gmi @@ -6,7 +6,7 @@ * priority: medium * type: API, metadata * keywords: API -* status: open +* status: stalled ## Description diff --git a/issues/move-search-to-xapian.gmi b/issues/move-search-to-xapian.gmi index 57612e7..d98be9b 100644 --- a/issues/move-search-to-xapian.gmi +++ b/issues/move-search-to-xapian.gmi @@ -18,3 +18,5 @@ As a work around---to make search work with Python3.10, an inefficient hack was => https://github.com/genenetwork/genenetwork2/pull/805/commits/9a6ddf9f1560b3bc1611f50bf2b94f0dc44652a2 Replace escape with conn.escape_string To get rid of this inheritance, I propose rewriting the search functionality in a more straightforward and functional manner. In doing so, we can also transition to Xapian search, a faster and more efficient search system. + +* closed diff --git a/issues/old_session_bug.gmi b/issues/old_session_bug.gmi index 649ea46..925b9f6 100644 --- a/issues/old_session_bug.gmi +++ b/issues/old_session_bug.gmi @@ -2,7 +2,7 @@ ## Tags -* status: open +* status: closed * priority: medium * type: bug * assigned: zsloan, fredm diff --git a/issues/prevent-weak-passwords.gmi b/issues/prevent-weak-passwords.gmi index 8e8ca2f..957a170 100644 --- a/issues/prevent-weak-passwords.gmi +++ b/issues/prevent-weak-passwords.gmi @@ -19,3 +19,11 @@ There was a request made to prevent weak passwords. Use existing libraries to check and prevent weak passwords. + +## Notes + +### 2025-12-31: Look Into Libraries + +=> https://pypi.org/project/password-strength/ password-strength + +The library above seems promising. Unfortunately, we'd have to write a guix definition for it. diff --git a/issues/production-container-mechanical-rob-failure.gmi b/issues/production-container-mechanical-rob-failure.gmi new file mode 100644 index 0000000..ae6bae8 --- /dev/null +++ b/issues/production-container-mechanical-rob-failure.gmi @@ -0,0 +1,224 @@ +# Production Container: `mechanical-rob` Failure + +## Tags + +* status: closed, completed, fixed +* priority: high +* type: bug +* assigned: fredm +* keywords: genenetwork, production, mechanical-rob + +## Description + +After deploying the latest commits to https://gn2-fred.genenetwork.org on 2025-02-19UTC-0600, with the following commits: + +* genenetwork2: 2a3df8cfba6b29dddbe40910c69283a1afbc8e51 +* genenetwork3: 99fd5070a84f37f91993f329f9cc8dd82a4b9339 +* gn-auth: 073395ff331042a5c686a46fa124f9cc6e10dd2f +* gn-libs: 72a95f8ffa5401649f70978e863dd3f21900a611 + +I had the (not so) bright idea to run the `mechanical-rob` tests against it before pushing it to production, proper. Here's where I ran into problems: some of the `mechanical-rob` tests failed, specifically, the correlation tests. + +Meanwhile, a run of the same tests against https://cd.genenetwork.org with the same commits was successful: + +=> https://ci.genenetwork.org/jobs/genenetwork2-mechanical-rob/1531 See this. + +This points to a possible problem with the setup of the production container, that leads to failures where none should be. This needs investigation and fixing. + +### Update 2025-02-20 + +The MariaDB server is crashing. To reproduce: + +* Go to https://gn2-fred.genenetwork.org/show_trait?trait_id=1435464_at&dataset=HC_M2_0606_P +* Click on "Calculate Correlations" to expand +* Click "Compute" + +Observe that after a little while, the system fails with the following errors: + +* `MySQLdb.OperationalError: (2013, 'Lost connection to MySQL server during query')` +* `MySQLdb.OperationalError: (2006, 'MySQL server has gone away')` + +I attempted updating the configuration for MariaDB, setting the `max_allowed_packet` to 16M and then 64M, but that did not resolve the problem. + +The log files indicate the following: + +``` +2025-02-20 7:46:07 0 [Note] Recovering after a crash using /var/lib/mysql/gn0-binary-log +2025-02-20 7:46:07 0 [Note] Starting crash recovery... +2025-02-20 7:46:07 0 [Note] Crash recovery finished. +2025-02-20 7:46:07 0 [Note] Server socket created on IP: '0.0.0.0'. +2025-02-20 7:46:07 0 [Warning] 'user' entry 'webqtlout@tux01' ignored in --skip-name-resolve mode. +2025-02-20 7:46:07 0 [Warning] 'db' entry 'db_webqtl webqtlout@tux01' ignored in --skip-name-resolve mode. +2025-02-20 7:46:07 0 [Note] Reading of all Master_info entries succeeded +2025-02-20 7:46:07 0 [Note] Added new Master_info '' to hash table +2025-02-20 7:46:07 0 [Note] /usr/sbin/mariadbd: ready for connections. +Version: '10.5.23-MariaDB-0+deb11u1-log' socket: '/run/mysqld/mysqld.sock' port: 3306 Debian 11 +2025-02-20 7:46:07 4 [Warning] Access denied for user 'root'@'localhost' (using password: NO) +2025-02-20 7:46:07 5 [Warning] Access denied for user 'root'@'localhost' (using password: NO) +2025-02-20 7:46:07 0 [Note] InnoDB: Buffer pool(s) load completed at 250220 7:46:07 +250220 7:50:12 [ERROR] mysqld got signal 11 ; +Sorry, we probably made a mistake, and this is a bug. + +Your assistance in bug reporting will enable us to fix this for the next release. +To report this bug, see https://mariadb.com/kb/en/reporting-bugs + +We will try our best to scrape up some info that will hopefully help +diagnose the problem, but since we have already crashed, +something is definitely wrong and this may fail. + +Server version: 10.5.23-MariaDB-0+deb11u1-log source revision: 6cfd2ba397b0ca689d8ff1bdb9fc4a4dc516a5eb +key_buffer_size=10485760 +read_buffer_size=131072 +max_used_connections=1 +max_threads=2050 +thread_count=1 +It is possible that mysqld could use up to +key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 4523497 K bytes of memory +Hope that's ok; if not, decrease some variables in the equation. + +Thread pointer: 0x7f599c000c58 +Attempting backtrace. You can use the following information to find out +where mysqld died. If you see no messages after this, something went +terribly wrong... +stack_bottom = 0x7f6150282d78 thread_stack 0x49000 +/usr/sbin/mariadbd(my_print_stacktrace+0x2e)[0x55f43330c14e] +/usr/sbin/mariadbd(handle_fatal_signal+0x475)[0x55f432e013b5] +sigaction.c:0(__restore_rt)[0x7f615a1cb140] +/usr/sbin/mariadbd(+0xcbffbe)[0x55f43314efbe] +/usr/sbin/mariadbd(+0xd730ec)[0x55f4332020ec] +/usr/sbin/mariadbd(+0xd1b36b)[0x55f4331aa36b] +/usr/sbin/mariadbd(+0xd1cd8e)[0x55f4331abd8e] +/usr/sbin/mariadbd(+0xc596f3)[0x55f4330e86f3] +/usr/sbin/mariadbd(_ZN7handler18ha_index_next_sameEPhPKhj+0x2a5)[0x55f432e092b5] +/usr/sbin/mariadbd(+0x7b54d1)[0x55f432c444d1] +/usr/sbin/mariadbd(_Z10sub_selectP4JOINP13st_join_tableb+0x1f8)[0x55f432c37da8] +/usr/sbin/mariadbd(_ZN10JOIN_CACHE24generate_full_extensionsEPh+0x134)[0x55f432d24224] +/usr/sbin/mariadbd(_ZN10JOIN_CACHE21join_matching_recordsEb+0x206)[0x55f432d245d6] +/usr/sbin/mariadbd(_ZN10JOIN_CACHE12join_recordsEb+0x1cf)[0x55f432d23eff] +/usr/sbin/mariadbd(_Z16sub_select_cacheP4JOINP13st_join_tableb+0x8a)[0x55f432c382fa] +/usr/sbin/mariadbd(_ZN4JOIN10exec_innerEv+0xd16)[0x55f432c63826] +/usr/sbin/mariadbd(_ZN4JOIN4execEv+0x35)[0x55f432c63cc5] +/usr/sbin/mariadbd(_Z12mysql_selectP3THDP10TABLE_LISTR4ListI4ItemEPS4_jP8st_orderS9_S7_S9_yP13select_resultP18st_select_lex_unitP13st_select_lex+0x106)[0x55f432c61c26] +/usr/sbin/mariadbd(_Z13handle_selectP3THDP3LEXP13select_resultm+0x138)[0x55f432c62698] +/usr/sbin/mariadbd(+0x762121)[0x55f432bf1121] +/usr/sbin/mariadbd(_Z21mysql_execute_commandP3THD+0x3d6c)[0x55f432bfdd1c] +/usr/sbin/mariadbd(_Z11mysql_parseP3THDPcjP12Parser_statebb+0x20b)[0x55f432bff17b] +/usr/sbin/mariadbd(_Z16dispatch_command19enum_server_commandP3THDPcjbb+0xdb5)[0x55f432c00f55] +/usr/sbin/mariadbd(_Z10do_commandP3THD+0x120)[0x55f432c02da0] +/usr/sbin/mariadbd(_Z24do_handle_one_connectionP7CONNECTb+0x2f2)[0x55f432cf8b32] +/usr/sbin/mariadbd(handle_one_connection+0x5d)[0x55f432cf8dad] +/usr/sbin/mariadbd(+0xbb4ceb)[0x55f433043ceb] +nptl/pthread_create.c:478(start_thread)[0x7f615a1bfea7] +x86_64/clone.S:97(__GI___clone)[0x7f6159dc6acf] + +Trying to get some variables. +Some pointers may be invalid and cause the dump to abort. +Query (0x7f599c012c50): SELECT ProbeSet.Name,ProbeSet.Chr,ProbeSet.Mb, + ProbeSet.Symbol,ProbeSetXRef.mean, + CONCAT_WS('; ', ProbeSet.description, ProbeSet.Probe_Target_Description) AS description, + ProbeSetXRef.additive,ProbeSetXRef.LRS,Geno.Chr, Geno.Mb + FROM ProbeSet INNER JOIN ProbeSetXRef + ON ProbeSet.Id=ProbeSetXRef.ProbeSetId + INNER JOIN Geno + ON ProbeSetXRef.Locus = Geno.Name + INNER JOIN Species + ON Geno.SpeciesId = Species.Id + WHERE ProbeSet.Name in ('1447591_x_at', '1422809_at', '1428917_at', '1438096_a_at', '1416474_at', '1453271_at', '1441725_at', '1452952_at', '1456774_at', '1438413_at', '1431110_at', '1453723_x_at', '1424124_at', '1448706_at', '1448762_at', '1428332_at', '1438389_x_at', '1455508_at', '1455805_x_at', '1433276_at', '1454989_at', '1427467_a_at', '1447448_s_at', '1438695_at', '1456795_at', '1454874_at', '1455189_at', '1448631_a_at', '1422697_s_at', '1423717_at', '1439484_at', '1419123_a_at', '1435286_at', '1439886_at', '1436348_at', '1437475_at', '1447667_x_at', '1421046_a_at', '1448296_x_at', '1460577_at', 'AFFX-GapdhMur/M32599_M_at', '1424393_s_at', '1426190_at', '1434749_at', '1455706_at', '1448584_at', '1434093_at', '1434461_at', '1419401_at', '1433957_at', '1419453_at', '1416500_at', '1439436_x_at', '1451413_at', '1455696_a_at', '1457190_at', '1455521_at', '1434842_s_at', '1442525_at', '1452331_s_at', '1428862_at', '1436463_at', '1438535_at', 'AFFX-GapdhMur/M32599_3_at', '1424012_at', '1440027_at', '1435846_x_at', '1443282_at', '1435567_at', '1450112_a_at', '1428251_at', '1429063_s_at', '1433781_a_at', '1436698_x_at', '1436175_at', '1435668_at', '1424683_at', '1442743_at', '1416944_a_at', '1437511_x_at', '1451254_at', '1423083_at', '1440158_x_at', '1424324_at', '1426382_at', '1420142_s_at', '1434553_at', '1428772_at', '1424094_at', '1435900_at', '1455322_at', '1453283_at', '1428551_at', '1453078_at', '1444602_at', '1443836_x_at', '1435590_at', '1434283_at', '1435240_at', '1434659_at', '1427032_at', '1455278_at', '1448104_at', '1421247_at', 'AFFX-MURINE_b1_at', '1460216_at', '1433969_at', '1419171_at', '1456699_s_at', '1456901_at', '1442139_at', '1421849_at', '1419824_a_at', '1460588_at', '1420131_s_at', '1446138_at', '1435829_at', '1434462_at', '1435059_at', '1415949_at', '1460624_at', '1426707_at', '1417250_at', '1434956_at', '1438018_at', '1454846_at', '1435298_at', '1442077_at', '1424074_at', '1428883_at', '1454149_a_at', '1423925_at', '1457060_at', '1433821_at', '1447923_at', '1460670_at', '1434468_at', '1454980_at', '1426913_at', '1456741_s_at', '1449278_at', '1443534_at', '1417941_at', '1433167_at', '1434401_at', '1456516_x_at', '1451360_at', 'AFFX-GapdhMur/M32599_5_at', '1417827_at', '1434161_at', '1448979_at', '1435797_at', '1419807_at', '1418330_at', '1426304_x_at', '1425492_at', '1437873_at', '1435734_x_at', '1420622_a_at', '1456019_at', '1449200_at', '1455314_at', '1428419_at', '1426349_s_at', '1426743_at', '1436073_at', '1452306_at', '1436735_at', '1439529_at', '1459347_at', '1429642_at', '1438930_s_at', '1437380_x_at', '1459861_s_at', '1424243_at', '1430503_at', '1434474_at', '1417962_s_at', '1440187_at', '1446809_at', '1436234_at', '1415906_at', 'AFFX-MURINE_B2_at', '1434836_at', '1426002_a_at', '1448111_at', '1452882_at', '1436597_at', '1455915_at', '1421846_at', '1428693_at', '1422624_at', '1423755_at', '1460367_at', '1433746_at', '1454872_at', '1429194_at', '1424652_at', '1440795_x_at', '1458690_at', '1434355_at', '1456324_at', '1457867_at', '1429698_at', '1423104_at', '1437585_x_at', '1437739_a_at', '1445605_s_at', '1436313_at', '1449738_s_at', '1437525_a_at', '1454937_at', '1429043_at', '1440091_at', '1422820_at', '1437456_x_at', '1427322_at', '1446649_at', '1433568_at', '1441114_at', '1456541_x_at', '1426985_s_at', '1454764_s_at', '1424071_s_at', '1429251_at', '1429155_at', '1433946_at', '1448771_a_at', '1458664_at', '1438320_s_at', '1449616_s_at', '1435445_at', '1433872_at', '1429273_at', '1420880_a_at', '1448645_at', '1449646_s_at', '1428341_at', '1431299_a_at', '1433427_at', '1418530_at', '1436247_at', '1454350_at', '1455860_at', '1417145_at', '1454952_s_at', '1435977_at', '1434807_s_at', '1428715_at', '1418117_at', '1447947_at', '1431781_at', '1428915_at', '1427197_at', '1427208_at', '1455460_at', '1423899_at', '1441944_s_at', '1455429_at', '1452266_at', '1454409_at', '1426384_a_at', '1428725_at', '1419181_at', '1454862_at', '1452907_at', '1433794_at', '1435492_at', '1424839_a_at', '1416214_at', '1449312_at', '1436678_at', '1426253_at', '1438859_x_at', '1448189_a_at', '1442557_at', '1446174_at', '1459718_x_at', '1437613_s_at', '1456509_at', '1455267_at', '1440480_at', '1417296_at', '1460050_x_at', '1433585_at', '1436771_x_at', '1424294_at', '1448648_at', '1417753_at', '1436139_at', '1425642_at', '1418553_at', '1415747_s_at', '1445984_at', '1440024_at', '1448720_at', '1429459_at', '1451459_at', '1428853_at', '1433856_at', '1426248_at', '1417765_a_at', '1439459_x_at', '1447023_at', '1426088_at', '1440825_s_at', '1417390_at', '1444744_at', '1435618_at', '1424635_at', '1443727_x_at', '1421096_at', '1427410_at', '1416860_s_at', '1442773_at', '1442030_at', '1452281_at', '1434774_at', '1416891_at', '1447915_x_at', '1429129_at', '1418850_at', '1416308_at', '1422858_at', '1447679_s_at', '1440903_at', '1417321_at', '1452342_at', '1453510_s_at', '1454923_at', '1454611_a_at', '1457532_at', '1438440_at', '1434232_a_at', '1455878_at', '1455571_x_at', '1436401_at', '1453289_at', '1457365_at', '1436708_x_at', '1434494_at', '1419588_at', '1433679_at', '1455159_at', '1428982_at', '1446510_at', '1434131_at', '1418066_at', '1435346_at', '1449415_at', '1455384_x_at', '1418817_at', '1442073_at', '1457265_at', '1447361_at', '1418039_at', '1428467_at', '1452224_at', '1417538_at', '1434529_x_at', '1442149_at', '1437379_x_at', '1416473_a_at', '1432750_at', '1428389_s_at', '1433823_at', '1451889_at', '1438178_x_at', '1441807_s_at', '1416799_at', '1420623_x_at', '1453245_at', '1434037_s_at', '1443012_at', '1443172_at', '1455321_at', '1438396_at', '1440823_x_at', '1436278_at', '1457543_at', '1452908_at', '1417483_at', '1418397_at', '1446589_at', '1450966_at', '1447877_x_at', '1446524_at', '1438592_at', '1455589_at', '1428629_at', '1429585_s_at', '1440020_at', '1417365_a_at', '1426442_at', '1427151_at', '1437377_a_at', '1433995_s_at', '1435464_at', '1417007_a_at', '1429690_at', '1427999_at', '1426819_at', '1454905_at', '1439516_at', '1434509_at', '1428707_at', '1416793_at', '1440822_x_at', '1437327_x_at', '1428682_at', '1435004_at', '1434238_at', '1417581_at', '1434699_at', '1455597_at', '1458613_at', '1456485_at', '1435122_x_at', '1452864_at', '1453122_at', '1435254_at', '1451221_at', '1460168_at', '1455336_at', '1427965_at', '1432576_at', '1455425_at', '1428762_at', '1455459_at', '1419317_x_at', '1434691_at', '1437950_at', '1426401_at', '1457261_at', '1433824_x_at', '1435235_at', '1437343_x_at', '1439964_at', '1444280_at', '1455434_a_at', '1424431_at', '1421519_a_at', '1428412_at', '1434010_at', '1419976_s_at', '1418887_a_at', '1428498_at', '1446883_at', '1435675_at', '1422599_s_at', '1457410_at', '1444437_at', '1421050_at', '1437885_at', '1459754_x_at', '1423807_a_at', '1435490_at', '1426760_at', '1449459_s_at', '1432098_a_at', '1437067_at', '1435574_at', '1433999_at', '1431289_at', '1428919_at', '1425678_a_at', '1434924_at', '1421640_a_at', '1440191_s_at', '1460082_at', '1449913_at', '1439830_at', '1425020_at', '1443790_x_at', '1436931_at', '1454214_a_at', '1455854_a_at', '1437061_at', '1436125_at', '1426385_x_at', '1431893_a_at', '1417140_a_at', '1435333_at', '1427907_at', '1434446_at', '1417594_at', '1426518_at', '1437345_a_at', '1420091_s_at', '1450058_at', '1435161_at', '1430348_at', '1455778_at', '1422653_at', '1447942_x_at', '1434843_at', '1454956_at', '1454998_at', '1427384_at', '1439828_at') AND + Species.Name = 'mouse' AND + ProbeSetXRef.ProbeSetFreezeId IN ( + SELECT ProbeSetFreeze.Id + FROM ProbeSetFreeze WHERE ProbeSetFreeze.Name = 'HC_M2_0606_P') + +Connection ID (thread ID): 41 +Status: NOT_KILLED + +Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on,not_null_range_scan=off + +The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mariadbd/ contains +information that should help you find out what is causing the crash. +Writing a core file... +Working directory at /export/mysql/var/lib/mysql +Resource Limits: +Limit Soft Limit Hard Limit Units +Max cpu time unlimited unlimited seconds +Max file size unlimited unlimited bytes +Max data size unlimited unlimited bytes +Max stack size 8388608 unlimited bytes +Max core file size 0 unlimited bytes +Max resident set unlimited unlimited bytes +Max processes 3094157 3094157 processes +Max open files 64000 64000 files +Max locked memory 65536 65536 bytes +Max address space unlimited unlimited bytes +Max file locks unlimited unlimited locks +Max pending signals 3094157 3094157 signals +Max msgqueue size 819200 819200 bytes +Max nice priority 0 0 +Max realtime priority 0 0 +Max realtime timeout unlimited unlimited us +Core pattern: core + +Kernel version: Linux version 5.10.0-22-amd64 (debian-kernel@lists.debian.org) (gcc-10 (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP Debian 5.10.178-3 (2023-04-22) + +2025-02-20 7:50:17 0 [Note] Starting MariaDB 10.5.23-MariaDB-0+deb11u1-log source revision 6cfd2ba397b0ca689d8ff1bdb9fc4a4dc516a5eb as process 3086167 +2025-02-20 7:50:17 0 [Note] InnoDB: !!! innodb_force_recovery is set to 1 !!! +2025-02-20 7:50:17 0 [Note] InnoDB: Uses event mutexes +2025-02-20 7:50:17 0 [Note] InnoDB: Compressed tables use zlib 1.2.11 +2025-02-20 7:50:17 0 [Note] InnoDB: Number of pools: 1 +2025-02-20 7:50:17 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions +2025-02-20 7:50:17 0 [Note] InnoDB: Using Linux native AIO +2025-02-20 7:50:17 0 [Note] InnoDB: Initializing buffer pool, total size = 17179869184, chunk size = 134217728 +2025-02-20 7:50:17 0 [Note] InnoDB: Completed initialization of buffer pool +2025-02-20 7:50:17 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=1537379110991,1537379110991 +2025-02-20 7:50:17 0 [Note] InnoDB: Last binlog file '/var/lib/mysql/gn0-binary-log.000134', position 82843148 +2025-02-20 7:50:17 0 [Note] InnoDB: 128 rollback segments are active. +2025-02-20 7:50:17 0 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1" +2025-02-20 7:50:17 0 [Note] InnoDB: Creating shared tablespace for temporary tables +2025-02-20 7:50:17 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ... +2025-02-20 7:50:17 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB. +2025-02-20 7:50:17 0 [Note] InnoDB: 10.5.23 started; log sequence number 1537379111003; transaction id 3459549902 +2025-02-20 7:50:17 0 [Note] Plugin 'FEEDBACK' is disabled. +2025-02-20 7:50:17 0 [Note] InnoDB: Loading buffer pool(s) from /export/mysql/var/lib/mysql/ib_buffer_pool +2025-02-20 7:50:17 0 [Note] Loaded 'locales.so' with offset 0x7f9551bc0000 +2025-02-20 7:50:17 0 [Note] Recovering after a crash using /var/lib/mysql/gn0-binary-log +2025-02-20 7:50:17 0 [Note] Starting crash recovery... +2025-02-20 7:50:17 0 [Note] Crash recovery finished. +2025-02-20 7:50:17 0 [Note] Server socket created on IP: '0.0.0.0'. +2025-02-20 7:50:17 0 [Warning] 'user' entry 'webqtlout@tux01' ignored in --skip-name-resolve mode. +2025-02-20 7:50:17 0 [Warning] 'db' entry 'db_webqtl webqtlout@tux01' ignored in --skip-name-resolve mode. +2025-02-20 7:50:17 0 [Note] Reading of all Master_info entries succeeded +2025-02-20 7:50:17 0 [Note] Added new Master_info '' to hash table +2025-02-20 7:50:17 0 [Note] /usr/sbin/mariadbd: ready for connections. +Version: '10.5.23-MariaDB-0+deb11u1-log' socket: '/run/mysqld/mysqld.sock' port: 3306 Debian 11 +2025-02-20 7:50:17 4 [Warning] Access denied for user 'root'@'localhost' (using password: NO) +2025-02-20 7:50:17 5 [Warning] Access denied for user 'root'@'localhost' (using password: NO) +2025-02-20 7:50:17 0 [Note] InnoDB: Buffer pool(s) load completed at 250220 7:50:17 +``` + +A possible issue is the use of the environment variable SQL_URI at this point: + +=> https://github.com/genenetwork/genenetwork2/blob/testing/gn2/wqflask/correlation/rust_correlation.py#L34 + +which is requested + +=> https://github.com/genenetwork/genenetwork2/blob/testing/gn2/wqflask/correlation/rust_correlation.py#L7 from here. + +I tried setting an environment variable "SQL_URI" with the same value as the config and rebuilt the container. That did not fix the problem. + +Running the query directly in the default mysql client also fails with: + +``` +ERROR 2013 (HY000): Lost connection to MySQL server during query +``` + +Huh, so this was not a code problem. + +Configured database to allow upgrade of tables if necessary and restarted mariadbd. + +The problem still persists. + +Note Pjotr: this is likely a mariadb bug with 10.5.23, the most recent mariadbd we use (both tux01 and tux02 are older). The dump shows it balks on creating a new thread: pthread_create.c:478. Looks similar to https://jira.mariadb.org/browse/MDEV-32262 + +10.5, 10.6, 10.11 are affected. so running correlations on production crashes mysqld? I am not trying for obvious reasons ;) the threading issues of mariadb look scary - I wonder how deep it goes. + +We'll test for a different version of mariadb combining a Debian update because Debian on tux04 is broken. diff --git a/issues/provide-link-to-register-user-in-sign-in-page.gmi b/issues/provide-link-to-register-user-in-sign-in-page.gmi index 24d7c21..b9e6a4d 100644 --- a/issues/provide-link-to-register-user-in-sign-in-page.gmi +++ b/issues/provide-link-to-register-user-in-sign-in-page.gmi @@ -3,7 +3,7 @@ ## Tags * type: bug -* status: open +* status: closed * assigned: fredm * priority: medium * keywords: register user, gn-auth, genenetwork @@ -16,3 +16,8 @@ Provide a link allowing a user to register with the system on the sign-in page. We are now using OAuth2 to enable sign-in, which means that the user is redirected from the service they were in to the authorisation service to sign-in. The service should retain a note of the service which the user came from, and redirect back to it on successful registration. + + +### Close as Completed + +@zachs seems to have fixed this. diff --git a/issues/quality-control/fix-flash-messages.gmi b/issues/quality-control/fix-flash-messages.gmi index da54c52..e65c0f6 100644 --- a/issues/quality-control/fix-flash-messages.gmi +++ b/issues/quality-control/fix-flash-messages.gmi @@ -5,7 +5,7 @@ * assigned: fredm * priority: low * type: bug -* status: open +* status: closed, completed, fixed * keywords: flask, flash ## Description diff --git a/issues/quality-control/qc-r-qtl2-bundles.gmi b/issues/quality-control/qc-r-qtl2-bundles.gmi index 9cc1452..6560594 100644 --- a/issues/quality-control/qc-r-qtl2-bundles.gmi +++ b/issues/quality-control/qc-r-qtl2-bundles.gmi @@ -3,7 +3,7 @@ ## Tags * assigned: fredm, acenteno -* status: open +* status: closed, completed * type: feature request * priority: medium * keywords: quality control, QC, R/qtl2 bundle diff --git a/issues/quality-control/r-qtl2-features.gmi b/issues/quality-control/r-qtl2-features.gmi index eac53c4..bcc5d71 100644 --- a/issues/quality-control/r-qtl2-features.gmi +++ b/issues/quality-control/r-qtl2-features.gmi @@ -3,7 +3,7 @@ ## Tags * type: listing -* status: open +* status: closed, completed * assigned: fredm * priority: high * keywords: listing, bug, feature @@ -12,5 +12,9 @@ This is a listing of non-critical features and bugs that do not currently have a dedicated issue, and need to be handled some time in the future. -* [feature] "Undo Transpose": Files marked as '*_transposed: true' will have the transposition undone to ease processing down the line. +* Closed, completed: [feature] "Undo Transpose": Files marked as '*_transposed: true' will have the transposition undone to ease processing down the line. * … + +### Close as completed + +Actually open dedicated issues for bugs and features rather than collecting them here. diff --git a/issues/rdf/automate-rdf-generation-and-ingress.gmi b/issues/rdf/automate-rdf-generation-and-ingress.gmi new file mode 100644 index 0000000..ef4ba9f --- /dev/null +++ b/issues/rdf/automate-rdf-generation-and-ingress.gmi @@ -0,0 +1,37 @@ +# Update RDF Generation and Ingress to Virtuoso + +## Tags + +* assigned: bonfacem +* priority: high +* tags: in-progress +* deadline: 2024-10-23 Wed + +We need to update Virtuoso in production. At the moment this is done manually. For the current set-up, we need to update the recent modified RIF+WIKI models: + + +``` +# Generate the RDF triples +time guix shell guile-dbi guile-hashing -m manifest.scm -- ./pre-inst-env ./examples/generif.scm --settings conf.scm --output /home/bonfacem/ttl-files/generif-metadata-new.ttl --documentation ./docs/generif-metadata.md + +# Make sure they are valid +guix shell -m manifest.scm -- rapper --input turtle --count /home/bonfacem/ttl-files/generif-metadata-new.ttl + +# Copy the files over to the exposed virtuoso path +cp /home/bonfacem/ttl-files/generif-metadata-new.ttl </some/dir/> + +# Get into Virtuoso (with a password) +guix shell virtuoso-ose -- isql <port-number> + +# Load the files to be loaded +# Assuming that '/var/lib/data' is where the files are +ld_dir('/var/lib/data', 'generif-metadata-new.ttl', 'http://genenetwork.org'); + +# Load the files +rdf_loader_run(); +CHECKPOINT; +``` + +Above steps should be automated and tested in CD before roll-out in production. Key considerations: + +- Pick latest important changes from git, so that we can pick what files to run instead of generating all the ttl files all the time. diff --git a/issues/rdf/hash-rdf-graph.gmi b/issues/rdf/hash-rdf-graph.gmi index c896218..2863108 100644 --- a/issues/rdf/hash-rdf-graph.gmi +++ b/issues/rdf/hash-rdf-graph.gmi @@ -5,3 +5,12 @@ ## Description Building the index is an expesive operation. Hash the graph and store the metadata in xapian, and similarly in the RDF store. The mcron-job should check whether this has changed, and if there's any difference, go ahead and re-build the index. + +Resolution: + +=> https://github.com/genenetwork/genenetwork3/pull/171 Improve Sharing Memory Across Processes. +=> https://github.com/genenetwork/genenetwork3/pull/172 Check whether table names were stored in xapian. +=> https://github.com/genenetwork/genenetwork3/pull/174 Wikidata index. +=> https://github.com/genenetwork/genenetwork3/pull/175 Refactor how the generif md5 sum is calculated and stored in XAPIAN. + +* closed diff --git a/issues/redesign-global-search-design.gmi b/issues/redesign-global-search-design.gmi new file mode 100644 index 0000000..df63791 --- /dev/null +++ b/issues/redesign-global-search-design.gmi @@ -0,0 +1,23 @@ +# Redesign Global Search Design + +## Tags +* assigned: alexm, zac +* keywords: global search, design, HTML +* type: enhancement +* status: closed, completed, done + +## Description +Rob suggested we model the global search on the NCBI PubMed interface. We should remove the `?` button, which seems to be confusing for users, and have a better user guide. + +## Tasks + +* [x] Redesign the global search to fit the NCBI PubMed model. +* [x] Replace the "?" button that acts as a user guide + +## Related issues: + +=> https://issues.genenetwork.org/issues/cleanup-base-file-gn2 + +## Notes +PR that seeks to address this issue: +=> https://github.com/genenetwork/genenetwork2/pull/880 \ No newline at end of file diff --git a/issues/remove-custom-bootstrap-css.gmi b/issues/remove-custom-bootstrap-css.gmi index 7fa6f24..14c1c35 100644 --- a/issues/remove-custom-bootstrap-css.gmi +++ b/issues/remove-custom-bootstrap-css.gmi @@ -1,7 +1,7 @@ # Remove overrides to bootstrap classes in bootstrap-custom.css * assigned: zachs, bonfacem, alexm - +* status: stalled We have a "bootstrap-custom.css" in GeneNetwork. Consider this snippet: diff --git a/issues/remove-references-to-old-gn-auth-code.gmi b/issues/remove-references-to-old-gn-auth-code.gmi index 1a03c25..8c110aa 100644 --- a/issues/remove-references-to-old-gn-auth-code.gmi +++ b/issues/remove-references-to-old-gn-auth-code.gmi @@ -4,7 +4,7 @@ * assigned: bonfacem * keywords: auth -* status: open +* status: stalled ## Description diff --git a/issues/replace-neo4j-with-virtuoso.gmi b/issues/replace-neo4j-with-virtuoso.gmi new file mode 100644 index 0000000..450fb70 --- /dev/null +++ b/issues/replace-neo4j-with-virtuoso.gmi @@ -0,0 +1,8 @@ +# Replace Neo4J with Virtuoso + +## Tags + +* assigned: bonfacem, soloshelby +* deadline: 2024-10-25 Fri + +Currently, the RAG ingests TTL files into Neo4J. Replace this with Virtuoso. diff --git a/issues/reset-password-on-container-rebuild.gmi b/issues/reset-password-on-container-rebuild.gmi index b0e4dbb..6c0ad1e 100644 --- a/issues/reset-password-on-container-rebuild.gmi +++ b/issues/reset-password-on-container-rebuild.gmi @@ -2,5 +2,6 @@ ## Tags * assigned: bonfacem +* status: stalled Whenever the virtuoso container is rebuilt, we manually have to reset the password. We should fix this by modifying the virtuoso service so that things are set automatically. diff --git a/issues/search-for-brca.gmi b/issues/search-for-brca.gmi index c42c745..05c6fd0 100644 --- a/issues/search-for-brca.gmi +++ b/issues/search-for-brca.gmi @@ -1,10 +1,31 @@ -# Search for brca +# Search Improvements: capital insensitive search for RIF+WIKI; Examples -* assigned: arun +## Tags -Search for brca does not return results for brca1 and brca2. It should. -=> https://cd.genenetwork.org/gsearch?type=gene&terms=brca +* assigned: bonfacem, rookie101 +* priority: high +* type: ops +* keywords: virtuoso -The xapian stemmer does not stem brca1 to brca. That's why when one searches for brca, results for brca1 are not returned. +## Description + +RIF search is finally working on production: + +> rif:Brca2 and group:BXD + +and capital insentive search too for the BXD. See: + +=> https://github.com/genenetwork/genenetwork3/commit/4b2e9f3fb3383421d7a55df5399aab71e0cc3b4f Stem group field regardless of case. +=> https://github.com/genenetwork/genenetwork3/commit/a37622b466f9f045db06a6f07e88fcf81b176f91 Stem all the time. + +## Questions: + +* How do we search genewiki data? + +* rif:Brca2 should also be RIF:Brca2 (prefer the latter if we have to +choose as that is what people will try) + +* Can we continue giving examples at + +=> https://genenetwork.org/search-syntax search syntax -Perhaps we should write a custom stemmer that stems brca1 to brca. But, at the same time, we should be wary of stemming terms like p450 to p. Pjotr suggests the heuristic that we look for at least 2 or 3 alphabetic characters at the beginning. Another approach is to hard-code a list of candidates to look for. diff --git a/issues/set-up-gn-guile-in-tux02.gmi b/issues/set-up-gn-guile-in-tux02.gmi new file mode 100644 index 0000000..29eca68 --- /dev/null +++ b/issues/set-up-gn-guile-in-tux02.gmi @@ -0,0 +1,15 @@ +# Set Up gn-guile in tux02 + +## Tags + +* assigned: bonfacem +* priority: high +* status: in-progress +* deadline: 2024-10-23 Wed + +## Tasks + +* [-] Create gn-guile container. +* [X] Merge gn2 UI PR. +=> https://github.com/genenetwork/genenetwork2/pull/854 Feature/gn editor UI +* [-] Test out auth editing in CD. diff --git a/issues/set-up-virtuoso-on-production.gmi b/issues/set-up-virtuoso-on-production.gmi index 88c04f7..614565a 100644 --- a/issues/set-up-virtuoso-on-production.gmi +++ b/issues/set-up-virtuoso-on-production.gmi @@ -1,8 +1,8 @@ -# Set-up Virtuoso on Production +# Set-up Virtuoso+Xapian on Production ## Tags -* assigned: bonfacem +* assigned: bonfacem, zachs, fredm * priority: high * type: ops * keywords: virtuoso @@ -11,5 +11,121 @@ We already have virtuoso set-up in tux02. Right now, to be able to interact with RDF, we need to have virtuoso set-up. This issue will unblock: +* Global Search in Production + => https://github.com/genenetwork/genenetwork3/pull/137 Update RDF endpoints + => https://github.com/genenetwork/genenetwork2/pull/808 UI/RDF frontend + + +## HOWTO: Updating Virtuoso in Production (Tux01) + + +Note where the virtuoso data directory is mapped from the "production.sh" script as you will use this in the consequent steps: + +> --share=/export2/guix-containers/genenetwork/var/lib/virtuoso=/var/lib/virtuoso + +### Generating the TTL Files + +=> https://git.genenetwork.org/gn-transform-databases/tree/generate-ttl-files.scm Run "generate-ttl-files" to generate the TTL files: + +``` +time guix shell guile-dbi -m manifest.scm -- \ +./generate-ttl-files.scm --settings conn-dev.scm --output \ +/export2/guix-containers/genenetwork-development/var/lib/virtuoso \ +--documentation /tmp/doc-directory +``` + +* [Recommended] Alternatively, copy over the TTL files (in Tux01) to the correct shared directory in the container: + +``` +cp /home/bonfacem/ttl-files/*ttl /export2/guix-containers/genenetwork/var/lib/virtuoso/ +``` + +### Loading the TTL Files + +* Make sure that the virtuoso service type has the "dirs-allowed" variable set correctly: + +``` +(service virtuoso-service-type + (virtuoso-configuration + (server-port 7892) + (http-server-port 7893) + (dirs-allowed "/var/lib/virtuoso"))) +``` + +* Get into isql: + +``` +guix shell virtuoso-ose -- isql 7892 +``` +* Make sure that no pre-existing TTL files exist in "DB.DBA.LOAD_LIST": + +``` +SQL> select * from DB.DBA.LOAD_LIST; +SQL> delete from DB.DBA.load_list; +``` +* Delete the genenetwork graph: + +``` +SQL> DELETE FROM rdf_quad WHERE g = iri_to_id('http://genenetwork.org'); +``` + +* Load all the TTL files (This takes some time): + +``` +SQL> ld_dir('/var/lib/virtuoso', '*.ttl', 'http://genenetwork.org'); +SQL> rdf_loader_run(); +SQL> CHECKPOINT; +SQL> checkpoint_interval(60); +SQL> scheduler_interval(10); +``` +* Verify you have some RDF data by running: + +``` +SQL> SPARQL +PREFIX gn: <http://genenetwork.org/id/> +PREFIX gnc: <http://genenetwork.org/category/> +PREFIX owl: <http://www.w3.org/2002/07/owl#> +PREFIX gnt: <http://genenetwork.org/term/> +PREFIX skos: <http://www.w3.org/2004/02/skos/core#> +PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> +PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> +PREFIX taxon: <http://purl.uniprot.org/taxonomy/> + +SELECT * WHERE { + ?s skos:member gn:Mus_musculus . + ?s ?p ?o . +}; +``` + +* Update GN3 Configurations to point to the correct Virtuoso instance: + +> SPARQL_ENDPOINT="http://localhost:7893/sparql" + +## HOWTO: Generating the Xapian Index + +* Make sure you are using the correct guix profile or that you have the "PYTHONPATH" pointing to the GN3 repository. + +* Generate the Xapian Index using "genenetwork3/scripts/create-xapian-index" against the correct output directory (The build takes around 71 minutes on an SSD Drive): + +``` +time python index-genenetwork create-xapian-index \ +/export/data/genenetwork-xapian/ \ +mysql://<user>:<password>@localhost/db_webqtl \ +http://localhost:7893/sparql +``` +* After the build, you can verify that the index works by: + +``` +guix shell xapian -- xapian-delve /export/data/genenetwork-xapian/ +``` +* Update GN3 configuration files to point to the right Xapian path: + +> XAPIAN_DB_PATH="/export/data/genenetwork-xapian/" + +## Resolution + +@fredm updated virtuoso; and @zachs updated the xapian index in production. + +* closed diff --git a/issues/systems/apps.gmi b/issues/systems/apps.gmi new file mode 100644 index 0000000..e374250 --- /dev/null +++ b/issues/systems/apps.gmi @@ -0,0 +1,225 @@ +# Apps + +GeneNetwork.org retains a number of apps. Currently they are managed by shepherd as `guix shell` services, but we should really move them to system containers. + +# Tags + +* assigned: pjotrp +* type: enhancement +* status: in progress +* priority: medium +* keywords: system, sheepdog, shepherd + +# Tasks + +* [ ] Get services running +* [ ] Move guix shell into containers +* [ ] Make sure the container starts up on reboot and/or migrate to a new host + +# List of apps + +Current apps managed by shepherd/systemd on tux02/balg01 are + +=> https://genecup.org/ +* [+] genecup [shell] (hao) +* [X] - fire up service +* [X] - add sheepdog monitor +* [ ] - add link in GN2 +* [X] - add banner for GeneNetwork +* [ ] - create system container +* [X] - create guix root +* [ ] - make sure it works on reboot (systemd) +=> https://bnw.genenetwork.org/ +* [+] bnw [container] (yan cui and rob) +* [X] - fire up service +* [X] - add sheepdog monitor +* [X] - add link in GN2 +* [ ] - add banner for GeneNetwork +* [ ] - update system container +* [X] - create guix root +* [ ] - make sure it works on reboot (systemd) +=> http://hrdp.genenetwork.org +* [+] hrdp-project (hao?) +* [X] - fire up service +* [X] - add sheepdog monitor +* [ ] - https +* [ ] - add link in GN2 +* [ ] - add banner for GeneNetwork +* [ ] - create system container +* [ ] - create guix root +* [ ] - make sure it works on reboot (systemd) +=> https://pluto.genenetwork.org/ +* [+] pluto (saunak) +* [X] - fire up service +* [X] - add sheepdog monitor +* [ ] - add link in GN2 +* [ ] - add banner for GeneNetwork +* [ ] - create system container +* [ ] - create guix root +* [ ] - make sure it works on reboot (systemd) +=> https://power.genenetwork.org/ +* [+] power app (dave) +* [X] - fire up service +* [X] - add sheepdog monitor +* [ ] - add link in GN2 +* [ ] - add banner for GeneNetwork +* [ ] - create system container +* [X] - create guix root +* [ ] - make sure it works on reboot (systemd) +* [ ] root? +=> http://longevity-explorer.genenetwork.org/ +* [+] Longevity explorer [container balg01] (dave) +* [X] - fire up service +* [X] - add sheepdog monitor +* [ ] - https +* [ ] - add link in GN2 +* [ ] - add banner for GeneNetwork +* [ ] - create system container +* [ ] - create guix root +* [ ] - make sure it works on reboot (systemd) +=> http://jumpshiny.genenetwork.org/ +* [+] jumpshiny app (xusheng) +* [+] - fire up service (still some dependencies) +* [X] - add sheepdog monitor +* [ ] - https +* [ ] - add link in GN2 +* [ ] - add banner for GeneNetwork +* [ ] - create system container +* [ ] - create guix root +* [ ] - make sure it works on reboot (systemd) +=> https://hegp.genenetwork.org/ +* [+] hegp (pjotr) +* [X] - fire up service +* [X] - add sheepdog monitor +* [ ] - add link in GN2 +* [ ] - add banner for GeneNetwork +* [ ] - create system container +* [ ] - create guix root +* [X] - make sure it works on reboot (systemd) + +* [-] singlecell (siamak) +* [-] rn6app (hao - remove) +* [-] genome-browser (defunct) + +To fix them we need to validate the sheepdog monitor and make sure they are working in either shepherd (+), or as a system container (X). + +Sheepdog monitor is at + +=> http://sheepdog.genenetwork.org/sheepdog/status.html + +# Info + +## BNW + +The app is already a Guix system container! To make it part of the startup I had to move it away from shepherd (which runs in userland) and: + +``` +/home/shepherd/guix-profiles/bnw/bin/guix system container /home/shepherd/guix-bioinformatics/gn/services/bnw-container.scm --share=/home/shepherd/logs/bnw-server=/var/log --network +ln -s /gnu/store/0hnfb9ynnxsig3yyprwxmg5h6c9g8mry-run-container /usr/local/bin/bnw-app-container +``` + +systemd service: + +``` +root@tux02:/etc/systemd/system# cat bnw-app-container.service +[Unit] +Description = Run genenetwork BNW app container +[Service] +ExecStart = /usr/local/bin/bnw-app-container +[Install] +WantedBy = multi-user.target +``` + +We need to make sure the garbace collector does not destroy the container, add the --root switch + +``` +/home/shepherd/guix-profiles/bnw/bin/guix system container /home/shepherd/guix-bioinformatics/gn/services/bnw-container.scm --share=/home/shepherd/logs/bnw-server=/var/log --network --root=/usr/local/bin/bnw-app-container +``` + +Check with + +``` +root@tux02:/home/shepherd# /home/shepherd/guix-profiles/bnw/bin/guix gc --list-roots |grep bnw + /usr/local/bin/bnw-app-container +``` + +## R/shiny apps + +The R/shiny apps were showing a tarball mismatch: + +``` +building /gnu/store/rjnw7k56z955v4bl07flm9pjwxx5vs0r-r-minimal-4.0.2.drv... +downloading from http://cran.r-project.org/src/contrib/Archive/KernSmooth/KernSmooth_2.23-17.tar.gz ... +- 'configure' phasesha256 hash mismatch for /gnu/store/n05zjfhxl0iqx1jbw8i6vv1174zkj7ja-KernSmooth_2.23-17.tar.gz: + expected hash: 11g6b0q67vasxag6v9m4px33qqxpmnx47c73yv1dninv2pz76g9b + actual hash: 1ciaycyp79l5aj78gpmwsyx164zi5jc60mh84vxxzq4j7vlcdb5p + hash mismatch for store item '/gnu/store/n05zjfhxl0iqx1jbw8i6vv1174zkj7ja-KernSmooth_2.23-17.tar.gz' +``` + +Guix checks and it is not great CRAN allows for changing tarballs with the same version number!! Luckily building with a more recent version of Guix just worked (TM). Now we create a root too: + +``` +/home/wrk/opt/guix-pull/bin/guix pull -p ~/guix-profiles/guix-for-r-shiny +``` + +Note I did not have to pull in guix-bioinformatics channel + +## Singlecell + +Singlecell is an R/shiny app. It starts with an error after above upgrade: + +``` +no slot of name "counts" for this object of class +``` + +and the code needs to be updated: + +=> https://github.com/satijalab/seurat/issues/8804 + +The 4 year old code lives at + +=> https://github.com/genenetwork/singleCellRshiny + +and it looks like lines like these need to be updated: + +=> https://github.com/genenetwork/singleCellRshiny/blob/6b2a344dd0d02f65228ad8c350bac0ced5850d05/app.R#L167 + +Let me ask the author Siamak Yousefi. I think we'll drop it. + +## longevity + +Package definition is at + +=> https://git.genenetwork.org/guix-bioinformatics/tree/gn/packages/mouse-longevity.scm + +Container is at + +=> https://git.genenetwork.org/gn-machines/tree/gn/services/mouse-longevity.scm + +gaeta:~/iwrk/deploy/gn-machines$ guix system container -L . -L ~/guix-bioinformatics --verbosity=3 test-r-container.scm -L ~/iwrk/deploy/guix-forge/guix +forge/nginx.scm:145:40: error: acme-service-type: unbound variable +hint: Did you forget `(use-modules (forge acme))'? + + +## jumpshiny + +Jumpshiny is hosted on balg01. Scripts are in tux02 git. + +=> git.genenetwork.org:/home/git/shared/source/jumpshiny + +``` +root@balg01:/home/j*/gn-machines# . /usr/local/guix-profiles/guix-pull/etc/profile +guix system container --network -L . -L ../guix-forge/guix/ -L ../guix-bioinformatics/ -L ../guix-past/modules/ --substitute-urls='https://ci.guix.gnu.org https://bordeaux.guix.gnu.org https://cuirass.genenetwork.org' test-r-container.scm -L ../guix-forge/guix/gnu/store/xyks73sf6pk78rvrwf45ik181v0zw8rx-run-container +/gnu/store/6y65x5jk3lxy4yckssnl32yayjx9nwl5-run-container +``` + +Currently: + +Jumpshiny: as aijun, cd services/jumpshiny and ./.guix-run + + +## JUMPsem_web + +Another shiny app to run on balg01. + +Jumpshiny: as aijun, cd services/jumpsem and ./.guix-run diff --git a/issues/systems/fallbacks-and-backups.gmi b/issues/systems/fallbacks-and-backups.gmi index 9b890c7..53bd8fa 100644 --- a/issues/systems/fallbacks-and-backups.gmi +++ b/issues/systems/fallbacks-and-backups.gmi @@ -1,6 +1,12 @@ # Fallbacks and backups -As a hurricane is barreling towards our machine room in Memphis we are checking our fallbacks and backups for GeneNetwork. For years we have been making backups on Amazon - both S3 and a running virtual machine. The latter was expensive, so I replaced it with a bare metal server which earns itself (if it hadn't been down for months, but that is a different story). +A revisit to previous work on backups etc. The sheepdog hosts are no longer responding and we should really run sheepdog on a machine that is not physically with the other machines. In time sheepdog should also move away from redis and run in a system container, but that is for later. I did most of the work late 2021 when I wrote: + +> As a hurricane is barreling towards our machine room in Memphis we are checking our fallbacks and backups for GeneNetwork. For years we have been making backups on Amazon - both S3 and a running virtual machine. The latter was expensive, so I replaced it with a bare metal server which earns itself (if it hadn't been down for months, but that is a different story). + +As we are introducing an external sheepdog server we may give it a DNS entry as sheepdog.genenetwork.org. + +=> http://sheepdog.genenetwork.org/sheepdog/index.html See also @@ -16,13 +22,15 @@ See also ## Tasks -* [.] backup ratspub, r/shiny, bnw, covid19, hegp, pluto services -* [X] /etc /home/shepherd backups for Octopus -* [X] /etc /home/shepherd backups for P2 -* [X] Get backups running again on fallback -* [ ] fix redis queue for P2 - needs to be on rabbit +* [X] fix redis queue and sheepdog server +* [X] check backups on tux01 +* [ ] drop tux02 backups off-site +* [ ] backup ratspub, r/shiny, bnw, covid19, hegp, pluto services +* [ ] /etc /home/shepherd backups for Octopus +* [ ] /etc /home/shepherd /home/git CI-CD GN-QA backups on Tux02 +* [ ] Get backups running again on fallback * [ ] fix bacchus large backups -* [ ] backup octopus01:/lizardfs/backup-pangenome on bacchus +* [ ] mount bacchus on HPC ## Backup and restore @@ -52,22 +60,21 @@ Recently epysode was reinstated after hardware failure. I took the opportunity t As epysode was one of the main sheepdog messaging servers I need to reinstate: * [X] scripts for sheepdog -* [X] enable trim -* [X] reinstate monitoring web services -* [X] reinstate daily backup from penguin2 -* [X] CRON -* [X] make sure messaging works through redis -* [X] fix and propagate GN1 backup -* [X] fix and propagate IPFS and gitea backups -* [X] add GN1 backup -* [X] add IPFS backup -* [X] other backups +* [ ] Check tunnel on tux01 is reinstated +* [ ] enable trim +* [ ] reinstate monitoring web services +* [ ] reinstate daily backups +* [ ] CRON +* [ ] make sure messaging works through redis +* [ ] fix and propagate GN1 backup +* [ ] fix and propagate fileserver and git backups +* [ ] add GN1 backup +* [ ] other backups * [ ] email on fail Tux01 is backed up now. Need to make sure it propagates to -* [X] P2 -* [X] epysode -* [X] rabbit -* [X] Tux02 +* [ ] rabbit +* [ ] Tux02 +* [ ] balg01 * [ ] bacchus diff --git a/issues/systems/machine-room.gmi b/issues/systems/machine-room.gmi deleted file mode 100644 index 28d9921..0000000 --- a/issues/systems/machine-room.gmi +++ /dev/null @@ -1,19 +0,0 @@ -# Machine room - -## Tags - -* assign: pjotrp, dana -* type: system administration -* priority: high -* keywords: systems -* status: unclear - -## Tasks - -* [X] Make tux02e visible from outside -* [ ] Network switch 10Gbs - add hosts -* [ ] Add disks to tux01 and tux02 - need to reboot -* [ ] Set up E-mail relay for tux01 and tux02 smtp.uthsc.edu, port 25 - -=> tux02-production.gmi setup new production machine -=> decommission-machines.gmi Decommission machines diff --git a/issues/systems/octopus.gmi b/issues/systems/octopus.gmi index c510fd9..3a6d317 100644 --- a/issues/systems/octopus.gmi +++ b/issues/systems/octopus.gmi @@ -1,6 +1,9 @@ # Octopus sysmaintenance -Reopened tasks because of new sheepdog layout and add new machines to Octopus and get fiber optic network going with @andreag. See also +Reopened tasks because of new sheepdog layout and add new machines to Octopus and get fiber optic network going with @andreag. +IT recently upgraded the network switch, so we should have great interconnect between all nodes. We also need to work on user management and network storage. + +See also => ../../topics/systemtopics/systems/hpcs/hpc/octopus-maintenance @@ -14,7 +17,7 @@ Reopened tasks because of new sheepdog layout and add new machines to Octopus an # Tasks -* [ ] add lizardfs to nodes +* [X] add lizardfs to nodes * [ ] add PBS to nodes * [ ] use fiber optic network * [ ] install sheepdog @@ -36,6 +39,17 @@ default via 172.23.16.1 dev ens1f0np0 # Current topology +vim /etc/ssh/sshd_config +systemctl reload ssh + +The routing should be as on octopus01 + +``` +default via 172.23.16.1 dev eno1 +172.23.16.0/21 dev ens1f0np0 proto kernel scope link src 172.23.18.221 +172.23.16.0/21 dev eno1 proto kernel scope link src 172.23.18.188 +``` + ``` ip a ip route @@ -44,3 +58,9 @@ ip route - Octopus01 uses eno1 172.23.18.188/21 gateway 172.23.16.1 (eno1: Link is up at 1000 Mbps) - Octopus02 uses eno1 172.23.17.63/21 gateway 172.23.16.1 (eno1: Link is up at 1000 Mbps) 172.23.x.x + +# Work + +* After the switch upgrade penguin2 NFS is not visible for octopus01. I disabled the mount in fstab +* On octopus01 disabled unattended upgrade script - we don't want kernel updates on this machine(!) +* Updated IP addresses in sshd_config diff --git a/issues/systems/octoraid-storage.gmi b/issues/systems/octoraid-storage.gmi new file mode 100644 index 0000000..97e0e55 --- /dev/null +++ b/issues/systems/octoraid-storage.gmi @@ -0,0 +1,18 @@ +# OctoRAID + +We are building machines that can handle cheap drives. + +# octoraid01 + +This is a jetson with 4 22TB seagate-ironwolf-pro-st22000nt001-22tb-enterprise-nas-hard-drives-7200-rpm. + +Unfortunately the stock kernel has no RAID support, so we simple mount the 4 drives (hosted on a USB-SATA bridge). + +Stress testing: + +``` +cd /export/nfs/lair01 +stress -v -d 1 +``` + +Running on multiple disks the jetson is holding up well! diff --git a/issues/systems/penguin2-raid5.gmi b/issues/systems/penguin2-raid5.gmi new file mode 100644 index 0000000..f03075d --- /dev/null +++ b/issues/systems/penguin2-raid5.gmi @@ -0,0 +1,61 @@ +# Penguin2 RAID 5 + +# Tags + +* assigned: @fredm, @pjotrp +* status: in progress + +# Description + +The current RAID contains 3 disks: + +``` +root@penguin2:~# cat /proc/mdstat +md0 : active raid5 sdb1[1] sda1[0] sdg1[4] +/dev/md0 33T 27T 4.2T 87% /export +``` + +using /dev/sda,sdb,sdg + +The current root and swap is on + +``` +# root +/dev/sdd1 393G 121G 252G 33% / +# swap +/dev/sdd5 partition 976M 76.5M -2 +``` + +We can therefore add four new disks in slots /dev/sdc,sde,sdf,sdh + +penguin2 has no out-of-band and no serial connector right now. That means any work needs to be done on the terminal. + +Boot loader menu: + +``` +menuentry 'Debian GNU/Linux' --class debian --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-simple-7ff268df-cb90-4cbc-9d76-7fd6677b4964' { + load_video + insmod gzio + if [ x$grub_platform = xxen ]; then insmod xzio; insmod lzopio; fi + insmod part_msdos + insmod ext2 + set root='hd2,msdos1' + if [ x$feature_platform_search_hint = xy ]; then + search --no-floppy --fs-uuid --set=root --hint-bios=hd2,msdos1 --hint-efi=hd2,msdos1 --hint-baremetal=ahci2,msdos1 7ff268df-cb90-4cbc-9d76-7fd6677b4964 + else + search --no-floppy --fs-uuid --set=root 7ff268df-cb90-4cbc-9d76-7fd6677b4964 + fi + echo 'Loading Linux 5.10.0-18-amd64 ...' + linux /boot/vmlinuz-5.10.0-18-amd64 root=UUID=7ff268df-cb90-4cbc-9d76-7fd6677b4964 ro quiet + echo 'Loading initial ramdisk ...' + initrd /boot/initrd.img-5.10.0-18-amd64 +} +``` + +Added to sdd MBR + +``` +root@penguin2:~# grub-install /dev/sdd +Installing for i386-pc platform. +Installation finished. No error reported. +``` diff --git a/issues/systems/t02-crash.gmi b/issues/systems/t02-crash.gmi new file mode 100644 index 0000000..bf0c5d5 --- /dev/null +++ b/issues/systems/t02-crash.gmi @@ -0,0 +1,47 @@ +## Postmortem tux02 crash + +I'll take a look at tux02 - it rebooted last night and I need to start some services. It rebooted at CDT Aug 07 19:29:14 tux02 kernel: Linux version ... We have two out of memory messages before that: + +``` +Aug 7 18:45:27 tux02 kernel: [13521994.665636] Out of memory: Kill process 30165 (guix) score 759 or sacrifice child +Aug 7 18:45:27 tux02 kernel: [13521994.758974] Killed process 30165 (guix) total-vm:498873224kB, anon-rss:223599272kB, file-rss:4kB, shmem-rss:0kB +``` + +My mosh clapped out before that + +``` +wrk pts/96 mosh [128868] Thu Aug 7 18:53 - down (00:00) +``` + +Someone killed the development container before that + +``` +Aug 7 18:06:32 tux02 systemd[1]: genenetwork-development-container.service: Killing process 86832 (20qjyhd7n9n62fa) with signal SIGKILL. +``` + +and + +``` +Aug 7 13:28:26 tux02 kernel: [13502972.611421] oom_reaper: reaped process 25224 (guix), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB +Aug 7 18:16:00 tux02 kernel: [13520227.160945] oom_reaper: reaped process 128091 (guix), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB +``` + +Guix builds running out of RAM... My conclusion is that someone has been doing some heavy lifting. Probably Fred. I'll ask him to use a different machine that is not shared by many people. First I need to bring up some processes. The shepherd had not started, so: + +``` +systemctl status user-shepherd.service +``` + +most services started now. I need to check in half an hour. + +BNW is the one that does not start up automatically. + +``` +su shepherd +herd status +herd stop bnw +herd status bnw +tail -f /home/shepherd/logs/bnw.log +``` + +Shows a process is blocking the port. Kill as root, after making sure herd status shows it as stopped. diff --git a/issues/systems/tux02-production.gmi b/issues/systems/tux02-production.gmi index 7de911f..d811c5e 100644 --- a/issues/systems/tux02-production.gmi +++ b/issues/systems/tux02-production.gmi @@ -14,9 +14,9 @@ We are going to move production to tux02 - tux01 will be the staging machine. Th * [X] update guix guix-1.3.0-9.f743f20 * [X] set up nginx (Debian) -* [X] test ipmi console (172.23.30.40) +* [X] test ipmi console * [X] test ports (nginx) -* [?] set up network for external tux02e.uthsc.edu (128.169.4.52) +* [?] set up network for external tux02 * [X] set up deployment evironment * [X] sheepdog copy database backup from tux01 on a daily basis using ibackup user * [X] same for GN2 production environment diff --git a/issues/systems/tux04-disk-issues.gmi b/issues/systems/tux04-disk-issues.gmi index cea5a59..3df0a03 100644 --- a/issues/systems/tux04-disk-issues.gmi +++ b/issues/systems/tux04-disk-issues.gmi @@ -1,4 +1,4 @@ -# Tux04 disk issues +# Tux04/Tux05 disk issues We are facing some disk issues with Tux04: @@ -6,6 +6,10 @@ We are facing some disk issues with Tux04: May 02 20:57:42 tux04 kernel: Buffer I/O error on device sdf1, logical block 859240457 ``` +and the same happened to tux05 (same batch). Basically the controllers report no issues. Just to be sure we added +a copy of the boot partition. + +=> topics/system/linux/add-boot-partition # Tags @@ -52,6 +56,8 @@ Download megacli from => https://hwraid.le-vert.net/wiki/DebianPackages ``` +apt-get update +apt-get install megacli megacli -LDInfo -L5 -a0 ``` @@ -95,3 +101,323 @@ and nothing ;). Megacli is actually the tool to use ``` megacli -AdpAllInfo -aAll ``` + +# Database + +During a backup the DB shows this error: + +``` +2025-03-02 06:28:33 Database page corruption detected at page 1079428, retrying...\n[01] 2025-03-02 06:29:33 Database page corruption detected at page 1103108, retrying... +``` + + +Interestingly the DB recovered on a second backup. + +The database is hosted on a solid /dev/sde Dell Ent NVMe FI. The log says + +``` +kernel: I/O error, dev sde, sector 2136655448 op 0x0:(READ) flags 0x80700 phys_seg 40 prio class 2 +``` + +Suggests: + +=> https://stackoverflow.com/questions/50312219/blk-update-request-i-o-error-dev-sda-sector-xxxxxxxxxxx + +> The errors that you see are interface errors, they are not coming from the disk itself but rather from the connection to it. It can be the cable or any of the ports in the connection. +> Since the CRC errors on the drive do not increase I can only assume that the problem is on the receive side of the machine you use. You should check the cable and try a different SATA port on the server. + +and someone wrote + +> analyzed that most of the reasons are caused by intensive reading and writing. This is a CDN cache node. Type reading NVME temperature is relatively high, if it continues, it will start to throttle and then slowly collapse. + +and temperature on that drive has been 70 C. + +Mariabd log is showing errors: + +``` +2025-03-02 6:54:47 0 [ERROR] InnoDB: Failed to read page 449925 from file './db_webqtl/SnpAll.ibd': Page read from tablespace is corrupted. +2025-03-02 7:01:43 489015 [ERROR] Got error 180 when reading table './db_webqtl/ProbeSetXRef' +2025-03-02 8:10:32 489143 [ERROR] Got error 180 when reading table './db_webqtl/ProbeSetXRef' +``` + +Let's try and dump those tables when the backup is done. + +``` +mariadb-dump -uwebqtlout db_webqtl SnpAll +mariadb-dump: Error 1030: Got error 1877 "Unknown error 1877" from storage engine InnoDB when dumping table `SnpAll` at row: 0 +mariadb-dump -uwebqtlout db_webqtl ProbeSetXRef > ProbeSetXRef.sql +``` + +Eeep: + +``` +tux04:/etc$ mariadb-check -uwebqtlout -c db_webqtl ProbeSetXRef +db_webqtl.ProbeSetXRef +Warning : InnoDB: Index ProbeSetFreezeId is marked as corrupted +Warning : InnoDB: Index ProbeSetId is marked as corrupted +error : Corrupt +tux04:/etc$ mariadb-check -uwebqtlout -c db_webqtl SnpAll +db_webqtl.SnpAll +Warning : InnoDB: Index PRIMARY is marked as corrupted +Warning : InnoDB: Index SnpName is marked as corrupted +Warning : InnoDB: Index Rs is marked as corrupted +Warning : InnoDB: Index Position is marked as corrupted +Warning : InnoDB: Index Source is marked as corrupted +error : Corrupt +``` + +On tux01 we have a working database, we can test with + +``` +mysqldump --no-data --all-databases > table_schema.sql +mysqldump -uwebqtlout db_webqtl SnpAll > SnpAll.sql +``` + +Running the backup with rate limiting from: + +``` +Mar 02 17:09:59 tux04 sudo[548058]: pam_unix(sudo:session): session opened for user root(uid=0) by wrk(uid=1000) +Mar 02 17:09:59 tux04 sudo[548058]: wrk : TTY=pts/3 ; PWD=/export3/local/home/wrk/iwrk/deploy/gn-deploy-servers/scripts/tux04 ; USER=roo> +Mar 02 17:09:55 tux04 sudo[548058]: pam_unix(sudo:auth): authentication failure; logname=wrk uid=1000 euid=0 tty=/dev/pts/3 ruser=wrk rhost= > +Mar 02 17:04:26 tux04 su[548006]: pam_unix(su:session): session opened for user ibackup(uid=1003) by wrk(uid=0) +``` + +Oh oh + +Tux04 is showing errors on all disks. We have to bail out. I am copying the potentially corrupted files to tux01 right now. We have backups, so nothing serious I hope. I am only worried about the myisam files we have because they have no strong internal validation: + +``` +2025-03-04 8:32:45 502 [ERROR] db_webqtl.ProbeSetData: Record-count is not ok; is 5264578601 Should be: 5264580806 +2025-03-04 8:32:45 502 [Warning] db_webqtl.ProbeSetData: Found 28665 deleted space. Should be 0 +2025-03-04 8:32:45 502 [Warning] db_webqtl.ProbeSetData: Found 2205 deleted blocks Should be: 0 +2025-03-04 8:32:45 502 [ERROR] Got an error from thread_id=502, ./storage/myisam/ha_myisam.cc:1120 +2025-03-04 8:32:45 502 [ERROR] MariaDB thread id 502, OS thread handle 139625162532544, query id 837999 localhost webqtlout Checking table +CHECK TABLE ProbeSetData +2025-03-04 8:34:02 79695 [ERROR] mariadbd: Table './db_webqtl/ProbeSetData' is marked as crashed and should be repaired +``` + +See also + +=> https://dev.mysql.com/doc/refman/8.4/en/myisam-check.html + +Tux04 will require open heart 'disk controller' surgery and some severe testing before we move back. We'll also look at tux05-8 to see if they have similar problems. + +## Recovery + +According to the logs tux04 started showing serious errors on March 2nd - when I introduced sanitizing the mariadb backup: + +``` +Mar 02 05:00:42 tux04 kernel: I/O error, dev sde, sector 2071078320 op 0x0:(READ) flags 0x80700 phys_seg 16 prio class 2 +Mar 02 05:00:58 tux04 kernel: I/O error, dev sde, sector 2083650928 op 0x0:(READ) flags 0x80700 phys_seg 59 prio class 2 +... +``` + +The log started on Feb 23 when we had our last reboot. It probably is a good idea to turn on persistent logging! Anyway, it is likely files were fine until March 2nd. Similarly the mariadb logs also show + +``` +2025-03-02 6:53:52 489007 [ERROR] mariadbd: Index for table './db_webqtl/ProbeSetData.MYI' is corrupt; try to repair it +2025-03-02 6:53:52 489007 [ERROR] db_webqtl.ProbeSetData: Can't read key from filepos: 2269659136 +``` + +So, if we can restore a backup from March 1st we should be reasonably confident it is sane. + +First is to backup the existing database(!) Next restore the new DB by changing the DB location (symlink in /var/lib/mysql as well as check /etc/mysql/mariadb.cnf). + +When upgrading it is an idea to switch on these in mariadb.cnf + +``` +# forcing recovery with these two lines: +innodb_force_recovery=3 +innodb_purge_threads=0 +``` + +Make sure to disable (and restart) once it is up and running! + +So the steps are: + +* [X] install updated guix version of mariadb in /usr/local/guix-profiles (don't use Debian!!) +* [X] repair borg backup +* [X] Stop old mariadb (on new host tux02) +* [X] backup old mariadb database +* [X] restore 'sane' version of DB from borg March 1st +* [X] point to new DB in /var/lib/mysql and cnf file +* [X] update systemd settings +* [X] start mariadb new version with recovery setting in cnf +* [X] check logs +* [X] once running revert on recovery setting in cnf and restart + +OK, looks like we are in business again. In the next phase we need to validate files. Normal files can be checked with + +``` +find -type f \( -not -name "md5sum.txt" \) -exec md5sum '{}' \; > md5sum.txt +``` + +and compared with another set on a different server with + +``` +md5sum -c md5sum.txt +``` + +* [X] check genotype file directory - some MAGIC files missing on tux01 + +gn-docs is a git repo, so that is easily checked + +* [X] check gn-docs and sync with master repo + + +## Other servers + +``` +journalctl -r|grep -i "I/O error"|less +# tux05 +Nov 18 02:19:55 tux05 kernel: XFS (sdc2): metadata I/O error in "xfs_da_read_buf+0xd9/0x130 [xfs]" at daddr 0x78 len 8 error 74 +Nov 05 14:36:32 tux05 kernel: blk_update_request: I/O error, dev sdb, sector 1993616 op 0x1:(WRITE) flags +0x0 phys_seg 35 prio class 0 +Jul 27 11:56:22 tux05 kernel: blk_update_request: I/O error, dev sdc, sector 55676616 op 0x0:(READ) flags +0x80700 phys_seg 26 prio class 0 +Jul 27 11:56:22 tux05 kernel: blk_update_request: I/O error, dev sdc, sector 55676616 op 0x0:(READ) flags +0x80700 phys_seg 26 prio class 0 +# tux06 +Apr 15 08:10:57 tux06 kernel: I/O error, dev sda, sector 21740352 op 0x1:(WRITE) flags 0x1000 phys_seg 4 prio class 2 +Dec 13 12:56:14 tux06 kernel: I/O error, dev sdb, sector 3910157327 op 0x9:(WRITE_ZEROES) flags 0x8000000 phys_seg 0 prio class 2 +# tux07 +Mar 27 08:00:11 tux07 mfschunkserver[1927469]: replication error: failed to create chunk (No space left) +# tux08 +Mar 27 08:12:11 tux08 mfschunkserver[464794]: replication error: failed to create chunk (No space left) +``` + +Tux04, 05 and 06 show disk errors. Tux07 and Tux08 are overloaded with a full disk, but no other errors. We need to babysit Lizard more! + +``` +stress -v -d 1 +``` + +Write test: + +``` +dd if=/dev/zero of=./test bs=512k count=2048 oflag=direct +``` + +Read test: + +``` +/sbin/sysctl -w vm.drop_caches=3 +dd if=./test of=/dev/zero bs=512k count=2048 +``` + + +smartctl -a /dev/sdd -d megaraid,0 + +RAID Controller in SL 3: Dell PERC H755N Front + +# The story continues + +I don't know what happened but the server gave a hard +error in the logs: + +``` +racadm getsel # get system log +Record: 340 +Date/Time: 05/31/2025 09:25:17 +Source: system +Severity: Critical +Description: A high-severity issue has occurred at the Power-On +Self-Test (POST) phase which has resulted in the system BIOS to +abruptly stop functioning. +``` + +Woops! I fixed it by resetting idrac and rebooting remotely. Nasty. + +Looking around I found this link + +=> +https://tomaskalabis.com/wordpress/a-high-severity-issue-has-occurred-at-the-power-on-self-te +st-post-phase-which-has-resulted-in-the-system-bios-to-abruptly-stop-functioning/ + +suggesting we should upgrade idrac firmware. I am not going to do that +without backups and a fully up-to-date fallback online. It may fix the +other hardware issues we have been seeing (who knows?). + +Fred, the boot sequence is not perfect yet. Turned out the network +interfaces do not come up in the right order and nginx failed because +of a missing /var/run/nginx. The container would not restart because - +missing above - it could not check the certificates. + +## A week later + +``` +[SMM] APIC 0x00 S00:C00:T00 > ASSERT [AmdPlatformRasRsSmm] u:\EDK2\MdePkg\Library\BasePciSegmentLibPci\PciSegmentLib.c(766): ((Address) & (0xfffffffff0000000ULL | (3))) == 0 !!!! X64 Exception Type - 03(#BP - Breakpoint) CPU Apic ID - 00000000 !!!! +RIP - 0000000076DA4343, CS - 0000000000000038, RFLAGS - 0000000000000002 +RAX - 0000000000000010, RCX - 00000000770D5B58, RDX - 00000000000002F8 +RBX - 0000000000000000, RSP - 0000000077773278, RBP - 0000000000000000 +RSI - 0000000000000087, RDI - 00000000777733E0 R8 - 00000000777731F8, R9 - 0000000000000000, R10 - 0000000000000000 +R11 - 00000000000000A0, R12 - 0000000000000000, R13 - 0000000000000000 +R14 - FFFFFFFFA0C1A118, R15 - 000000000005B000 +DS - 0000000000000020, ES - 0000000000000020, FS - 0000000000000020 +GS - 0000000000000020, SS - 0000000000000020 +CR0 - 0000000080010033, CR2 - 0000000015502000, CR3 - 0000000077749000 +CR4 - 0000000000001668, CR8 - 0000000000000001 +DR0 - 0000000000000000, DR1 - 0000000000000000, DR2 - 0000000000000000 DR3 - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 - 0000000000000400 +GDTR - 000000007773C000 000000000000004F, LDTR - 0000000000000000 IDTR - 0000000077761000 00000000000001FF, TR - 0000000000000040 +FXSAVE_STATE - 0000000077772ED0 +!!!! Find image based on IP(0x76DA4343) u:\Build_Genoa\DellBrazosPkg\DEBUG_MYTOOLS\X64\DellPkgs\DellChipsetPkgs\AmdGenoaModulePkg\Override\AmdCpmPkg\Features\PlatformRas\Rs\Smm\AmdPlatformRasRsSmm\DEBUG\AmdPlatformRasRsSmm.pdb (ImageBase=0000000076D3E000, EntryPoint=0000000076D3E6C0) !!!! +``` + +New error in system log: + +``` +Record: 341 Date/Time: 06/04/2025 19:47:08 +Source: system +Severity: Critical Description: A high-severity issue has occurred at the Power-On Self-Test (POST) phase which has resulted in the system BIOS to abruptly stop functioning. +``` + +The error appears to relate to AMD Brazos which is probably part of the on board APU/GPU. + +The code where it segfaulted is online at: + +=> https://github.com/tianocore/edk2/blame/master/MdePkg/Library/BasePciSegmentLibPci/PciSegmentLib.c + +and has to do with PCI registers and that can actually be caused by the new PCIe card we hosted. + +# Sept 2025 + +We moved production away from tux04, so now we should be able to work on this machine. + + +## System crash on tux04 + +And tux04 is down *again*. Wow, glad we moved off! I want to fix that machine and we had to move production off! I left the terminal open and the last message is: + +``` +tux04:~$ [SMM] APIC 0x00 S00:C00:T00 > ASSERT [AmdPlatformRasRsSmm] u:\EDK2\MdePkg\Library\BasePciSegmentLibPci\PciSegmentLib.c(766): ((Address) & (0xfffffffff0000000ULL | (3))) == 0 +!!!! X64 Exception Type - 03(#BP - Breakpoint) CPU Apic ID - 00000000 !!!! +RIP - 0000000076DA4343, CS - 0000000000000038, RFLAGS - 0000000000000002 +RAX - 0000000000000010, RCX - 00000000770D5B58, RDX - 00000000000002F8 +RBX - 0000000000000000, RSP - 0000000077773278, RBP - 0000000000000000 +RSI - 0000000000000000, RDI - 00000000777733E0 +R8 - 00000000777731F8, R9 - 0000000000000000, R10 - 0000000000000000 +R11 - 00000000000000A0, R12 - 0000000000000000, R13 - 0000000000000000 +R14 - FFFFFFFFAC41A118, R15 - 000000000005B000 +DS - 0000000000000020, ES - 0000000000000020, FS - 0000000000000020 +GS - 0000000000000020, SS - 0000000000000020 +CR0 - 0000000080010033, CR2 - 00007F67F5268030, CR3 - 0000000077749000 +CR4 - 0000000000001668, CR8 - 0000000000000001 +DR0 - 0000000000000000, DR1 - 0000000000000000, DR2 - 0000000000000000 +DR3 - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 - 0000000000000400 +GDTR - 000000007773C000 000000000000004F, LDTR - 0000000000000000 +IDTR - 0000000077761000 00000000000001FF, TR - 0000000000000040 +FXSAVE_STATE - 0000000077772ED0 +!!!! Find image based on IP(0x76DA4343) u:\Build_Genoa\DellBrazosPkg\DEBUG_MYTOOLS\X64\DellPkgs\DellChipsetPkgs\AmdGenoaModulePkg\Override\AmdCpmPkg\Features\PlatformRas\Rs\Smm\AmdPlatformRasRsSmm\DEBUG\AmdPlatformRasRsSmm.pdb (ImageBase=0000000076D3E000, EntryPoint=0000000076D3E6C0) !!!! +``` + +and the racadm system log says + +``` +Record: 362 +Date/Time: 09/11/2025 21:47:02 +Source: system +Severity: Critical +Description: A high-severity issue has occurred at the Power-On Self-Test (POST) phase which has resulted in the system BIOS to abruptly stop functioning. +``` + +I have seen that before and it is definitely a hardware/driver issue on the Dell itself. I'll work on tha later. Luckily it always reboots. diff --git a/issues/systems/tux04-production.gmi b/issues/systems/tux04-production.gmi new file mode 100644 index 0000000..58ff8c1 --- /dev/null +++ b/issues/systems/tux04-production.gmi @@ -0,0 +1,279 @@ +# Production on tux04 + +Lately we have been running production on tux04. Unfortunately Debian got broken and I don't see a way to fix it (something with python versions that break apt!). Also mariadb is giving problems: + +=> issues/production-container-mechanical-rob-failure.gmi + +and that is alarming. We might as well try an upgrade. I created a new partition on /dev/sda4 using debootstrap. + +The hardware RAID has proven unreliable on this machine (and perhaps others). + +We added a drive on a PCIe raiser outside the RAID. Use this for bulk data copying. We still bootstrap from the RAID. + +Luckily not too much is running on this machine and if we mount things again, most should work. + +# Tasks + +* [X] cleanly shut down mariadb +* [X] reboot into new partition /dev/sda4 +* [X] git in /etc +* [X] make sure serial boot works (/etc/default/grub) +* [X] fix groups and users +* [X] get guix going +* [X] get mariadb going +* [X] fire up GN2 service +* [X] fire up SPARQL service +* [X] sheepdog +* [ ] fix CRON jobs and backups +* [ ] test full reboots + + +# Boot in new partition + +``` +blkid /dev/sda4 +/dev/sda4: UUID="4aca24fe-3ece-485c-b04b-e2451e226bf7" BLOCK_SIZE="4096" TYPE="ext4" PARTUUID="2e3d569f-6024-46ea-8ef6-15b26725f811" +``` + +After debootstrap there are two things to take care of: the /dev directory and grub. For good measure +I also capture some state + +``` +cd ~ +ps xau > cron.log +systemctl > systemctl.txt +cp /etc/network/interfaces . +cp /boot/grub/grub.cfg . +``` + +we should still have access to the old root partition, so I don't need to capture everything. + +## /dev + +I ran MAKEDEV and that may not be needed with udev. + +## grub + +We need to tell grub to boot into the new partition. The old root is on +UUID=8e874576-a167-4fa1-948f-2031e8c3809f /dev/sda2. + +Next I ran + +``` +tux04:~$ update-grub2 /dev/sda +Generating grub configuration file ... +Found linux image: /boot/vmlinuz-5.10.0-32-amd64 +Found initrd image: /boot/initrd.img-5.10.0-32-amd64 +Found linux image: /boot/vmlinuz-5.10.0-22-amd64 +Found initrd image: /boot/initrd.img-5.10.0-22-amd64 +Warning: os-prober will be executed to detect other bootable partitions. +Its output will be used to detect bootable binaries on them and create new boot entries. +Found Debian GNU/Linux 12 (bookworm) on /dev/sda4 +Found Windows Boot Manager on /dev/sdd1@/efi/Microsoft/Boot/bootmgfw.efi +Found Debian GNU/Linux 11 (bullseye) on /dev/sdf2 +``` + +Very good. Do a diff on grub.cfg and you see it even picked up the serial configuration. It only shows it added menu entries for the new boot. Very nice. + +At this point I feel safe to boot as we should be able to get back into the old partition. + +# /etc/fstab + +The old fstab looked like + +``` +UUID=8e874576-a167-4fa1-948f-2031e8c3809f / ext4 errors=remount-ro 0 1 +# /boot/efi was on /dev/sdc1 during installation +UUID=998E-68AF /boot/efi vfat umask=0077 0 1 +# swap was on /dev/sdc3 during installation +UUID=cbfcd84e-73f8-4cec-98ee-40cad404735f none swap sw 0 0 +UUID="783e3bd6-5610-47be-be82-ac92fdd8c8b8" /export2 ext4 auto 0 2 +UUID="9e6a9d88-66e7-4a2e-a12c-f80705c16f4f" /export ext4 auto 0 2 +UUID="f006dd4a-2365-454d-a3a2-9a42518d6286" /export3 auto auto 0 2 +/export2/gnu /gnu none defaults,bind 0 0 +# /dev/sdd1: PARTLABEL="bulk" PARTUUID="b1a820fe-cb1f-425e-b984-914ee648097e" +# /dev/sdb4 /export ext4 auto 0 2 +# /dev/sdd1 /export2 ext4 auto 0 2 +``` + +# reboot + +Next we are going to reboot, and we need a serial connector to the Dell out-of-band using racadm: + +``` +ssh IP +console com2 +racadm getsel +racadm serveraction powercycle +racadm serveraction powerstatus + +``` + +Main trick it so hit ESC, wait 2 sec and 2 when you want the bios boot menu. Ctrl-\ to escape console. Otherwise ESC (wait) ! to get to the boot menu. + +# First boot + +It still boots by default into the old root. That gave an error: + +[FAILED] Failed to start File Syste…a-2365-454d-a3a2-9a42518d6286 + +This is /export3. We can fix that later. + +When I booted into the proper partition the console clapped out. Also the racadm password did not work on tmux -- I had to switch to a standard console to log in again. Not sure why that is, but next I got: + +``` +Give root password for maintenance +(or press Control-D to continue): +``` + +and giving the root password I was in maintenance mode on the correct partition! + +To rerun grup I had to add `GRUB_DISABLE_OS_PROBER=false`. + +Once booting up it is a matter of mounting partitions and tick the check boxes above. + +The following contained errors: + +``` +/dev/sdd1 3.6T 1.8T 1.7T 52% /export2 +``` + +# Guix + +Getting guix going is a bit tricky because we want to keep the store! + +``` +cp -vau /mnt/old-root/var/guix/ /var/ +cp -vau /mnt/old-root/usr/local/guix-profiles /usr/local/ +cp -vau /mnt/old-root/usr/local/bin/* /usr/local/bin/ +cp -vau /mnt/old-root/etc/systemd/system/guix-daemon.service* /etc/systemd/system/ +cp -vau /mnt/old-root/etc/systemd/system/gnu-store.mount* /etc/systemd/system/ +``` + +Also had to add guixbuild users and group by hand. + +# nginx + +We use the streaming facility. Check that + +``` +nginx -V +``` + +lists --with-stream=static, see + +=> https://serverfault.com/questions/858067/unknown-directive-stream-in-etc-nginx-nginx-conf86/858074#858074 + +and load at the start of nginx.conf: + +``` +load_module /usr/lib/nginx/modules/ngx_stream_module.so; +``` + +and + +``` +nginx -t +``` + +passes + +Now the container responds to the browser with `Internal Server Error`. + +# container web server + +Visit the container with something like + +``` +nsenter -at 2838 /run/current-system/profile/bin/bash --login +``` + +The nginx log in the container has many + +``` +2025/02/22 17:23:48 [error] 136#0: *166916 connect() failed (111: Connection refused) while connecting to upstream, client: 127.0.0.1, server: genenetwork.org, request: "GET /gn3/gene/aliases/st%2029:1;o;s HTTP/1.1", upstream: "http://127.0.0.1:9800/gene/aliases/st%2029:1;o;s", host: "genenetwork.org" +``` + +that is interesting. Acme/https is working because GN2 is working: + +``` +curl https://genenetwork.org/api3/version +"1.0" +``` + +Looking at the logs it appears it is a redis problem first for GN2. + +Fred builds the container with `/home/fredm/opt/guix-production/bin/guix`. Machines are defined in + +``` +fredm@tux04:/export3/local/home/fredm/gn-machines +``` + +The shared dir for redis is at + +--share=/export2/guix-containers/genenetwork/var/lib/redis=/var/lib/redis + +with + +``` +root@genenetwork-production /var# ls lib/redis/ -l +-rw-r--r-- 1 redis redis 629328484 Feb 22 17:25 dump.rdb +``` + +In production.scm it is defined as + +``` +(service redis-service-type + (redis-configuration + (bind "127.0.0.1") + (port 6379) + (working-directory "/var/lib/redis"))) +``` + +The defaults are the same as the definition of redis-service-type (in guix). Not sure why we are duplicating. + +After starting redis by hand I get another error `500 DatabaseError: The following exception was raised while attempting to access http://auth.genenetwork.org/auth/data/authorisation: database disk image is malformed`. The problem is it created +a DB in the wrong place. Alright, the logs in the container say: + +``` +Feb 23 14:04:31 genenetwork-production shepherd[1]: [redis-server] 3977:C 23 Feb 2025 14:04:31.040 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo +Feb 23 14:04:31 genenetwork-production shepherd[1]: [redis-server] 3977:C 23 Feb 2025 14:04:31.040 # Redis version=7.0.12, bits=64, commit=00000000, modified=0, pid=3977, just started +Feb 23 14:04:31 genenetwork-production shepherd[1]: [redis-server] 3977:C 23 Feb 2025 14:04:31.040 # Configuration loaded +Feb 23 14:04:31 genenetwork-production shepherd[1]: [redis-server] 3977:M 23 Feb 2025 14:04:31.041 * Increased maximum number of open files to 10032 (it was originally set to 1024). +Feb 23 14:04:31 genenetwork-production shepherd[1]: [redis-server] 3977:M 23 Feb 2025 14:04:31.041 * monotonic clock: POSIX clock_gettime +Feb 23 14:04:31 genenetwork-production shepherd[1]: [redis-server] 3977:M 23 Feb 2025 14:04:31.041 * Running mode=standalone, port=6379. +Feb 23 14:04:31 genenetwork-production shepherd[1]: [redis-server] 3977:M 23 Feb 2025 14:04:31.042 # Server initialized +Feb 23 14:04:31 genenetwork-production shepherd[1]: [redis-server] 3977:M 23 Feb 2025 14:04:31.042 # WARNING Memory overcommit must be enabled! Without it, a background save or replication may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect. +Feb 23 14:04:31 genenetwork-production shepherd[1]: [redis-server] 3977:M 23 Feb 2025 14:04:31.042 # Wrong signature trying to load DB from file +Feb 23 14:04:31 genenetwork-production shepherd[1]: [redis-server] 3977:M 23 Feb 2025 14:04:31.042 # Fatal error loading the DB: Invalid argument. Exiting. +Feb 23 14:04:31 genenetwork-production shepherd[1]: Service redis (PID 3977) exited with 1. +``` + +This is caused by a newer version of redis. This is odd because we are using the same version from the container?! + +Actually it turned out the redis DB was corrupted on the SSD! Same for some other databases (ugh). + +Fred copied all data to an enterprise level storage, and we rolled back to some older DBs, so hopefully we'll be OK for now. + +# Reinstating backups + +In the next step we need to restore backups as described in + +=> /topics/systems/backups-with-borg + +I already created an ibackup user. Next we test the backup script for mariadb. + +One important step is to check the database: + +``` +/usr/bin/mariadb-check -c -u user -p* db_webqtl +``` + +A successful mariadb backup consists of multiple steps + +``` +2025-02-27 11:48:28 +0000 (ibackup@tux04) SUCCESS 0 <32m43s> mariabackup-dump +2025-02-27 11:48:29 +0000 (ibackup@tux04) SUCCESS 0 <00m00s> mariabackup-make-consistent +2025-02-27 12:16:37 +0000 (ibackup@tux04) SUCCESS 0 <28m08s> borg-tux04-sql-backup +2025-02-27 12:16:46 +0000 (ibackup@tux04) SUCCESS 0 <00m07s> drop-rsync-balg01 +``` diff --git a/issues/xapian_bug.gmi b/issues/xapian_bug.gmi index f11b604..068d8eb 100644 --- a/issues/xapian_bug.gmi +++ b/issues/xapian_bug.gmi @@ -5,6 +5,7 @@ * assigned: zsloan * priority: high * type: search +* status: closed * keywords: xapian, gn2, gn3 ## Description |
