summaryrefslogtreecommitdiff
path: root/issues
diff options
context:
space:
mode:
authorAlexander_Kabui2022-09-05 20:08:20 +0300
committerAlexander_Kabui2022-09-05 20:08:20 +0300
commit93cfcd34d73be3c4ab6811b11d9703e7ac091d1b (patch)
treeeb25dfda7d42acc03b039eefdbaa4510591354e2 /issues
parent47b0a949735cbf776b276db08a7e497cdefbfb72 (diff)
parentf52cfbb325ad28cd743ea94b83859977f0063230 (diff)
downloadgn-gemtext-93cfcd34d73be3c4ab6811b11d9703e7ac091d1b.tar.gz
Merge branch 'main' of https://github.com/genenetwork/gn-gemtext-threads into main
Diffstat (limited to 'issues')
-rw-r--r--issues/gemma/report-missing-genotype-file.gmi3
-rw-r--r--issues/genenetwork/genewiki.gmi37
-rw-r--r--issues/genenetwork/global-search.gmi13
-rw-r--r--issues/genenetwork/issue-404-in-logs.gmi11
-rw-r--r--issues/resurrect-mechanical-rob.gmi8
-rw-r--r--issues/sql-too-many-connections.gmi4
-rw-r--r--issues/sqlalchemy.gmi59
-rw-r--r--issues/systems/gn2-time-machines.gmi187
-rw-r--r--issues/systems/second-production-tux02.gmi9
-rw-r--r--issues/systems/tux01-ram-problem.gmi109
-rw-r--r--issues/tests-for-genodb.gmi6
11 files changed, 424 insertions, 22 deletions
diff --git a/issues/gemma/report-missing-genotype-file.gmi b/issues/gemma/report-missing-genotype-file.gmi
new file mode 100644
index 0000000..a801d70
--- /dev/null
+++ b/issues/gemma/report-missing-genotype-file.gmi
@@ -0,0 +1,3 @@
+# GEMMA should report name of missing genotype file
+
+When genenetwork is unable to find a genotype file that GEMMA needs, it should report the name of the missing file in the error message. The correct way to do this is it raise a FileNotFoundError lower down close to the GEMMA call, and handle it higher up close to the web UI.
diff --git a/issues/genenetwork/genewiki.gmi b/issues/genenetwork/genewiki.gmi
index e0a0a00..ce54afb 100644
--- a/issues/genenetwork/genewiki.gmi
+++ b/issues/genenetwork/genewiki.gmi
@@ -25,3 +25,40 @@ with an edit button, similar to
* keywords: GN1, documentation
## Tasks
+
+* [ ] Export Genewiki to markdown - one file per gene and store in git@github.com:genenetwork/gn-docs.git
+* [ ] Format output for GN using markdown parser (similar to other docs)
+* [ ] Provide edit link to github
+
+Later we'll add automated links to wikidata and Uniprot etc.
+
+## Notes
+
+Zach writes: How exactly do we want to store all of this? It appears to currently be
+stored across three SQL tables - GeneRIF, GeneRIFXRef, and GeneCategory.
+The first contains a row for each item a user adds (when displaying all
+items it queries by gene symbol), and the latter two are for storing the
+checkbox stuff (so there will presumably be a row in GeneRIFXRef for every
+checked box for each symbol, though this isn't totally clear to me because
+it's linked by GeneRIF.Id - which isn't unique - rather than GeneRIF.symbol
+which is what I would have assumed).
+
+IIRC the issue I ran into (that isn't immediately apparent from looking at
+the web page) is that it's currently stored as a list of items. There isn't
+a single "free text" area - when a user edits they are either adding a new
+text item with its own row in the DB or editing one of the existing items,
+so I'm not sure how best to reasonably convert the current contents and
+editing method to markdown. Currently it doesn't even support any sort of
+user styling/formatting - users just enter basic text into a form. And if
+they were converted to markdown, how would we be storing the checkbox
+content?
+
+It's probably possible to write a script that goes through those tables and
+generates a bunch of markdown files from them (one for each gene symbol, I
+think?), with the list of items just being converted into a single markdown
+file with those items formatted into a list. This would de-link GN1's
+GeneWiki from GN2's in the future, though (since the way things are stored
+would be fundamentally changed).
+
+Pj: That is what we want. Create a markdown file for each gene symbol.
+Checklist can be part of that using markdown syntax.
diff --git a/issues/genenetwork/global-search.gmi b/issues/genenetwork/global-search.gmi
index a39da80..156145d 100644
--- a/issues/genenetwork/global-search.gmi
+++ b/issues/genenetwork/global-search.gmi
@@ -1,16 +1,19 @@
# Global search problems
-Global search is the top bar of GN2
+Global search is the top bar of GN2.
+
+Note we are replacing search with xapian. So this is less important.
## Tags
* assigned: pjotrp, zsloan
-* status: unclear
-* priority: critical
+* status: later
+* priority: low
* type: bug
* keywords: global search, BRCA2
## Tasks
-* [ ] BRCA2 does not render results in table
-* [ ] 'Brca2' with quotes gives a SQL error
+* [X] BRCA2 does not render results in table
+* [ ] 'Brca2' with quotes gives a SQL error, see
+=> http://genenetwork.org/gsearch?type=gene&terms=%27Brca2%27
diff --git a/issues/genenetwork/issue-404-in-logs.gmi b/issues/genenetwork/issue-404-in-logs.gmi
index 0006896..8e69838 100644
--- a/issues/genenetwork/issue-404-in-logs.gmi
+++ b/issues/genenetwork/issue-404-in-logs.gmi
@@ -1,6 +1,4 @@
-# 404 error in logs
-
-We get many 404's in GN logs. Can we rewire that so no log entries appear as a full stack dump?
+# Better Logging
## Tags
@@ -14,12 +12,9 @@ We get many 404's in GN logs. Can we rewire that so no log entries appear as a f
=> https://flask.palletsprojects.com/en/2.0.x/errorhandling/
-Some of those 404's in our log
-mean that we forgot to package something; for
-example:
+Some of those 404's in our log mean that we forgot to package something; for example:
-=>
-https://git.genenetwork.org/guix-bioinformatics/guix-bioinformatics/commit/e80fe4ddcf15e21004b8135cf8af34b458697f64
+=> https://git.genenetwork.org/guix-bioinformatics/guix-bioinformatics/commit/e80fe4ddcf15e21004b8135cf8af34b458697f64
Removing the 404's would prevent us from catching important errors if ever they occur. I suggest we fix the 404's; some of them have a cascading effect, like the font-awesome missing "webfonts" folder I just fixed that leads to a lot of unnecessary 404s.
diff --git a/issues/resurrect-mechanical-rob.gmi b/issues/resurrect-mechanical-rob.gmi
index bea2a78..d456864 100644
--- a/issues/resurrect-mechanical-rob.gmi
+++ b/issues/resurrect-mechanical-rob.gmi
@@ -9,3 +9,11 @@ We need to run Mechanical Rob tests as part of our continuous integration tests.
* status: in progress
* type: enhancement
* priority: medium
+
+## Resolution
+
+The Mechanical Rob CI tests are functioning again now. To see how to run Mechanical Rob, see the CI job definition in the genenetwork-machines repo.
+=> https://git.genenetwork.org/arunisaac/genenetwork-machines/src/branch/main/genenetwork-development.scm
+The invocation procedure is bound to change as the many environment variables in genenetwork2 are cleared up.
+
+* closed
diff --git a/issues/sql-too-many-connections.gmi b/issues/sql-too-many-connections.gmi
index 68ed23d..93b8587 100644
--- a/issues/sql-too-many-connections.gmi
+++ b/issues/sql-too-many-connections.gmi
@@ -8,8 +8,8 @@
## Tasks
-* [ ] Figure out root cause
-* [ ] Send patch
+* [x] Figure out root cause
+* [x] Send patch
## Description
diff --git a/issues/sqlalchemy.gmi b/issues/sqlalchemy.gmi
new file mode 100644
index 0000000..e3ea894
--- /dev/null
+++ b/issues/sqlalchemy.gmi
@@ -0,0 +1,59 @@
+# Replace sqlalchemy with MySQLdb
+
+## Tags
+
+* assigned: bonfacem, zachs
+* type: refactor
+* priority: medium
+
+## Description
+
+Connections that use sqlalchemy are the only placen in GN2 where connections remain "open" indefinitely until a connection is closed. In the event that we have many users at the same time, say like during one of Rob's classes; and they do a search, we have N connections indefinitely open until their sessions are killed. And removing that is trivial, and to demonstrate that using a random example from GN2 (/wqflask/wqflask/search_results.py):
+
+```
+def get_GO_symbols(a_search):
+ query = """SELECT genes
+ FROM GORef
+ WHERE goterm='{0}:{1}'""".format(a_search['key'], a_search['search_term'][0])
+
+ gene_list = g.db.execute(query).fetchone()[0].strip().split()
+
+ new_terms = []
+ for gene in gene_list:
+ this_term = dict(key=None,
+ separator=None,
+ search_term=[gene])
+
+ new_terms.append(this_term)
+
+ return new_terms
+```
+
+could be replaced with:
+
+```
+ def get_GO_symbols(a_search):
+- query = """SELECT genes
+- FROM GORef
+- WHERE goterm='{0}:{1}'""".format(a_search['key'], a_search['search_term'][0])
+-
+- gene_list = g.db.execute(query).fetchone()[0].strip().split()
+-
+- new_terms = []
+- for gene in gene_list:
+- this_term = dict(key=None,
+- separator=None,
+- search_term=[gene])
+-
+- new_terms.append(this_term)
+-
+- return new_terms
++ genes = []
++ with database_connection() as conn:
++ with conn.cursor() as cursor:
++ cursor.execute("SELECT genes FROM GORef WHERE goterm=%s",
++ (a_search.get("key")))
++ genes = cursor.fetchone()[0].strip().split()
++ return [dict(key=None, separator=None, search_term=[gene])
++ for gene in genes]
+```
diff --git a/issues/systems/gn2-time-machines.gmi b/issues/systems/gn2-time-machines.gmi
index 68ddaa9..513a91a 100644
--- a/issues/systems/gn2-time-machines.gmi
+++ b/issues/systems/gn2-time-machines.gmi
@@ -2,9 +2,7 @@
GN1 time machines are pretty straightforward. With GN2 the complexity has increased a lot because of interacting services and a larger dependency graph.
-Here I track what it takes today to install an instance of GN2 that is 'frozen' in time.
-
-- [X] Install Mariadb and recover production DB (est. 3-4 hrs)
+Here I track what it takes today to install a fallback instance of GN2 that is 'frozen' in time.
## Tags
@@ -16,14 +14,14 @@ Here I track what it takes today to install an instance of GN2 that is 'frozen'
## Tasks
-General time line:
+Also a time line:
-* [X] Install machine software and physical (4 hours)
+* [X] Install machine software and physical (est. 4-8 hours)
* [X] Sync backups on a daily basis and add monitoring (2 hours)
* [X] Set up Mariadb and sync from backup (4 hours)
-* [ ] GN2 production environment
-* [ ] GN3 aliases server (Racket)
+* [X] GN2 production environment with nginx & genotype_files (2 hours)
* [ ] GN3 Genenetwork3 service (Python)
+* [ ] GN3 aliases server (Racket)
* [ ] GN3 auth proxy (Racket)
* [ ] set up https and letsencrypt
* [ ] setup logrotate for production log files
@@ -41,6 +39,18 @@ guix pull -p ~/opt/guix-pull
guix package -i mariadb -p /usr/local/guix-profiles/mariadb
```
+To get to genenetwork we use a channel. The last working channel on the CI can be downloaded from https://ci.genenetwork.org/channels.scm. Now do
+
+```
+guix pull -C channels.scm -p ~/opt/guix-gn-channel
+. ~/opt/guix-gn-channel/etc/profile
+guix package -i genenetwork2 -p ~/opt/genenetwork2
+```
+
+That sets the profile to ~/opt/genenetwork2.
+
+Note that these commands may take a while. And when guix starts building lots of software it may be necessary to configure a substitute server (we use guix.genenetwork.org) adding --substitute-urls="http://guix.genenetwork.org https://ci.guix.info".
+
### Mariadb (est. 1-2 hours)
Set up a global Mariadb
@@ -129,3 +139,166 @@ In the process I discover that ibdata1 file has grown to 100GB. Not a problem ye
=> https://www.percona.com/blog/2013/08/20/why-is-the-ibdata1-file-continuously-growing-in-mysql/
(obviously we don't want to use mysqldump right now, but I'll need to do some future work).
+
+### Setting up GN2
+
+Create a gn2 user and checkout the git repo in /home/gn2/production/gene. Note that there exists also a backup of gn2 in borg which has a 'run_production.sh' script.
+
+Running the script will give feedback
+
+```
+su gn2
+cd /home/gn2/production/
+sh run_production.sh
+```
+
+You'll find you need the Guix install of gn2. Starting with guix section above.
+
+### Genotype files
+
+GN2 requires a set of files that is in the backup
+
+```
+borg extract borg-genenetwork::borg-ZACH-home-20220819-04:04-Fri home/zas1024/gn2-zach/genotype_files/
+```
+
+move the genotype_files and update the path in `gn2_settings.py` which is in the same dir as the run_production.sh script.
+
+### Configure Nginx
+
+You'll need to tell Nginx to forward to the web server. Something like:
+
+```
+server {
+ listen 80;
+ server_name gn2-fallback.genenetwork.org;
+
+ access_log /var/log/nginx/gn2-danny-access.log;
+ error_log /var/log/nginx/gn2-danny-error.log;
+
+ location / {
+ proxy_pass http://127.0.0.1:5000/;
+ proxy_redirect off;
+
+ proxy_set_header Host $host;
+ proxy_set_header X-Real-IP $remote_addr;
+ proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
+
+ client_max_body_size 8050m;
+ proxy_read_timeout 300;
+ proxy_connect_timeout 300; proxy_send_timeout 300;
+
+ }
+}
+```
+
+### Setting up GN3
+
+Without gn3 the menu will not show on the main page and you see 'There was an error retrieving and setting the menu. Try again later.'
+
+GN3 is a separate REST server that has its own dependencies. A bit confusingly it is also a Python module dependency for GN2. So we need to set up both 'routes'.
+
+First checkout the genenetwork3 repo as gn2 user
+
+```
+su gn2
+cd /home/gn2
+mkdir -p gn3_production
+cd gn3_production
+git clone https://github.com/genenetwork/genenetwork3.git
+```
+
+Check the genenetwork3 README for latest instructions on starting the service as a Guix container. Typically
+
+```
+guix shell -C --network --expose=$HOME/production/genotype_files/ -Df guix.scm
+```
+
+where genotype_files is the dir you installed earlier.
+
+Run it with, for example
+
+```
+export FLASK_APP="main.py"
+flask run --port=8081
+```
+
+I.e., the same port as GN2 expects in gn2_settings.py. Test with
+
+```
+curl localhost:8081/api/version
+"1.0"
+```
+
+Next set up the external API with nginx by adding the following path to above definition:
+
+```
+ location /gn3 {
+ rewrite /gn3/(.*) /$1 break;
+ proxy_pass http://127.0.0.1:8081/;
+ proxy_redirect off;
+ proxy_set_header Host $host;
+ }
+```
+
+and if DNS is correct you should get
+
+```
+curl gn2-fallback.genenetwork.org/gn3/api/version
+"1.0"
+```
+
+To generate the main menu the server does a request to
+$.ajax(gn_server_url +'api/menu/generate/json. On production that is
+https://genenetwork.org/api3/api/menu/generate/json which is actually gn3(!)
+
+```
+curl http://gn2-fallback.genenetwork.org/gn3/api/menu/generate/json
+```
+
+If this gives an error check the gn3 output log.
+
+Perhaps obviously, on a production server GN3 should be running as a proper service.
+
+### Alias service
+
+There is another GN3 service that resolves wikidata Gene aliases
+
+```
+su gn2
+cd ~/gn3_production
+git clone https://github.com/genenetwork/gn3.git
+```
+
+follow the instructions in the README and you should get
+
+```
+curl localhost:8000/gene/aliases/Shh
+["Hx","ShhNC","9530036O11Rik","Dsh","Hhg1","Hxl3","M100081","ShhNC"]
+```
+
+### Authentication proxy
+
+The proxy also needs to run.
+
+```
+su gn2
+cd ~/gn3_production
+git clone https://github.com/genenetwork/gn-proxy.git
+```
+
+See README
+
+### Trouble shooting
+
+Check the server log for errors from the server. There should be one in /home/gn2/production/tmp/. For example you may see
+
+```
+ERROR:wqflask:404: Not Found: 7:20AM UTC Aug 20, 2022: http://gn2-fallback.genenetwork.org/api/api/menu/generate/json
+```
+
+pointing out the setting in gn2_settings.py is wrong.
+
+Use the console bar of the browse to see what JS error you get.
+
+If you get CORS errors it is because you are using a server that is not genenetwork.org and this is usually a configuration issue.
diff --git a/issues/systems/second-production-tux02.gmi b/issues/systems/second-production-tux02.gmi
new file mode 100644
index 0000000..161629a
--- /dev/null
+++ b/issues/systems/second-production-tux02.gmi
@@ -0,0 +1,9 @@
+# Second production on tux02
+
+* assigned: aruni
+
+Set up a second production system on tux02. This will be fully configured using Guix and will be able to roll back to previous states easily. The Guix configuration of this system should go into the genenetwork-machines repo.
+=> https://git.genenetwork.org/arunisaac/genenetwork-machines genenetwork-machines repo
+
+This issue likely obsoletes
+=> /issues/systems/tux02-production
diff --git a/issues/systems/tux01-ram-problem.gmi b/issues/systems/tux01-ram-problem.gmi
new file mode 100644
index 0000000..90b37a0
--- /dev/null
+++ b/issues/systems/tux01-ram-problem.gmi
@@ -0,0 +1,109 @@
+# tux01 running out of RAM
+
+Tux01 ran out of steam.
+
+## Tags
+
+* assigned: pjotrp, zsloan
+* type: systems
+* keywords: database
+* status: unclear
+* priority: medium
+
+## Tasks
+
+* [X] post-mortem (see below)
+* [ ] free up disk space
+* [ ] update nvme firmware
+* [ ] convert remaining tables to innodb
+* [ ] monitor mariadb internals
+* [ ] find out what can have caused an OOM
+
+## Notes
+
+Some post mortem:
+
+* GN1 uses 10% of RAM, that is a bit high
+* Other services behaving fine
+* dirs look fine though /home only has 80G left
+* dmesg shows serial console crashes
+ + kthread starved
+ + RIP: 0010:serial8250_console_write+0x3d/0x2b0
+ + Out of memory: Kill process 4361 (mysqld)
+ + mysqld was using 14006154 pages of 4096 size = 53Gb RAM
+* daemon log shows restart of mysql at 2am
+* syslog: Sep 3 02:07:01 tux01 kernel: [18254757.549855] oom_reaper: reaped process 4361 (mysqld), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
+
+On to the mysql logs, after the crash
+
+* 2022-09-03 2:07:50 68 [ERROR] mysqld: Table './db_webqtl/CaseAttributeXRefNew' is marked as crashed and should be repaired
+* 2022-09-03 2:07:50 9 [ERROR] mysqld: Table './db_webqtl/GeneRIF' is marked as crashed and should be repaired
+
+right before the crash
+
+* 2022-09-03 2:05:16 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 5476ms. The settings might not be optimal. (flushed=0 and evicted=0, during the time.)
+
+The mysql slow query log shows a number of slow queries before 2am. Which is not normal. So it seems it was leading up to a crash. Most of these queries refer to GeneRIF:
+
+```
+# Time: 220903 1:25:34
+# User@Host: webqtlout[webqtlout] @ [128.169.4.67]
+# Query_time: 772.880949 Lock_time: 0.001206 Rows_sent: 157432 Rows_examined: 472318
+# Rows_affected: 0 Bytes_sent: 29887236
+SET timestamp=1662168334;
+select distinct Species.FullName, GeneRIF_BASIC.GeneId, GeneRIF_BASIC.comment, GeneRIF_BASIC.PubMed_ID from GeneRIF_BASIC, Species where GeneRIF_BASIC.symbol=''OR ELT(6676=1964,1964) AND '4YK4' LIKE '4YK4' and GeneRIF_BASIC.SpeciesId = Species.Id order by Species.Id, GeneRIF_BASIC.createtime;
+# Time: 220903 1:26:31
+```
+
+let's try to run that by hand. It returns 157432 rows in set (2.523 sec). So it is fine now. It might be that on reboot the table got fixed, but we'll check the tables anyway. First take a look at the state of the engine itself as described in
+
+=> ../database-not-responding.gmi
+
+Also
+
+```
+MariaDB [db_webqtl]> CHECK TABLE GeneRIF;
++-------------------+-------+----------+----------+
+| Table | Op | Msg_type | Msg_text |
++-------------------+-------+----------+----------+
+| db_webqtl.GeneRIF | check | status | OK |
++-------------------+-------+----------+----------+
+1 row in set (0.014 sec)
+```
+
+So the tables were repaired on restarting mariadb - something we set it up to do. We should convert these tables to innodb (from myisam), but I have been postponing that until we have a large enough SSD for mariadb.
+
+## Check RAID and disks
+
+/dev/sda is on a PERC H740P Adp. controller. A quick search shows that there are no real known issues with these RAID controllers after 4 years. Pretty impressive.
+
+The following show no errors logged:
+
+```
+hdparm -I /dev/sda
+smartctl -a /dev/sda -d megaraid,0
+```
+
+Same for disk /dev/sdb and
+
+```
+smartctl -x /dev/nvme0
+smartctl -x /dev/nvme1
+```
+
+It looks like there is nothing to worry about.
+
+A search for nvme 'Dell Express Flash PM1725a problems' shows this an issue where disks go offline and that can be solved with Dell Express Flash NVMe PCIe SSD PM1725a, version 1.1.2, A03.
+We are on 1.0.4.
+
+=> https://www.dell.com/support/kbdoc/fr-fr/000177934/dell-technologies-a-pm1725a-may-go-offline-with-various-errors-including-nvme-remove-namespaces?lang=en
+
+Dell engineers have observed an infrequent issue during system operations, using the Dell PM1725a Express Flash NVMe PCIe SSD, in which the device may go offline and remain inaccessible. The drive may be accessible again after a reboot.
+
+The disks /dev/sda and /dev/sdb is Model Family: Seagate Barracuda 2.5 5400 Device Model: ST5000LM000-2AN170 and appear to be behaving well.
+
+## Conclusion
+
+No real problems surface on those checks. So it looks like a table went out of wack and killed mariadb. It does not explain the RAM issue though. Why the the OOM killer had mariadb killed at 50Gb? It was the largest process, but not all RAM was used.
+
+Recommendations: see tasks above
diff --git a/issues/tests-for-genodb.gmi b/issues/tests-for-genodb.gmi
index e4398d2..957dca7 100644
--- a/issues/tests-for-genodb.gmi
+++ b/issues/tests-for-genodb.gmi
@@ -8,3 +8,9 @@ The genodb genotype database implementation is explained in detail at
=> /topics/genotype-database.html
* assigned: aruni
+
+## Resolution
+
+Tests have now been written.
+
+* closed