1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
|
# Meeting Notes
## 2024-06-21
### Outage for 2024-06-20
What happened?
2024-06-19: Dev experienced intermittent ssh connections, and got logged out
2024-06-19: Dev assumed this wasn't related to the server problems and clocked out
2024-06-20: Another team mate experienced problems accessing git.genenetwork.org and reported this to the genenetwork sphere channel
2024-06-20: We noted that CI and CD services were also down
2024-06-20: We assumed it could be a DNS issue and followed up with our providers??
2024-06-20: Realized it was because the server was out of RAM. Killing processes resolved the issue
What can we learn from this?
* If we experience network problems, communicate to other team members.
* Our gunicorn processes are expensive each taking 17GB.
* We don't have monitoring and alerting for our server resources.
* Shouldn't OOM have helped with this?
* How much memory/resources do the scripts we run use? This can help us know the impact before running in production.
* If we start multiple processes from python, similar to the index-genenetwork script, does sending SIGTERM or SIGKILL kill the children processes or will they remain as orphans? Note: there were orphans, so we should investigate ways of killing the script next time completely.
How do we prevent something similar from happening in the future?
* Ask if anything our server is slow and attempt to inspect this. Add documentation for quick bash scripts to run for this.
* Reduce the amount of gunicorn processes to reduce their memory footprints. How did we end up with the no of processes we currently have? What impact will reducing this have on our users?
* Attempt to get an estimate memory footprint for `index-genenetwork` and use this to determine when it's safe to run the script or not. This can even end up integrated into the cron job.
* Create some alerting mechanism for sane thresholds that can be send to a common channel/framework e.g. when CPU usage > 90%, memory usage > 90% etc. This allows someone to be on the look out in case something drastic needs to be taken.
* Python doesn't kill child processes when SIGTERM is used. This means when testings, we were creating more and more orphaned processes. Investigate how to propagate the SIGTERM signal to all children processes.
=> https://docs.python.org/3/library/multiprocessing.html#multiprocessing.Process.terminate
## 2024-06-18
### Agenda
* Last week review: DONE, made good progress on what we planned. We need to make sure we're more aware of the TODOs we had.
* Plan for this week: DONE, LET'S GOOO!
* Reviewing patches: added to week's goals.
* Search bugs discussions: Boni will create an issue with all details and we'll plan further on how to attack this.
If we have a plan for the week, and something comes up that breaks our plan:
* Are we aware that it broke our plans?
* Communicate the impact this may have on the plan.
### Plan for Week
* TODO: @bmunyoki rebuild guix container with new mcron changes
* TODO: @jnduli attempts to make UI change that shows all supported keys in the search
* TODO: @bmunyoki create an issue with all the problems experienced with search and potential solutions. Make sure it has replication steps, and plans for solutions.
* TODO: @bmunyoki follows up to make sure RDF changes are visible in production and fix issues that come up
* TODO: @bmunyoki and @jnduli genewiki indexing
* TODO: @bmunyoki demoes and documents how to run and test guix cron job for indexing
* TODO: @bmunyoki trains @jnduli on how to review patchsets from emails
* TODO: @jnduli attempts to add stronger types to index-genenetwork script, to make it explicit that we're using MonadicDicts
* TODO: @jnduli Documentation improvements for GN2 and GN3 and auth? None done last week.
* TODO: @jnduli Follow up notes on setting up local index-genenetwork search
Nice to haves:
* TODO: minor: bonfacem makes sure that mypy in CI runs against the index-genenetwork script.
* TODO: @bmunyoki improve search documentation and fix bugs in the frontend: binary term search doesn't work as expected
* TODO: @bmunyoki follow up with Rob to makes sure he tests search after everything is complete
* TODO: @bmunyoki follow up how do we make sure that xapian prefix changes in code retrigger xapian indexing?
## 2024-06-11
### Agenda
* Local checks to do before PRs:
In gn3, make sure to run:
> pylint main.py setup.py wsgi.py setup_commands tests gn3 scripts
> TODO jnduli run this: mypy --show-error-codes .
DONE: led to a PR that fixed all mypy errors in index-genenetwork
> pytest -k unit_test
* Set up new DB before sync + fixing any problems that occur: DONE, no errors after Jnduli set up new DB.
### Generif Indexing
* Probeset data exists in SQL. Generif metadata exists in RDF.
* Write code that queries RDF for Generif metadata and enriches exising Probeset query.
* DONE: checksums in Generif rdf output
* TODO: jnduli look at xapian docs and their example for python bindings: DONE, led to a WIP PR for wiki data indexing and a local custom script to search the index without the need to run genenetwork web service
* TODO: bonfacem makes sure tux02 indexing works: DONE, there's a working patchset sent to Arun for review.
* TODO: bonfacem make changes to mcron and guix-machines: DONE, there's a working patch set sent to Arun. Follow up needs to be done for the learnings gained
### How would GeneWiki work?
* GeneWiki = GeneRif data from NCBI
* Workflow would be similar to Generif Indexing. We need to figure out if we'll need an extra RDF query or if we can modify the existing SPARQL query.
* TODO: jnduli attempts to add stronger types to index-genenetwork script, to make it explicit that we're using MonadicDicts: Not DONE, got stuck trying to run index-genenetwork locally and removing global variables from the script..
* TODO: bonfacem makes sure that mypy in CI runs against the index-genenetwork script: NOT DONE, needs a separate PR that will be sent to Arun.
|