blob: c36c3458fd6b0b5bb750ce2d6420e10f6b2c8d58 (
plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
|
# Correlations time out
We are seeing errors with correlations which happen for the larger requests, such as
=> http://gn2.genenetwork.org/show_trait?trait_id=ENSG00000244734&dataset=GTEXv8_Wbl_tpm_0220
It looks like correlations don't finish - and worse the running processes take a significant chunk of RAM, around 10GB each. Eventually they disappear.
## Tags
* assigned: pjotrp, zachs, alexm
* keywords: correlations, time out
* type: bug
* status: closed, completed
* priority: critical
## Tasks
* [X] Set up OOM killer
* [X] Prevent many GN2 threads taking too much RAM
* [ ] Disable URL messages
## Duplicate
* duplicate
=> slow-correlations.gmi
## OOM Killer
To prevent future OOM killing I set the memory allocation in sysctl.conf to
```
vm.overcommit_ratio=95
vm.overcommit_memory=2
```
The running kernel did not accept these, so it requires a reboot (later).
## GN2 threads
One error I am seeing in the logs is
```
/export/local/home/zas1024/opt/genenetwork2_20210805/lib/python3.8/site-packages/scipy/stats/stats.py:3913: PearsonRConstantInputWarning: An input array is constant; the correlation coefficent is not defined.warnings.warn(PearsonRConstantInputWarning())
```
This is bleeding through to GN2 as a library which explains why it is
not the GN3 server growing.
In all, the instability is probably caused be a computation going out of whack. What is worrisome is the amount of RAM all processes take. Python is not cleaning up. We should start by lowering gunicorn cleanup routines, such as --max-requests INT and --max-requests-jitter INT in
=> https://docs.gunicorn.org/en/stable/settings.html Gunicorn settings
see
=> https://github.com/genenetwork/genenetwork2/commit/d29dfb72ceb005ab045203a36b1fc1552544b1e2
## GN3 Threads
Currently the GN3 API is run with
```
env FLASK_DEBUG=1 FLASK_APP="main.py" flask run --port=8086
```
I added gunicorn for production
=> https://github.com/genenetwork/genenetwork3#running-the-flask-app GN3 gunicorn instructions
## Log noise
In the logs we see also quite a bit of noise. Should disable that:
```
Werkzeug.exceptions.NotFound: 404 Not Found: The requested URL was not found on the server. If you entered the URL manually please check your spelling and try again.
```
## 2022-11-22
Tested this issue, and there was no timeout. Closing this as completed.
|