1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
|
# Phenotype Correlation Error
## Tags
* type: bug
* priority: high
* status: closed
* assigned: fredm, zachs
* keywords: correlation
## Fixed: Correlation against phenotypes fails (for at least some datasets)
Example: Run a correlation against BXD Published Phenotypes (at the top of the drop-down menu) from here -
=> https://genenetwork.org/show_trait?trait_id=1442370_at&dataset=HC_M2_0606_P
The bug appears to occur in the rust correlation tool, so I'm not sure how to debug it myself. The last few linnes of the stack trace are as follows:
```
File "/export2/local/home/zas1024/gn2-zach/gene/wqflask/wqflask/correlation/rust_correlation.py", line 262, in __compute_sample_corr__
return run_correlation(
File "/usr/local/guix-profiles/gn-latest-20220820/lib/python3.9/site-packages/gn3/computations/rust_correlation.py", line 58, in run_correlation
subprocess.run(command_list, check=True)
File "/gnu/store/qar3sks5fwzm91bl3d3ngyrvxs7ipj5z-python-3.9.9/lib/python3.9/subprocess.py", line 528, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['/usr/local/guix-profiles/gn-latest-20220820/bin/correlation_rust', '/home/zas1024/gn2-zach/tmp/gn2/correlation/IoaglmTgDJ.json', '/home/zas1024/gn2-zach/tmp/gn2']' died with <Signals.SIGSEGV: 11>.
```
## Fixed: Processing for Output too Early
After fixing the issues with the interactions with the rust correlations code, I am now running into the following error when I run a correlation against the "Hippocampus Consortium M430v2 (Jun06) PDNN" dataset with the same trait from the URI above:
```
Traceback (most recent call last):
File "/home/frederick/opt/gn_profiles/gn2_latest/lib/python3.9/site-packages/flask/app.py", line 1523, in full_dispatch_request
rv = self.dispatch_request()
File "/home/frederick/opt/gn_profiles/gn2_latest/lib/python3.9/site-packages/flask/app.py", line 1509, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args)
File "/home/frederick/genenetwork/genenetwork2/wqflask/wqflask/views.py", line 820, in corr_compute_page
correlation_results = set_template_vars(request.form, correlation_results)
File "/home/frederick/genenetwork/genenetwork2/wqflask/wqflask/correlation/show_corr_results.py", line 54, in set_template_vars
table_json = correlation_json_for_table(correlation_data,
File "/home/frederick/genenetwork/genenetwork2/wqflask/wqflask/correlation/show_corr_results.py", line 104, in correlation_json_for_table
target_trait_ob = create_trait(dataset=target_dataset_ob,
File "/home/frederick/genenetwork/genenetwork2/wqflask/base/trait.py", line 44, in create_trait
the_trait = retrieve_trait_info(
File "/home/frederick/genenetwork/genenetwork2/wqflask/base/trait.py", line 599, in retrieve_trait_info
raise KeyError(repr(trait.name)
KeyError: "'1' information is not found in the database."
```
The error above was caused by processing the data for output way too early. This has been fixed.
## Tissue Correlation: Probeset Trait Against Publish/Genotype Dataset
Running "Tissue" correlations on
=> https://genenetwork.org/show_trait?trait_id=1442370_at&dataset=HC_M2_0606_P
against the "BXD Published Phenotypes" database fails with the error:
This also fails if you run it against the "BXD Genotypes" dataset.
```
Traceback (most recent call last):
File "/usr/local/guix-profiles/gn-latest-20220820/lib/python3.9/site-packages/flask/app.py", line 1523, in full_dispatch_request
rv = self.dispatch_request()
File "/usr/local/guix-profiles/gn-latest-20220820/lib/python3.9/site-packages/flask/app.py", line 1509, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args)
File "/home/gn2/production/gene/wqflask/wqflask/views.py", line 820, in corr_compute_page
correlation_results = set_template_vars(request.form, correlation_results)
File "/home/gn2/production/gene/wqflask/wqflask/correlation/show_corr_results.py", line 54, in set_template_vars
table_json = correlation_json_for_table(correlation_data,
File "/home/gn2/production/gene/wqflask/wqflask/correlation/show_corr_results.py", line 104, in correlation_json_for_table
target_trait_ob = create_trait(dataset=target_dataset_ob,
File "/home/gn2/production/gene/wqflask/base/trait.py", line 44, in create_trait
the_trait = retrieve_trait_info(
File "/home/gn2/production/gene/wqflask/base/trait.py", line 599, in retrieve_trait_info
raise KeyError(repr(trait.name)
KeyError: "'1422223_at' information is not found in the database."
```
so far, triangulated the issue to possibly being the fact that the "target_dataset" value is not used
=> https://github.com/genenetwork/genenetwork2/blob/53aa084fd2c9c930ac791ee43affffb3f788547c/wqflask/wqflask/correlation/rust_correlation.py#L271-L289 in this function.
## Literature Correlation: Probeset Trait Against Publish/Genotype Dataset
Run literature correlation for
=> http://localhost:5033/show_trait?trait_id=1442370_at&dataset=HC_M2_0606_P this trait
against the "BXD Published Phenotype" database and observe the following exception:
This also fails if you run it against the "BXD Genotypes" dataset.
```
ERROR:wqflask:http://localhost:5033/corr_compute ( 4:26AM UTC Oct 03, 2022)
Traceback (most recent call last):
File "/home/frederick/opt/gn_profiles/gn2_latest/lib/python3.9/site-packages/flask/app.py", line 1523, in full_dispatch_request
rv = self.dispatch_request()
File "/home/frederick/opt/gn_profiles/gn2_latest/lib/python3.9/site-packages/flask/app.py", line 1509, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args)
File "/home/frederick/genenetwork/genenetwork2/wqflask/wqflask/views.py", line 818, in corr_compute_page
correlation_results = compute_correlation(request.form, compute_all=True)
File "/home/frederick/genenetwork/genenetwork2/wqflask/wqflask/correlation/correlation_gn3_api.py", line 199, in compute_correlation
return compute_correlation_rust(
File "/home/frederick/genenetwork/genenetwork2/wqflask/wqflask/correlation/rust_correlation.py", line 326, in compute_correlation_rust
results = corr_type_fns[corr_type](
File "/home/frederick/genenetwork/genenetwork2/wqflask/wqflask/correlation/rust_correlation.py", line 299, in __compute_lit_corr__
(this_trait_geneid, geneid_dict, species) = do_lit_correlation(
File "/home/frederick/genenetwork/genenetwork2/wqflask/wqflask/correlation/correlation_gn3_api.py", line 237, in do_lit_correlation
geneid_dict = this_dataset.retrieve_genes("GeneId")
AttributeError: 'PhenotypeDataSet' object has no attribute 'retrieve_genes'
```
The literature correlations computation calls the `retrieve_genes` method, that is only present in the `base.data_set.mrnaassaydataset.MrnaAssayDataSet` class, which handles traits of type "ProbeSet".
The code seems to imply that we should not run literature correlations against any dataset that is not of type "ProbeSet".
## Some Reflections
The `target_dataset` is not used in the
=> https://github.com/genenetwork/genenetwork2/blob/c38bee43c1256c3515bbd1d805745d8dfb8ce390/wqflask/wqflask/correlation/rust_correlation.py#L271-L289 tissue correlations which seems like an error to me (fredm).
In my (fredm) work on partial correlations, before doing the computations,
=> https://github.com/genenetwork/genenetwork3/blob/ff34aee0f39c2e91db243461d7d67405e7aea0e3/gn3/computations/partial_correlations.py#L704-L750 there were error checks
that were run.
Should these be present for the full correlations too?
The failures above with the Publish/Genotype datasets implies one of two things:
* The code is not general enough, or
* We need to handle the exceptions, and present the selection errors as appropriate.
Better yet, we should probably not present invalid data to the user, i.e. do not present user with a dataset which would lead to errors if a correlation of a particular type is run against it with the given trait.
## Trial Against GN1
@zsloan @alexm: Running the failing tissue and literature correlations above with the same trait against the "BXD Published Phenotypes" and the "BXD Genotypes" on
=> http://gn1.genenetwork.org/
I got the error
```
Wrong correlation type
Sorry! Error occurred while processing your request.
The nature of the error generated is as follows:
Correlation Type Error :
It is not possible to compute the Tissue Correlation (Pearson's r) between your trait and data in this BXDGeno database. Please try again after selecting another type of correlation.
```
for the tissue correlations and
```
Wrong correlation type
Sorry! Error occurred while processing your request.
The nature of the error generated is as follows:
Correlation Type Error :
It is not possible to compute the SGO Literature Correlation between your trait and data in this BXDPublish database. Please try again after selecting another type of correlation.
```
for the literature correlations.
My initial hunch was correct. We should not be running the tissue and literature correlations in the way we were in the cases above.
We now need to check for these combinations and display an error for the user, as is done in GN1
The error reported above
```
raise KeyError(
KeyError: "'1' information is not found in the database for dataset 'HC_M2_0606_P' with id '112'."
```
causes the correlation below to fail
for maintainability and to fix current bugs
this code that does preprocessing of data needs to be modified
thats is :-
* Tissue correlation data
* top n sample correlation data
* top n tissue correlation data
## Tags
* assigned: alexm, fredm, zsloan
* type: bug
* keywords: correlations
* status: closed, completed
* priority: high
## Notes
* 2022-09-29: Successfully reproduced on production
* 2022-09-29: Fix file format issues
* 2022-09-30: Fix issues in rust correlation code
* 2022-10-03: Fix: avoid processing for output early
|