summaryrefslogtreecommitdiff
path: root/issues/correlation_wrong_results.gmi
diff options
context:
space:
mode:
authorzsloan2022-12-12 14:26:46 -0600
committerGitHub2022-12-12 14:26:46 -0600
commitc7db1c0a4e91eb1e0a5a88cfe1a98d70591efb9c (patch)
treecf0c4c8d5c418190dae8161f48c64e338df4babf /issues/correlation_wrong_results.gmi
parentb16d9429ef0e8208e722bb6a5b90fa43950e414e (diff)
downloadgn-gemtext-c7db1c0a4e91eb1e0a5a88cfe1a98d70591efb9c.tar.gz
Create correlation_wrong_results.gmi
Diffstat (limited to 'issues/correlation_wrong_results.gmi')
-rw-r--r--issues/correlation_wrong_results.gmi27
1 files changed, 27 insertions, 0 deletions
diff --git a/issues/correlation_wrong_results.gmi b/issues/correlation_wrong_results.gmi
new file mode 100644
index 0000000..c2685f3
--- /dev/null
+++ b/issues/correlation_wrong_results.gmi
@@ -0,0 +1,27 @@
+# Correlation results wrong for certain traits/datasets
+
+## Tags
+
+* assigned: alexm, zsloan, fredm
+* priority: high
+* status: ongoing
+
+* keywords: correlations
+
+## Description
+
+(Note that this uses the update to using GN! text files, but I don't think it's caused by that update)
+
+There are still a few remaining issues with correlations where the results are at least partially wrong. The ones I'm aware of are as follows:
+
+### Examples
+
+- http://gn2-zach.genenetwork.org/show_trait?trait_id=10710&dataset=BXDPublish (my branch linked because it's using the text file update)
+
+Correlate against "HQF Striatum Affy Mouse Exon 1.0ST Exon Level (Dec09) RMA"
+
+The results are a mix of correct and wrong ones. The top GN2 results have the same r/p values as their GN1 counterparts, but a number of top GN1 results are far lower on the list of GN2 results and have r values that are drastically lower. For example, 5200673 has the same r/p value in both GN1 and GN2, but 5169291 is the top GN1 result (with an r of -0.755), but in GN2 has an r of just 0.275
+
+- https://genenetwork.org/show_trait?trait_id=24638&dataset=BXDPublish
+
+Correlate against BXD Published Phenotypes (the default). These results are almost all wrong, but in a way that is close to correct. I suspect the issue is that 0 values are being ommitted, since this seems to always occur when correlation with traits/datasets that have many sample values of 0.