From e5290ab69eadc70ea40b35f652ee1ddf4286914f Mon Sep 17 00:00:00 2001 From: Frederick Muriuki Muriithi Date: Wed, 23 Nov 2022 08:10:12 +0300 Subject: issues: (correlations-return-fewer-results): update issue with notes * issues/correlations-return-fewer-results.gmi: Update the issue with some notes on the findings so far --- issues/correlations-return-fewer-results | 44 --------------------- issues/correlations-return-fewer-results.gmi | 57 ++++++++++++++++++++++++++++ 2 files changed, 57 insertions(+), 44 deletions(-) delete mode 100644 issues/correlations-return-fewer-results create mode 100644 issues/correlations-return-fewer-results.gmi (limited to 'issues') diff --git a/issues/correlations-return-fewer-results b/issues/correlations-return-fewer-results deleted file mode 100644 index 82a0a10..0000000 --- a/issues/correlations-return-fewer-results +++ /dev/null @@ -1,44 +0,0 @@ - -# Tags - -* assigned: alexm -* priority: high -* status: ongoing - -* keywords: correlations,fewer results - - -## Description - -in some cases correlaton return fewer number of results -than are required - -an example of such a case is computing the below dataset against the against BXD Phenotype, gets 477 results when you select Top 500 - -=> http://gn2-zach.genenetwork.org/show_trait?trait_id=rsm10000006649&dataset=BXDGeno - - -## Notes - -Probably causes are :- - -=> https://github.com/genenetwork/gn-gemtext-threads/blob/main/issues/fix-include-f1-parents-correlations.gmi - -## Updates - -The samplelist issue doesn't appear to be causing the issue with fewer results, since it still exists after the fix. There also seems to be an additional - or related - issue where it's either returning wrong results or not returning the actual top results (or both) - -Using the sample example above, after the change, the first result has a sample(r) of 0.265. This isn't the top result when run in GN1. There also appears to be a mismatch between the result displayed in the table and the r displayed in the scatterplot (what you see if you click the sample(r) link); those should be roughly the same. - -An additional error has been reported by Beni where there's an error about NoneType being passed into string formatting (so I think it's returning None for some results). Steps to reproduce are below: -- https://genenetwork.org/show_trait?trait_id=ENSMUST00000031535&dataset=UTHSC-BXD-Harv_Liv-1019 -- Correlate against the default dataset (same one as the trait) - - - -## Notes 13/11/22 - - issue on handling non float values while parsing - addressed on this Pr - -=> https://github.com/genenetwork/genenetwork2/pull/746/files diff --git a/issues/correlations-return-fewer-results.gmi b/issues/correlations-return-fewer-results.gmi new file mode 100644 index 0000000..fa7a79a --- /dev/null +++ b/issues/correlations-return-fewer-results.gmi @@ -0,0 +1,57 @@ +# Correlations: Returning Fewer Results + +## Tags + +* assigned: alexm +* priority: high +* status: ongoing + +* keywords: correlations,fewer results + + +## Description + +in some cases correlaton return fewer number of results +than are required + +an example of such a case is computing the below dataset against the against BXD Phenotype, gets 477 results when you select Top 500 + +=> http://gn2-zach.genenetwork.org/show_trait?trait_id=rsm10000006649&dataset=BXDGeno + + +## Notes + +Probably causes are :- + +=> https://github.com/genenetwork/gn-gemtext-threads/blob/main/issues/fix-include-f1-parents-correlations.gmi + +## Updates + +The samplelist issue doesn't appear to be causing the issue with fewer results, since it still exists after the fix. There also seems to be an additional - or related - issue where it's either returning wrong results or not returning the actual top results (or both) + +Using the sample example above, after the change, the first result has a sample(r) of 0.265. This isn't the top result when run in GN1. There also appears to be a mismatch between the result displayed in the table and the r displayed in the scatterplot (what you see if you click the sample(r) link); those should be roughly the same. + +An additional error has been reported by Beni where there's an error about NoneType being passed into string formatting (so I think it's returning None for some results). Steps to reproduce are below: +- https://genenetwork.org/show_trait?trait_id=ENSMUST00000031535&dataset=UTHSC-BXD-Harv_Liv-1019 +- Correlate against the default dataset (same one as the trait) + + + +## Notes 13/11/22 + + issue on handling non float values while parsing + addressed on this Pr + +=> https://github.com/genenetwork/genenetwork2/pull/746/files + +## Notes 2022-11-23 + +=> https://genenetwork.org/show_trait?trait_id=rsm10000006649&dataset=BXDGeno On production + +with a selection of top 500 results I got the following: + +* Without changing any of the filters, I get 500 results +* Location: Chr: 5, Mb: 105 to 107 => I get 18 results +* Location: Chr: 5, Mb: 70 to 150 => I get 327 results + +I think the issue here is the sequence of events - the system takes the top 500 results, and then applies the given filters, rather than applying the filters first, then selecting the top 500 of the filtered results. -- cgit v1.2.3