diff options
author | zsloan | 2022-02-21 21:18:46 +0000 |
---|---|---|
committer | zsloan | 2022-02-21 15:27:29 -0600 |
commit | 7c9e73f196575cd6d1de7df4430bc2b4ecb28466 (patch) | |
tree | 10c5f75b683f8438745b0a1d489069fc3225c6a9 /paper | |
parent | 17652b17455bd58bf82d130b60b3e80c57b7f80c (diff) | |
download | genenetwork2-7c9e73f196575cd6d1de7df4430bc2b4ecb28466.tar.gz |
Fix incorrect dataset trait data caching
Trait data caching wasn't working correctly because it didn't account for the samplelist, causing caching to work incorrect in any situation where the target dataset's samplelist wasn't the same as that of the trait being correlated against. Trait data is stored as a dictionary where the keys are trait IDs and values are *lists* of sample values. This means that the caching needs to account for the exact same set of samples; otherwise you'll end up with samples being mismatched (since "the third sample with a value" for one dataset's trait might not be the same as "the third sample with a value" for another dataset's trait). To fix this, I added the samplelist to the functions that generate and fetch the hash file. This will require more cache files, though, so this should probably be reexamined later to make the code work with only a single cache file for each dataset.
Diffstat (limited to 'paper')
0 files changed, 0 insertions, 0 deletions