aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-08-04Avoid string interpolation: use prepared statementMuriithi Frederick Muriuki
Issue: https://github.com/genenetwork/gn-gemtext-threads/blob/main/topics/gn1-migration-to-gn2/clustering.gmi * Following Arun's comment at https://github.com/genenetwork/genenetwork3/pull/31#issuecomment-890915813 this commit eliminates string interpolation, and adds a map of tables for the various types of traits dataset names
2021-08-04Merge branch 'main' of github.com:genenetwork/genenetwork3 into ↵Muriithi Frederick Muriuki
heatmap_decompose_db_retrieval
2021-08-03guix.scm: Add csvdiff as a propagated inputBonfaceKilz
2021-08-02sql: Document the table Strain.Arun Isaac
* sql/schema.org (Strain): New section.
2021-07-31sql: Add schema description.Arun Isaac
* sql/schema.org: New file.
2021-07-30Rework db functions to enable postprocessingMuriithi Frederick Muriuki
Issue: https://github.com/genenetwork/gn-gemtext-threads/blob/main/topics/gn1-migration-to-gn2/clustering.gmi * Rework the database functions to return a dict of key-value pairs, which eases the postprocessing of the trait information. The postprocessing is mainly to try an maintain data compatibility with the code that is at the following locations: https://github.com/genenetwork/genenetwork1/blob/master/web/webqtl/base/webqtlTrait.py https://github.com/genenetwork/genenetwork1/blob/master/web/webqtl/base/webqtlDataset.py https://github.com/genenetwork/genenetwork1/blob/master/web/webqtl/heatmap/Heatmap.py This was mainly a proof-of-concept, and the functions do not have testing added for them: there is therefore need to add testing for the new functions, and probably even rework them if they are found to be complicated.
2021-07-30Remove extra spaceMuriithi Frederick Muriuki
Issue: https://github.com/genenetwork/gn-gemtext-threads/blob/main/topics/gn1-migration-to-gn2/clustering.gmi * Remove extra space that was causing test to fail.
2021-07-30Add module for common utilitiesMuriithi Frederick Muriuki
Issue: https://github.com/genenetwork/gn-gemtext-threads/blob/main/topics/gn1-migration-to-gn2/clustering.gmi * gn3/function_helpers.py: new file Provides a new module to hold common programming utilities that are generic enough that they will find use across the entire application. The first utility function provided in this commit is the `compose` function, whose purpose, as indicated by its name, is to take a number of functions and compose them into a single function, which when called, will return the same result that would have been got had the user called the functions in a chain from right to left.
2021-07-30Return dict from query functionsMuriithi Frederick Muriuki
Issue: https://github.com/genenetwork/gn-gemtext-threads/blob/main/topics/gn1-migration-to-gn2/clustering.gmi * gn3/db/traits.py: return dicts rather than tuples/list * tests/unit/db/test_traits.py: Update tests Return dicts with the key-value pairs set up so as to ease with the data manipulation down the pipeline. This is also useful to help with the retrieval of all other extra information that was left out in the first iteration. This commit also updates the tests by ensuring they expect dicts rather than tuples.
2021-07-30Merge branch 'main' of github.com:genenetwork/genenetwork3 into ↵Muriithi Frederick Muriuki
heatmap_decompose_db_retrieval Issue: https://github.com/genenetwork/gn-gemtext-threads/blob/main/topics/gn1-migration-to-gn2/clustering.gmi Fix merge conflicts in: * gn3/db/traits.py * tests/unit/db/test_traits.py
2021-07-29Merge pull request #30 from genenetwork/Feature/Update-db-from-csv-dataBonfaceKilz
Feature/update db from csv data
2021-07-29Merge branch 'main' into Feature/Update-db-from-csv-dataBonfaceKilz
2021-07-29Delete "update_raw" and it's test-casesBonfaceKilz
2021-07-29Add method for updating values from a sample datasetBonfaceKilz
* gn3/db/traits.py (update_sample_data): New function. * tests/unit/db/test_traits.py: New test cases for ^^.
2021-07-29Add partial type annotations for slink moduleMuriithi Frederick Muriuki
Issue: https://github.com/genenetwork/gn-gemtext-threads/blob/main/topics/gn1-migration-to-gn2/clustering.gmi * Add some type annotations for the `nearest` function. * Leave some comments regarding the issues experienced when trying to add some typing annotations to the function to help with future endeavours of the same.
2021-07-29Add type annotations to the functionMuriithi Frederick Muriuki
Issue: https://github.com/genenetwork/gn-gemtext-threads/blob/main/topics/gn1-migration-to-gn2/clustering.gmi * Add some type annotations to the functions to reduce the chances of bugs creeping into the code.
2021-07-29Retrieve trait informationMuriithi Frederick Muriuki
Issue: https://github.com/genenetwork/gn-gemtext-threads/blob/main/topics/gn1-migration-to-gn2/clustering.gmi * gn3/db/traits.py: add functions to retrieve traits information * tests/unit/db/test_traits.py: add tests for new function Add functions to retrieve traits information as is done in genenetwork1 https://github.com/genenetwork/genenetwork1/blob/master/web/webqtl/base/webqtlTrait.py#L397-L456 At this point, the data retrieval functions are probably incomplete, as there is more of the `retrieveInfo` function in GN1 that has not been considered as of this commit.
2021-07-29Make name retrieval more generalMuriithi Frederick Muriuki
Issue: https://github.com/genenetwork/gn-gemtext-threads/blob/main/topics/gn1-migration-to-gn2/clustering.gmi * gn3/db/traits.py: make function more general * tests/unit/db/test_traits.py: parametrize the tests Make the name retrieval more general for the different types of traits by changing the column specification and table as appropriate.
2021-07-29Retrieve 'ProbeSet' trait nameMuriithi Frederick Muriuki
Issue: https://github.com/genenetwork/gn-gemtext-threads/blob/main/topics/gn1-migration-to-gn2/clustering.gmi * gn3/db/traits.py: new function (retrieve_probeset_trait_name) * tests/unit/db/test_traits.py: test(s) for new function Add a function to retrieve the name of a 'ProbeSet' trait in a manner similar to genenetwork1's retrieval of the same, as implemented here https://github.com/genenetwork/genenetwork1/blob/master/web/webqtl/base/webqtlDataset.py#L140-154 Unlike in genenetwork1, we do not mutate an object, instead, we return the values as retrieved from the database, and the caller will deal with the returned values as appropriate.
2021-07-29Add partial type annotations for slink moduleMuriithi Frederick Muriuki
Issue: https://github.com/genenetwork/gn-gemtext-threads/blob/main/topics/gn1-migration-to-gn2/clustering.gmi * Add some type annotations for the `nearest` function. * Leave some comments regarding the issues experienced when trying to add some typing annotations to the function to help with future endeavours of the same.
2021-07-29Add type annotations to the functionMuriithi Frederick Muriuki
Issue: https://github.com/genenetwork/gn-gemtext-threads/blob/main/topics/gn1-migration-to-gn2/clustering.gmi * Add some type annotations to the functions to reduce the chances of bugs creeping into the code.
2021-07-29db: traits: Remove publishdata columnBonfaceKilz
2021-07-28Retrieve trait informationMuriithi Frederick Muriuki
Issue: https://github.com/genenetwork/gn-gemtext-threads/blob/main/topics/gn1-migration-to-gn2/clustering.gmi * gn3/db/traits.py: add functions to retrieve traits information * tests/unit/db/test_traits.py: add tests for new function Add functions to retrieve traits information as is done in genenetwork1 https://github.com/genenetwork/genenetwork1/blob/master/web/webqtl/base/webqtlTrait.py#L397-L456 At this point, the data retrieval functions are probably incomplete, as there is more of the `retrieveInfo` function in GN1 that has not been considered as of this commit.
2021-07-28Make name retrieval more generalMuriithi Frederick Muriuki
Issue: https://github.com/genenetwork/gn-gemtext-threads/blob/main/topics/gn1-migration-to-gn2/clustering.gmi * gn3/db/traits.py: make function more general * tests/unit/db/test_traits.py: parametrize the tests Make the name retrieval more general for the different types of traits by changing the column specification and table as appropriate.
2021-07-28Retrieve 'ProbeSet' trait nameMuriithi Frederick Muriuki
Issue: https://github.com/genenetwork/gn-gemtext-threads/blob/main/topics/gn1-migration-to-gn2/clustering.gmi * gn3/db/traits.py: new function (retrieve_probeset_trait_name) * tests/unit/db/test_traits.py: test(s) for new function Add a function to retrieve the name of a 'ProbeSet' trait in a manner similar to genenetwork1's retrieval of the same, as implemented here https://github.com/genenetwork/genenetwork1/blob/master/web/webqtl/base/webqtlDataset.py#L140-154 Unlike in genenetwork1, we do not mutate an object, instead, we return the values as retrieved from the database, and the caller will deal with the returned values as appropriate.
2021-07-26tests: test_db: Add test case for "update_raw"BonfaceKilz
2021-07-26gn3: db: Create a raw update queryBonfaceKilz
* gn3/db/__init__.py (update_raw): New function.
2021-07-26db: traits: Fetch sample_data from a trait in csv formBonfaceKilz
2021-07-26Merge branch 'main' of github.com:genenetwork/genenetwork3Muriithi Frederick Muriuki
2021-07-26Fix issues caught by pylintMuriithi Frederick Muriuki
Issue: https://github.com/genenetwork/gn-gemtext-threads/blob/main/topics/gn1-migration-to-gn2/clustering.gmi * Fix a myriad of issues caught by pylint to ensure the code passes all tests.
2021-07-26db: traits: Remove unused functionsBonfaceKilz
2021-07-26Check if corr_coefficient is NaN, since apparently it's stored as NaN ↵zsloan
instead of None when it can't be calculcated (which was messing up sorting); it may also be okay to remove the None check, but leaving it for now (#28)
2021-07-23sql: metadata_audit: Make char-set encoding explicit(utf8mb4)BonfaceKilz
See: https://www.eversql.com/mysql-utf8-vs-utf8mb4-whats-the-difference-between-utf8-and-utf8mb4/
2021-07-23Add data examples for `slink`. Implement function.Muriithi Frederick Muriuki
Issue: https://github.com/genenetwork/gn-gemtext-threads/blob/main/topics/gn1-migration-to-gn2/clustering.gmi * gn3/computations/slink.py: Copy the function, mostly verbatim from genenetwork1. See: https://github.com/genenetwork/genenetwork1/blob/master/web/webqtl/heatmap/slink.py#L107-L138 * tests/unit/computations/test_slink.py: Add a test with some example data to test that the implementation gives the same results as that in genenetwork1
2021-07-23Add more test dataMuriithi Frederick Muriuki
Issue: https://github.com/genenetwork/gn-gemtext-threads/blob/main/topics/gn1-migration-to-gn2/clustering.gmi
2021-07-23Iterate through all valid pairsMuriithi Frederick Muriuki
Issue: https://github.com/genenetwork/gn-gemtext-threads/blob/main/topics/gn1-migration-to-gn2/clustering.gmi * gn3/computations/slink.py: Fix the iteration construct. Given two lists of member coordinates, such as [0, 1] and [3, 5], the initial code would iterate over the pairs [0, 3] and [1, 5]. This commit fixes the iteration constructs such that the new code iterates over the pairs [0, 3], [0, 5], [1, 3] and [1, 5].
2021-07-23Extract function to flatten list of listsMuriithi Frederick Muriuki
Issue: https://github.com/genenetwork/gn-gemtext-threads/blob/main/topics/gn1-migration-to-gn2/clustering.gmi * gn3/computations/slink.py: Extract the `__flatten_list_of_lists` function since it is used in more than one place.
2021-07-23Fix issue caught in `nearest` while testing `slink`Muriithi Frederick Muriuki
Issue: https://github.com/genenetwork/gn-gemtext-threads/blob/main/topics/gn1-migration-to-gn2/clustering.gmi * While running tests for slink, to try and understand what it is doing in order to write the appropriate tests for it, an issue arose that pointed a blindspot in the former understanding of now `nearest` should work. This commit fixes the issue found in both the expected data, and the code.
2021-07-23New function (`slink`): return [] on exceptionMuriithi Frederick Muriuki
Issue: https://github.com/genenetwork/gn-gemtext-threads/blob/main/topics/gn1-migration-to-gn2/clustering.gmi * gn3/computations/slink.py: Add minimum code to pass new test * tests/unit/computations/test_slink.py: new test Add test to ensure that the new `slink` function return an empty list in case and exception is raised. Add the new `slink` function with minimum amount of code needed to pass the test.
2021-07-23Add dummy module documentationMuriithi Frederick Muriuki
Issue: https://github.com/genenetwork/gn-gemtext-threads/blob/main/topics/gn1-migration-to-gn2/clustering.gmi * gn3/computations/slink.py: Add initial dummy module documentation in form of a python docstring Update the names of what would be private methods/function to start with a double-underscore (__) so that they do not show up in the default python documentation.
2021-07-23Add docstring for `nearest' functionMuriithi Frederick Muriuki
Issue: https://github.com/genenetwork/gn-gemtext-threads/blob/main/topics/gn1-migration-to-gn2/clustering.gmi * gn3/computations/slink.py: add documentation for the `nearest` function in the `gn3.computations.slink` module in the form of a (hopefully correct) python docstring.
2021-07-22Get shortest distance from two lists/tuples of member coordinatesMuriithi Frederick Muriuki
* gn3/computations/slink.py: add code to ensure new test passes * tests/unit/computations/test_slink.py: new test This one is a little weird: from https://github.com/genenetwork/genenetwork1/blob/master/web/webqtl/heatmap/slink.py#L57-L63 It gets rid of the last coordinates in both the lists of the member coordinates, and uses the remaining coordinates to find the shortest members. For example, given the following member coordinates: - i=[0,1,2] and j=[5,7,9], it uses [0,1] and [5,7] - i=[3,6,1] and j=[7,13], it uses [3,6] and [7] to find the shortest distances. I (fredmanglis) am not sure why it does it this way, since I'd have expected it to use all the coordinates, however, since at this time we need to retain bug-compatibility with the older code, I have done it as it is done in the old code. I also add a statement to raise an exception in the case where i and j are not lists of integers, or integers
2021-07-22Extract common `is_list_or_tuple' functionMuriithi Frederick Muriuki
Issue: https://github.com/genenetwork/gn-gemtext-threads/blob/main/topics/gn1-migration-to-gn2/clustering.gmi * Extract the common function `is_list_or_tuple' making it accessible to later parts of the code.
2021-07-22Test for shortest distance between members in a list and coordinateMuriithi Frederick Muriuki
Issue: https://github.com/genenetwork/gn-gemtext-threads/blob/main/topics/gn1-migration-to-gn2/clustering.gmi * gn3/computations/slink.py: add code to pass new test * tests/unit/computations/test_slink.py: new test Given a list of members in a group, and a coordinate for a member in the same group, find the distance of the closest member from the given coordinate in the group.
2021-07-22Merge branch 'main' of github.com:genenetwork/genenetwork3Muriithi Frederick Muriuki
2021-07-22Check that given list and both coordinates, we get shortest distanceMuriithi Frederick Muriuki
* gn3/computations/slink.py: Add code to compute the distance given the coordinate of both members on the parent list/tuple * tests/unit/computations/test_slink.py: * Change the name of the tests to more closely correspond to the business requirement the test is checking for * Update the comments to indicate some more things that might need to be done in the future
2021-07-22Check that all distances are positive or zeroMuriithi Frederick Muriuki
Issue: https://github.com/genenetwork/gn-gemtext-threads/blob/main/topics/gn1-migration-to-gn2/clustering.gmi * gn3/computations/slink.py: check that all distances between the 'somethings' are all either zero or positive. * tests/unit/computations/test_slink.py: * Remove data with all distances positive or zero, since it would fail the test * Change the expected message to more closely correspond to the business logic
2021-07-22Check that distance from A to B is same as from B to AMuriithi Frederick Muriuki
Issue: https://github.com/genenetwork/gn-gemtext-threads/blob/main/topics/gn1-migration-to-gn2/clustering.gmi * gn3/computations/slink.py: check that the distance from child A to B is the same as distance from child B to A. If not, throw an exception. * tests/unit/computations/test_slink.py: * Change the name of the test to more closely correspond to the business logic being tested. * Update the data in a separate test such that it does not error out due to failing to fulfill the expectations of separate requirement. - pass tests - Rename test - Fix errors: distances same both directions
2021-07-22Move sql updates into dirPjotr Prins
2021-07-22Check that child distance from itself is zeroMuriithi Frederick Muriuki
Issue: https://github.com/genenetwork/gn-gemtext-threads/blob/main/topics/gn1-migration-to-gn2/clustering.gmi * gn3/computations/slink.py: Check that a child's distance from itself is zero. If not, throw an exception. The children lists are a list of distances of "something" from other "somethings". There is still some need to establish what those "somethings" are, so that the test names can reflect the ideas that are actually being tested for. * tests/unit/computations/test_slink.py: Change the name of the test so that it more closely corresponds to the business logic it is actually testing, and not the mechanics of testing the idea.