aboutsummaryrefslogtreecommitdiff
AgeCommit message (Expand)Author
2022-06-21db: datasets.py: Ignore results from sparql.queryAndConvert...ATM, it's very difficult to work the correct type that is returned. Ignore this for now and fix this later. BonfaceKilz
2022-06-21mypy.ini: Ignore missing lmdb mypy stubsBonfaceKilz
2022-06-20Update README: export env variables explicitlyFrederick Muriuki Muriithi
2022-06-20gn3: genodb: Retire get function....* gn3/genodb.py (get): Delete function. (matrix): Use db.txn.get instead of get. Arun Isaac
2022-06-20gn3: genodb: Match class and function names of GenotypeMatrix....* gn3/genodb.py (GenotypeMatrix): Match class and function names. Arun Isaac
2022-06-20gn3: genodb: Remove db, nrows and ncols fields from GenotypeMatrix....db is unused. nrows and ncols are available in the array and transpose numpy arrays. * gn3/genodb.py (GenotypeMatrix)[db, nrows, ncols]: Delete fields. * gn3/genodb.py (matrix): Do not initialize db, nrows and ncols fields. Arun Isaac
2022-06-20gn3: genodb: Mention reading entire matrix in module docstring....* gn3/genodb.py: Mention reading entire matrix in module docstring. Arun Isaac
2022-06-20Restrict partial correlation method choices...- Have "Pearson's r" and "Spearman's rho" as the only valid choices for the partial correlations Frederick Muriuki Muriithi
2022-06-17gn3: genodb: Rename Matrix named tuple to GenotypeMatrix....* gn3/genodb.py (Matrix): Rename to GenotypeMatrix. (matrix): Update invocation of Matrix. Arun Isaac
2022-06-17gn3: genodb: Allow retrieval of the entire genotype matrix....* gn3/genodb.py: Document nparray in the module docstring. (nparray): New function. Arun Isaac
2022-06-17gn3: genodb: Read optimized storage for the current matrix....The genotype database now stores the current version of the matrix alone in a read-optimized form, while storing the older versions of the matrix in a more compressed form. We are only interested in the current version of the matrix. So, always use the read optimized storage. * gn3/genodb.py (Matrix)[row_pointers, column_pointers]: Delete fields. [array, transpose]: New fields. * gn3/genodb.py (matrix, row, column): Read from read-optimized storage. (vector_ref): Delete function. Arun Isaac
2022-06-09gn3: genodb: Remove blank line in module docstring....* gn3/genodb.py: Remove blank line in module docstring. Arun Isaac
2022-06-09gn3: genodb: Rewrite without classes....We rewrite genodb using only functions. This makes for much more readable code. * gn3/genodb.py: Rewrite without classes. Arun Isaac
2022-06-08gn3: genodb: Support reading columns....* gn3/genodb.py (Matrix.__init__): Retrieve column pointers from database. (row): Abstract out vector access code to ... (Matrix.__vector): ... here. (Matrix.column): New method. Arun Isaac
2022-06-08gn3: genodb: Read only the most recent genotype matrix....The genotype database format now supports versioning of matrices. So, we update genodb.py to return only the most recent genotype matrix. * gn3/genodb.py (GenotypeDatabase.matrix): Return only the most recent genotype matrix. Arun Isaac
2022-06-08gn3: genodb: Open genotype database in read-only mode....* gn3/genodb.py (GenotypeDatabase.__init__): Open genotype database in read-only mode. Arun Isaac
2022-06-08gn3: genodb: Do not create genotype database if it does not exist....* gn3/genodb.py (GenotypeDatabase.__init__): Do not create genotype database if it does not exist. Arun Isaac
2022-06-08gn3: genodb: Decide on little endianness....It has been decided that the genotype database will use little endianness wherever applicable. * gn3/genodb.py (Matrix.__init__): Remove TODO note to decide on endianness. Arun Isaac
2022-06-08gn3: genodb: Do not terminate database strings with null....* gn3/genodb.py (GenotypeDatabase.get_metadata, GenotypeDatabase.matrix): Do not terminate database strings with the null character. Arun Isaac
2022-06-03gn3: Add genodb....genodb is a tiny library to read our new genotype database file format. * gn3/genodb.py: New file. Arun Isaac
2022-05-31Remove unnecessary statementFrederick Muriuki Muriithi
2022-05-31Extract utility functions from `fetch_all_database_data`...Extract the utility functions to help with understanding the what the `fetch_all_database_data` function is doing. This helps with maintenance. Frederick Muriuki Muriithi
2022-05-30Pass trait data as args to `fix_strains` and fix some bugs...The `fix_strains` function works on the trait data, not the basic trait info. This commit fixes the arguments passed to the function, and also some bugs in the function. Frederick Muriuki Muriithi
2022-05-27Move sql for CRUD operations on case-attrs from gn2 to gn3BonfaceKilz
2022-05-27Move sql for modifying case-attributes from gn2 to gn3BonfaceKilz
2022-05-27sql: caseattributes_audit.sql: New file...Create new table that stores edits related to case-attributes. BonfaceKilz
2022-05-27Return all the results from CaseAttributes column as is...* gn3/db/sample_data.py: Remove "collections" import. Add "Optional" import. (get_case_attributes): Return the results of "fetchall" from the case attributes. * tests/unit/db/test_sample_data.py (test_get_case_attributes): Update failing test. BonfaceKilz
2022-05-26Add Endpoint to get menu items for use in UIFrederick Muriuki Muriithi
2022-05-24Run partial correlations with external script...Use new external script to run the partial correlations for both cases, i.e. - against an entire dataset, or - against selected traits Frederick Muriuki Muriithi
2022-05-24Fix some linting issuesFrederick Muriuki Muriithi
2022-05-24New script to compute partial correlations...* Add a new script to compute the partial correlations against: - a select list of traits, or - an entire dataset depending on the specified subcommand. This new script is meant to supercede the `scripts/partial_correlations.py` script. * Fix the check for errors * Reorganise the order of arguments for the `partial_correlations_with_target_traits` function: move the `method` argument before the `target_trait_names` argument so that the common arguments in the partial correlation computation functions share the same order. Frederick Muriuki Muriithi
2022-05-21Fix linting errorsFrederick Muriuki Muriithi
2022-05-21Use multiprocessing to improve performanceFrederick Muriuki Muriithi
2022-05-21Process primary, target and control traits in a single iteration...Rework the code to process the traits in a single iteration to improve performance. Frederick Muriuki Muriithi
2022-05-21Return generator object rather than tuples...Return generator objects rather than pre-computed tuples to reduce the number of iterations needed to process the data, and thus improve the performance of the system somewhat. Frederick Muriuki Muriithi
2022-05-16Run computation in one-shot asynchronous process...After reworking the worker/runner to have a one-shot mode, add a function that queues up the task and then runs the worker in the one-shot mode to process the computation in the background. Frederick Muriuki Muriithi
2022-05-16Enable running the worker in "one-shot" mode...Enable the running of the worker script in one-shot mode. Frederick Muriuki Muriithi
2022-05-06Fix linting and typing errorsFrederick Muriuki Muriithi
2022-05-06Hook up pcorrs with target traits computations...Enable the endpoint to actually compute partial correlations with selected target traits rather than against an entire dataset. Fix some issues caused by recent refactor that broke pcorrs against a dataset Frederick Muriuki Muriithi
2022-05-05Compute partial correlation with selected traits...Compute partial correlations against a selection of traits rather than against an entire dataset. Frederick Muriuki Muriithi
2022-05-05Extract common error checking. Rename function....* Extract the common error checking code into a separate function * Rename the function to make its use clearer Frederick Muriuki Muriithi
2022-05-05Link to continuous deployment in README....* README.md: Link to continuous deployment. Arun Isaac
2022-05-03Refactor: Remove unnecessary loop...Remove an unnecessary looping construct to help with speeding up the partial correlations somewhat. Frederick Muriuki Muriithi
2022-04-29Replace whole header with the longest one, instead of just the...non-CaseAttribute headers (before this caused issues if someone was adding case attributes to a file that already contained some case attributes) zsloan
2022-04-29Get max string length instead when comparing headers...Apparently max(string1, string2) in Python gets the strong that is highest alphabetically, but I'm pretty sure this line was intenteded to get the header with the most items (which this commit doesn't fully address; you could still end up with a situation where some case attributes were removed while others were added, though that should be rare) zsloan
2022-04-12Delete "get_allowable_sampledata_headers"...* gn3/csvcmp.py (get_allowable_sampledata_headers): Delete it. * tests/unit/test_csvcmp.py: Remove "get_allowable_sampledata_headers" import. (test_get_allowable_csv_headers): Delete it. BonfaceKilz
2022-04-12Strip any newline, tab or carriage-return chars from sample data...* gn3/db/sample_data.py (get_trait_csv_sample_data): Strip out "\n", "\t", or "\r" from the sample data. See: <https://issues.genenetwork.org/issues/csv-error-ITP_10001-longevity-data-set.html> BonfaceKilz
2022-04-12Test that a carriage return is removed when generating csv...* tests/unit/db/test_sample_data.py: import "get_trait_csv_sample_data". (test_get_trait_csv_sample_data): New test function. BonfaceKilz
2022-04-07Fix pylint errorsBonfaceKilz
2022-04-07Fix mypy errorBonfaceKilz