Age | Commit message (Collapse) | Author |
|
|
|
* gn3/fs_helpers.py: Remove unused "pathlib" import.
(cache_ipfs_file): Disable "unused-argument" warting.
|
|
* tests/unit/test_file_utils.py (test_cache_ipfs_file_cache_hit): Skip
it.
(test_cache_ipfs_file_cache_miss): Ditto.
|
|
This library pollutes the Genenetwork2 profile with an old version
"dataclasses" thereby causing it to fail.
|
|
|
|
|
|
* gn3/db/correlations.py (__fetch_data__): Use a more readable code as
opposed to an error code.
|
|
* gn3/db/correlations.py (__fetch_data__): Ignore "Too many args" [R0913]
error.
|
|
* gn3/db/correlations.py (__build_query__): Ignore the "sample_ids" and
"joins" types when calling build_query_sgo_lit_corr
(fetch_all_database_data): Ignore the return type.
TODO: Ping Alex/Arun to fix this.
|
|
ATM, it's very difficult to work the correct type that is returned. Ignore
this for now and fix this later.
|
|
|
|
|
|
* gn3/genodb.py (get): Delete function.
(matrix): Use db.txn.get instead of get.
|
|
* gn3/genodb.py (GenotypeMatrix): Match class and function names.
|
|
db is unused. nrows and ncols are available in the array and transpose numpy
arrays.
* gn3/genodb.py (GenotypeMatrix)[db, nrows, ncols]: Delete fields.
* gn3/genodb.py (matrix): Do not initialize db, nrows and ncols fields.
|
|
* gn3/genodb.py: Mention reading entire matrix in module docstring.
|
|
- Have "Pearson's r" and "Spearman's rho" as the only valid choices for the
partial correlations
|
|
* gn3/genodb.py (Matrix): Rename to GenotypeMatrix.
(matrix): Update invocation of Matrix.
|
|
* gn3/genodb.py: Document nparray in the module docstring.
(nparray): New function.
|
|
The genotype database now stores the current version of the matrix alone in a
read-optimized form, while storing the older versions of the matrix in a more
compressed form. We are only interested in the current version of the
matrix. So, always use the read optimized storage.
* gn3/genodb.py (Matrix)[row_pointers, column_pointers]: Delete fields.
[array, transpose]: New fields.
* gn3/genodb.py (matrix, row, column): Read from read-optimized storage.
(vector_ref): Delete function.
|
|
* gn3/genodb.py: Remove blank line in module docstring.
|
|
We rewrite genodb using only functions. This makes for much more readable
code.
* gn3/genodb.py: Rewrite without classes.
|
|
* gn3/genodb.py (Matrix.__init__): Retrieve column pointers from database.
(row): Abstract out vector access code to ...
(Matrix.__vector): ... here.
(Matrix.column): New method.
|
|
The genotype database format now supports versioning of matrices. So, we
update genodb.py to return only the most recent genotype matrix.
* gn3/genodb.py (GenotypeDatabase.matrix): Return only the most recent
genotype matrix.
|
|
* gn3/genodb.py (GenotypeDatabase.__init__): Open genotype database in
read-only mode.
|
|
* gn3/genodb.py (GenotypeDatabase.__init__): Do not create genotype database
if it does not exist.
|
|
It has been decided that the genotype database will use little endianness
wherever applicable.
* gn3/genodb.py (Matrix.__init__): Remove TODO note to decide on endianness.
|
|
* gn3/genodb.py (GenotypeDatabase.get_metadata, GenotypeDatabase.matrix): Do
not terminate database strings with the null character.
|
|
genodb is a tiny library to read our new genotype database file format.
* gn3/genodb.py: New file.
|
|
|
|
Extract the utility functions to help with understanding the what the
`fetch_all_database_data` function is doing. This helps with maintenance.
|
|
The `fix_strains` function works on the trait data, not the basic trait
info. This commit fixes the arguments passed to the function, and also some
bugs in the function.
|
|
|
|
|
|
Create new table that stores edits related to case-attributes.
|
|
* gn3/db/sample_data.py: Remove "collections" import. Add "Optional" import.
(get_case_attributes): Return the results of "fetchall" from the case
attributes.
* tests/unit/db/test_sample_data.py (test_get_case_attributes): Update failing
test.
|
|
|
|
Use new external script to run the partial correlations for both cases,
i.e.
- against an entire dataset, or
- against selected traits
|
|
|
|
* Add a new script to compute the partial correlations against:
- a select list of traits, or
- an entire dataset
depending on the specified subcommand. This new script is meant to supercede
the `scripts/partial_correlations.py` script.
* Fix the check for errors
* Reorganise the order of arguments for the
`partial_correlations_with_target_traits` function: move the `method`
argument before the `target_trait_names` argument so that the common
arguments in the partial correlation computation functions share the same
order.
|
|
|
|
|
|
Rework the code to process the traits in a single iteration to improve
performance.
|
|
Return generator objects rather than pre-computed tuples to reduce the number
of iterations needed to process the data, and thus improve the performance of
the system somewhat.
|
|
After reworking the worker/runner to have a one-shot mode, add a function that
queues up the task and then runs the worker in the one-shot mode to process
the computation in the background.
|
|
Enable the running of the worker script in one-shot mode.
|
|
|
|
Enable the endpoint to actually compute partial correlations with selected
target traits rather than against an entire dataset.
Fix some issues caused by recent refactor that broke pcorrs against a dataset
|
|
Compute partial correlations against a selection of traits rather than against
an entire dataset.
|
|
* Extract the common error checking code into a separate function
* Rename the function to make its use clearer
|