Age | Commit message (Collapse) | Author |
|
* gn3/db/correlations.py (__fetch_data__): Ignore "Too many args" [R0913]
error.
|
|
* gn3/db/correlations.py (__build_query__): Ignore the "sample_ids" and
"joins" types when calling build_query_sgo_lit_corr
(fetch_all_database_data): Ignore the return type.
TODO: Ping Alex/Arun to fix this.
|
|
ATM, it's very difficult to work the correct type that is returned. Ignore
this for now and fix this later.
|
|
* gn3/genodb.py (get): Delete function.
(matrix): Use db.txn.get instead of get.
|
|
* gn3/genodb.py (GenotypeMatrix): Match class and function names.
|
|
db is unused. nrows and ncols are available in the array and transpose numpy
arrays.
* gn3/genodb.py (GenotypeMatrix)[db, nrows, ncols]: Delete fields.
* gn3/genodb.py (matrix): Do not initialize db, nrows and ncols fields.
|
|
* gn3/genodb.py: Mention reading entire matrix in module docstring.
|
|
- Have "Pearson's r" and "Spearman's rho" as the only valid choices for the
partial correlations
|
|
* gn3/genodb.py (Matrix): Rename to GenotypeMatrix.
(matrix): Update invocation of Matrix.
|
|
* gn3/genodb.py: Document nparray in the module docstring.
(nparray): New function.
|
|
The genotype database now stores the current version of the matrix alone in a
read-optimized form, while storing the older versions of the matrix in a more
compressed form. We are only interested in the current version of the
matrix. So, always use the read optimized storage.
* gn3/genodb.py (Matrix)[row_pointers, column_pointers]: Delete fields.
[array, transpose]: New fields.
* gn3/genodb.py (matrix, row, column): Read from read-optimized storage.
(vector_ref): Delete function.
|
|
* gn3/genodb.py: Remove blank line in module docstring.
|
|
We rewrite genodb using only functions. This makes for much more readable
code.
* gn3/genodb.py: Rewrite without classes.
|
|
* gn3/genodb.py (Matrix.__init__): Retrieve column pointers from database.
(row): Abstract out vector access code to ...
(Matrix.__vector): ... here.
(Matrix.column): New method.
|
|
The genotype database format now supports versioning of matrices. So, we
update genodb.py to return only the most recent genotype matrix.
* gn3/genodb.py (GenotypeDatabase.matrix): Return only the most recent
genotype matrix.
|
|
* gn3/genodb.py (GenotypeDatabase.__init__): Open genotype database in
read-only mode.
|
|
* gn3/genodb.py (GenotypeDatabase.__init__): Do not create genotype database
if it does not exist.
|
|
It has been decided that the genotype database will use little endianness
wherever applicable.
* gn3/genodb.py (Matrix.__init__): Remove TODO note to decide on endianness.
|
|
* gn3/genodb.py (GenotypeDatabase.get_metadata, GenotypeDatabase.matrix): Do
not terminate database strings with the null character.
|
|
genodb is a tiny library to read our new genotype database file format.
* gn3/genodb.py: New file.
|
|
|
|
Extract the utility functions to help with understanding the what the
`fetch_all_database_data` function is doing. This helps with maintenance.
|
|
The `fix_strains` function works on the trait data, not the basic trait
info. This commit fixes the arguments passed to the function, and also some
bugs in the function.
|
|
|
|
|
|
* gn3/db/sample_data.py: Remove "collections" import. Add "Optional" import.
(get_case_attributes): Return the results of "fetchall" from the case
attributes.
* tests/unit/db/test_sample_data.py (test_get_case_attributes): Update failing
test.
|
|
|
|
Use new external script to run the partial correlations for both cases,
i.e.
- against an entire dataset, or
- against selected traits
|
|
* Add a new script to compute the partial correlations against:
- a select list of traits, or
- an entire dataset
depending on the specified subcommand. This new script is meant to supercede
the `scripts/partial_correlations.py` script.
* Fix the check for errors
* Reorganise the order of arguments for the
`partial_correlations_with_target_traits` function: move the `method`
argument before the `target_trait_names` argument so that the common
arguments in the partial correlation computation functions share the same
order.
|
|
|
|
|
|
Rework the code to process the traits in a single iteration to improve
performance.
|
|
Return generator objects rather than pre-computed tuples to reduce the number
of iterations needed to process the data, and thus improve the performance of
the system somewhat.
|
|
After reworking the worker/runner to have a one-shot mode, add a function that
queues up the task and then runs the worker in the one-shot mode to process
the computation in the background.
|
|
|
|
Enable the endpoint to actually compute partial correlations with selected
target traits rather than against an entire dataset.
Fix some issues caused by recent refactor that broke pcorrs against a dataset
|
|
Compute partial correlations against a selection of traits rather than against
an entire dataset.
|
|
* Extract the common error checking code into a separate function
* Rename the function to make its use clearer
|
|
Remove an unnecessary looping construct to help with speeding up the partial
correlations somewhat.
|
|
non-CaseAttribute headers (before this caused issues if someone was
adding case attributes to a file that already contained some case
attributes)
|
|
Apparently max(string1, string2) in Python gets the strong that is
highest alphabetically, but I'm pretty sure this line was intenteded
to get the header with the most items (which this commit doesn't fully
address; you could still end up with a situation where some case
attributes were removed while others were added, though that should be
rare)
|
|
* gn3/csvcmp.py (get_allowable_sampledata_headers): Delete it.
* tests/unit/test_csvcmp.py: Remove "get_allowable_sampledata_headers" import.
(test_get_allowable_csv_headers): Delete it.
|
|
* gn3/db/sample_data.py (get_trait_csv_sample_data): Strip out "\n", "\t", or
"\r" from the sample data. See:
<https://issues.genenetwork.org/issues/csv-error-ITP_10001-longevity-data-set.html>
|
|
|
|
|
|
* gn3/db/sample_data.py (delete_sample_data): If an id is present in the column header, use it.
* tests/unit/db/test_sample_data.py (test_delete_sample_data): Update tests to
capture the above.
|
|
* gn3/db/sample_data.py (insert_sample_data): If an id is present in the column header, use it.
* tests/unit/db/test_sample_data.py (test_insert_sample_data): Update tests to
capture the above.
|
|
* gn3/db/sample_data.py: Import "parse_csv_column".
(update_sample_data): If an id is present in the column header, use it.
* tests/unit/db/test_sample_data.py (test_update_sample_data): Update tests to
capture the above.
|
|
* gn3/db/sample_data.py (get_case_attributes): New function.
* tests/unit/db/test_sample_data.py (test_get_case_attributes): Test case for
the above.
|
|
* gn3/db/sample_data.py: Run "python black -l 79 ..."
|