Age | Commit message (Collapse) | Author |
|
Test that the partial correlations endpoint responds with an appropriate
"not-found" message and the corresponding 404 status code in the case where a
request is made and the primary trait requested for does not exist in the
database.
Summary of the changes in each file:
* gn3/api/correlation.py: generalise the building of the response
* gn3/computations/partial_correlations.py: return with a "not-found" if the
primary trait does not exist in the database
* gn3/db/partial_correlations.py: Fix a number of bugs that led to exceptions
in the case that the primary trait did not exist
* pytest.ini: register a `slow` pytest marker
* tests/integration/test_partial_correlations.py: Add a new test to check for
an appropriate 404 response in case of a primary trait that does not exist
in the database.
|
|
Add a test for the partial correlations endpoint, with:
- no data in the request
- missing items in the data
Fix the bugs caught by the test
|
|
Add property tests using pytest and hypothesis to test that the expected
properties hold for the
`gn3.computations.partial_correlations.dictify_by_samples`
function.
|
|
Do all the work in a single iteration to avoid unnecessary iterations that
hamper performance.
|
|
Web servers are long-running processes, and python is not very good at
cleaning up after itself especially in forked processes - this leads to memory
errors in the web-server after a while.
This commit removes the use of multiprocessing to avoid such failures.
|
|
|
|
This commit refactors the code to make it possible to use multiprocessing to
speed up the computation of the partial correlations.
The major refactor is to move the `__compute_trait_info__` function to the
top-level of the module, and provide to it all the other necessary context via
the new args.
|
|
In Python3 when slicing,
seq[:min(some_val, len(seq))] == seq[:some_val]
because Python3 will just return a copy of the entire sequence if `some_val`
happens to be larger/greater than the length of the sequence.
This commit removes the unnecessary call to `min()`
|
|
If a user replaces an individual value with an "x", delete that date entry
from the respective table. Deletion here is the only option since by default
the Nstrain, PublishData and PublishSE don't accept null values. Note that
deleting all 3 values is equivalent to removing the sample from the CSV file.
* gn3/db/traits.py (update_sample_data): If a value is "x", delete it from the
respective table.
|
|
When editing values from "x" to "0"(or any other value) when editing data, an
"update" statement was being run; thereby no new value was being inserted. To
the end user, modifying an "x" value to something else meant that no value was
being inserted. This commit fixes that by doing an insert whenever a change
from "x" to "0" is performed.
* gn3/db/traits.py (update_sample_data): Add insert statements whenever an
"update" statement returns a 0 row-count.
|
|
|
|
|
|
The PublishFreeze table isn't necessary in phenotype queries, since
PublishFreeze.Id = InbredSet.Id (for the purposes of identifying traits,
at least)
|
|
In line 91 of gn3/db/traits.py, there was an if statement "if
record[key] else 'x'" that was treating values of 0 as False, so I
changed it to explicitly check that values aren't None
|
|
The PublishFreeeze table is actually unnecessary for this query, since
the group ID (inbred_set_id) should be passed in and that ID is in the
PublishXRef table (so no neeed to join with PublishFreeze)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The function is a generator function, since it uses a `yield` statement, and
thus returns a generator object, that contains a tuple object. This fixes
that. We also remove a duplicate import.
|
|
Indent the code correctly.
|
|
The queries run in the `get_trait_csv_sample_data` and
`retrieve_publish_trait_data` functions in the `gn3.db.traits` module were
mostly similar. This commit changes that, by making the
`get_trait_csv_sample_data` function make use of the results from calling the
`retrieve_publish_trait_data` function.
|
|
Issue: https://github.com/genenetwork/gn-gemtext-threads/blob/main/topics/gn1-migration-to-gn2/partial-correlations.gmi
|
|
Issue: https://github.com/genenetwork/gn-gemtext-threads/blob/main/topics/gn1-migration-to-gn2/partial-correlations.gmi
|
|
* Use the correct case for the keys inorder to retrieve the correct values.
|
|
|
|
* The string had the f-string syntax to format the values to be inserted into
the string, but was missing the 'f' before the opening quotes to signify to
python that this was an f-string. This commit fixes that.
|
|
* Some traits have a name composed of all numerals, which leads to the names
being interpreted as numbers. This commit forces them to string to avoid
subtle bugs where the code fails.
|
|
* Remove all key-value pairs whose value is None.
|
|
|
|
Issue:
https://github.com/genenetwork/gn-gemtext-threads/blob/main/topics/gn1-migration-to-gn2/partial-correlations.gmi
* Replace unoptimised function with one optimised to give better performance.
The optimisation done here is to fetch multiple items/traits from the
database per query, rather than the original form, which fetched a single
item/trait from the database per query.
|
|
Issue: https://github.com/genenetwork/gn-gemtext-threads/blob/main/topics/gn1-migration-to-gn2/partial-correlations.gmi
Comment:
https://github.com/genenetwork/genenetwork3/pull/67#issuecomment-1000828159
* Convert NaN values to None to avoid possible bugs with the string replace
method used before.
|
|
Issue:
* Function
`gn3.computations.partial_correlations_optimised.partial_correlations_entry`
is a copy of the
`gn3.computations.partial_correlation.partial_correlations_entry`
function that is optimised for better performance.
The optimised function is intended to replace the unoptimised one, but it is
included in this commit for comparison purposes, and to maintain some
historical context for doing it this way.
|
|
Issue:
https://github.com/genenetwork/gn-gemtext-threads/blob/main/topics/gn1-migration-to-gn2/partial-correlations.gmi
* In an attempt to optimise the performance of the partial correlations
feature, this commit reworks some database access functions to fetch
multiple items from the database, per query, unlike their original forms
which would fetch a single item per query.
This reduces queries to the database, and should hopefully improve the
responsiveness of the partial correlations feature.
|
|
Adds type hint for normalize_values function
|
|
|
|
|
|
The problem with using the "value" record is that it's a floating point
number. See
<https://www.bonfacemunyoki.com/post/2021-10-21-comparing-floating-point-numbers/>
on why comparing floating point numbers can be an issue.
|
|
Sometimes, a user will try to insert data twice, on in some instances, 2
different users will attempt the same inserts of the same records separately.
In such cases, ignore the insert, and return early.
|
|
In the case when the user tries to delete the same data twice, prior to this
commit, an error was being generated. This commit remedies this by checking
if a record exists prior to deleting it.
|
|
|
|
|