From e0386dcad4d10f5d303b96fb7335d4a90e473297 Mon Sep 17 00:00:00 2001 From: Frederick Muriuki Muriithi Date: Thu, 20 Oct 2022 07:28:43 +0300 Subject: Issues (materialised-views-for-correlations): Update issue * issues/materialised-views-for-correlations.gmi: Add information on the possible data that could show up in the materialised views and the database tables that data could come from. --- issues/materialised-views-for-correlations.gmi | 36 ++++++++++++++++++++++++-- 1 file changed, 34 insertions(+), 2 deletions(-) (limited to 'issues') diff --git a/issues/materialised-views-for-correlations.gmi b/issues/materialised-views-for-correlations.gmi index 10b878d..c8617fe 100644 --- a/issues/materialised-views-for-correlations.gmi +++ b/issues/materialised-views-for-correlations.gmi @@ -22,7 +22,7 @@ There is some work on => /topics/genotype-database the genotype database that should allow intermediate materialised views to be stored in lmdb -There might need to be multiple materialised views for the different types of traits, i.e. +There might need to be multiple materialised views for the different types of datasets/traits, i.e. * Phenotypes (Publish) * Genotypes (Geno) * mRNA (ProbeSet) @@ -52,9 +52,41 @@ Possible candidate queries for materialisation are: The method above is doing way too much - it should probably be split into separate methods for each class, to simplify the code a little and make it clearer what each part does before reworking the queries for the materialized view. -=> https://github.com/genenetwork/genenetwork2/blob/a2b837801d479ed2fb06ca33c07de9c271532c46/wqflask/base/trait.py#L386-L610 +=> https://github.com/genenetwork/genenetwork2/blob/a2b837801d479ed2fb06ca33c07de9c271532c46/wqflask/base/trait.py#L386-L610 The method above is also doing way too much. Both methods above do not have the metadata, so probably also have a look at adding the metadata to the materialized views + +### Possible "Entities" in Materialised Views + +In my early (2022-10-10) assessment, for each of the different types of datasets/traits mentioned in the description, we might need the data for the following "entities" in the materialized views: + +* Species data (db tables: Species) +* Strains data (db tables: Strain, NStrain) +* Group info (db tables: InbredSet) +* Dataset Info and Metadata +* Trait data and metadata + +The following are the database tables classified into their various functions (in the order order: Genotypes, mRNA, Phenotypes, Temp): + +#### Dataset Information Tables + +* GenoFreeze +* ProbeSetFreeze +* PublishFreeze + +#### Dataset Metadata Tables + +* Geno +* ProbeSet +* Phenotype +* Temp + +#### Sample Data + +* GenoData and GenoSE +* ProbeSetData and ProbeSetSE +* PublishData and PublishSE +* TempData -- cgit v1.2.3