From 2d7f9bd464905d8205ecc4acb14ee9e261bf5417 Mon Sep 17 00:00:00 2001 From: Munyoki Kilyungi Date: Tue, 8 Aug 2023 12:33:14 +0300 Subject: Create new issue regarding modelling phenotype data Signed-off-by: Munyoki Kilyungi --- issues/modelling-phenotype-data.gmi | 44 +++++++++++++++++++++++++++++++++++++ 1 file changed, 44 insertions(+) create mode 100644 issues/modelling-phenotype-data.gmi diff --git a/issues/modelling-phenotype-data.gmi b/issues/modelling-phenotype-data.gmi new file mode 100644 index 0000000..7005bfe --- /dev/null +++ b/issues/modelling-phenotype-data.gmi @@ -0,0 +1,44 @@ +# Modelling Phenotype Data + +* assigned: robw, bonfacem +* tags: critical +* contact: pjotrp + +## Introduction + +Consider the following columns from our phenotype +table: + + Pre_publication_description + Post_publication_description + Original_description + + Pre_publication_abbreviation + Post_publication_abbreviation + + +Ideally, all traits in GeneNetwork have pre- and post- descriptions and abbreviations upon initial data entry. This however is not the case. + +Also, it's not always the case that pre- and post- data are the same as evidenced by: + +``` +MariaDB [db_webqtl]> SELECT COUNT(*) FROM Phenotype where Pre_publication_description != Post_publication_description AND Post_publication_description IS NOT NULL AND Pre_publication_description IS NOT NULL; ++----------+ +| COUNT(*) | ++----------+ +| 4684 | ++----------+ +1 row in set (0.03 sec) +``` + +Pre- descriptions/abbreviations are shown until a PMID is attached. However, for many users, they forget to attach the PMID after the paper has been published. Regardless, many traits in GN are never published and their value is a function of the full "post" description. + +We should explore pre-linking pre-prints with canonical publications---to avoid duplication---after the RDF work. + +## Meeting Agenda + +Date: TBA + +* How do we handle private/public data and metadata? Data is the vectors of numbers; metadata include pre/post publication/abbreviation. + +* Given the above problem, what's the FAIR way to go about it? How do we allow sharing data that even encourages the paranoid? -- cgit v1.2.3