summaryrefslogtreecommitdiff
path: root/issues/modelling-phenotype-data.gmi
diff options
context:
space:
mode:
authorMunyoki Kilyungi2023-08-08 12:33:14 +0300
committerMunyoki Kilyungi2023-08-08 12:33:14 +0300
commit2d7f9bd464905d8205ecc4acb14ee9e261bf5417 (patch)
tree8c664ddf1381c62e337cd0816aab564fc8ff75d5 /issues/modelling-phenotype-data.gmi
parent3c9091681d58c3bdb4c819c93226e66279847042 (diff)
downloadgn-gemtext-2d7f9bd464905d8205ecc4acb14ee9e261bf5417.tar.gz
Create new issue regarding modelling phenotype data
Signed-off-by: Munyoki Kilyungi <me@bonfacemunyoki.com>
Diffstat (limited to 'issues/modelling-phenotype-data.gmi')
-rw-r--r--issues/modelling-phenotype-data.gmi44
1 files changed, 44 insertions, 0 deletions
diff --git a/issues/modelling-phenotype-data.gmi b/issues/modelling-phenotype-data.gmi
new file mode 100644
index 0000000..7005bfe
--- /dev/null
+++ b/issues/modelling-phenotype-data.gmi
@@ -0,0 +1,44 @@
+# Modelling Phenotype Data
+
+* assigned: robw, bonfacem
+* tags: critical
+* contact: pjotrp
+
+## Introduction
+
+Consider the following columns from our phenotype
+table:
+
+ Pre_publication_description
+ Post_publication_description
+ Original_description
+
+ Pre_publication_abbreviation
+ Post_publication_abbreviation
+
+
+Ideally, all traits in GeneNetwork have pre- and post- descriptions and abbreviations upon initial data entry. This however is not the case.
+
+Also, it's not always the case that pre- and post- data are the same as evidenced by:
+
+```
+MariaDB [db_webqtl]> SELECT COUNT(*) FROM Phenotype where Pre_publication_description != Post_publication_description AND Post_publication_description IS NOT NULL AND Pre_publication_description IS NOT NULL;
++----------+
+| COUNT(*) |
++----------+
+| 4684 |
++----------+
+1 row in set (0.03 sec)
+```
+
+Pre- descriptions/abbreviations are shown until a PMID is attached. However, for many users, they forget to attach the PMID after the paper has been published. Regardless, many traits in GN are never published and their value is a function of the full "post" description.
+
+We should explore pre-linking pre-prints with canonical publications---to avoid duplication---after the RDF work.
+
+## Meeting Agenda
+
+Date: TBA
+
+* How do we handle private/public data and metadata? Data is the vectors of numbers; metadata include pre/post publication/abbreviation.
+
+* Given the above problem, what's the FAIR way to go about it? How do we allow sharing data that even encourages the paranoid?