summaryrefslogtreecommitdiff
path: root/issues/quality-control
diff options
context:
space:
mode:
authorArun Isaac2022-07-19 15:02:48 +0530
committerArun Isaac2022-07-19 15:02:48 +0530
commit35c4cec2c3c1593b59bc29fa5a738f857ecc270f (patch)
tree182237d08d59a74505f5d418f0905050cc2e5b00 /issues/quality-control
parent44d951234e82dc27541035d0050cc6c04719ab14 (diff)
downloadgn-gemtext-35c4cec2c3c1593b59bc29fa5a738f857ecc270f.tar.gz
Rescue quality control issues from topics.
Diffstat (limited to 'issues/quality-control')
-rw-r--r--issues/quality-control/qc-checks.gmi55
-rw-r--r--issues/quality-control/qc.gmi41
-rw-r--r--issues/quality-control/ui-design.gmi49
3 files changed, 145 insertions, 0 deletions
diff --git a/issues/quality-control/qc-checks.gmi b/issues/quality-control/qc-checks.gmi
new file mode 100644
index 0000000..dc18f94
--- /dev/null
+++ b/issues/quality-control/qc-checks.gmi
@@ -0,0 +1,55 @@
+# Quality Control Checks
+
+1. ProbeSetId (Affymetrix format):
+
+We favour using Illumina, Affimetrix, and other platform formats.
+
+Custom formats require a new annotation file to be created.
+
+We usually use Ensemble ID or Gene IDs.
+
+1.1 Ensemble transcript IDs usually have duplicates that need to be pruned.
+
+ENSMBL1234
+
+## Example Gene Symbol to ProbeSetId
+
+AFFX-BkGr-GC03_st -> TCO500002136.mm.2
+
+2. Inbred Strain names should prefer long form:
+
+B6 -> C57BL/6
+D2 -> DBA/2
+
+3. Probeset IDs that don't have any values should be pruned:
+
+For example an Affymetrix data set might have ~28,000 entries and the data set that
+is allowed into the GeneNetwork will be 22,000 entries.
+
+4. The standard error between male and female mice has to be computed.
+
+5. SE values have to be computed to 6 or greater decimal places.
+
+6. The average between male and female mice has to be computed to 3 decimal places.
+
+7. Datasets/studies having the same ProbeSetID should be grouped together.
+
+8. There should be no trailing spaces in data cells.
+
+9. Entries should have the same capitalization style.
+
+10. Assesing Phenotypes for normality with Shapiro-Wilk Test.
+
+11. Check for annotations file.
+
+12. Check for CRLF.
+
+13. Check for UTF-8 encoding.
+
+## Tags
+
+* assigned: jgart
+* type: feature-request
+* priority: high
+* status: unclear
+* keywords: quality control
diff --git a/issues/quality-control/qc.gmi b/issues/quality-control/qc.gmi
new file mode 100644
index 0000000..7b5d1e4
--- /dev/null
+++ b/issues/quality-control/qc.gmi
@@ -0,0 +1,41 @@
+# Quality Control Project
+
+Develop an app with a web interface to automate the job of cleaning tsv data
+files for entry. The app would be used by a group of users on a network to
+upload data.
+
+QC should be embedded functionality of the data uploader that Bonface has written.
+
+* Upload data through REST API - it goes into a temp dir for a user (data is in
+ escrow) - Bonface wrote this already
+* Run QC - what Arthur proposes (start here)
+* Show results - run tools (hard part!)
+* User can say - please accept data (Bonface wrote this)
+* Curator accepts data (different person!) (Bonface wrote this)
+* Data gets piped into GN proper
+
+The QC step consists of
+
+* Standard checks - some GN tools, such as outliers
+* Run mapping
+
+So, even though the data is in 'escrow' we should be able to use it as
+something that is in the database. GN1 does some of that. This is
+where Arun comes in - we need to have a common handler for data that
+is in the database and data that is in escrow. My idea is that this
+will all be text files (truth files). A simple first QC step is to
+check that all fields in the table are numbers where should be. Not
+text.
+
+Note we could run QC through the REST API too. That would allow it to
+be run from R and Python and Jupyter notebooks. Make it part of GN3.
+
+The tricky part is still how the data is handled in escrow.
+
+## Tags
+
+* assigned: jgart
+* priority: high
+* type: feature-request
+* status: in progress, beta
+* keywords: quality control
diff --git a/issues/quality-control/ui-design.gmi b/issues/quality-control/ui-design.gmi
new file mode 100644
index 0000000..029a2b8
--- /dev/null
+++ b/issues/quality-control/ui-design.gmi
@@ -0,0 +1,49 @@
+# UI Design
+
+1. Input/Receive Data in UI (drag and drop/upload submit form)
+
+2. Select Mouse
+
+"What type of Group are you using?"
+
+> (AKXD, BXH, Mouse Diversity Panel, BXD)
+
+3. "What is your platform?"
+
+> (Aff, Ilumina, ...)
+
+If Affymetrix (Aff) is selected then there should be various options
+like Clarion S.
+
+If the platform you chose is not available:
+
+ Tell PI that they should solicit for their platform to be added to the list.
+
+ They can contact us via email.
+
+4. Allow excel file upload?
+
+## More Example UI Interactions and Checks
+
+"If your dataset does not comply with GN then you can try uploading your
+dataset so that we can inspect it."
+
+"Your dataset has two erroneous entries: Gene Accession Gene."
+
+"The last two columns have the wrong format for the strain name."
+
+"Here's our format of how your dataset should look like."
+
+> ProbeSetID Strains ...
+
+"Inbred Set ID 1 is the same as BXD"
+
+> These are the strains: ...
+
+## Tags
+
+* assigned: jgart
+* type: feature-request
+* status: unclear
+* priority: medium
+* keywords: UI, quality control