summaryrefslogtreecommitdiff
path: root/topics/quality-control
diff options
context:
space:
mode:
Diffstat (limited to 'topics/quality-control')
-rw-r--r--topics/quality-control/qc-checks.gmi47
-rw-r--r--topics/quality-control/shapiro-wilk-test.gmi33
-rw-r--r--topics/quality-control/ui-design.gmi41
3 files changed, 121 insertions, 0 deletions
diff --git a/topics/quality-control/qc-checks.gmi b/topics/quality-control/qc-checks.gmi
new file mode 100644
index 0000000..0d04625
--- /dev/null
+++ b/topics/quality-control/qc-checks.gmi
@@ -0,0 +1,47 @@
+# Quality Control Checks
+
+1. ProbeSetId (Affymetrix format):
+
+We favour using Illumina, Affimetrix, and other platform formats.
+
+Custom formats require a new annotation file to be created.
+
+We usually use Ensemble ID or Gene IDs.
+
+1.1 Ensemble transcript IDs usually have duplicates that need to be pruned.
+
+ENSMBL1234
+
+## Example Gene Symbol to ProbeSetId
+
+AFFX-BkGr-GC03_st -> TCO500002136.mm.2
+
+2. Inbred Strain names should prefer long form:
+
+B6 -> C57BL/6
+D2 -> DBA/2
+
+3. Probeset IDs that don't have any values should be pruned:
+
+For example an Affymetrix data set might have ~28,000 entries and the data set that
+is allowed into the GeneNetwork will be 22,000 entries.
+
+4. The standard error between male and female mice has to be computed.
+
+5. SE values have to be computed to 6 or greater decimal places.
+
+6. The average between male and female mice has to be computed to 3 decimal places.
+
+7. Datasets/studies having the same ProbeSetID should be grouped together.
+
+8. There should be no trailing spaces in data cells.
+
+9. Entries should have the same capitalization style.
+
+10. Assesing Phenotypes for normality with Shapiro-Wilk Test.
+
+11. Check for annotations file.
+
+12. Check for CRLF.
+
+13. Check for UTF-8 encoding.
diff --git a/topics/quality-control/shapiro-wilk-test.gmi b/topics/quality-control/shapiro-wilk-test.gmi
new file mode 100644
index 0000000..b1bdbfd
--- /dev/null
+++ b/topics/quality-control/shapiro-wilk-test.gmi
@@ -0,0 +1,33 @@
+# Shapiro Wilk Test
+
+This document contains more info about QC step 11.
+
+https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.shapiro.html
+https://vedexcel.com/how-to-perform-a-shapiro-wilk-test-in-python/
+
+```
+Here is a simple QC procedure we may want to consider that was used by
+Megan and Camron in a recent paper that I have attached.
+
+
+Phenotypes were assessed for normality using the Shapiro–Wilk Test. Because
+some of the data residuals devi- ated significantly from normality, we used
+the orderNorm function to perform Ordered Quantile normalization43 on all
+phenotypes.
+
+--
+Rob
+```
+
+```
+QTL analysis was performed in F2 mice using the R package R/qtl
+(RRID:SCR_009085) as previously described.11,29,42 Quality checking
+of genotypes and QTL analysis were performed in R (https://www.r-
+project.org/) using R/bestNormalize (https://github.com/petersonR/
+bestNormalize) and R/qtl.42 Phenotypes were assessed for normality
+using the Shapiro–Wilk Test. Because some of the data residuals devi-
+ated significantly from normality, we used the orderNorm function to
+perform Ordered Quantile normalization43 on all phenotypes. QTL
+--
+A quantitative trait variant in Gabra2 underlies increased methamphetamine stimulant sensitivity
+```
diff --git a/topics/quality-control/ui-design.gmi b/topics/quality-control/ui-design.gmi
new file mode 100644
index 0000000..84f7748
--- /dev/null
+++ b/topics/quality-control/ui-design.gmi
@@ -0,0 +1,41 @@
+# UI Design
+
+1. Input/Receive Data in UI (drag and drop/upload submit form)
+
+2. Select Mouse
+
+"What type of Group are you using?"
+
+> (AKXD, BXH, Mouse Diversity Panel, BXD)
+
+3. "What is your platform?"
+
+> (Aff, Ilumina, ...)
+
+If Affymetrix (Aff) is selected then there should be various options
+like Clarion S.
+
+If the platform you chose is not available:
+
+ Tell PI that they should solicit for their platform to be added to the list.
+
+ They can contact us via email.
+
+4. Allow excel file upload?
+
+## More Example UI Interactions and Checks
+
+"If your dataset does not comply with GN then you can try uploading your
+dataset so that we can inspect it."
+
+"Your dataset has two erroneous entries: Gene Accession Gene."
+
+"The last two columns have the wrong format for the strain name."
+
+"Here's our format of how your dataset should look like."
+
+> ProbeSetID Strains ...
+
+"Inbred Set ID 1 is the same as BXD"
+
+> These are the strains: ...