diff options
Diffstat (limited to 'topics/quality-control')
-rw-r--r-- | topics/quality-control/qc-checks.gmi | 47 | ||||
-rw-r--r-- | topics/quality-control/shapiro-wilk-test.gmi | 33 | ||||
-rw-r--r-- | topics/quality-control/ui-design.gmi | 41 |
3 files changed, 121 insertions, 0 deletions
diff --git a/topics/quality-control/qc-checks.gmi b/topics/quality-control/qc-checks.gmi new file mode 100644 index 0000000..0d04625 --- /dev/null +++ b/topics/quality-control/qc-checks.gmi @@ -0,0 +1,47 @@ +# Quality Control Checks + +1. ProbeSetId (Affymetrix format): + +We favour using Illumina, Affimetrix, and other platform formats. + +Custom formats require a new annotation file to be created. + +We usually use Ensemble ID or Gene IDs. + +1.1 Ensemble transcript IDs usually have duplicates that need to be pruned. + +ENSMBL1234 + +## Example Gene Symbol to ProbeSetId + +AFFX-BkGr-GC03_st -> TCO500002136.mm.2 + +2. Inbred Strain names should prefer long form: + +B6 -> C57BL/6 +D2 -> DBA/2 + +3. Probeset IDs that don't have any values should be pruned: + +For example an Affymetrix data set might have ~28,000 entries and the data set that +is allowed into the GeneNetwork will be 22,000 entries. + +4. The standard error between male and female mice has to be computed. + +5. SE values have to be computed to 6 or greater decimal places. + +6. The average between male and female mice has to be computed to 3 decimal places. + +7. Datasets/studies having the same ProbeSetID should be grouped together. + +8. There should be no trailing spaces in data cells. + +9. Entries should have the same capitalization style. + +10. Assesing Phenotypes for normality with Shapiro-Wilk Test. + +11. Check for annotations file. + +12. Check for CRLF. + +13. Check for UTF-8 encoding. diff --git a/topics/quality-control/shapiro-wilk-test.gmi b/topics/quality-control/shapiro-wilk-test.gmi new file mode 100644 index 0000000..b1bdbfd --- /dev/null +++ b/topics/quality-control/shapiro-wilk-test.gmi @@ -0,0 +1,33 @@ +# Shapiro Wilk Test + +This document contains more info about QC step 11. + +https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.shapiro.html +https://vedexcel.com/how-to-perform-a-shapiro-wilk-test-in-python/ + +``` +Here is a simple QC procedure we may want to consider that was used by +Megan and Camron in a recent paper that I have attached. + + +Phenotypes were assessed for normality using the Shapiro–Wilk Test. Because +some of the data residuals devi- ated significantly from normality, we used +the orderNorm function to perform Ordered Quantile normalization43 on all +phenotypes. + +-- +Rob +``` + +``` +QTL analysis was performed in F2 mice using the R package R/qtl +(RRID:SCR_009085) as previously described.11,29,42 Quality checking +of genotypes and QTL analysis were performed in R (https://www.r- +project.org/) using R/bestNormalize (https://github.com/petersonR/ +bestNormalize) and R/qtl.42 Phenotypes were assessed for normality +using the Shapiro–Wilk Test. Because some of the data residuals devi- +ated significantly from normality, we used the orderNorm function to +perform Ordered Quantile normalization43 on all phenotypes. QTL +-- +A quantitative trait variant in Gabra2 underlies increased methamphetamine stimulant sensitivity +``` diff --git a/topics/quality-control/ui-design.gmi b/topics/quality-control/ui-design.gmi new file mode 100644 index 0000000..84f7748 --- /dev/null +++ b/topics/quality-control/ui-design.gmi @@ -0,0 +1,41 @@ +# UI Design + +1. Input/Receive Data in UI (drag and drop/upload submit form) + +2. Select Mouse + +"What type of Group are you using?" + +> (AKXD, BXH, Mouse Diversity Panel, BXD) + +3. "What is your platform?" + +> (Aff, Ilumina, ...) + +If Affymetrix (Aff) is selected then there should be various options +like Clarion S. + +If the platform you chose is not available: + + Tell PI that they should solicit for their platform to be added to the list. + + They can contact us via email. + +4. Allow excel file upload? + +## More Example UI Interactions and Checks + +"If your dataset does not comply with GN then you can try uploading your +dataset so that we can inspect it." + +"Your dataset has two erroneous entries: Gene Accession Gene." + +"The last two columns have the wrong format for the strain name." + +"Here's our format of how your dataset should look like." + +> ProbeSetID Strains ... + +"Inbred Set ID 1 is the same as BXD" + +> These are the strains: ... |