# Quality Control Checks 1. Gene Symbols to ProbeSetId (Affymetrix format): We favour using Illumina, Affimetrix, and other platform formats. Custom formats require a new annotation file to be created. We usually use Ensemble ID or Gene IDs. 1.1 Ensemble transcript IDs usually have duplicates that need to be pruned. ENSMBL1234 ## Example Gene Symbol to ProbeSetId AFFX-BkGr-GC03_st -> TCO500002136.mm.2 2. Inbred Strain names should prefer long form: B6 -> C57BL/6 D2 -> DBA/2 3. Probeset IDs that don't have any values should be pruned: For example an Affymetrix data set might have ~28,000 entries and the data set that is allowed into the GeneNetwork will be 22,000 entries. 4. The standard error between male and female mice has to be computed. 5. SE values have to be computed to 8 decimal places. 6. The average between male and female mice has to be computed. 7. AVG values have to be computed to only 3 decimal places. 8. Datasets/studies having the same ProbeSetID should be grouped together. 9. There should be no trailing spaces in data cells. 10. Entries should have the same capitalization style. 11. Assesing Phenotypes for normality with Shapiro-Wilk Test. https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.shapiro.html https://vedexcel.com/how-to-perform-a-shapiro-wilk-test-in-python/ ``` Here is a simple QC procedure we may want to consider that was used by Megan and Camron in a recent paper that I have attached. Phenotypes were assessed for normality using the Shapiro–Wilk Test. Because some of the data residuals devi- ated significantly from normality, we used the orderNorm function to perform Ordered Quantile normalization43 on all phenotypes. -- Rob ``` ``` QTL analysis was performed in F2 mice using the R package R/qtl (RRID:SCR_009085) as previously described.11,29,42 Quality checking of genotypes and QTL analysis were performed in R (https://www.r- project.org/) using R/bestNormalize (https://github.com/petersonR/ bestNormalize) and R/qtl.42 Phenotypes were assessed for normality using the Shapiro–Wilk Test. Because some of the data residuals devi- ated significantly from normality, we used the orderNorm function to perform Ordered Quantile normalization43 on all phenotypes. QTL -- A quantitative trait variant in Gabra2 underlies increased methamphetamine stimulant sensitivity ```