summaryrefslogtreecommitdiff
path: root/topics/quality-control/qc-checks.gmi
diff options
context:
space:
mode:
Diffstat (limited to 'topics/quality-control/qc-checks.gmi')
-rw-r--r--topics/quality-control/qc-checks.gmi47
1 files changed, 47 insertions, 0 deletions
diff --git a/topics/quality-control/qc-checks.gmi b/topics/quality-control/qc-checks.gmi
new file mode 100644
index 0000000..0d04625
--- /dev/null
+++ b/topics/quality-control/qc-checks.gmi
@@ -0,0 +1,47 @@
+# Quality Control Checks
+
+1. ProbeSetId (Affymetrix format):
+
+We favour using Illumina, Affimetrix, and other platform formats.
+
+Custom formats require a new annotation file to be created.
+
+We usually use Ensemble ID or Gene IDs.
+
+1.1 Ensemble transcript IDs usually have duplicates that need to be pruned.
+
+ENSMBL1234
+
+## Example Gene Symbol to ProbeSetId
+
+AFFX-BkGr-GC03_st -> TCO500002136.mm.2
+
+2. Inbred Strain names should prefer long form:
+
+B6 -> C57BL/6
+D2 -> DBA/2
+
+3. Probeset IDs that don't have any values should be pruned:
+
+For example an Affymetrix data set might have ~28,000 entries and the data set that
+is allowed into the GeneNetwork will be 22,000 entries.
+
+4. The standard error between male and female mice has to be computed.
+
+5. SE values have to be computed to 6 or greater decimal places.
+
+6. The average between male and female mice has to be computed to 3 decimal places.
+
+7. Datasets/studies having the same ProbeSetID should be grouped together.
+
+8. There should be no trailing spaces in data cells.
+
+9. Entries should have the same capitalization style.
+
+10. Assesing Phenotypes for normality with Shapiro-Wilk Test.
+
+11. Check for annotations file.
+
+12. Check for CRLF.
+
+13. Check for UTF-8 encoding.