From 35c4cec2c3c1593b59bc29fa5a738f857ecc270f Mon Sep 17 00:00:00 2001 From: Arun Isaac Date: Tue, 19 Jul 2022 15:02:48 +0530 Subject: Rescue quality control issues from topics. --- topics/quality-control/qc-checks.gmi | 55 ------------------------------------ 1 file changed, 55 deletions(-) delete mode 100644 topics/quality-control/qc-checks.gmi (limited to 'topics/quality-control/qc-checks.gmi') diff --git a/topics/quality-control/qc-checks.gmi b/topics/quality-control/qc-checks.gmi deleted file mode 100644 index dc18f94..0000000 --- a/topics/quality-control/qc-checks.gmi +++ /dev/null @@ -1,55 +0,0 @@ -# Quality Control Checks - -1. ProbeSetId (Affymetrix format): - -We favour using Illumina, Affimetrix, and other platform formats. - -Custom formats require a new annotation file to be created. - -We usually use Ensemble ID or Gene IDs. - -1.1 Ensemble transcript IDs usually have duplicates that need to be pruned. - -ENSMBL1234 - -## Example Gene Symbol to ProbeSetId - -AFFX-BkGr-GC03_st -> TCO500002136.mm.2 - -2. Inbred Strain names should prefer long form: - -B6 -> C57BL/6 -D2 -> DBA/2 - -3. Probeset IDs that don't have any values should be pruned: - -For example an Affymetrix data set might have ~28,000 entries and the data set that -is allowed into the GeneNetwork will be 22,000 entries. - -4. The standard error between male and female mice has to be computed. - -5. SE values have to be computed to 6 or greater decimal places. - -6. The average between male and female mice has to be computed to 3 decimal places. - -7. Datasets/studies having the same ProbeSetID should be grouped together. - -8. There should be no trailing spaces in data cells. - -9. Entries should have the same capitalization style. - -10. Assesing Phenotypes for normality with Shapiro-Wilk Test. - -11. Check for annotations file. - -12. Check for CRLF. - -13. Check for UTF-8 encoding. - -## Tags - -* assigned: jgart -* type: feature-request -* priority: high -* status: unclear -* keywords: quality control -- cgit v1.2.3