summaryrefslogtreecommitdiff
path: root/topics/quality-control/qc-checks.gmi
blob: 756d2ee89551f6bd04d700ebda365065151dfe92 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
# Quality Control Checks

1. ProbeSetId (Affymetrix format):

We favour using Illumina, Affimetrix, and other platform formats.

Custom formats require a new annotation file to be created.

We usually use Ensemble ID or Gene IDs.

1.1 Ensemble transcript IDs usually have duplicates that need to be pruned.

ENSMBL1234

## Example Gene Symbol to ProbeSetId

AFFX-BkGr-GC03_st -> TCO500002136.mm.2

2. Inbred Strain names should prefer long form:

B6 -> C57BL/6
D2 -> DBA/2

3. Probeset IDs that don't have any values should be pruned:

For example an Affymetrix data set might have ~28,000 entries and the data set that
is allowed into the GeneNetwork will be 22,000 entries.

4. The standard error between male and female mice has to be computed.

5. SE values have to be computed to 8 decimal places.

6. The average between male and female mice has to be computed.

7. AVG values have to be computed to only 3 decimal places.

8. Datasets/studies having the same ProbeSetID should be grouped together.

9. There should be no trailing spaces in data cells.

10. Entries should have the same capitalization style.

11. Assesing Phenotypes for normality with Shapiro-Wilk Test.

12. Check for annotations file.