blob: 30236e1c8a1f2e2df6fffeccde3a304e748f0a8b (
plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
|
# Quality Control Checks
1. Gene Symbols to ProbeSetId (Affymetrix format):
We favour using Illumina, Affimetrix, and other platform formats.
Custom formats require a new annotation file to be created.
We usually use Ensemble ID or Gene IDs.
1.1 Ensemble transcript IDs usually have duplicates that need to be pruned.
ENSMBL1234
## Example Gene Symbol to ProbeSetId
AFFX-BkGr-GC03_st -> TCO500002136.mm.2
2. Inbred Strain names should prefer long form:
B6 -> C57BL/6
D2 -> DBA/2
3. Probeset IDs that don't have any values should be pruned:
For example an Affymetrix data set might have ~28,000 entries and the data set that
is allowed into the GeneNetwork will be 22,000 entries.
4. The standard error between male and female mice has to be computed.
5. SE values have to be computed to 8 decimal places.
6. The average between male and female mice has to be computed.
7. AVG values have to be computed to only 3 decimal places.
8. Datasets/studies having the same ProbeSetID should be grouped together.
9. There should be no trailing spaces in data cells.
10. Entries should have the same capitalization style.
11. Assesing Phenotypes for normality with Shapiro-Wilk Test.
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.shapiro.html
https://vedexcel.com/how-to-perform-a-shapiro-wilk-test-in-python/
```
Here is a simple QC procedure we may want to consider that was used by
Megan and Camron in a recent paper that I have attached.
Phenotypes were assessed for normality using the Shapiro–Wilk Test. Because
some of the data residuals devi- ated significantly from normality, we used
the orderNorm function to perform Ordered Quantile normalization43 on all
phenotypes.
--
Rob
```
|