diff options
author | Frederick Muriuki Muriithi | 2021-12-22 11:49:30 +0300 |
---|---|---|
committer | Frederick Muriuki Muriithi | 2021-12-22 11:49:30 +0300 |
commit | 2f56ee37183938270197d9bd968648e65584513c (patch) | |
tree | 195c6ca08188a61d1669403c3109de8a96427a7a /topics/quality-control/qc-checks.gmi | |
parent | 512bc12aaac7189253a62b2be105472a34821263 (diff) | |
parent | be16a6a7f1a7e2dfa074e858c26ff6a9b6aa86de (diff) | |
download | gn-gemtext-2f56ee37183938270197d9bd968648e65584513c.tar.gz |
Merge branch 'main' of github.com:genenetwork/gn-gemtext-threads
Diffstat (limited to 'topics/quality-control/qc-checks.gmi')
-rw-r--r-- | topics/quality-control/qc-checks.gmi | 47 |
1 files changed, 47 insertions, 0 deletions
diff --git a/topics/quality-control/qc-checks.gmi b/topics/quality-control/qc-checks.gmi new file mode 100644 index 0000000..0d04625 --- /dev/null +++ b/topics/quality-control/qc-checks.gmi @@ -0,0 +1,47 @@ +# Quality Control Checks + +1. ProbeSetId (Affymetrix format): + +We favour using Illumina, Affimetrix, and other platform formats. + +Custom formats require a new annotation file to be created. + +We usually use Ensemble ID or Gene IDs. + +1.1 Ensemble transcript IDs usually have duplicates that need to be pruned. + +ENSMBL1234 + +## Example Gene Symbol to ProbeSetId + +AFFX-BkGr-GC03_st -> TCO500002136.mm.2 + +2. Inbred Strain names should prefer long form: + +B6 -> C57BL/6 +D2 -> DBA/2 + +3. Probeset IDs that don't have any values should be pruned: + +For example an Affymetrix data set might have ~28,000 entries and the data set that +is allowed into the GeneNetwork will be 22,000 entries. + +4. The standard error between male and female mice has to be computed. + +5. SE values have to be computed to 6 or greater decimal places. + +6. The average between male and female mice has to be computed to 3 decimal places. + +7. Datasets/studies having the same ProbeSetID should be grouped together. + +8. There should be no trailing spaces in data cells. + +9. Entries should have the same capitalization style. + +10. Assesing Phenotypes for normality with Shapiro-Wilk Test. + +11. Check for annotations file. + +12. Check for CRLF. + +13. Check for UTF-8 encoding. |