From 5300d6034fad3a57e45795a485f70b0aeef1f665 Mon Sep 17 00:00:00 2001 From: jgart Date: Wed, 17 Nov 2021 01:13:38 -0500 Subject: add qc checks list --- topics/quality-control/qc-checks.gmi | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) create mode 100644 topics/quality-control/qc-checks.gmi (limited to 'topics/quality-control/qc-checks.gmi') diff --git a/topics/quality-control/qc-checks.gmi b/topics/quality-control/qc-checks.gmi new file mode 100644 index 0000000..55b4847 --- /dev/null +++ b/topics/quality-control/qc-checks.gmi @@ -0,0 +1,29 @@ +# Quality Control Checks + +1. Gene Symbols to ProbeSetId (Affymetrix format): + +AFFX-BkGr-GC03_st -> TCO500002136.mm.2 + +2. Inbred Strain names should prefer long form: + +B6 -> C57BL/6 +D2 -> DBA/2 + +3. Probeset IDs that don't have any values should be pruned: + +For example an Affymetrix data set might have ~28,000 entries and the data set that +is allowed into the GeneNetwork will be 22,000 entries. + +4. The standard error between male and female mice has to be computed. + +5. SE values have to be computed to 8 decimal places. + +6. The average between male and female mice has to be computed. + +7. AVG values have to be computed to only 3 decimal places. + +8. Datasets/studies having the same ProbeSetID should be grouped together. + +9. There should be no trailing spaces in data cells. + +10. Entries should have the same capitalization style. -- cgit v1.2.3 From eb2c81c4fdeab43f1d8b0b746638075562829040 Mon Sep 17 00:00:00 2001 From: jgart Date: Fri, 19 Nov 2021 01:43:21 -0500 Subject: Add more info to qc checks --- topics/quality-control/qc-checks.gmi | 12 ++++++++++++ 1 file changed, 12 insertions(+) (limited to 'topics/quality-control/qc-checks.gmi') diff --git a/topics/quality-control/qc-checks.gmi b/topics/quality-control/qc-checks.gmi index 55b4847..f2e47fd 100644 --- a/topics/quality-control/qc-checks.gmi +++ b/topics/quality-control/qc-checks.gmi @@ -2,6 +2,18 @@ 1. Gene Symbols to ProbeSetId (Affymetrix format): +We favour using Illumina, Affimetrix, and other platform formats. + +Custom formats require a new annotation file to be created. + +We usually use Ensemble ID or Gene IDs. + +1.1 Ensemble transcript IDs usually have duplicates that need to be pruned. + +ENSMBL1234 + +## Example Gene Symbol to ProbeSetId + AFFX-BkGr-GC03_st -> TCO500002136.mm.2 2. Inbred Strain names should prefer long form: -- cgit v1.2.3 From f46ed96a01617514547da4636b6121fadc5f2326 Mon Sep 17 00:00:00 2001 From: jgart Date: Wed, 24 Nov 2021 00:05:03 -0500 Subject: Add Shapiro-Wilk Test QC Check scipy has an implementation --- topics/quality-control/qc-checks.gmi | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) (limited to 'topics/quality-control/qc-checks.gmi') diff --git a/topics/quality-control/qc-checks.gmi b/topics/quality-control/qc-checks.gmi index f2e47fd..30236e1 100644 --- a/topics/quality-control/qc-checks.gmi +++ b/topics/quality-control/qc-checks.gmi @@ -39,3 +39,22 @@ is allowed into the GeneNetwork will be 22,000 entries. 9. There should be no trailing spaces in data cells. 10. Entries should have the same capitalization style. + +11. Assesing Phenotypes for normality with Shapiro-Wilk Test. + +https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.shapiro.html +https://vedexcel.com/how-to-perform-a-shapiro-wilk-test-in-python/ + +``` +Here is a simple QC procedure we may want to consider that was used by +Megan and Camron in a recent paper that I have attached. + + +Phenotypes were assessed for normality using the Shapiro–Wilk Test. Because +some of the data residuals devi- ated significantly from normality, we used +the orderNorm function to perform Ordered Quantile normalization43 on all +phenotypes. + +-- +Rob +``` -- cgit v1.2.3 From 1259ed80a7f60a9f7ba82f3751990ac24a8e9436 Mon Sep 17 00:00:00 2001 From: jgart Date: Wed, 24 Nov 2021 00:39:40 -0500 Subject: Add citation for Shapiro-Wilk Test --- topics/quality-control/qc-checks.gmi | 13 +++++++++++++ 1 file changed, 13 insertions(+) (limited to 'topics/quality-control/qc-checks.gmi') diff --git a/topics/quality-control/qc-checks.gmi b/topics/quality-control/qc-checks.gmi index 30236e1..713fa0b 100644 --- a/topics/quality-control/qc-checks.gmi +++ b/topics/quality-control/qc-checks.gmi @@ -58,3 +58,16 @@ phenotypes. -- Rob ``` + +``` +QTL analysis was performed in F2 mice using the R package R/qtl +(RRID:SCR_009085) as previously described.11,29,42 Quality checking +of genotypes and QTL analysis were performed in R (https://www.r- +project.org/) using R/bestNormalize (https://github.com/petersonR/ +bestNormalize) and R/qtl.42 Phenotypes were assessed for normality +using the Shapiro–Wilk Test. Because some of the data residuals devi- +ated significantly from normality, we used the orderNorm function to +perform Ordered Quantile normalization43 on all phenotypes. QTL +-- +A quantitative trait variant in Gabra2 underlies increased methamphetamine stimulant sensitivity +``` -- cgit v1.2.3 From e867ad5edbebfab57e2efd09cde7246f3cb79cc7 Mon Sep 17 00:00:00 2001 From: jgart Date: Wed, 24 Nov 2021 01:26:09 -0500 Subject: Check for annotations file --- topics/quality-control/qc-checks.gmi | 30 +----------------------------- 1 file changed, 1 insertion(+), 29 deletions(-) (limited to 'topics/quality-control/qc-checks.gmi') diff --git a/topics/quality-control/qc-checks.gmi b/topics/quality-control/qc-checks.gmi index 713fa0b..149557c 100644 --- a/topics/quality-control/qc-checks.gmi +++ b/topics/quality-control/qc-checks.gmi @@ -42,32 +42,4 @@ is allowed into the GeneNetwork will be 22,000 entries. 11. Assesing Phenotypes for normality with Shapiro-Wilk Test. -https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.shapiro.html -https://vedexcel.com/how-to-perform-a-shapiro-wilk-test-in-python/ - -``` -Here is a simple QC procedure we may want to consider that was used by -Megan and Camron in a recent paper that I have attached. - - -Phenotypes were assessed for normality using the Shapiro–Wilk Test. Because -some of the data residuals devi- ated significantly from normality, we used -the orderNorm function to perform Ordered Quantile normalization43 on all -phenotypes. - --- -Rob -``` - -``` -QTL analysis was performed in F2 mice using the R package R/qtl -(RRID:SCR_009085) as previously described.11,29,42 Quality checking -of genotypes and QTL analysis were performed in R (https://www.r- -project.org/) using R/bestNormalize (https://github.com/petersonR/ -bestNormalize) and R/qtl.42 Phenotypes were assessed for normality -using the Shapiro–Wilk Test. Because some of the data residuals devi- -ated significantly from normality, we used the orderNorm function to -perform Ordered Quantile normalization43 on all phenotypes. QTL --- -A quantitative trait variant in Gabra2 underlies increased methamphetamine stimulant sensitivity -``` +12. Check for annotations file. -- cgit v1.2.3 From c10687b09b144b0fd38b4eeeea4b92f6dc968aa1 Mon Sep 17 00:00:00 2001 From: jgart Date: Wed, 1 Dec 2021 01:05:09 -0500 Subject: remove mention of conversion of gene symbols since we are not converting --- topics/quality-control/qc-checks.gmi | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'topics/quality-control/qc-checks.gmi') diff --git a/topics/quality-control/qc-checks.gmi b/topics/quality-control/qc-checks.gmi index 149557c..756d2ee 100644 --- a/topics/quality-control/qc-checks.gmi +++ b/topics/quality-control/qc-checks.gmi @@ -1,6 +1,6 @@ # Quality Control Checks -1. Gene Symbols to ProbeSetId (Affymetrix format): +1. ProbeSetId (Affymetrix format): We favour using Illumina, Affimetrix, and other platform formats. -- cgit v1.2.3 From 89c182b5418852fbe8ef1efeb66866f7bef550f9 Mon Sep 17 00:00:00 2001 From: jgart Date: Wed, 1 Dec 2021 01:18:50 -0500 Subject: qc-checks: check for crlf --- topics/quality-control/qc-checks.gmi | 2 ++ 1 file changed, 2 insertions(+) (limited to 'topics/quality-control/qc-checks.gmi') diff --git a/topics/quality-control/qc-checks.gmi b/topics/quality-control/qc-checks.gmi index 756d2ee..cbf7e5b 100644 --- a/topics/quality-control/qc-checks.gmi +++ b/topics/quality-control/qc-checks.gmi @@ -43,3 +43,5 @@ is allowed into the GeneNetwork will be 22,000 entries. 11. Assesing Phenotypes for normality with Shapiro-Wilk Test. 12. Check for annotations file. + +13. Check for CRLF. -- cgit v1.2.3 From fe53e6c30090cdea1cd947de876d6ee2f7b28983 Mon Sep 17 00:00:00 2001 From: jgart Date: Tue, 14 Dec 2021 04:04:19 -0500 Subject: add qc check for utf-8 encoding --- topics/quality-control/qc-checks.gmi | 2 ++ 1 file changed, 2 insertions(+) (limited to 'topics/quality-control/qc-checks.gmi') diff --git a/topics/quality-control/qc-checks.gmi b/topics/quality-control/qc-checks.gmi index cbf7e5b..9c8fcfd 100644 --- a/topics/quality-control/qc-checks.gmi +++ b/topics/quality-control/qc-checks.gmi @@ -45,3 +45,5 @@ is allowed into the GeneNetwork will be 22,000 entries. 12. Check for annotations file. 13. Check for CRLF. + +14. Check for UTF-8 encoding. -- cgit v1.2.3 From 7c7df39586d850fb560987081984069c66d70b33 Mon Sep 17 00:00:00 2001 From: jgart Date: Sat, 18 Dec 2021 09:33:59 -0500 Subject: qc-checks: Update SE values decimal size --- topics/quality-control/qc-checks.gmi | 20 +++++++++----------- 1 file changed, 9 insertions(+), 11 deletions(-) (limited to 'topics/quality-control/qc-checks.gmi') diff --git a/topics/quality-control/qc-checks.gmi b/topics/quality-control/qc-checks.gmi index 9c8fcfd..0d04625 100644 --- a/topics/quality-control/qc-checks.gmi +++ b/topics/quality-control/qc-checks.gmi @@ -28,22 +28,20 @@ is allowed into the GeneNetwork will be 22,000 entries. 4. The standard error between male and female mice has to be computed. -5. SE values have to be computed to 8 decimal places. +5. SE values have to be computed to 6 or greater decimal places. -6. The average between male and female mice has to be computed. +6. The average between male and female mice has to be computed to 3 decimal places. -7. AVG values have to be computed to only 3 decimal places. +7. Datasets/studies having the same ProbeSetID should be grouped together. -8. Datasets/studies having the same ProbeSetID should be grouped together. +8. There should be no trailing spaces in data cells. -9. There should be no trailing spaces in data cells. +9. Entries should have the same capitalization style. -10. Entries should have the same capitalization style. +10. Assesing Phenotypes for normality with Shapiro-Wilk Test. -11. Assesing Phenotypes for normality with Shapiro-Wilk Test. +11. Check for annotations file. -12. Check for annotations file. +12. Check for CRLF. -13. Check for CRLF. - -14. Check for UTF-8 encoding. +13. Check for UTF-8 encoding. -- cgit v1.2.3