aboutsummaryrefslogtreecommitdiff
path: root/general/datasets/Cb_m_0305_r/processing.rtf
blob: f7b4668aa9aa4795e659c3e10104bd0bbd0c16af (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
<blockquote><strong>Probe (cell) level data from the CEL file: </strong>These CEL values produced by <a class="fs14" href="http://www.affymetrix.com/support/technical/product_updates/gcos_download.affx" target="_blank">GCOS</a> are 75% quantiles from a set of 91 pixel values per cell.
<ul>
	<li>Step 1: We added an offset of 1.0 unit to each cell signal to ensure that all values could be logged without generating negative values. We then computed the log base 2 of each cell.</li>
	<li>Step 2: We performed a quantile normalization for the log base 2 values for the total set of 104 arrays (all three batches) using the same initial steps used by the RMA transform.</li>
	<li>Step 3: We computed the Z scores for each cell value.</li>
	<li>Step 4: We multiplied all Z scores by 2.</li>
	<li>Step 5: We added 8 to the value of all Z scores. The consequence of this simple set of transformations is to produce a set of Z scores that have a mean of 8, a variance of 4, and a standard deviation of 2. The advantage of this modified Z score is that a two-fold difference in expression level corresponds approximately to a 1 unit difference.</li>
	<li>Step 6: We corrected for technical variance introduced by three large batches at the probe level. To do this we determined the ratio of the batch mean to the mean of all three batches and used this as a single multiplicative probe-specific batch correction factor. The consequence of this simple correction is that the mean probe signal value for each of the three batches is the same.</li>
	<li>Step 7a: The 430A and 430B arrays include a set of 100 shared probe sets (a total of 2200 probes) that have identical sequences. These probes and probe sets provide a way to calibrate expression of the 430A and 430B arrays to a common scale. To bring the two arrays into alignment, we regressed Z scores of the common set of probes to obtain a linear regression correction to rescale the 430B arrays to the 430A array. In our case this involved multiplying all 430B Z scores by the slope of the regression and adding or subtracting a small offset. The result of this step is that the mean of the 430A expression is fixed at a value of 8, whereas that of the 430B chip is typically reduced to 7. The average of the merged 430A and 430B array data set is approximately 7.5.</li>
	<li>Step 7b: We recentered the merged 430A and 430B data sets to a mean of 8 and a standard deviation of 2. This involved reapplying Steps 3 through 5.</li>
	<li>Step 8: Finally, we computed the arithmetic mean of the values for the set of microarrays for each strain. Technical replciates were averaged before computing the mean for independent biological samples. Note, that we have not (yet) corrected for variance introduced by differences in sex, age, source of animals, or any interaction terms. We have not corrected for background beyond the background correction implemented by Affymetrix in generating the CEL file. We eventually hope to add statistical controls and adjustments for these variables.</li>
</ul>
<strong>Probe set data: </strong>The expression data were processed by Yanhua Qu (UTHSC). Probe set data were generated from the fully normalized CEL files (quantile and batch corrected) using the standard MAS 5 Tukey biweight procedure. A 1-unit difference represents roughly a two-fold difference in expression level. Expression levels below 5 are usually close to background noise levels.</blockquote>