about summary refs log tree commit diff
path: root/general/datasets/Br_u_0405_ss/processing.rtf
blob: 2a34d5e32d6e4d9219f6cfaa384ed9e758aafce0 (plain)
1
2
3
4
5
6
7
8
9
10
11
<blockquote><strong>Probe (cell) level data from the CEL file: </strong> Probe signal intensity estimates in the Affymetrix CEL files are the 75% quantile value taken from a set of <a class="fs14" href="images/AffyU74.pdf" target="_blank">36</a> (6x6) pixels per probe cell in the DAT image file.
<ul>
	<li>Step 1: We added an offset of 1.0 unit to each cell signal to ensure that all values could be logged without generating negative values. We then computed the log base 2 of each cell.</li>
	<li>Step 2: We performed a quantile normalization for the log base 2 values for the total set of 97 arrays (all seven batches) using the same initial steps used by the RMA transform.</li>
	<li>Step 3: We computed the Z scores for each cell value.</li>
	<li>Step 4: We multiplied all Z scores by 2.</li>
	<li>Step 5: We added 8 to the value of all Z scores. The consequence of this simple set of transformations is to produce a set of Z scores that have a mean of 8, a variance of 4, and a standard deviation of 2. The advantage of this modified Z score is that a two-fold difference in expression level corresponds approximately to a 1 unit difference.</li>
	<li>Step 6: We corrected for technical variance introduced by seven batches at the probe level. To do this we determined the ratio of the batch mean to the mean of all seven batches and used this as a single multiplicative probe-specific batch correction factor. The consequence of this simple correction is that the mean probe signal value for each of the seven batches is the same.</li>
	<li>Step 7: Finally, we computed the arithmetic mean of the values for the set of microarrays for each strain. Technical replciates were averaged before computing the mean for independent biological samples. Note, that we have not (yet) corrected for variance introduced by differences in sex, age, source of animals, or any interaction terms. We have not corrected for background beyond the background correction implemented by Affymetrix in generating the CEL file. We eventually hope to add statistical controls and adjustments for these variables.</li>
</ul>
<strong>Probe set data from the CHP file: </strong>Probe set estimates of expression were initially generated using the standard Affymetrix MAS 5 algorithm. The CHP values were then processed following precisely the same six steps listed above to normalize expression and stabilize the variance of all 97 arrays. The mean expression within each array is therefore 8 units with a standard deviation of 2 units. A 1-unit difference represents roughly a 2-fold difference in expression level. Expression levels below 5 are close to the background noise level. While a value of 8 unit is nominally the average expression, this average includes all those transcripts with negligible expression in the brain that would often be eliminated from subsequent analysis (so-called &quot;absent&quot; and &quot;marginal&quot; calls in the CHP file).</blockquote>