aboutsummaryrefslogtreecommitdiff
path: root/general/datasets/Ibr_m_0204_m/processing.rtf
blob: 4663b88eab8bd3bf4b0e0c4f2d290f634b91e3e4 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
<blockquote><strong>Probe (cell) level data from the CEL file: </strong>These CEL values produced by <a class="fs14" href="http://www.affymetrix.com/support/technical/product_updates/gcos_download.affx" target="_blank">GCOS</a> are 75% quantiles from a set of 91 pixel values per cell.</blockquote>

<blockquote>
<ul>
	<li>Step 1: We added an offset of 1 to the CEL expression values for each cell to ensure that all values could be logged without generating negative values.</li>
	<li>Step 2: We took the log base 2 of each probe signal.</li>
	<li>Step 3: We computed the Z scores for each probe signal.</li>
	<li>Step 4: We multiplied all Z scores by 2.</li>
	<li>Step 5: We added 8 to the value of all Z scores. The consequence of this simple set of transformations is to produce a set of Z scores that have a mean of 8, a variance of 4, and a standard deviation of 2. The advantage of this modified Z score is that a two-fold difference in expression level corresponds approximately to a 1 unit difference.</li>
	<li>Step 6a: The 430A and 430B arrays include a set of 100 shared probe sets (2200 probes) that have identical sequences. These probes provide a way to calibrate expression of the A and B arrays to a common scale. The absolute mean expression on the 430B array is almost invariably lower than that on the 430A array. To bring the two arrays into alignment, we regressed Z scores of the common set of probes to obtain a linear regression corrections to rescale the 430B arrays to the 430A array. In our case this involved multiplying all 430B Z scores by the slope of the regression and adding or subtracting a very small offset. The result of this step is that the mean of the 430A GeneChip expression is fixed at a value of 8, whereas that of the 430B chip is typically 7. Thus average of A and B arrays is approximately 7.5.</li>
	<li>Step 6b: We recenter the whole set of 430A and B transcripts to a mean of 8 and a standard deviation of 2. This involves reapplying Steps 3 through 5 above but now using the entire set of probes and probe sets from a merged 430A and B data set.</li>
	<li>Step 7: We corrected for technical variance introduced by two batches. Means separated by the first batch for each gene are corrected same as means of the second batch.</li>
	<li>Step 8: Finally, we compute the arithmetic mean of the values for the set of microarrays for each strain. In this data set we have modest numbers of replicates and for this reason we do not yet provide error terms for transcripts or probes. Note, that we have not (yet) corrected for variance introduced by differences in sex, age, or any interaction terms. We have not corrected for background beyond the background correction implemented by Affymetrix in generating the CEL file. We expect to add statistical controls and adjustments for these variables in subsequent versions of WebQTL.</li>
</ul>
</blockquote>

<blockquote>
<p><strong>Probe set data: </strong>The original expression values in the Affymetrix CEL files were read into <a class="fs14" href="http://odin.mdacc.tmc.edu/~zhangli/PerfectMatch/" target="_blank">PerfectMatch</a> to generate the normalized PDNN data set.</p>

<p>PDNN values of each array were subsequently normalized to a achieve a mean value of 8 units and a variance of 2 units.</p>

<p>When necessary, we computed the arithmetic mean for technical replicates and treated these as single samples. We then computed the arithmetic mean for the set of 2 to 5 biological replicates for each strain.</p>
</blockquote>

<p>About the array probe sets names:</p>

<blockquote>
<p>Most probe sets on the mouse 430A and 430B arrays consist of a total of 22 probes, divided into 11 perfect match(PM) probes and 11 mismatch (MM) controls. Each set of these 25-nucleotide-long probes has an identifier code that includes a unique number, an underscore character, several suffix characters that highlight design features, a a final A or B character to specify the array pair member. The most common probe set suffix is <strong>at</strong>. This code indicates that the probes should hybridize relatively selectively with the complementary anti-sense target (i.e., the complemenary RNA) produced from a single gene.</p>
</blockquote>