1 files changed, 17 insertions, 0 deletions
diff --git a/general/datasets/CB_M_1004_R/processing.rtf b/general/datasets/CB_M_1004_R/processing.rtf
new file mode 100644
index 0000000..cdfa27a
--- /dev/null
+++ b/general/datasets/CB_M_1004_R/processing.rtf
@@ -0,0 +1,17 @@
+<blockquote><strong>Probe (cell) level data from the CEL file: </strong>These CEL values produced by <a class="fs14" href="http://www.affymetrix.com/support/technical/product_updates/gcos_download.affx" target="_blank">GCOS</a> are 75% quantiles from a set of 91 pixel values per cell.
+<ul>
+	<li>Step 1: We added a constant offset of 1 to the CEL expression values for each cell to ensure that all values could be logged without generating negative values.</li>
+	<li>Step 2: We took the log2 of each cell signal level.</li>
+	<li>Step 3: We computed the Z scores for each cell within its array.</li>
+	<li>Step 4: We multiplied all Z scores by 2.</li>
+	<li>Step 5: We added 8 to the value of all Z scores. The consequence of this simple set of transformations is to produce a set of Z scores that have a mean of 8 units, a variance of 4 units, and a standard deviation of 2 units.</li>
+	<li>Step 6a: The 430A and 430B GeneChips include a set of 100 shared probe sets that have identical sequences. These 100 probe sets and 2200 probes provide a good way to adjust expression of the two GeneChips to a common scale. The absolute mean expression on the 430B array is almost invariably lower than that on the 430A array (the A array contains the more commonly expressed transcripts). To bring the two arrays into numerical alignment, we regressed Z scores of the common set of 2200 probes to obtain linear regression corrections to rescale the 430B arrays to values that match the 430A array. This involved multiplying all 430B Z scores by the slope of the regression and adding a very small offset (the regression intercept). The result of this adjustment is that the mean of the 430A array expression is fixed at a value of 8, whereas that of the 430B chip is typically 7.</li>
+	<li>Step 6b: We recentered the entire combined set of 430A and 430B transcripts to a mean of 8 and a standard deviation of 2. This involves reapplying Steps 3 through 5 above but now using the entire set of probes and probe sets from a merged 430A and B data set.</li>
+	<li>Step 7: No batch correction was applied.</li>
+	<li>Step 8: Finally, we compute the arithmetic mean of the values for the set of microarrays for each strain. In this data set we have only a very modest number of replicates and for this reason we do not yet provide error terms for transcripts or probes. Note, that this data set does not provide any correction for variance introduced by differences in sex, age, tissue source, or any interaction terms. We have not corrected for background beyond the background correction implemented by Affymetrix in generating the CEL file. We expect to add statistical controls and adjustments for these variables in subsequent versions of WebQTL.</li>
+</ul>
+<strong>Probe set data from the CHP file: </strong>These CHP files were generated using MAS 5. The same simple steps described above were also applied to these values. A 1-unit difference represents roughly a 2-fold difference in expression level. Expression levels below 5 are close to the noise level.</blockquote>
+
+<p>About the chromosome and megabase position values:</p>
+
+<blockquote>The chromosomal locations of probe sets and gene markers on the 430A and 430B microarrays were determined by BLAT analysis using the Mouse Genome Sequencing Consortium Oct 2003 Assembly (see <a class="fs14" href="http://genome.ucsc.edu/cgi-bin/hgBlat?command=start&amp;org=mouse">http://genome.ucsc.edu/cgi-bin/hgBlat?command=start&amp;org=mouse</a>). We thank Yan Cui (UTHSC) for allowing us to use his Linux cluster to perform this analysis.</blockquote>