1 files changed, 30 insertions, 0 deletions
diff --git a/general/datasets/Hc_u_0304_r/processing.rtf b/general/datasets/Hc_u_0304_r/processing.rtf
new file mode 100644
index 0000000..6de5cfc
--- /dev/null
+++ b/general/datasets/Hc_u_0304_r/processing.rtf
@@ -0,0 +1,30 @@
+<p>About data processing:</p>
+
+<blockquote><strong>Probe (cell) level data from the CEL file: </strong>These CEL values produced by MAS 5 are the 75% quantiles from a set of <a class="normal" href="images/AffyU74.pdf" target="_blank">36 pixel </a>values per cell.
+
+<ul>
+	<li>Step 1: We added an offset of 1.0 to the CEL expression values for each cell to ensure that all values could be logged without generating negative values.</li>
+	<li>Step 2: We took the log2 of each cell.</li>
+	<li>Step 3: We computed the Z score for each cell.</li>
+	<li>Step 4: We multiplied all Z scores by 2.</li>
+	<li>Step 5: We added 8 to the value of all Z scores. The consequence of this simple set of transformations is to produce a set of Z scores that have a mean of 8, a variance of 4, and a standard deviation of 2. The advantage of this modified Z score is that a two-fold difference in expression level corresponds approximately to a 1 unit difference.</li>
+	<li>Step 6: We computed the arithmetic mean of the values for the set of microarrays for each of the individual strains.</li>
+</ul>
+<strong>Probe set data from the TXT file: </strong>These TXT files were generated using the MAS 5. The same simple steps described above were also applied to these values. Every microarray data set therefore has a mean expression of 8 with a standard deviation of 2. A 1-unit difference therefore represents roughly a two-fold difference in expression level. Expression levels below 5 are usually close to background noise levels.</blockquote>
+
+<p>About the array probe set names:</p>
+
+<blockquote>
+<p>Most probe sets on the U74Av2 array consist of a total of 32 probes, divided into 16 perfect match probes and 16 mismatch controls. Each set of these 25-nucleotide-long probes has an identifier code that includes a unique number, an underscore character, and several suffix characters that highlight design features. The most common probe set suffix is <strong>at</strong>. This code indicates that the probes should hybridize relatively selectively with the complementary anti-sense target (i.e., the complemenary RNA) produced from a single gene. Other codes include:</p>
+
+<ul>
+	<li><strong>f_at (sequence family)</strong>: Some probes in this probe set will hybridize to identical and/or slightly different sequences of related gene transcripts.</li>
+	<li><strong>s_at (similarity constraint)</strong>: All Probes in this probe set target common sequences found in transcripts from several genes.</li>
+	<li><strong>g_at (common groups)</strong>: Some probes in this set target identical sequences in multiple genes and some target unique sequences in the intended target gene.</li>
+	<li><strong>r_at (rules dropped)</strong>: Probe sets for which it was not possible to pick a full set of unique probes using the Affymetrix probe selection rules. Probes were picked after dropping some of the selection rules.</li>
+	<li><strong>i_at (incomplete)</strong>: Designates probe sets for which there are fewer than the standard numbers of unique probes specified in the design (16 perfect match for the U74Av2).</li>
+	<li><strong>st (sense target)</strong>: Designates a sense target; almost always generated in error.</li>
+</ul>
+
+<p>Descriptions for the probe set extensions were taken from the Affymetrix<a class="normal" href="./dbdoc/data_analysis_fundamentals_manual.pdf"> GeneChip Expression Analysis Fundamentals</a>.</p>
+</blockquote>