After RMA processing all arrays were rank-order normalized. This second round of quantile normalization removes much residual non-linearity across arrays and forces every array to have the same distribution of values as the mean of all arrays. Comparative array data quality was then evaluated in DataDesk. Outlier arrays were flagged by visual inspection in DataDesk, usually by means of an analysis of scatter plots and more quantitatively by generating a correlation matrix of all arrays. Those arrays with mean correlation <0.96 versus all other arrays indicates trouble or a biological outlier). In some cases, outliers were expected, such as samples from strains with retinal degeneration (C3H/HeJ and BXD24) and samples from wild subspecies such as CAST/Ei, PWD/Ph, and PWK/Ph. However, when arrays were anomolous both within strain and across strains, they were often simply discarded. We tended to keep arrays that "conformed" to the expectation. The assumption in these cases is that anomolous data are much more likely due to experimental problem and errors than to informative biological variation. Approximately 8 arrays total were discarded in batches 1 and 2 combined.
After this process, the acceptable set of arrays was renormalized using all step as above, starting with the original RMA procedure, etc.
We then categorized arrays into XXX major "technical groups" depending on expression patterns as noted in scatterplots. This process of defining technical groups was done in DataDesk by manually "typing" arrays. These technical groups are apparently due to subtle within-batch effect that we do not yet understand and that cannot be corrected by quantile normalization. These XXX major technical groups are not obviously related to strain, sex, age, or any other known biological effect or variable. They are also not obviously related to any of the Affymetrix QC data types (3'/5' ratios, gain, etc.). Once the technical groups were defined, we forced the means of each probe set in the XX technical groups to the same value. This simple process partially removes a technical error of unknown origin in large expression array data sets.
We reviewed the final data set using a new method developed by RW Williams, Jeremy Peirce, and Hongqiang Li. For the full set of 140 arrays that passed standard QC protocols described above, we computed the strain means for the BXD strains, B6, D2, and F1s. Using this set of strain means we then computed LRS scores for all 45101 probe sets and counted the number of transcripts that generated QTLs with LRS values greater than 50. This value (e.g. 1000) represented the QTL harvest for the full data set. We then dropped a single array from the data set (n = 139 arrays), recomputed strain means, and recomputed the number of transcripts with LRS scores great than 50. This value is expected to typically reduce the number of QTLs that reach the criterion level (e.g., 950 QTLs > 50). This process was repeated for every array to obtain an array-specific difference value--the effect of removing that array on the total QTL count. For example, the loss of a single array might cause a decrease in 50 QTLs (1000-950). Values ranged from -90 (good0 to +38 (bad). This procedure is similar in some ways to a jackknife protocol, although we are not using this procedure to esimate an error term, but rather as a final method to polish a data set. By applying this procedure we discovered that a set of XX (7?) arrays could be excluded while simultaneously improving the total number of QTLs with values above 50.
During this final process we discovered that nearly XX arrays in the second batch had been mislabeled at some point in processing. We computed the correct strain membership of each array using a large number of Mendelian probe sets (more than 50) and comparing their match to standard SNP and microsatellite markers and the original array data set of November 2005. This allowed us to rescue a large number of arrays that were of very high quality.