INIA Brain mRNA M430 (Feb04) MAS5 modify this page

Accession number: GN11

    Summary:

The February 2004 freeze provides estimates of mRNA expression in brains of BXD recombinant inbred mice measured using the Affymetrix MOE430 microarrays that replaced the U74 series of arrays in 2003. Data were generated at the University of Tennessee Health Science Center (UTHSC) as part of an research project funded by the NIAAA. Brain samples from BXD strains were hybridized in small pools (n=3) to M430A and M430B arrays. Data were processed using the Microarray Suite 5 (MAS 5) protocol of Affymetrix. To simplify comparison between transforms, MAS 5 values of each array were adjusted to an average of 8 units and a variance of 2 units. This data set was essentially run as a single large batch with careful consideration to balancing samples by sex and age.

    About the cases used to generate this set of data:

We have exploited a set of BXD recombinant inbred strains. The parental strains from which all BXD lines are derived are C57BL/6J (B6 or B) and DBA/2J (D2 or D). Both B and D strains have been almost fully sequence (8x coverage for B6 by a public consortium and approximately 1.5x coverage for D by Celera). Chromosomes of the two parental strains have been recombined and fixed randomly in the many different BXD strains. BXD2 through BXD32 were produced by Benjamin A. Taylor starting in the late 1970s. BXD33 through 42 were also produced by Taylor, but they were generated in the 1990s. These strains are all available from the Jackson Laboratory, Bar Harbor, Maine. BXD43 through BXD99 were produced by Lu Lu, Jeremy Peirce, Lee M. Silver, and Robert W. Williams in the late 1990s and early 2000s using advanced intercross progeny (Peirce et al. 2004).

In this mRNA expression data set we generally used progeny of stock obtained from The Jackson Laboratory between 1999 and 2001. Animals were generated in-house at the University of Alabama by John Mountz and Hui-Chen Hsu and at the University of Tennessee Health Science Center by Lu Lu and Robert Williams.

    About the tissue used to generate these data:

This INIA M430 brain Database (February04) consists of 30 pairs of Affymetrix 430A and 430B arrays. Each pair was hybridized in succession (A then B) with cRNA generated from a pool of three brains from adult mice of the same age and sex. The brain region included most of the forebrain and midbrain, bilaterally. This sample excluded the olfactory bulbs, retinas, or the posterior pituitary (all formally part of the forebrain).

RNA was extracted at UTHSC by Lu Lu, Zhiping Jia, and Hongtao Zhai.

All samples were subsequently processed in the INIA Bioanalytical Core at the W. Harry Feinstone Center of Excellence by Thomas R. Sutter and colleagues at the University of Memphis. Before running the main batch of 30 pairs of array, we ran four test samples (one male and one female pool from each of the two parental strains, C57BL/6J and DBA/2J). The main batch of 30 array pairs includes the same four samples (in other words we have four technical replicates shared between the test and a single main batch), two F1 hybrid sample (each run two times for within-batch technical replication), and 22 BXD strains. The February04 data set therefore consists of one male and one female pool from C57BL/6J, DBA/2J, the B6D2F1 hybrid, 11 female BXD samples, and 11 male BXD samples. We should note that the four technical replicates were eventually combined with a correction for a highly significant batch effect. This was done at both the probe and probe set levels to numerically align the test batch values with the two main batches. (The ratio of the probe average in the four test arrays to the average of the same probe in the four corresponding main batch arrays was used as a correction factor.) The F1 within-batch technical replicates were simply averaged. In the next batch we will reverse the sex of the BXD samples to achieve a balance with at least 22 BXD strains with one male and one female sample each.

Strain Sex Age Sample_name Result date
B6D2F1F127919-F1Jan04
B6D2F1F127919-F2Jan04
B6D2F1M127920-F1Jan04
B6D2F1M127920-F2Jan04
C57BL/6JF65903-F1Nov03
C57BL/6JF65903-F2Jan03
C57BL/6JM66906-F1Nov03
C57BL/6JM66906-F2Jan04
DBA/2JF60917-F1Nov03
DBA/2JF60917-F2Jan04
DBA/2JM60918-F1Nov03
DBA/2JM60918-F2Jan04
BXD1F95895-F1Jan04
BXD5M71728-F1Jan04
BXD6M92902-F1Jan04
BXD8F72S167-F1Jan04
BXD9M86909-F1Jan04
BXD12M64897-F1Jan04
BXD13F86748-F1Jan04
BXD14M91912-F1Jan04
BXD18F108771-F1Jan04
BXD19F56S236-F1Jan04
BXD21F67740-F1Jan04
BXD23F88815-F1Jan04
BXD24M71913-F1Jan04
BXD25F74S373-F1Jan04
BXD28F79910-F1Jan04
BXD29F76693-F1Jan04
BXD32F93898-F1Jan04
BXD33M77915-F1Jan04
BXD34M72916-F1Jan04
BXD36M77926-F1Jan04
BXD38M69731-F1Jan04
BXD42M97936-F1Jan04

    About the array platform:

Affymetrix MOE430 GeneChip Set: The expression data were generated using MOE430A and MOE430B arrays. Chromosomal positions of probe sets were determined by BLAT analysis of concatenated probe sequences using the Mouse Genome Sequencing Consortium May 2004 (mm5) assembly. This BLAT analysis is performed periodically by Yanhua Qu as each new build of the mouse genome is released (see http://genome.ucsc.edu/cgi-bin/hgBlat?command=start&org=mouse). We thank Yan Cui (UTHSC) for allowing us to use his Linux cluster to perform this analysis. It is possiible to confirm the BLAT alignment results yourself simply by clicking on the Verify link in the Trait Data and Editing Form (right side of the Location line).

    About data processing:

Probe (cell) level data from the CEL file: These CEL values produced by GCOS are 75% quantiles from a set of 91 pixel values per cell.
  • Step 1: We added a constant offset of 1 to the CEL expression values for each cell to ensure that all values could be logged without generating negative values.
  • Step 2: We took the log2 of each cell signal level.
  • Step 3: We computed the Z scores for each cell.
  • Step 4: We multiplied all Z scores by 2.
  • Step 5: We added 8 to the value of all Z scores. The consequence of this simple set of transformations is to produce a set of Z scores that have a mean of 8 units, a variance of 4 units, and a standard deviation of 2 units. The advantage of this modified Z score is that a 2-fold difference in expression level corresponds approximately to a 1 unit difference.
  • Step 6a: The 430A and 430B GeneChips include a set of 100 shared probe sets (2200 probes) that have identical sequences. These 2200 probes and 100 probe sets provide a way to calibrate expression of the two GeneChips to a common scale. The absolute mean expression on the 430B array is almost invariably lower than that on the 430A array (the A array contains the more commonly expressed transcripts). To bring the two arrays into numerical alignment, we regressed Z scores of the common set of 2200 probes to obtain a linear regression corrections to rescale the 430B arrays to values that match the 430A array. In our case this involved multiplying all 430B Z scores by the slope of the regression and adding or subtracting a very small offset (the regression intercept). The result of this adjustment is that the mean of the 430A GeneChip expression is fixed at a value of 8, whereas that of the 430B chip is typically 7. Thus average of A and B arrays is approximately 7.5.
  • Step 6b: We recentered the entire combined set of 430A and B transcripts to a mean of 8 and a standard deviation of 2. This involves reapplying Steps 3 through 5 above but now using the entire set of probes and probe sets from a merged 430A and B data set.
  • Step 7: When necessary, we correct for technical variance introduced by running multiple batches. However, this data set is essentially a single batch with a few technical replicates in a first test batch.
  • Step 8: Finally, we compute the arithmetic mean of the values for the set of microarrays for each strain. In this data set we have only a very modest number of replicates and for this reason we do not yet provide error terms for transcripts or probes. Note, that this data set does not provide any correction for variance introduced by differences in sex, age, tissue source, or any interaction terms. We have not corrected for background beyond the background correction implemented by Affymetrix in generating the CEL file. We expect to add statistical controls and adjustments for these variables in subsequent versions of WebQTL.
Probe set data from the CHP file: These CHP files were generated using MAS 5. The same simple steps described above were also applied to these values. A 1-unit difference represents roughly a 2-fold difference in expression level. Expression levels below 5 are close to the noise level.

    Data source acknowledgment:

Array data were generated with funds from the NIAAA INIA to RWW and Thomas Sutter. Informatics resources are supported primarily by an NIMH/NIDA Human Brain Project. All arrays were processed at the University of Memphis by Thomas Sutter and colleagues with support of the INIA Bioanalytical Core.

    Information about this text file:

This text file originally generated by RWW, YHQ, and EJC, March 2004. Updated by RWW, October 30, 2004.