M430 Microarray brain February04 / WebQTL

INIA M430 brain RMA Database (February/04 Freeze)

Accession number: GN43

About the mice used to map microarray data:

We have exploited a set of BXD recombinant inbred strains. The parental strains from which all BXD lines are derived are C57BL/6J (B6 or B) and DBA/2J (D2 or D). Both B and D strains have been almost fully sequence (8x coverage for B6 by a public consortium and approximately 1.5x coverage for D by Celera). Chromosomes of the two parental strains have been recombined and fixed randomly in the many different BXD strains. BXD lines 2 through 32 were produced by Dr. Benjamin Taylor starting in the late 1970s. BXD33 through 42 were also produced by Dr. Taylor, but they were generated in the 1990s. All of these strains are available from The Jackson Laboratory. Lines such as BXD43, BXD67, BXD68, etc. are BXD recombinant inbred strains that are part of a large set produced by Drs. Lu Lu and Jeremy Peirce. There are approximately 45 of these new BXD strains. For additional background on recombinant inbred strains please see Peirce et al. 2004.

About the tissue used to generate these data:

The INIA M430 brain Database (February04) consists of 30 Affymetrix MOE 430A and MOE430B GeneChip microarray pairs. Each AB pair of arrays was hybridized in sequence (A array first, B array second) with a pool of brain tissue (forebrain minus olfactory bulb, plus the entire midbrain) taken from three adult animals of closely matched age and the same sex. RNA was extracted at UTHSC by Lu Lu, Zhiping Jia, and Hongtao Zhai. All samples were subsequently processed in the INIA Bioanalytical Core at the W. Harry Feinstone Center of Excellence by Thomas R. Sutter and colleagues at the University of Memphis. Before running the main batch of 30 pairs of array, we ran four "test" samples (one male and one female pool from each of the two parental strains, C57BL/6J and DBA/2J). The main batch of 30 array pairs includes the same four samples (in other words we have four technical replicates between the test and the main batches), two F1 hybrid sample (each run two times for a within-batch technical replication), and 22 BXD strains. The February04 data set therefore consists of one male and one female pool from C57BL/6J, DBA/2J, the B6D2F1 hybrid, 11 female BXD samples, and 11 male BXD samples. We should note that the four technical replicates between batches were eventually combined with a correction for a highly significant batch effect. This was done at both the probe and probe set levels to "align" the test batch values with the two main batches. (The ratio of the probe average in the four test arrays to the average of the same probe in the four corresponding main batch arrays was used as a correction factor.) The F1 within-batch technical replicates were simply averaged. In the next batch we will reverse the sex of the BXD samples to achieve a balance with at least 22 BXD strains with one male and one female sample each.

Strain Sex Age Sample_name Result date

B6D2F1 F 127 919-F1 Jan04

B6D2F1 F 127 919-F2 Jan04

B6D2F1 M 127 920-F1 Jan04

B6D2F1 M 127 920-F2 Jan04

C57BL/6J F 65 903-F1 Nov03

C57BL/6J F 65 903-F2 Jan03

C57BL/6J M 66 906-F1 Nov03

C57BL/6J M 66 906-F2 Jan04

DBA/2J F 60 917-F1 Nov03

DBA/2J F 60 917-F2 Jan04

DBA/2J M 60 918-F1 Nov03

DBA/2J M 60 918-F2 Jan04

BXD1 F 95 895-F1 Jan04

BXD5 M 71 728-F1 Jan04

BXD6 M 92 902-F1 Jan04

BXD8 F 72 S167-F1 Jan04

BXD9 M 86 909-F1 Jan04

BXD12 M 64 897-F1 Jan04

BXD13 F 86 748-F1 Jan04

BXD14 M 91 912-F1 Jan04

BXD18 F 108 771-F1 Jan04

BXD19 F 56 S236-F1 Jan04

BXD21 F 67 740-F1 Jan04

BXD23 F 88 815-F1 Jan04

BXD24 M 71 913-F1 Jan04

BXD25 F 74 S373-F1 Jan04

BXD28 F 79 910-F1 Jan04

BXD29 F 76 693-F1 Jan04

BXD32 F 93 898-F1 Jan04

BXD33 M 77 915-F1 Jan04

BXD34 M 72 916-F1 Jan04

BXD36 M 77 926-F1 Jan04

BXD38 M 69 731-F1 Jan04

BXD42 M 97 936-F1 Jan04

About data processing:

Probe (cell) level data from the .CEL file: These .CEL values produced by GCOS are 75% quantiles from a set of 91 pixel values per cell.

Step 1: We added an offset of 1.0 to the .CEL expression values for each cell to ensure that all values could be logged without generating negative values.
Step 2: We took the log base 2 of each cell.
Step 3: We computed the Z scores for each cell.
Step 4: We multiplied all Z scores by 2.
Step 5: We added 8 to the value of all Z scores. The consequence of this simple set of transformations is to produce a set of Z scores that have a mean of 8, a variance of 4, and a standard deviation of 2. The advantage of this modified Z score is that a two-fold difference in expression level corresponds approximately to a 1 unit difference.
Step 6a: The 430A and 430B GeneChips include a set of 100 shared probe sets (2200 probes) that have identical sequences. These probes and probe sets provide a way to calibrate expression of the two GeneChips to a common scale. The absolute mean expression on the 430B array is almost invariably lower than that on the 430A array. To bring the two arrays into alignment, we regressed Z scores of the common set of probes to obtain a linear regression corrections to rescale the 430B arrays to the 430A array. In our case this involved multiplying all 430B Z scores by the slope of the regression and adding or subtracting a very small offset. The result of this step is that the mean of the 430A GeneChip expression is fixed at a value of 8, whereas that of the 430B chip is typically 7. Thus average of A and B arrays is approximately 7.5.
Step 6b: We recenter the whole set of 430A and B transcripts to a mean of 8 and a standard deviation of 2. This involves reapplying Steps 3 through 5 above but now using the entire set of probes and probe sets from a merged 430A and B data set.
Step 7: We corrected for technical variance introduced by two batches. Means separated by tow batchs for each gene are corrected same with the data of two common strains in these two batches.
Step 8: Finally, we compute the arithmetic mean of the values for the set of microarrays for each strain. In this data set we have modest numbers of replicates and for this reason we do not yet provide error terms for transcripts or probes. Note, that we have not (yet) corrected for variance introduced by differences in sex, age, or any interaction terms. We have not corrected for background beyond the background correction implemented by Affymetrix in generating the .CEL file. We expect to add statistical controls and adjustments for these variables in subsequent versions of WebQTL.

Probe set data: These expression data were generated by RMA method. The raw expression values in .CEL files were read into the R environment (Ihaka a nd Gentleman, 1996). These were normalized using the RMA method of background correction and normalization (Irrizary et al, 2003). The same simple steps described above were also applied to these values. A 1-unit difference represents roughly a two-fold difference in expression level. Expression levels below 5 are usually close to background noise levels.

About the chromosome and megabase position values:

The chromosomal locations of probe sets and gene markers on the 430A and 430B microarrays were determined by BLAT analysis using the Mouse Genome Sequencing Consortium Oct 2003 (mm4) Assembly (see http://genome.ucsc.edu/cgi-bin/hgBlat?command=start&org=mouse). We thank Dr. Yan Cui (UTHSC) for allowing us to use his Linux cluster to perform this analysis.

Data source acknowledgment:

Data were generated with funds from multiple data sources including NIAAA INIA support to RWW and Thomas Sutter, an NIMH Human Brain Project, and the Dunavant Chair of Excellence, University of Tennessee Health Science Center. All arrays were processed at the University of Memphis by Dr. Thomas Sutter and colleagues with support of the INIA Bioanalytical Core.