M430 Microarray October03 / WebQTL

SJUT mRNA M430 (Oct03) MAS5

Accession number: GN9

Summary:

This October 2003 freeze provides estimates of mRNA expression in cerebellum of 26 adult BXD recombinant inbred strains, as well as C57BL/6J, DBA/2J, and their F1 hybrid, measured using the Affymetrix M430A and B microarrays. Data were generated by a small consortium of investigators at St. Jude Children's Research Hospital (SJ) and the University of Tennessee Health Science Center (UT). Data were processed using the Microarray Suite 5 (MAS 5) protocol of Affymetrix. To simplify comparison between transforms, MAS 5 values of each array were adjusted to an average of 8 units and a variance of 2 units. This data set was run in two large batches with careful consideration to balancing samples by sex and age. Eighteen strains have been profiled using two or three independent samples. All other strains were sampled once.

About the cases used to generate this set of data:

We have exploited a set of BXD recombinant inbred strains. The parental strains from which all BXD lines are derived are C57BL/6J (B6 or B) and DBA/2J (D2 or D). Both B and D strains have been almost fully sequence (8x coverage for B6 by a public consortium and approximately 1.5x coverage for D by Celera Discovery Systems). Chromosomes of the two parental strains have been recombined and fixed randomly in the many different BXD strains. BXD2 through BXD32 were produced by Benjamin A. Taylor starting in the late 1970s. BXD33 through 42 were also produced by Taylor, but they were generated in the 1990s. These strains are all available from The Jackson Laboratory, Bar Harbor, Maine.

In this mRNA expression data set we generally used progeny of stock obtained from The Jackson Laboratory between 1999 and 2001. Animals were generated in-house at the University of Alabama by John Mountz and Hui-Chen Hsu and at the University of Tennessee Health Science Center by Lu Lu and Robert Williams.

About the tissue used to generate these data:

The October 2003 data set was processed in two large batches. The first batch (the May 2003 data set) consists of 20 pooled samples from 20 strains run on pairs of Affymetrix 430A and 430B arrays (40 arrays total). The second batch consists of 29 samples, included may biological replicates, 2 technical replicates, and data for 9 new strains. Each individual array experiment involved a pool of whole cerebellum taken from three adult animals of the same age and sex. The age range may look broad, but translated into human terms corresponds to a range from about 20 years to 50 years.

RNA was extracted at UTHSC by Zhiping Jia and Hongtao Zhai.

All samples were subsequently processed at the Hartwell Center Affymetrix laboratory at SJCRH by Jay Morris.

The table below summarizes informaton on strain, sex, age, sample name, and batch number.

Strain Sex Age SampleID Batch

B6D2F1 M 127 766-C1 2

B6D2F1 M 94 S347-1C1 1

C57BL/6J F 116 773-C1 2

C57BL/6J M 109 S054-1C2 1

DBA/2J F 71 S175-1C1 1

DBA/2J F 91 782-C1 2

BXD1 F 57 813-C1 2

BXD2 F 142 751-C1 1

BXD2 F 78 774-C1 2

BXD5 F 56 802-C1 2

BXD5 M 71 752-C1 1

BXD6 F 92 719-C1 1

BXD8 F 72 S173-1C1 1

BXD9 M 86 737-C1 1

BXD11 F 441 S200-1C1 1

BXD11 M 92 790-C1 2

BXD12 F 130 776-C1 2

BXD12 M 64 756-C1 2

BXD14 F 190 794-C1 2

BXD14 M 91 758-C1 2

BXD16 F 163 750-C1 1

BXD19 F 61 772-C1 2

BXD21 F 116 711-C1 1

BXD21 M 64 803-C1 2

BXD22 F 65 S174-1C1 1

BXD23 F 88 814-C1 2

BXD24 F 71 805-C1 2

BXD24 M 71 759-C1 2

BXD25 M 90 S429-1C1 1

BXD28 F 113 785-C1 2

BXD28 F 427 S203-1C1 1

BXD29 F 82 777-C1 2

BXD29 M 76 714-C1 2

BXD29 M 76 714-C1 1

BXD31 F 142 816-C1 2

BXD32 F 62 778-C1 2

BXD32 M 218 786-C1 2

BXD33 F 184 793-C1 2

BXD33 M 124 715-C1 1

BXD34 F 56 725-C1 1

BXD34 M 91 789-C1 2

BXD38 F 55 781-C1 2

BXD38 M 65 761-C1 2

BXD39 M 165 723-C1 1

BXD40 F 56 718-C1 1

BXD40 F 56 718-C1 2

BXD40 M 73 812-C1 2

BXD42 F 100 799-C1 2

BXD42 M 97 709-C1 1

About the array platform:

Affymetrix Mouse Expression 430 GeneChip set: The expression data were generated using 430A and 430B arrays. Chromosomal positions of probe sets were determined by BLAT analysis of concatenated probe sequences using the Mouse Genome Sequencing Consortium May 2004 (mm5) assembly. This BLAT analysis is performed periodically by Yanhua Qu as each new build of the mouse genome is released (see http://genome.ucsc.edu/cgi-bin/hgBlat?command=start&org=mouse). We thank Yan Cui (UTHSC) for allowing us to use his Linux cluster to perform this analysis. It is possiible to confirm the BLAT alignment results yourself simply by clicking on the Verify link in the Trait Data and Editing Form (right side of the Location line).

About data processing:

Probe (cell) level data from the CEL file: These CEL values produced by GCOS are 75% quantiles from a set of 91 pixel values per cell.

Step 1: We added a constant offset of 1 to the CEL expression values for each cell to ensure that all values could be logged without generating negative values.
Step 2: We took the log2 of each cell signal level.
Step 3: We computed the Z scores for each cell within its array.
Step 4: We multiplied all Z scores by 2.
Step 5: We added 8 to the value of all Z scores. The consequence of this simple set of transformations is to produce a set of Z scores that have a mean of 8 units, a variance of 4 units, and a standard deviation of 2 units.
Step 6a: The 430A and 430B GeneChips include a set of 100 shared probe sets that have identical sequences. These 100 probe sets and 2200 probes provide a good way to adjust expression of the two GeneChips to a common scale. The absolute mean expression on the 430B array is almost invariably lower than that on the 430A array (the A array contains the more commonly expressed transcripts). To bring the two arrays into numerical alignment, we regressed Z scores of the common set of 2200 probes to obtain linear regression corrections to rescale the 430B arrays to values that match the 430A array. This involved multiplying all 430B Z scores by the slope of the regression and adding a very small offset (the regression intercept). The result of this adjustment is that the mean of the 430A array expression is fixed at a value of 8, whereas that of the 430B chip is typically 7.
Step 6b: We recentered the entire combined set of 430A and 430B transcripts to a mean of 8 and a standard deviation of 2. This involves reapplying Steps 3 through 5 above but now using the entire set of probes and probe sets from a merged 430A and B data set.
Step 7: No batch correction was applied.
Step 8: Finally, we compute the arithmetic mean of the values for the set of microarrays for each strain. In this data set we have only a very modest number of replicates and for this reason we do not yet provide error terms for transcripts or probes. Note, that this data set does not provide any correction for variance introduced by differences in sex, age, tissue source, or any interaction terms. We have not corrected for background beyond the background correction implemented by Affymetrix in generating the CEL file. We expect to add statistical controls and adjustments for these variables in subsequent versions of WebQTL.
Probe set data from the CHP file: These CHP files were generated using MAS 5. The same simple steps described above were also applied to these values. A 1-unit difference represents roughly a 2-fold difference in expression level. Expression levels below 5 are close to the noise level.

About the chromosome and megabase position values:

The chromosomal locations of probe sets and gene markers on the 430A and 430B microarrays were determined by BLAT analysis using the Mouse Genome Sequencing Consortium Oct 2003 Assembly (see http://genome.ucsc.edu/cgi-bin/hgBlat?command=start&org=mouse). We thank Yan Cui (UTHSC) for allowing us to use his Linux cluster to perform this analysis.

Data source acknowledgment:

Data were generated with funds contributed equally by The UTHSC-SJCRH Cerebellum Transcriptome Profiling Consortium. Our members include:

Tom Curran
Dan Goldowitz
Kristin Hamre
Lu Lu
Peter McKinnon
James Morgan
Clayton Naeve
Richard Smeyne
Robert W. Williams
The Center of Genomics and Bioinformatics at UTHSC
The Hartwell Center at SJCRH

Information about this text file:

This text file originally generated by RWW and YHQ, September 2003. Updated by RWW, October 30, 2004.