M430 Microarray May03 / WebQTL

SJUT M430 Cerebellum Database (May/03)

About the mice used to map microarray data:

The set of mouse strains used for mapping (a mapping panel) consists of groups of genetically unique BXD recombinant inbred strains. The ancestral strains from which all BXD lines are derived are C57BL/6J (B6 or B) and DBA/2J (D2 or D). Both B and D strains have been almost fully sequence (8x coverage for B6, and 1.5x coverage for D by Celera Genomics). Chromosomes of the two parental strains are recombined randomly in the many different BXD strains. BXD lines 2 through 32 were produced by Dr. Benjamin Taylor starting in the late 1970s. BXD33 through 42 were also produced by Dr. Taylor, but they were generated in the 1990s. All of these strains are available from The Jackson Laboratory. Lines such as BXDA12, BXDA20, etc. are BXD Advanced recombinant inbred strains that are part of a large set now being produced by Drs. Lu Lu, Guomin Zhou, Lee Silver, Jeremy Peirce, and Robert Williams. There will eventually be ~45 of these BXDA strains. For additional background on recombinant inbred strains, please see http://www.nervenet.org/papers/bxn.html.

About the tissue used to generate these data:

The May03 data were run as a single batch. Each individual array experiment involved a pool of brain tissue (intact whole cerebellum) taken from three adult animals usually of the same age. RNA was extracted at UTHSC. Twenty samples passed quality control at SJCRH and were run on Affymetrix MOE 430A and MOE430B GeneChip pairs (40 arrays total).

The May03 data set includes a single Affymetrix GeneChip pair (abbreviated 430AB) processed with labeled messenger RNA taken from 20 strains. Please note that the variation of sex and age is intentional and this data set is only the first of many batches that will be required to obtain a fully balanced design by sex and age. However, we note that there is still only quite modest evidence of sex difference in cerebellar transcriptional profiles (beyond such obvious transcripts such as Xist and Dby). The age range may look very broad, but translated into human terms corresponds to a range from about 20 years to 50 years.

SampleID Strain Case_ID_UT Age Sex

S347-1 B6D2F1/J 021202.01 94 M

S054-1 C57BL/6J 101201.01 109 M

S175-1 DBA/2J 011402.07 71 F

751-C BXD2 022003.02 142 F

752-C BXD5 031103.01 71 M

719-C BXD6 010803.01 92 F

S173-1 BXD8 011402.01 72 F

737-C BXD9 031903.04 86 M

S200-1 BXD11 011602.04 441 F

750-C BXD16 021402.04 163 F

711-C BXD21 121102.01 116 F

S174-1 BXD22 011402.04 65 F

S429-1 BXD25 030702.01 90 M

S203-1 BXD28 011602.13 427 F

714-C BXD29 020503.04 76 M

715-C BXD33 121002.01 124 M

725-C BXD34 111902.07 56 F

723-C BXD39 120902.01 165 M

718-C BXD40 111902.04 56 F

709-C BXD42 011303.01 97 M

About data processing:

Probe (cell) level data from the .CEL file: These .CEL values produced by MAS 5.0 are ~ 75% quantiles from a set of 22 pixel values per cell (6th-ranked pixel).

Step 1: We added an offset of 1.0 to the .CEL expression values for each cell to ensure that all values could be logged without generating negative values.
Step 2: We took the log base 2 of each cell.
Step 3: We computed the Z-score for each cell.
Step 4: We multiplied all Z scores by 2.
Step 5: We added 8 to the value of all Z-scores. The consequence of this simple set of transformations is to produce a set of Z-scores that have a mean of 8, a variance of 4, and a standard deviation of 2. The advantage of this modified Z-score is that a two-fold difference in expression level corresponds approximately to a 1 unit difference.
Step 6: The 430A and 430B GeneChips include a set of 100 shared probe sets (2200 probes) that have identical sequences. These probes and probe set provide a way to calibrate expression of the two GeneChips to a common scale. The absolute mean expression on the 430B array is almost invariably lower than that on the 430A array. To bring the two arrays into alignment, we regressed Z scores of the common set of probes to obtain a linear regression corrections to rescale the 430B arrays to the 430A array. In our case this involved multiplying all 430B Z values by approximately 0.87 and subtracting a very small offset. The result of this step is that the mean of the 430A GeneChip expression is fixed at a value of 8, whereas that of the 430B chip is typically 7. Thus average of A and B arrays is approximately 7.5.
Step 7: This first batch of data intentionally includes no technical or biological replicates. Those are all included in September03 data set and will also be included in all subsequent large batches. For this particular data set we therefore did not need to compute the arithmetic mean of the values for the set of microarrays for each strain. We have not (yet) corrected for variance introduced by sex, age, or a sex-by-age interaction. We have not corrected for background beyond the background correction implemented by Affymetrix in generating the .CEL file.
Probe set data from the .TXT file: These .TXT files were generated using the MAS 5.0. The same simple steps described above were also applied to these values. A 1-unit difference represents roughly a two-fold difference in expression level. Expression levels below 5 are usually close to background noise levels.

About the chromosome and megabase position values:

The chromosomal locations of probe sets and gene markers on the 430A and 430B microarrays were determined by BLAT analysis using the Mouse Genome Sequencing Consortium Oct 2003 Assembly (see http://genome.ucsc.edu/cgi-bin/hgBlat?command=start&org=mouse). We thank Dr. Yan Cui (UTHSC) for allowing us to use his Linux cluster to perform this analysis.

Data source acknowledgment:

Data were generated with funds contributed equally by The UTHSC-SJCRH Cerebellum Transcriptome Profiling Consortium. Our members include:

Tom Curran
Dan Goldowitz
Kristin Hamre
Lu Lu
Peter McKinnon
Jim Morgan
Clayton Naeve
Richard Smeyne
Robert Williams
The Center of Genomics and Bioinformatics at UTHSC
The Hartwell Center at SJCRH