M430 Microarray January 04 / WebQTL

SJUT M430 Cerebellum Database (January/04 Freeze)

About the mice used to map microarray data:

The set of mouse strains used for mapping (a mapping panel) consists of groups of genetically unique BXD recombinant inbred strains. The ancestral strains from which all BXD lines are derived are C57BL/6J (B6 or B) and DBA/2J (D2 or D). Both B and D strains have been almost fully sequence (8x coverage for B6, and 1.5x coverage for D by Celera Genomics). Chromosomes of the two parental strains are recombined randomly in the many different BXD strains. BXD lines 2 through 32 were produced by Dr. Benjamin Taylor starting in the late 1970s. BXD33 through 42 were also produced by Dr. Taylor, but they were generated in the 1990s. All of these strains are available from The Jackson Laboratory. Lines such as BXDA12, BXDA20, etc. are BXD Advanced recombinant inbred strains that are part of a large set now being produced by Drs. Lu Lu, Guomin Zhou, Lee Silver, Jeremy Peirce, and Robert Williams. There will eventually be ~45 of these BXDA strains. For additional background on recombinant inbred strains, please see http://www.nervenet.org/papers/bxn.html.

About the tissue used to generate these data:

The January04 data is same as the October03 data these were processed in two large batches. We did correction for the two batches based on the October03 data set.The first batch (the May03 data set) consisted of samples from 20 samples and 20 strains run on Affymetrix MOE 430A and MOE430B GeneChip pairs (40 arrays total). The second batch of 29 samples, included may biological replicates, 2 technical replicates, and data for 9 new strains. Each individual array experiment involved a pool of brain tissue (intact whole cerebellum) taken from three adult animals usually of the same age. RNA was extracted at UTHSC and all samples were processed at the Hartwell Center (SJCRH, Memphis). We will eventually achieve a sample with good, but not perfect, balance of samples by sex and age. The age range may look broad, but translated into human terms corresponds to a range from about 20 years to 50 years.

Strain Sex Age Sample_name

B6D2F1 M 127 766-C1

B6D2F1 M 94 S347-1C1

C57BL/6J F 116 773-C1

C57BL/6J M 109 S054-1C2

DBA/2J F 71 S175-1C1

DBA/2J F 91 782-C1

BXD1 F 57 813-C1

BXD2 F 142 751-C1

BXD2 F 78 774-C1

BXD5 F 56 802-C1

BXD5 M 71 752-C1

BXD6 F 92 719-C1

BXD8 F 72 S173-1C1

BXD9 M 86 737-C1

BXD11 F 441 S200-1C1

BXD11 M 92 790-C1

BXD12 F 130 776-C1

BXD12 M 64 756-C1

BXD14 F 190 794-C1

BXD14 M 91 758-C1

BXD16 F 163 750-C1

BXD19 F 61 772-C1

BXD21 F 116 711-C1

BXD21 M 64 803-C1

BXD22 F 65 S174-1C1

BXD23 F 88 814-C1

BXD24 F 71 805-C1

BXD24 M 71 759-C1

BXD25 M 90 S429-1C1

BXD28 F 113 785-C1

BXD28 F 427 S203-1C1

BXD29 F 82 777-C1

BXD29 M 76 714-C1

BXD29 M 76 714-C1

BXD31 F 142 816-C1

BXD32 F 62 778-C1

BXD32 M 218 786-C1

BXD33 F 184 793-C1

BXD33 M 124 715-C1

BXD34 F 56 725-C1

BXD34 M 91 789-C1

BXD38 F 55 781-C1

BXD38 M 65 761-C1

BXD39 M 165 723-C1

BXD40 F 56 718-C1

BXD40 F 56 718-C1

BXD40 M 73 812-C1

BXD42 F 100 799-C1

BXD42 M 97 709-C1

About data processing:

Probe (cell) level data from the .CEL file: These .CEL values produced by MAS 5.0 are ~ 75% quantiles from a set of 22 pixel values per cell (6th-ranked pixel).

Step 1: We added an offset of 1.0 to the .CEL expression values for each cell to ensure that all values could be logged without generating negative values.
Step 2: We took the log base 2 of each cell.
Step 3: We computed the Z scores for each cell.
Step 4: We multiplied all Z scores by 2.
Step 5: We added 8 to the value of all Z scores. The consequence of this simple set of transformations is to produce a set of Z scores that have a mean of 8, a variance of 4, and a standard deviation of 2. The advantage of this modified Z score is that a two-fold difference in expression level corresponds approximately to a 1 unit difference.
Step 6a: The 430A and 430B GeneChips include a set of 100 shared probe sets (2200 probes) that have identical sequences. These probes and probe sets provide a way to calibrate expression of the two GeneChips to a common scale. The absolute mean expression on the 430B array is almost invariably lower than that on the 430A array. To bring the two arrays into alignment, we regressed Z scores of the common set of probes to obtain a linear regression corrections to rescale the 430B arrays to the 430A array. In our case this involved multiplying all 430B Z scores by the slope of the regression and adding or subtracting a very small offset. The result of this step is that the mean of the 430A GeneChip expression is fixed at a value of 8, whereas that of the 430B chip is typically 7. Thus average of A and B arrays is approximately 7.5.
Step 6b: We recenter the whole set of 430A and B transcripts to a mean of 8 and a standard deviation of 2. This involves reapplying Steps 3 through 5 above but now using the entire set of probes and probe sets from a merged 430A and B data set.
Step 7: Finally, we compute the arithmetic mean of the values for the set of microarrays for each strain. In this October03 data set we have relatively modest numbers of replicates and for this reason we do not yet provide error terms for transcripts or probes. Note, that we have not (yet) corrected for variance introduced by differences in sex, age, array batch, or any interaction terms. We have not corrected for background beyond the background correction implemented by Affymetrix in generating the .CEL file. We expect to add statistical controls and adjustments for these variables in a subsequent versions of WebQTL.
Probe set data from the .TXT file: These .TXT files were generated using the MAS 5.0. The same simple steps described above were also applied to these values. A 1-unit difference represents roughly a two-fold difference in expression level. Expression levels below 5 are usually close to background noise levels.

About the chromosome and megabase position values:

The chromosomal locations of probe sets and gene markers on the 430A and 430B microarrays were determined by BLAT analysis using the Mouse Genome Sequencing Consortium Feb 2002 Assembly (see http://genome.ucsc.edu/cgi-bin/hgBlat?command=start&org=mouse). We thank Dr. Yan Cui (UTHSC) for allowing us to use his Linux cluster to perform this analysis.

Resolving Gene Identity and Position Problems:

Probe sets that are intended to target transcripts from a single gene occasionally map to different chromosomes; for example, M430 probe sets that supposedly target the thyroid hormone alpha receptor (Thra, 1416958_at on M430A) maps to Chr 14 at 13.556 Mb. Since Thra maps to Chr 11 rather than Chr 14, it is likely that one or all of these Thra probe sets are mismapped or mislabeled as Thra. To determine which problem is more likely, we suggest that you re-BLAT the perfect match probe sequence. This is quite simple. Just paste all of the perfect match probes (odd numbered probes) into a single BLAT query. WebQTL will do this automatically for you from the bottom of any Probe Sequence Table To do this:

Go to the Trait Data and Editing Form.
Select the Link: Probe sequences.
Scroll to the bottom of this page.
Click on the "BLAT PM Probes" button.
Click on the "browser" action link for the top row of the BLAT Search Results page.
Click on the "zoom out" 3x or 10x button.
Review the relation of "Your sequence from BLAT Search" with the "Known Genes" or any of the other Genome Browser tracks.

(NOTE: BLAT is insensitive to sequence overlap and extra spaces. The sequence above is a concatenation of all PM probes without any concern for probe overlap. The Perfect Match sequences are available on WebQTL by selecting the Probe link on the Trait Data and Editing window).

This will return this BLAT Search Results

The result confirms that the probe set maps to Chr 14 (a score of 211 is good). However if you click on the browser link in the BLAT Search Results window you will see that the gene that these probes target is actually BC008556 (a nuclear receptor subfamily 1, group D, member 2 gene), not Thra.

Data source acknowledgment:

Data were generated with funds contributed equally by The UTHSC-SJCRH Cerebellum Transcriptome Profiling Consortium. Our members include:

Tom Curran
Dan Goldowitz
Kristin Hamre
Lu Lu
Peter McKinnon
Jim Morgan
Clayton Naeve
Richard Smeyne
Robert Williams
The Center of Genomics and Bioinformatics at UTHSC
The Hartwell Center at SJCRH

Reference: None yet specifically for this project and data set

Wang J, Williams RW, Manly KF (2003) WebQTL: Web-based complex trait analysis. Neuroinformatics 1: 299-308..