SJUT M430 Cerebellum RMA Database (October/04 Freeze) modify this page

Accession number: GN44

    About the mice used to map microarray data:

The set of mouse strains used for mapping (a mapping panel) consists of groups of genetically unique BXD recombinant inbred strains. The ancestral strains from which all BXD lines are derived are C57BL/6J (B6 or B) and DBA/2J (D2 or D). Both B and D strains have been almost fully sequence (8x coverage for B6, and 1.5x coverage for D by Celera Genomics). Chromosomes of the two parental strains are recombined randomly in the many different BXD strains. BXD lines 2 through 32 were produced by Dr. Benjamin Taylor starting in the late 1970s. BXD33 through 42 were also produced by Dr. Taylor, but they were generated in the 1990s. All of these strains are available from The Jackson Laboratory. Lines such as BXD67, BXD68, etc. are BXD Advanced recombinant inbred strains that are part of a large set now being produced by Drs. Lu Lu, Guomin Zhou, Lee Silver, Jeremy Peirce, and Robert Williams. There will eventually be 45 of these BXD strains. For additional background on recombinant inbred strains, please see http://www.nervenet.org/papers/bxn.html.

    About the tissue used to generate these data:

The October04 data set was processed in two large batches. The first batch (the May 2003 data set) consists of samples from 20 samples and 20 strains run on pairs of Affymetrix 430A and 430B arrays (40 arrays total). The second batch consists of 29 samples, included may biological replicates, 2 technical replicates, and data for 9 new strains. Each individual array experiment involved a pool of whole cerebellum taken from three adult animals of the same age and sex. The age range may look broad, but translated into human terms corresponds to a range from about 20 years to 50 years.

RNA was extracted at UTHSC by Lu Lu, Zhiping Jia, and Hongtao Zhai.

All samples were subsequently processed at the Hartwell Center Affymetrix laboratory at SJCRH by Jay Morris.

The table below summarizes informaton on strain, sex, age, sample name, and batch number.
Strain Sex Age SampleIDBatch
B6D2F1M127766-C12
B6D2F1M94S347-1C11
C57BL/6JF116773-C12
C57BL/6JM109S054-1C21
DBA/2JF71S175-1C11
DBA/2JF91782-C12
BXD1F57813-C12
BXD2F142751-C11
BXD2F78774-C12
BXD5F56802-C12
BXD5M71752-C11
BXD6F92719-C11
BXD8F72S173-1C11
BXD9M86737-C11
BXD11F441S200-1C11
BXD11M92790-C12
BXD12F130776-C12
BXD12M64756-C12
BXD14F190794-C12
BXD14M91758-C12
BXD16F163750-C11
BXD19F61772-C12
BXD21F116711-C11
BXD21M64803-C12
BXD22F65S174-1C11
BXD23F88814-C12
BXD24F71805-C12
BXD24M71759-C12
BXD25M90S429-1C11
BXD28F113785-C12
BXD28F427S203-1C11
BXD29F82777-C12
BXD29M76714-C12
BXD29M76714-C11
BXD31F142816-C12
BXD32F62778-C12
BXD32M218786-C12
BXD33F184793-C12
BXD33M124715-C11
BXD34F56725-C11
BXD34M91789-C12
BXD38F55781-C12
BXD38M65761-C12
BXD39M165723-C11
BXD40F56718-C11
BXD40F56718-C12
BXD40M73812-C12
BXD42F100799-C12
BXD42M97709-C11

    About the array platform:

Affymetrix Mouse Expression 430 GeneChip set: The expression data were generated using 430A and 430B arrays. Chromosomal positions of probe sets were determined by BLAT analysis of concatenated probe sequences using the Mouse Genome Sequencing Consortium May 2004 (mm5) assembly. This BLAT analysis is performed periodically by Yanhua Qu as each new build of the mouse genome is released. We thank Yan Cui (UTHSC) for allowing us to use his Linux cluster to perform this analysis. It is possiible to confirm the BLAT alignment results yourself simply by clicking on the Verify link in the Trait Data and Editing Form (right side of the Location line).

    About data processing:

Probe (cell) level data from the CEL file: These CEL values produced by GCOS are 75% quantiles from a set of 91 pixel values per cell.
  • Step 1: We added a constant offset of 1 to the CEL expression values for each cell to ensure that all values could be logged without generating negative values.
  • Step 2: We took the log2 of each cell signal level.
  • Step 3: We computed the Z scores for each cell within its array.
  • Step 4: We multiplied all Z scores by 2.
  • Step 5: We added 8 to the value of all Z scores. The consequence of this simple set of transformations is to produce a set of Z scores that have a mean of 8 units, a variance of 4 units, and a standard deviation of 2 units.
  • Step 6a: The 430A and 430B arrays include a set of 100 shared probe sets that have identical probe sequences. These 100 probe sets and 2200 probes provide a good way to adjust expression of the two GeneChips to a common scale. The absolute mean expression on the 430B array is almost invariably lower than that on the 430A array (the A array contains the more commonly expressed transcripts). To bring the two arrays into numerical alignment, we regressed Z scores of the common set of 2200 probes to obtain linear regression corrections to rescale the 430B arrays to values that match the 430A array. This involved multiplying all 430B Z scores by the slope of the regression and adding a very small offset (the regression intercept). The result of this adjustment is that the mean of the 430A array expression is fixed at a value of 8, whereas that of the 430B chip is typically 7.
  • Step 6b: We recentered the combined set of 430A and 430B transcripts to a mean of 8 and a standard deviation of 2. This involves reapplying Steps 3 through 5 above but now using the entire set of probes and probe sets from a merged 430A and B data set.
  • Step 7: We corrected for technical variance introduced by running two large batches. Individual probe means for the two batches (n = 20 and 29 samples, respectively) were calcuated separately. Probe values of the smaller batch (1) were then adjusted by multiplying batch 2 probe estimates by the Batch_2/Batch_1 ratio of the averages for that probe.
  • Step 8: Finally, we compute the arithmetic mean of the values for the set of microarrays for each strain. In this data set we have only a very modest number of replicates and for this reason we do not yet provide error terms for transcripts or probes. Note, that this data set does not provide any correction for variance introduced by differences in sex, age, tissue source, or any interaction terms. We have not corrected for background beyond the background correction implemented by Affymetrix in generating the CEL file. We expect to add statistical controls and adjustments for these variables in subsequent versions of WebQTL.
Probe set data: These expression data were generated by MAS5 method. We fixed the .CEL files with the above Step 6. The raw expression values in the fixed .CEL files were read into the R environment (Ihaka a nd Gentleman, 1996). These were normalized using the MAS5 method of background correction and normalization. The same simple steps described above were also applied to these values. A 1-unit difference represents roughly a two-fold difference in expression level. Expression levels below 5 are usually close to background noise levels.

    About the chromosome and megabase position values:

The chromosomal locations of probe sets and gene markers on the 430A and 430B microarrays were determined by BLAT analysis using the Mouse Genome Sequencing Consortium Oct 2003 Assembly (see http://genome.ucsc.edu/cgi-bin/hgBlat?command=start&org=mouse). We thank Dr. Yan Cui (UTHSC) for allowing us to use his Linux cluster to perform this analysis.

    Data source acknowledgment:

Data were generated with funds contributed equally by The UTHSC-SJCRH Cerebellum Transcriptome Profiling Consortium. Our members include:
  • Tom Curran
  • Dan Goldowitz
  • Kristin Hamre
  • Lu Lu
  • Peter McKinnon
  • Jim Morgan
  • Clayton Naeve
  • Richard Smeyne
  • Robert Williams
  • The Center of Genomics and Bioinformatics at UTHSC
  • The Hartwell Center at SJCRH