SJUT Cerebellum mRNA M430 (Mar05) RMA modify this page

Accession number: GN56

    Summary:

This March 2005 data freeze provides estimates of mRNA expression in adult cerebellum of 48 lines of mice including 45 BXD recombinant inbred strains, C57BL/6J, DBA/2J, and F1 hybrids. Data were generated by a consortium of investigators at St. Jude Children's Research Hospital (SJ) and the University of Tennessee Health Science Center (UT). Cerebellar samples were hybridized in small pools (n = 3) to Affymetrix M430A and B arrays. Data were processed using the RMA protocol. To simplify comparison between transforms, PDNN values of each array were adjusted to an average of 8 units and a standard deviation of 2 units.

    About the cases used to generate this set of data:

We have exploited a set of BXD recombinant inbred strains. All BXD lines are derived crossed between C57BL/6J (B6 or B) and DBA/2J (D2 or D). Both B and D parental strains have been almost fully sequenced (8x coverage for B6 by a public consortium and approximately 1.5x coverage for D by Celera Discovery Systems) and data for 1.75 millioin B vs D SNPs are incorporated into WebQTLs genetic maps for the BXDs. BXD2 through BXD32 were produced by Benjamin A. Taylor starting in the late 1970s. BXD33 through 42 were also produced by Taylor, but they were generated in the 1990s. These strains are all available from The Jackson Laboratory, Bar Harbor, Maine. BXD43 through BXD99 were produced by Lu Lu, Jeremy Peirce, Lee M. Silver, and Robert W. Williams in the late 1990s and early 2000s using advanced intercross progeny (Peirce et al. 2004).

Most BXD animals were generated in-house at the University of Tennessee Health Science Center by Lu Lu and Robert Williams using stock obtained from The Jackson Laboratory between 1999 and 2004. All BXD strains with numbers above 42 are new advanced intecross type BXDs (Peirce et al. 2004) that are current available from UTHSC. Additional cases were provided by Glenn Rosen, John Mountz, and Hui-Chen Hsu. These cases were bred either at The Jackson Laboratory (GR) or at the University of Alabama (JM and HCH).

Legend: Santiago Ramón y Cajal. 1899 drawing of two Purkinje cells (A) and five granule cells (B). These are the two major cell types that generate expression signal in this data set.

    About the tissue used to generate this set of data:

The March 2005 data set consists of a total of 102 array pairs (Affymetrix 430A and 430B) from 49 different genotypes. Each sample consists of whole cerebellum taken from three adult animals of the same age and sex. Two sets of technical replicates (BXD14 n = 2; BXD29 n = 3) were combined before generating group means; giving a total of 101 biologically independent data sets. The two reciprocal F1s (D2B6F1 and B6D2F1) were combined to give a single F1 mean estimate of gene expression. 430A and 430B arrays were processed in three large batches. The first batch (May03 data) consists of 17 samples from 17 strains balanced by sex (8M and 9F). The second batch consists of 29 samples, and includes biological replicates, 2 technical replicates, and data for 9 new strains. The third batch consists of 56 samples, and also includes biological replicates, 2 technical replicates, and data for 15 additional strains.

Replication and Sample Balance: Our goal is to obtain data for independent biological sample pools from both sexes for each strain. Six of 48 genotypes are still represented by single samples: BXD13, BXD20, BXD27 are female-only strains, whereas BXD25, BXD77, BXD90 are male-only. Ten strains are represented by three independent samples with the following breakdown by sex: C57BL/6J (1F 2M), DBA/2J (2F 1M), B6D2F1 (1F 2M), BXD2 (2F 1M), BXD11 (2F 1M), BXD28 (2F 1M), BXD40 (2F 1M), BXD51 (1F 2M), BXD60 (1F 2M), BXD92 (2F 1M).

The age range of samples is relatively narrow. Only 18 samples were taken from animals older than 99 days and only two samples are older than 7 months of age. BXD11 includes an extra (third) 441-day-old female sample and the BXD28 includes an extra 427-day-old sample.

RNA was extracted at UTHSC by Lu Lu, Zhiping Jia, and Hongtao Zhai.

All samples were subsequently processed at the Hartwell Center Affymetrix laboratory at SJCRH by Jay Morris.

The table below summarizes informaton on strain, sex, age, sample name, and batch number.
IdStrain Sex Age
SampleName
BatchID
Source
1C57BL/6JF116
R0773C
2
UAB
2C57BL/6JM109
R0054C
1
JAX
3C57BL/6JM71
R1450C
3
UTM DG
4DBA/2JF71
R0175C
1
UAB
5DBA/2JF91
R0782C
2
UAB
6DBA/2JM62
R1121C
3
UTM RW
7B6D2F1F60
R1115C
3
UTM RW
8B6D2F1M94
R0347C
1
JAX
9B6D2F1M127
R0766C
2
UTM JB
10D2B6F1F57
R1067C
3
UTM RW
11D2B6F1M60
R1387C
3
UTM RW
12BXD1F57
R0813C
2
UAB
13BXD1M181
R1151C
3
UTM JB
14BXD2F142
R0751C
1
UAB
15BXD2F78
R0774C
2
UAB
16BXD2M61
R1503C
3
HarvardU GR
17BXD5F56
R0802C
2
UMemphis
18BXD6F92
R0719C
1
UMemphis
19BXD6M92
R0720C
3
UMemphis
20BXD8F72
R0173C
1
UAB
21BXD8M59
R1484C
3
HarvardU GR
22BXD9F86
R0736C
3
UMemphis
23BXD9M86
R0737C
1
UMemphis
24BXD11F441
R0200C
1
UAB
25BXD11F97
R0791C
3
UAB
26BXD11M92
R0790C
2
UMemphis
27BXD12F130
R0776C
2
UAB
28BXD12M64
R0756C
2
UMemphis
29BXD13F86
R1144C
3
UMemphis
30BXD14F190
R0794C
2
UAB
31BXD14F190
R0794C
3
UAB
32BXD14M91
R0758C
2
UMemphis
33BXD14M65
R1130C
3
UTM RW
34BXD15F60
R1491C
3
HarvardU GR
35BXD15M61
R1499C
3
HarvardU GR
36BXD16F163
R0750C
1
UAB
37BXD16M61
R1572C
3
HarvardU GR
38BXD19F61
R0772C
2
UAB
39BXD19M157
R1230C
3
UTM JB
40BXD20F59
R1488C
3
HarvardU GR
41BXD21F116
R0711C
1
UAB
42BXD21M64
R0803C
2
UMemphis
43BXD22F65
R0174C
1
UAB
44BXD22M59
R1489C
3
HarvardU GR
45BXD23F88
R0814C
2
UAB
46BXD24F71
R0805C
2
UMemphis
47BXD24M71
R0759C
2
UMemphis
48BXD25M90
R0429C
1
UTM RW
49BXD27F60
R1496C
3
HarvardU GR
50BXD28F113
R0785C
2
UTM RW
51BXD28M79
R0739C
3
UMemphis
52BXD29F82
R0777C
2
UAB
53BXD29M76
R0714C
1
UMemphis
54BXD29M76
R0714C
2
UMemphis
55BXD29M76
R0714C
3
UMemphis
56BXD31F142
R0816C
2
UAB
57BXD31M61
R1142C
3
UTM RW
58BXD32F62
R0778C
2
UAB
59BXD32M218
R0786C
2
UAB
60BXD33F184
R0793C
2
UAB
61BXD33M124
R0715C
1
UAB
62BXD34F56
R0725C
1
UMemphis
63BXD34M91
R0789C
2
UMemphis
64BXD36F64
R1667C
3
UTM RW
65BXD36M61
R1212C
3
UMemphis
66BXD38F55
R0781C
2
UAB
67BXD38M65
R0761C
2
UMemphis
68BXD39F59
R1490C
3
HarvardU GR
69BXD39M165
R0723C
1
UAB
70BXD40F56
R0718C
2
UMemphis
71BXD40M73
R0812C
2
UMemphis
72BXD42F100
R0799C
2
UAB
73BXD42M97
R0709C
1
UMemphis
74BXD43F61
R1200C
3
UTM RW
75BXD43M63
R1182C
3
UTM RW
76BXD44F61
R1188C
3
UTM RW
77BXD44M58
R1073C
3
UTM RW
78BXD45F63
R1404C
3
UTM RW
79BXD45M93
R1506C
3
UTM RW
80BXD48F64
R1158C
3
UTM RW
81BXD48M65
R1165C
3
UTM RW
82BXD51F66
R1666C
3
UTM RW
83BXD51M62
R1180C
3
UTM RW
84BXD51M79
R1671C
3
UTM RW
85BXD60F64
R1160C
3
UTM RW
86BXD60M61
R1103C
3
UTM RW
87BXD60M99
R1669C
3
UTM RW
88BXD62M61
R1149C
3
UTM RW
89BXD62M60
R1668C
3
UTM RW
90BXD69F60
R1440C
3
UTM RW
91BXD69M64
R1197C
3
UTM RW
92BXD73F60
R1276C
3
UTM RW
93BXD73M77
R1665C
3
UTM RW
94BXD77M62
R1424C
3
UTM RW
95BXD85F79
R1486C
3
UTM RW
96BXD85M79
R1487C
3
UTM RW
97BXD86F58
R1408C
3
UTM RW
98BXD86M58
R1412C
3
UTM RW
99BXD90M74
R1664C
3
UTM RW
100BXD92F62
R1391C
3
UTM RW
101BXD92F63
R1670C
3
UTM RW
102BXD92M59
R1308C
3
UTM RW

    About data processing:

Probe (cell) level data from the CEL file: These CEL values produced by GCOS are 75% quantiles from a set of 91 pixel values per cell.
  • Step 1: We added an offset of 1.0 unit to each cell signal to ensure that all values could be logged without generating negative values. We then computed the log base 2 of each cell.
  • Step 2: We performed a quantile normalization for the log base 2 values for the total set of 104 arrays (all three batches) using the same initial steps used by the RMA transform.
  • Step 3: We computed the Z scores for each cell value.
  • Step 4: We multiplied all Z scores by 2.
  • Step 5: We added 8 to the value of all Z scores. The consequence of this simple set of transformations is to produce a set of Z scores that have a mean of 8, a variance of 4, and a standard deviation of 2. The advantage of this modified Z score is that a two-fold difference in expression level corresponds approximately to a 1 unit difference.
  • Step 6: We corrected for technical variance introduced by three large batches at the probe level. To do this we determined the ratio of the batch mean to the mean of all three batches and used this as a single multiplicative probe-specific batch correction factor. The consequence of this simple correction is that the mean probe signal value for each of the three batches is the same.
  • Step 7a: The 430A and 430B arrays include a set of 100 shared probe sets (a total of 2200 probes) that have identical sequences. These probes and probe sets provide a way to calibrate expression of the 430A and 430B arrays to a common scale. To bring the two arrays into alignment, we regressed Z scores of the common set of probes to obtain a linear regression correction to rescale the 430B arrays to the 430A array. In our case this involved multiplying all 430B Z scores by the slope of the regression and adding or subtracting a small offset. The result of this step is that the mean of the 430A expression is fixed at a value of 8, whereas that of the 430B chip is typically reduced to 7. The average of the merged 430A and 430B array data set is approximately 7.5.
  • Step 7b: We recentered the merged 430A and 430B data sets to a mean of 8 and a standard deviation of 2. This involved reapplying Steps 3 through 5.
  • Step 8: Finally, we computed the arithmetic mean of the values for the set of microarrays for each strain. Technical replciates were averaged before computing the mean for independent biological samples. Note, that we have not (yet) corrected for variance introduced by differences in sex, age, source of animals, or any interaction terms. We have not corrected for background beyond the background correction implemented by Affymetrix in generating the CEL file. We eventually hope to add statistical controls and adjustments for these variables.
Probe set data: The expression data were processed by Yanhua Qu (UTHSC). The original CEL files were read into the R environment (Ihaka and Gentleman 1996). Data were processed using the Robust Multichip Average (RMA) method (Irrizary et al. 2003). Values were log2 transformed. Probe set values listed in WebQTL are the averages of biological replicates within strain. A few technical replicates were averaged and treated as single samples. A 1-unit difference represents roughly a two-fold difference in expression level. Expression levels below 5 are usually close to background noise levels.

This data set include further normalization to produce final estimates of expression that can be compared directly to the other transforms (average of 8 units and stabilized standard deviation of 2 units within each array). Please seee Bolstad and colleagues (2003) for a helpful comparison of RMA and two other common methods of processing Affymetrix array data sets.

    About the chromosome and megabase position values:

The chromosomal locations of probe sets and gene markers on the 430A and 430B microarrays were determined by BLAT analysis using the Mouse Genome Sequencing Consortium May 2004 Assembly (see http://genome.ucsc.edu/cgi-bin/hgBlat?command=start&org=mouse). We thank Dr. Yan Cui (UTHSC) for allowing us to use his Linux cluster to perform this analysis.

    Data source acknowledgment:

Data were generated with funds contributed equally by The UTHSC-SJCRH Cerebellum Transcriptome Profiling Consortium. Our members include:
  • Tom Curran
  • Dan Goldowitz
  • Kristin Hamre
  • Lu Lu
  • Peter McKinnon
  • Jim Morgan
  • Clayton Naeve
  • Richard Smeyne
  • Robert Williams
  • The Center of Genomics and Bioinformatics at UTHSC
  • The Hartwell Center at SJCRH

    Information about this text file:

This text file originally generated by RWW and YHQ, March 21, 2005. Updated by RWW, March 23, 2005.