Information on the HQF Striatum Exon Array data of July 2007

The High Q Foundation Striatum Exon 1.0 Array Expression Dataset of July 2007

Summary:

EXPERIMENTAL EXON ST TEST DATA SET (preliminary text, not error checked). The July 2007 data freeze provides estimates of mRNA expression in the striatum (caudate nucleus of the forebrain) of 50 lines of mice, including the C57BL/6J and DBA/2J parental strains, their F1 hybrid (B6D2F1), 30 BXD recombinant inbred strains, and 17 more common inbred strains of mice. Data were generated using the new Affymetrix Mouse Exon 1.0 ST short oligomer microarrays by Weikuan Gu, Yan Jiao, David Kulp, and Lu Lu, Glenn D. Rosen, and Robert W. Williams with the support of a grant from the High Q Foundation. This is the first "all exons" array that we have entered into GeneNetwork and the data are still experimental. Approximately 300 brain samples (males and females) from 50 strains were used in this experiment. This data set includes 97 arrays that passed very stringent quality control procedures. Data were processed using the RMA method of Irizarry, Bolstad, Speed, and colleagues. To simplify comparison among transforms, RMA values of each array were adjusted to an average expression of 8 units and a standard deviation of 2 units.

About the strains and cases used to generate this set of data:

We have used a set of 30 BXD recombinant inbred strains generated by crossing C57BL/6J (B6 or B) with DBA/2J (D2 or D). The BXDs are particularly useful for systems genetics because both parental strains have been sequenced (8x coverage of B6 and 1.5x coverage of D). Physical maps in WebQTL maps incorporate approximately 1.75 million B vs D SNPs from Celera. BXD2 through BXD32 were bred by Benjamin A. Taylor starting in the late 1970s. BXD33 through 42 were bred by Taylor in the 1990s. All of these strains are available from The Jackson Laboratory.

Mouse Diversity Panel (MDP). We have also profiled a MDP consisting at total of 19 inbred strains (this number includes the C57BL/6J and DBA/2J strains) and one F1 hybrid (B6D2F1 only; not D2B6F1 yet). Strains were selected for several reasons:

genetic and phenotypic diversity, including use by the Phenome Project
their use in making genetic reference populations including recombinant inbred strains, cosomic strains, congenic and recombinant congenic strains
their use by the Complex Trait Consortium to make the Collaborative Cross (Nairobi/Wellcome, Oak Ridge/DOE, and Perth/UWA)
genome sequence data from three sources (NHGRI, Celera, and Perlegen-NIEHS)
availability from The Jackson Laboratory

Seven of the eight parents of the Collaborative Cross (129, A, C57BL/6J, NOD, NZO, PWK, and WSB) have been included. CAST/Ei is the member of the Collaborative Cross that is currently missing from this data set. Thirteen of the MDP strains have been sequenced by Celera, NIH, or by Perlegen for the NIEHS. This panel will be extremely helpful in systems genetic analysis of a wide variety of traits, and will be a powerful adjunct in fine mapping modulators using what is essentially an association analysis of sequence variants.

129S1/SvImJ
    Collaborative Cross strain sequenced by NIEHS; background for many knockouts; Phenome Project A list
A/J
    Collaborative Cross strain sequenced by Perlegen/NIEHS; parent of the AXB/BXA panel
AKR/J
    Sequenced by NIEHS; Phenome Project B list
BALB/cByJ
    Sequenced by NIEHS; maternal parent of the CXB panel; Phenome Project A list
BTBR T<+> tf/J
     Phenome Project group D strain. Used in mutagenesis studies. This black and tan strain carries the recessive tufted allele and is wildtype at the T locus (brachyury).
BXSB/MpJ
    An isolated recombinant inbred strain generated by crossing C57BL/6J and SB/Le that is used to study autoimmune disease. Males are deficient in pre-B cells.
C3H/HeJ
    Sequenced by Perlegen/NIEHS; paternal parent of the BXH panel; Phenome Project A list
C57BL/6J
    Sequenced by NHGRI; parental strain of AXB/BXA, BXD, and BXH; Phenome Project A list
DBA/2J
    Sequenced by Perlegen/NIEHS and Celera; paternal parent of the BXD panel; Phenome Project A list
FVB/NJ
    Sequenced by Perlegen/NIEHS. Phenome Project group A strain.
KK/HlJ
    Sequenced by Perlegen/NIEHS
MOLF/EiJ
    Sequenced by Perlegen/NIEHS. Phenome Project B strain.
NZB/BlNJ
     Phenome Project B list. Please note that the substrain is B-el-J not B-eye-NJ.
NOD/LtJ
    Collaborative Cross strain sequenced by NIEHS; Phenome Project B list; diabetic
NZO/HlLtJ
    Collaborative Cross strain
NZW/LacJ
    Phenome Project D strain
PWD/PhJ
    Sequenced by Perlegen/NIEHS; parental strain for a consomic set by Forjet and colleagues. Not part of the Phenome Project.
PWK/PhJ
    Collaborative Cross strain; Phenome Project D list
WSB/EiJ
    Collaborative Cross strain sequenced by NIEHS; Phenome Project C list
B6D2F1
This F1 hybrid was generated by crossing C57BL/6J with DBA/2J at the Jackson Laboratory. They are also be designated (incorrectly) as B6D2F1/J.

All of these strains are available from The Jackson Laboratory.

About the tissue used to generate this set of data:

Many of the tissue samples used in this exon array study were also used in our previous M430 analysis of the striatum, providing a partially matched Exon-M430 pair of data sets. However, the previous study included fewer samples (47) and fewer strains (31 total). Animals were obtained from The Jackson Laboratory and housed for several weeks at BIDMC until they reached ~2 months of age (range from 55 to 62 days). Mice were killed by cervical dislocation and brains were removed and placed in RNAlater for 20 to 25 minutes prior to dissection. Cerebella and olfactory bulbs were removed; brains were hemisected, and both striata were dissected using a medial approach by GD Rosen that typically yields 5 to 7 mg of tissue per side.
All striatal dissections were performed by one person (GD Rosen) using a midsagittal approach that minimizes the likelihood of contamination across tissues. This dissection recovers most, but not all, of neostraitum. We have histologically examined dissected tissue and have found that no evidence of inclusion of cortical or thalamic tissue at the margins. We have further confirmed the dissections by comparative assays for acetylcholinesterase (AChE) protein levels using Western blots. The concentration of AChE in the striatum is far higher than that in cortex or cerebellum. A pool of dissected tissue from 3 or 4 adults (approximately 25 to 30 mg of tissue) of the same strain, sex, and age was collected in one session and used to generate cRNA samples.
Roughly 90 to 95% of all cells in the striatum are medium spiny neurons (Gerfen, 1992, for a review of the structure and function of the neostriatum).

RNA Extraction: RNA was extracted by Rosen and colleagues between June 2, 2004 and March 8, 2006. In brief, we used the RNA STAT-60 protocol (TEL-TEST "B" Bulletin No. 1), steps 5.1A (homogenization of tissue), 5.2 (RNA extraction), 5.3 (RNA precipitation), and 5.4 (RNA wash). In Step 5.4 we stopped after adding 75% ethanol (1 ml per 1 ml RNA STAT-60) and stored the mix at -80 deg C until further use. Before RNA labeling we thawed samples and proceeded with the remainder of Step 5.4; pelleting, drying, and redissovling the pellet in RNAase-free water.
RNA samples were then processed by the array core at the VA Medical Center by Drs. Yan Jiao and Weikuan Gu (Director of the the DNA Discovery Core of the UTHSC Center of Genomics and Bioinformatics). Labeled cRNA was generated using the standard Affymetrix whole transcript sense target labeling protocol.

Legend: Summary of protocol from http://www.affymetrix.com/products/reagents/wt_cdna_synthesis_amp_chart.jsp) as carried out by Dr. Yan Jiao.

Replication and Sample Balance: The aim of our standard operating procedure is to obtain data for independent biological sample pools from each sex for all strains. We have succeeded for 44 of 50 strains. Several strains are represented by only a single sex or a single sample pool. This sex imbalance can lead to bias with respect to transcripts that have genuine sex differences. One way to handle this issue is to study the correlation between a proxy variable for this bias, as represented by the Xist probe set 5153684, and a data set of interest.

Legend: Sex balance in this data set is illustrated using the sex-specific Xist gene and one of its probe sets (Affy Exon ST probe set: 5153684). Most samples include one male sample pool with very low Xist expression (6 or 7) and one female sample pool with high Xist expression (10 to 12). As a result 43 of the 50 strains have both intermediate values and high variance. The B6D2F1 sample has no error bar due to an early data entry error. Strains for which samples are only male or only female are at the extreme left and right sides of this bar chart, respectively.

Strains with two male samples: KK/HlJ, BTBRT<+>tf/J
Strains with two female samples:BXD5, BXD22
Only a single female sample:BXD29
The status of BXD23 is not clear and may represent a single male sample or a possible mixed sex pool.

Batch Structure: This data set consists of 97 arrays processed in 8 batches. All arrays were processed by a single skilled operator (Dr. Yan Jiao) between and October 20 and Nov 29, 2006 (scan dates from Oct 26 to Nov 29). In general, the male and female samples from a single strain were run within a single batch.

Data Table 1:

Mouse Exon 1.0 ST data: The table below lists arrays by strain, age, sex, case id, and batch ID. Each array was hybridized to a pool of mRNA from 3 to 4 mice. All mice were between 48 and 71 days.

RNA ID Strain Age Sex Case ID Batch
ID Source

R3101SA C57BL/6J 58 F 073106.70 6 GDRosen

R3102SA C57BL/6J 59 M 073106.01 6 GDRosen

R3105SA DBA/2J 58 F 073106.65 7 GDRosen

R3106SA DBA/2J 59 M 073106.02 7 GDRosen

R3031SA B6D2F1/J 59 F 073106.69 2 GDRosen

R3032SA B6D2F1/J 59 M 073106.67 2 GDRosen

R3037SA BXD1 59 F 073106.04 2 GDRosen

R3038SA BXD1 59 M 073106.38 2 GDRosen

R3055SA BXD2 61 M 073106.06 3 GDRosen

R3056SA BXD2 61 F 073106.05 3 GDRosen

R3089SA BXD5 58 F 073106.42 6 GDRosen

R3090SA BXD5 58 F 073106.41 6 GDRosen

R3091SA BXD6 59 F 073106.09 6 GDRosen

R3092SA BXD6 59 M 073106.08 6 GDRosen

R3093SA BXD8 61 F 073106.21 6 GDRosen

R3094SA BXD8 61 M 073106.20 6 GDRosen

R3095SA BXD9 60 F 073106.15 6 GDRosen

R3096SA BXD9 60 M 073106.14 6 GDRosen

R3039SA BXD11 59 F 073106.07 2 GDRosen

R3040SA BXD11 59 M 073106.24 2 GDRosen

R3041SA BXD12 62 F 073106.27 2 GDRosen

R3042SA BXD12 59 M 073106.26 2 GDRosen

R3044SA BXD13 60 M 073106.32 3 GDRosen

R3043SA BXD13 60 F 073106.33 8 GDRosen

R3045SA BXD14 59 F 073106.51 3 GDRosen

R3144SA BXD14 59 M 073106.52 3 GDRosen

R3047SA BXD15 60 F 073106.50 3 GDRosen

R3048SA BXD15 60 M 073106.49 3 GDRosen

R3049SA BXD16 61 F 073106.22 3 GDRosen

R3050SA BXD16 61 M 073106.23 3 GDRosen

R3051SA BXD18 59 F 073106.60 3 GDRosen

R3052SA BXD18 59 M 073106.59 3 GDRosen

R3053SA BXD19 60 F 073106.40 3 GDRosen

R3054SA BXD19 60 M 073106.39 3 GDRosen

R3057SA BXD20 60 F 073106.75 3 GDRosen

R3058SA BXD20 60 M 073106.74 3 GDRosen

R3059SA BXD21 48 F 073106.62 3 GDRosen

R3060SA BXD21 48 M 073106.61 4 GDRosen

R3061SA BXD22 58 F 073106.73 4 GDRosen

R3062SA BXD22 60 M 073106.77 4 GDRosen

R3064SA BXD23 60 M 073106.57 4 GDRosen

R3063SA BXD23 60 F 073106.58 8 GDRosen

R3065SA BXD24 59 F 073106.03 4 GDRosen

R3066SA BXD24 60 M 073106.12 4 GDRosen

R3067SA BXD27 60 F 073106.55 4 GDRosen

R3068SA BXD27 60 M 073106.56 4 GDRosen

R3069SA BXD28 60 F 073106.47 4 GDRosen

R3070SA BXD28 60 M 073106.48 4 GDRosen

R3071SA BXD29 58 F 073106.45 4 GDRosen

R3072SA BXD29 58 M 073106.46 5 GDRosen

R3074SA BXD31 60 M 073106.43 8 GDRosen

R3073SA BXD31 60 F 073106.44 5 GDRosen

R3075SA BXD32 57 F 073106.37 8 GDRosen

R3076SA BXD32 57 M 073106.36 5 GDRosen

R3077SA BXD33 59 F 073106.35 5 GDRosen

R3078SA BXD33 59 M 073106.34 5 GDRosen

R3079SA BXD34 60 F 073106.29 5 GDRosen

R3080SA BXD34 60 M 073106.28 5 GDRosen

R3081SA BXD36 57 F 073106.31 5 GDRosen

R3082SA BXD36 57 M 073106.30 5 GDRosen

R3083SA BXD38 60 F 073106.19 5 GDRosen

R3084SA BXD38 60 M 073106.18 5 GDRosen

R3085SA BXD40 60 F 073106.17 5 GDRosen

R3086SA BXD40 60 M 073106.16 6 GDRosen

R3087SA BXD42 58 F 073106.11 6 GDRosen

R3088SA BXD42 58 M 073106.10 6 GDRosen

R3025SA 129S1/SvImJ 60 F 073106.12 1 GDRosen

R3026SA 129S1/SvImJ 59 M 073106.87 1 GDRosen

R3027SA A/J 59 F 073106.93 1 GDRosen

R3028SA A/J 59 M 073106.95 1 GDRosen

R3029SA AKR/J 59 F 073106.89 1 GDRosen

R3030SA AKR/J 59 M 073106.91 1 GDRosen

R3033SA BALB/cByJ 59 M 073106.99 2 GDRosen

R3036SA BTBR/T+tf/J 60 M 073106.10 2 GDRosen

R3034SA BTBR/T+tf/J 59 F 073106.97 2 GDRosen

R3035SA BTBRT+tf/J 59 M 073106.11 2 GDRosen

R3097SA BXSB/MpJ 61 F 073106.15 6 GDRosen

R3098SA BXSB/MpJ 61 M 073106.14 6 GDRosen

R3099SA C3H/HeJ 60 F 073106.11 6 GDRosen

R3100SA C3H/HeJ 60 M 073106.11 6 GDRosen

R3107SA FVB/NJ 60 F 073106.11 7 GDRosen

R3108SA FVB/NJ 60 M 073106.11 7 GDRosen

R3109SA KK/HlJ 61 M 073106.13 7 GDRosen

R3110SA KK/HlJ 61 M 073106.13 7 GDRosen

R3111SA MOLF/EiJ 60 F 073106.13 7 GDRosen

R3112SA MOLF/EiJ 60 M 073106.13 7 GDRosen

R3113SA NOD/LtJ 58 F 073106.79 7 GDRosen

R3114SA NOD/LtJ 58 M 073106.81 7 GDRosen

R3115SA NZB/BinJ 61 F 073106.16 7 GDRosen

R3116SA NZB/BinJ 58 M 073106.10 7 GDRosen

R3117SA NZO/HlLtJ 61 F 073106.12 8 GDRosen

R3118SA NZO/HlLtJ 61 M 073106.12 8 GDRosen

R3119SA NZW/LacJ 65 F 073106.13 8 GDRosen

R3120SA NZW/LacJ 70 M 073106.12 8 GDRosen

R3121SA PWD/PhJ 70 F 073106.14 8 GDRosen

R3122SA PWD/PhJ 70 M 073106.14 8 GDRosen

R3123SA PWK/PhJ 59 F 073106.12 8 GDRosen

R3124SA PWK/PhJ 60 M 073106.13 8 GDRosen

R3125SA WSB/EiJ 71 F 073106.13 8 GDRosen

R3126SA WSB/EiJ 71 M 073106.11 8 GDRosen

About the array platfrom :

Affymetrix Mouse Exon ST 1.0 array: The Exon 1.0 ST (sense target) array consists of approximately 4.5 million useful 25-nucleotide probes that estimate the expression of approximately 1 million exon clusters. The array sequences were selected in 2006 using Unigene Build XXX.

About data processing:

Probe (cell) level and Probe set data from the CEL file: 1. Probes overlapping SNPs were removed from the design file 2. Affymetrix Power Tools(APT) package was used extract CEL values and perform RMA normalization 3. Probe set values were normalized to mean=8 and sd=2 (per chip) 4. Strain average was calculated by averaging over chips that belong to same strain

Step 1: Probes overlapping SNPs were removed from the design file
Step 2: Affymetrix Power Tools(APT) package was used extract CEL values and perform RMA normalization
Step 3: Probe set values were normalized to mean=8 and sd=2 (per array)
Step 4: Strain averages were calculated by averaging over all arrays that belong to same strain (3 maximum in this data set)
Probe set data from the CHP file: The expression values were generated by Manjunatha in David Kulp's group at the University of Massachusetts Amherst using RMA. The same simple steps described above were also applied to these values. Every microarray data set therefore has a mean expression of 8 with a standard deviation of 2. A 1 unit difference represents roughly a two-fold difference in expression level. Expression levels below 5 are usually close to background noise levels.

Data quality control: A total of 97 samples passed RNA quality control.
Part1: Testing if replicates come from the same strain

RMA normalized values were used in this analysis
Pair-wise correlations were calculated between all the arrays using the probesets with high variance and high median
Probability density of correlations between non-replicate pairs and replicate-pairs were calculated
Threshold of 0.85 using Maximum likelihood estimate
In total 5 set of replicates might not have come from the same strains. (They are marked as 0 in Manju_Quality Score column)

Part 2: Testing if strain labeling is correct

RMA normalized values were used in this analysis
Only BXD strains were tested
A set of strongly cis-linked probesets were identified (using linkage to nearest marker)
The expression of these probesets was used to re-estimate the genotype of nearest marker
The values of all re-estimated marker genotypes were compared to genotypes of all the BXD strains and optimal match was identified
In total four set of replicates were found to be mislabeled.

Probe set level QC: The final normalized array data were evaluated for outliers. XXX arrays were considered outliers. These XXX suspect arrays were elimated from this data set. The following arrays were eliminated: XXX, YYY, ZZZ.

Data source acknowledgment:

Data were generated with funds to Weikuan Gu, Rob Williams, Glenn Rosen from the High Q Foundation. Samples and arrays were processed by Dr. Yan Jiao Array Core at the University of Tennessee Health Science Center and VA Medical Center, Memphis.

About this text file:

This text file originally generated by RWW on July 24, 2007 using a template from a previous M430 Striatum data set. Updated by RWW July 26, 2007; MJ and RWW, Aug 7, 2007.