The High Q Foundation Striatum Exon 1.0 Array Expression Dataset of July 2007 modify this page

    Summary:

EXPERIMENTAL EXON ST TEST DATA SET (preliminary text, not error checked). The July 2007 data freeze provides estimates of mRNA expression in the striatum (caudate nucleus of the forebrain) of 50 lines of mice, including the C57BL/6J and DBA/2J parental strains, their F1 hybrid (B6D2F1), 30 BXD recombinant inbred strains, and 17 more common inbred strains of mice. Data were generated using the new Affymetrix Mouse Exon 1.0 ST short oligomer microarrays by Weikuan Gu, Yan Jiao, David Kulp, and Lu Lu, Glenn D. Rosen, and Robert W. Williams with the support of a grant from the High Q Foundation. This is the first "all exons" array that we have entered into GeneNetwork and the data are still experimental. Approximately 300 brain samples (males and females) from 50 strains were used in this experiment. This data set includes 97 arrays that passed very stringent quality control procedures. Data were processed using the RMA method of Irizarry, Bolstad, Speed, and colleagues. To simplify comparison among transforms, RMA values of each array were adjusted to an average expression of 8 units and a standard deviation of 2 units.

    About the strains and cases used to generate this set of data:

We have used a set of 30 BXD recombinant inbred strains generated by crossing C57BL/6J (B6 or B) with DBA/2J (D2 or D). The BXDs are particularly useful for systems genetics because both parental strains have been sequenced (8x coverage of B6 and 1.5x coverage of D). Physical maps in WebQTL maps incorporate approximately 1.75 million B vs D SNPs from Celera. BXD2 through BXD32 were bred by Benjamin A. Taylor starting in the late 1970s. BXD33 through 42 were bred by Taylor in the 1990s. All of these strains are available from The Jackson Laboratory.

Mouse Diversity Panel (MDP). We have also profiled a MDP consisting at total of 19 inbred strains (this number includes the C57BL/6J and DBA/2J strains) and one F1 hybrid (B6D2F1 only; not D2B6F1 yet). Strains were selected for several reasons:

  • genetic and phenotypic diversity, including use by the Phenome Project
  • their use in making genetic reference populations including recombinant inbred strains, cosomic strains, congenic and recombinant congenic strains
  • their use by the Complex Trait Consortium to make the Collaborative Cross (Nairobi/Wellcome, Oak Ridge/DOE, and Perth/UWA)
  • genome sequence data from three sources (NHGRI, Celera, and Perlegen-NIEHS)
  • availability from The Jackson Laboratory

Seven of the eight parents of the Collaborative Cross (129, A, C57BL/6J, NOD, NZO, PWK, and WSB) have been included. CAST/Ei is the member of the Collaborative Cross that is currently missing from this data set. Thirteen of the MDP strains have been sequenced by Celera, NIH, or by Perlegen for the NIEHS. This panel will be extremely helpful in systems genetic analysis of a wide variety of traits, and will be a powerful adjunct in fine mapping modulators using what is essentially an association analysis of sequence variants.

  1. 129S1/SvImJ
        Collaborative Cross strain sequenced by NIEHS; background for many knockouts; Phenome Project A list
  2. A/J
        Collaborative Cross strain sequenced by Perlegen/NIEHS; parent of the AXB/BXA panel
  3. AKR/J
        Sequenced by NIEHS; Phenome Project B list
  4. BALB/cByJ
        Sequenced by NIEHS; maternal parent of the CXB panel; Phenome Project A list
  5. BTBR T<+> tf/J
         Phenome Project group D strain. Used in mutagenesis studies. This black and tan strain carries the recessive tufted allele and is wildtype at the T locus (brachyury).
  6. BXSB/MpJ
        An isolated recombinant inbred strain generated by crossing C57BL/6J and SB/Le that is used to study autoimmune disease. Males are deficient in pre-B cells.
  7. C3H/HeJ
        Sequenced by Perlegen/NIEHS; paternal parent of the BXH panel; Phenome Project A list
  8. C57BL/6J
        Sequenced by NHGRI; parental strain of AXB/BXA, BXD, and BXH; Phenome Project A list
  9. DBA/2J
        Sequenced by Perlegen/NIEHS and Celera; paternal parent of the BXD panel; Phenome Project A list
  10. FVB/NJ
        Sequenced by Perlegen/NIEHS. Phenome Project group A strain.
  11. KK/HlJ
        Sequenced by Perlegen/NIEHS
  12. MOLF/EiJ
        Sequenced by Perlegen/NIEHS. Phenome Project B strain.
  13. NZB/BlNJ
         Phenome Project B list. Please note that the substrain is B-el-J not B-eye-NJ.
  14. NOD/LtJ
        Collaborative Cross strain sequenced by NIEHS; Phenome Project B list; diabetic
  15. NZO/HlLtJ
        Collaborative Cross strain
  16. NZW/LacJ
        Phenome Project D strain
  17. PWD/PhJ
        Sequenced by Perlegen/NIEHS; parental strain for a consomic set by Forjet and colleagues. Not part of the Phenome Project.
  18. PWK/PhJ
        Collaborative Cross strain; Phenome Project D list
  19. WSB/EiJ
        Collaborative Cross strain sequenced by NIEHS; Phenome Project C list
  20. B6D2F1
    This F1 hybrid was generated by crossing C57BL/6J with DBA/2J at the Jackson Laboratory. They are also be designated (incorrectly) as B6D2F1/J.

All of these strains are available from The Jackson Laboratory.

    About the tissue used to generate this set of data:

Many of the tissue samples used in this exon array study were also used in our previous M430 analysis of the striatum, providing a partially matched Exon-M430 pair of data sets. However, the previous study included fewer samples (47) and fewer strains (31 total). Animals were obtained from The Jackson Laboratory and housed for several weeks at BIDMC until they reached ~2 months of age (range from 55 to 62 days). Mice were killed by cervical dislocation and brains were removed and placed in RNAlater for 20 to 25 minutes prior to dissection. Cerebella and olfactory bulbs were removed; brains were hemisected, and both striata were dissected using a medial approach by GD Rosen that typically yields 5 to 7 mg of tissue per side.

All striatal dissections were performed by one person (GD Rosen) using a midsagittal approach that minimizes the likelihood of contamination across tissues. This dissection recovers most, but not all, of neostraitum. We have histologically examined dissected tissue and have found that no evidence of inclusion of cortical or thalamic tissue at the margins. We have further confirmed the dissections by comparative assays for acetylcholinesterase (AChE) protein levels using Western blots. The concentration of AChE in the striatum is far higher than that in cortex or cerebellum. A pool of dissected tissue from 3 or 4 adults (approximately 25 to 30 mg of tissue) of the same strain, sex, and age was collected in one session and used to generate cRNA samples.

Roughly 90 to 95% of all cells in the striatum are medium spiny neurons (Gerfen, 1992, for a review of the structure and function of the neostriatum).

RNA Extraction: RNA was extracted by Rosen and colleagues between June 2, 2004 and March 8, 2006. In brief, we used the RNA STAT-60 protocol (TEL-TEST "B" Bulletin No. 1), steps 5.1A (homogenization of tissue), 5.2 (RNA extraction), 5.3 (RNA precipitation), and 5.4 (RNA wash). In Step 5.4 we stopped after adding 75% ethanol (1 ml per 1 ml RNA STAT-60) and stored the mix at -80 deg C until further use. Before RNA labeling we thawed samples and proceeded with the remainder of Step 5.4; pelleting, drying, and redissovling the pellet in RNAase-free water.

RNA samples were then processed by the array core at the VA Medical Center by Drs. Yan Jiao and Weikuan Gu (Director of the the DNA Discovery Core of the UTHSC Center of Genomics and Bioinformatics). Labeled cRNA was generated using the standard Affymetrix whole transcript sense target labeling protocol.

Legend: Summary of protocol from http://www.affymetrix.com/products/reagents/wt_cdna_synthesis_amp_chart.jsp) as carried out by Dr. Yan Jiao.

Replication and Sample Balance: The aim of our standard operating procedure is to obtain data for independent biological sample pools from each sex for all strains. We have succeeded for 44 of 50 strains. Several strains are represented by only a single sex or a single sample pool. This sex imbalance can lead to bias with respect to transcripts that have genuine sex differences. One way to handle this issue is to study the correlation between a proxy variable for this bias, as represented by the Xist probe set 5153684, and a data set of interest.

Legend: Sex balance in this data set is illustrated using the sex-specific Xist gene and one of its probe sets (Affy Exon ST probe set: 5153684). Most samples include one male sample pool with very low Xist expression (6 or 7) and one female sample pool with high Xist expression (10 to 12). As a result 43 of the 50 strains have both intermediate values and high variance. The B6D2F1 sample has no error bar due to an early data entry error. Strains for which samples are only male or only female are at the extreme left and right sides of this bar chart, respectively.

  • Strains with two male samples: KK/HlJ, BTBRT<+>tf/J
  • Strains with two female samples:BXD5, BXD22
  • Only a single female sample:BXD29
  • The status of BXD23 is not clear and may represent a single male sample or a possible mixed sex pool.

Batch Structure: This data set consists of 97 arrays processed in 8 batches. All arrays were processed by a single skilled operator (Dr. Yan Jiao) between and October 20 and Nov 29, 2006 (scan dates from Oct 26 to Nov 29). In general, the male and female samples from a single strain were run within a single batch.

    Data Table 1:

Mouse Exon 1.0 ST data: The table below lists arrays by strain, age, sex, case id, and batch ID. Each array was hybridized to a pool of mRNA from 3 to 4 mice. All mice were between 48 and 71 days.
RNA ID Strain Age Sex Case ID Batch
ID
Source
R3101SAC57BL/6J58F073106.706GDRosen
R3102SAC57BL/6J59M073106.016GDRosen
R3105SADBA/2J58F073106.657GDRosen
R3106SADBA/2J59M073106.027GDRosen
R3031SAB6D2F1/J59F073106.692GDRosen
R3032SAB6D2F1/J59M073106.672GDRosen
R3037SABXD159F073106.042GDRosen
R3038SABXD159M073106.382GDRosen
R3055SABXD261M073106.063GDRosen
R3056SABXD261F073106.053GDRosen
R3089SABXD558F073106.426GDRosen
R3090SABXD558F073106.416GDRosen
R3091SABXD659F073106.096GDRosen
R3092SABXD659M073106.086GDRosen
R3093SABXD861F073106.216GDRosen
R3094SABXD861M073106.206GDRosen
R3095SABXD960F073106.156GDRosen
R3096SABXD960M073106.146GDRosen
R3039SABXD1159F073106.072GDRosen
R3040SABXD1159M073106.242GDRosen
R3041SABXD1262F073106.272GDRosen
R3042SABXD1259M073106.262GDRosen
R3044SABXD1360M073106.323GDRosen
R3043SABXD1360F073106.338GDRosen
R3045SABXD1459F073106.513GDRosen
R3144SABXD1459M073106.523GDRosen
R3047SABXD1560F073106.503GDRosen
R3048SABXD1560M073106.493GDRosen
R3049SABXD1661F073106.223GDRosen
R3050SABXD1661M073106.233GDRosen
R3051SABXD1859F073106.603GDRosen
R3052SABXD1859M073106.593GDRosen
R3053SABXD1960F073106.403GDRosen
R3054SABXD1960M073106.393GDRosen
R3057SABXD2060F073106.753GDRosen
R3058SABXD2060M073106.743GDRosen
R3059SABXD2148F073106.623GDRosen
R3060SABXD2148M073106.614GDRosen
R3061SABXD2258F073106.734GDRosen
R3062SABXD2260M073106.774GDRosen
R3064SABXD2360M073106.574GDRosen
R3063SABXD2360F073106.588GDRosen
R3065SABXD2459F073106.034GDRosen
R3066SABXD2460M073106.124GDRosen
R3067SABXD2760F073106.554GDRosen
R3068SABXD2760M073106.564GDRosen
R3069SABXD2860F073106.474GDRosen
R3070SABXD2860M073106.484GDRosen
R3071SABXD2958F073106.454GDRosen
R3072SABXD2958M073106.465GDRosen
R3074SABXD3160M073106.438GDRosen
R3073SABXD3160F073106.445GDRosen
R3075SABXD3257F073106.378GDRosen
R3076SABXD3257M073106.365GDRosen
R3077SABXD3359F073106.355GDRosen
R3078SABXD3359M073106.345GDRosen
R3079SABXD3460F073106.295GDRosen
R3080SABXD3460M073106.285GDRosen
R3081SABXD3657F073106.315GDRosen
R3082SABXD3657M073106.305GDRosen
R3083SABXD3860F073106.195GDRosen
R3084SABXD3860M073106.185GDRosen
R3085SABXD4060F073106.175GDRosen
R3086SABXD4060M073106.166GDRosen
R3087SABXD4258F073106.116GDRosen
R3088SABXD4258M073106.106GDRosen
R3025SA129S1/SvImJ60F073106.121GDRosen
R3026SA129S1/SvImJ59M073106.871GDRosen
R3027SAA/J59F073106.931GDRosen
R3028SAA/J59M073106.951GDRosen
R3029SAAKR/J59F073106.891GDRosen
R3030SAAKR/J59M073106.911GDRosen
R3033SABALB/cByJ59M073106.992GDRosen
R3036SABTBR/T+tf/J60M073106.102GDRosen
R3034SABTBR/T+tf/J59F073106.972GDRosen
R3035SABTBRT+tf/J59M073106.112GDRosen
R3097SABXSB/MpJ61F073106.156GDRosen
R3098SABXSB/MpJ61M073106.146GDRosen
R3099SAC3H/HeJ60F073106.116GDRosen
R3100SAC3H/HeJ60M073106.116GDRosen
R3107SAFVB/NJ60F073106.117GDRosen
R3108SAFVB/NJ60M073106.117GDRosen
R3109SAKK/HlJ61M073106.137GDRosen
R3110SAKK/HlJ61M073106.137GDRosen
R3111SAMOLF/EiJ60F073106.137GDRosen
R3112SAMOLF/EiJ60M073106.137GDRosen
R3113SANOD/LtJ58F073106.797GDRosen
R3114SANOD/LtJ58M073106.817GDRosen
R3115SANZB/BinJ61F073106.167GDRosen
R3116SANZB/BinJ58M073106.107GDRosen
R3117SANZO/HlLtJ61F073106.128GDRosen
R3118SANZO/HlLtJ61M073106.128GDRosen
R3119SANZW/LacJ65F073106.138GDRosen
R3120SANZW/LacJ70M073106.128GDRosen
R3121SAPWD/PhJ70F073106.148GDRosen
R3122SAPWD/PhJ70M073106.148GDRosen
R3123SAPWK/PhJ59F073106.128GDRosen
R3124SAPWK/PhJ60M073106.138GDRosen
R3125SAWSB/EiJ71F073106.138GDRosen
R3126SAWSB/EiJ71M073106.118GDRosen

    About the array platfrom :

Affymetrix Mouse Exon ST 1.0 array: The Exon 1.0 ST (sense target) array consists of approximately 4.5 million useful 25-nucleotide probes that estimate the expression of approximately 1 million exon clusters. The array sequences were selected in 2006 using Unigene Build XXX.

    About data processing:

Probe (cell) level and Probe set data from the CEL file: 1. Probes overlapping SNPs were removed from the design file 2. Affymetrix Power Tools(APT) package was used extract CEL values and perform RMA normalization 3. Probe set values were normalized to mean=8 and sd=2 (per chip) 4. Strain average was calculated by averaging over chips that belong to same strain
  • Step 1: Probes overlapping SNPs were removed from the design file
  • Step 2: Affymetrix Power Tools(APT) package was used extract CEL values and perform RMA normalization
  • Step 3: Probe set values were normalized to mean=8 and sd=2 (per array)
  • Step 4: Strain averages were calculated by averaging over all arrays that belong to same strain (3 maximum in this data set)
Probe set data from the CHP file: The expression values were generated by Manjunatha in David Kulp's group at the University of Massachusetts Amherst using RMA. The same simple steps described above were also applied to these values. Every microarray data set therefore has a mean expression of 8 with a standard deviation of 2. A 1 unit difference represents roughly a two-fold difference in expression level. Expression levels below 5 are usually close to background noise levels.
Data quality control: A total of 97 samples passed RNA quality control.

Part1: Testing if replicates come from the same strain

  1. RMA normalized values were used in this analysis
  2. Pair-wise correlations were calculated between all the arrays using the probesets with high variance and high median
  3. Probability density of correlations between non-replicate pairs and replicate-pairs were calculated
  4. Threshold of 0.85 using Maximum likelihood estimate
  5. In total 5 set of replicates might not have come from the same strains. (They are marked as 0 in Manju_Quality Score column)

Part 2: Testing if strain labeling is correct

  1. RMA normalized values were used in this analysis
  2. Only BXD strains were tested
  3. A set of strongly cis-linked probesets were identified (using linkage to nearest marker)
  4. The expression of these probesets was used to re-estimate the genotype of nearest marker
  5. The values of all re-estimated marker genotypes were compared to genotypes of all the BXD strains and optimal match was identified
  6. In total four set of replicates were found to be mislabeled.

Probe set level QC: The final normalized array data were evaluated for outliers. XXX arrays were considered outliers. These XXX suspect arrays were elimated from this data set. The following arrays were eliminated: XXX, YYY, ZZZ.

    Data source acknowledgment:

Data were generated with funds to Weikuan Gu, Rob Williams, Glenn Rosen from the High Q Foundation. Samples and arrays were processed by Dr. Yan Jiao Array Core at the University of Tennessee Health Science Center and VA Medical Center, Memphis.

    About this text file:

This text file originally generated by RWW on July 24, 2007 using a template from a previous M430 Striatum data set. Updated by RWW July 26, 2007; MJ and RWW, Aug 7, 2007.