Information on Human Data Setsmodify this page

These human data sets are under development and not all features have been implemented in GeneNetwork. Mapping functions have not been implemented but it is possible to study the expression and covariation of transcripts.

Human Liver Cohort (GSE9588 from GEO, entered into GeneNetwork, March 2011):

Please review and cite: Mapping the genetic architecture of gene expression in human liver. Eric E. Schadt, Cliona Molony, Eugene Chudin, Ke Hao, Xia Yang, Pek Y. Lum, Andrew Kasarskis, Bin Zhang, Susanna Wang, Christine Suver, Jun Zhu, Joshua Millstein, Solveig Sieberts, John Lamb, Debraj GuhaThakurta, Jonathan Derry, John D. Storey, Iliana Avila-Campillo, Mark J. Kruger, Jason M. Johnson, Carol A. Rohl, Atila van Nas, Margarete Mehrabian, Thomas A. Drake, Aldons J. Lusis, Ryan C. Smith, F. Peter Guengerich, Stephen C. Strom, Erin Schuetz, Thomas H. Rushmore, Roger Ulrich. PLoS Biol, 2008. 6(5): p. e107. PMID: 18462017

Systematic Genetic and Genomic Analysis of Cytochrome P450 Enzyme Activities in Human Liver. Xia Yang, Bin Zhang, Cliona Molony, Eugene Chudin, Ke Hao, Jun Zhu, Christine Suver, Hua Zhong, F. Peter Guengerich, Stephen C. Strom, Erin Schuetz, Thomas H. Rushmore, Roger G. Ulrich, J. Greg Slatter, Eric E. Schadt, Andrew Kasarskis, Pek Yee Lum. Genome Res. 2010 Aug;20(8):1020-36.

The Human Liver Cohort (HLC) study aimed to characterize the genetic architecture of gene expression in human liver using genotyping, gene expression profiling, and enzyme activity measurements of Cytochrom P450. The HLC was assembled from a total of 780 liver samples screened. These liver samples were acquired from caucasian individuals from three independant tissue collection centers. DNA samples were genotyped on the Affymetrix 500K SNP and Illumina 650Y SNP genotyping arrays representing a total of 782,476 unique single nucleotide polymorphisms (SNPs). Only the genotype data from those samples which were collected postmortem are accessible in dbGap. These 228 samples represent a subset of the 427 samples included in the Human Liver Cohort Publication (Schadt, Molony et al. 2008). RNA samples were profiled on a custom Agilent 44,000 feature microarray composed of 39,280 oligonucleotide probes targeting transcripts representing 34,266 known and predicted genes, including high-confidence, noncoding RNA sequences. Each of the liver samples was processed into cytosol and microsomes using a standard differential centrifugation method. The activities of nine P450 enzymes (CYP1A2, 2A6, 2B6, 2C8, 2C9, 2C19, 2D6, 2E1, and 3A4) in isolated microsomes from 398 HLC liver samples were measured in the microsome preparations using probe substrate metabolism assays expressed as nmol/min/mg protein. Each was measured with a single substrate except for the CYP3A4 activity that was measured using two substrates, midazolam and testosterone.

Summary from GEO: "To uncover the genetic determinants affecting expression in a metabolically active tissue relevant to the study of obesity, diabetes, atherosclerosis, and other common human diseases, we profiled 427 human liver samples on a comprehensive gene expression microarray targeting greater than 40,000 transcripts and genotyped DNA from each of these samples at greater than 1,000,000 SNPs. The relatively large sample size of this study and the large number of SNPs genotyped provided the means to assess the relationship between genetic variants and gene expression and it provided this look for the first time in a non-blood derived, metabolically active tissue. A comprehensive analysis of the liver gene expression traits revealed that thousands of these traits are under the control of well defined genetic loci, with many of the genes having already been implicated in a number of human diseases."

Alzheimer's disease Cases and Controls Liang (July 2009):

Please cite: Liang WS, Reiman EM, Valla J, Dunckley T, Beach TG, Grover A, Niedzielko TL, Schneider LE, Mastroeni D, Caselli R, Kukull W, Morris JC, Hulette CM, Schmechel D, Rogers J, Stephan DA (2008) Alzheimer's disease is associated with reduced expression of energy metabolism genes in posterior cingulate neurons. Proc Natl Acad Sci USA 105:4441-4446.

Summary from GEO: "Information about the genes that are preferentially expressed during the course of Alzheimer's disease (AD) could improve our understanding of the molecular mechanisms involved in the pathogenesis of this common cause of cognitive impairment in older persons, provide new opportunities in the diagnosis, early detection, and tracking of this disorder, and provide novel targets for the discovery of interventions to treat and prevent this disorder. Information about the genes that are preferentially expressed in relationship to normal neurological aging could provide new information about the molecular mechanisms that are involved in normal age-related cognitive decline and a host of age-related neurological disorders, and they could provide novel targets for the discovery of interventions to mitigate some of these deleterious effects."

Alzheimer's disease Cases and Controls Myers (April 2009):

Expression quantitative trait loci study using human brain from 363 cortical samples. Affymetrix 500K chip for genotyping, Illumina ref-seq 8 chip for expression. Genotypes are available at dbGAP.

Please cite: Webster JA, Gibbs JR, Clarke J, Ray M, Zhang W, Holmans P, Rohrer K, Zhao A, Marlowe L, Kaleem M, McCorquodale DS 3rd, Cuello C, Leung D, Bryden L, Nath P, Zismann VL, Joshipura K, Huentelman MJ, Hu-Lince D, Coon KD, Craig DW, Pearson JV; NACC-Neuropathology Group, Heward CB, Reiman EM, Stephan D, Hardy J, Myers AJ (2009) Genetic control of human brain transcript expression in Alzheimer disease. Am J Hum Genet 84:445-58.

Summary from GEO: Myers and colleagues generated massive neocortical transcriptome data sets for a set of unrelated elderly neurologically and neuropathologically normal humans and from confirmed late onset Alzheimer's disease patients (LOAD, n = 187 normal and 176 LOAD cases, see DOI:10.1016/j.ajhg.2009.03.011 for detail). They used an Illumina Sentrix Bead array (HumanRef-8) that measures expression of approximately 19,730 curated RefSeq sequences (Human Build 34).

Case identifiers: All case identifiers (IDs) in GeneNetwork begin with a capital C followed by a six digit GEO identifier, followed by the sex and age in years. Non-Alzheimer cases are labeled with the suffix letter N: C225652M85N. Alzheimer cases are labeled with the suffix letter A: C388217F97A.

Data were initially downloaded from the NCBI GEO archive under the experiment ID GSE15222. All data were generated using the Illumina HumanRef-8 expression BeadChip (GPL2700) v2 Rev0. This data set in GeneNetwork includes data for 24,354 probes. We have realigned the 50-mer sequences by BLAT to the latest version of the human genome (Feb 2009, hg19) and reannotated the array (August 2009). The annotation in GN will differ from that provided in GEO for this platform. We were unable to obtain 50-mer sequences for several thousand probes (e.g., HTT), and these probes have therefore not been realigned to the human genome.

The GEO data set was processed by Myers and colleagues using Illumina's Rank Invariant transform. We performed a series of QC and renormalization steps to the data to allow more facile comparison to other data sets in GeneNetwork. In brief, data is log2 transformed. We recentered each array to a mean expression of 8 units and a standard deviation of 2 units (2z + 8 transform). The values are therefore modified z scores and each unit represents roughly a two-fold difference in expression. Average expression across all 363 cases range from a low of 6 units (e.g., SYT15) to a high of 19 units for ARSK. APOE has an average expression of 15 units and APP has an average expression of 11.5 units.. The distribution is far from normal with a great excess of measurements of genes with low to moderate expression clustered between 6.5 and 8.5 units.

The CANDLE STUDY: Conditions Affecting Neurocognitive Development and Learning (June 2011):

The CANDLE Study is a large multidisciplinary study of early child development that involves genetic, genomic, environmental, and large-scale behavioral evaluation of children and their families from the second trimester of development through to 4 years of age. The full study involves more than 1000 children and their mothers and fathers.

For information on genomic and genetic studies related to CANDLE, please contact: Drs. Ronald M. Adkins (ronald.m.adkins@gmail.com) and Julia Krushkal (jkrushka@uthsc.edu).

For information on the overall design of CANDLE, please contact: Dr. Frances A. Tylavsky (ftylavsk@uthsc.edu).

Summary from The Urban Child Institute: The primary goal of the CANDLE study is to study factors that affect brain development in young children. To this end, the current study will test specific hypotheses regarding factors that may negatively influence cognitive development in children. Participants in this cohort study will include 1,500 mother-child dyads, recruited during the second trimester of pregnancy and followed from birth to age 3. Data on a wide range of possible influences on children's cognitive outcomes is being collected during pregnancy, at delivery, and at 1, 2, 3, and 4 years of age from numerous sources, including questionnaires, interviews, psychosocial assessments, medical chart abstraction, environmental samples from the child's home environment, blood and urine samples from the mother, cord blood, and placental tissue. The primary outcomes of the current study are those associated with cognitive measures. Outcomes are being measured using standardized cognitive assessments conducted at 12 months, 24 months, and 36 months of age. Epidemiological, clinical, and laboratory-based research may be undertaken using data from the project, with sub-studies including, but not limited to, molecular genetics, environmental exposure assessments, and micronutrient deficiency analyses. Results of this cohort study may provide information that will ultimately lead to improvements in the health, development, and well-being of children in Shelby County, Tennessee through interventions and policy enforcement and/or development. Full participant recruitment and complete data collection began in November 2006.

Associated References:

  1. Adkins RM, Thomas F, Tylavsky FA, Krushkal J (2011) Parental ages and levels of DNA methylation in the newborn are correlated. BMC Med Genet. 2011 Mar 31;12:47.
  2. Adkins RM, Krushkal J, Tylavsky FA, Thomas F (2011) Racial differences in gene-specific DNA methylation levels are present at birth. Birth Defects Res A Clin Mol Teratol. 2011 Feb 9. doi: 10.1002/bdra.20770.
CEPH Immortalized B Cells (October 2008):

UTHSC CEPH C-cells Illumina (Sept09) RankInv data were generated by Malak Kotb, Robert W. Williams, and colleagues. Please contact Robert Williams at UTHSC regarding use of these data.

More Details

Monks CEPH-D-cells Agilent (Dec04) Log10Ratio data were generated by Stephanie Monks (Stephanie Santorico), Eric Schadt, and collaborators.

More Details

About this file:

The file started, Aug 6, 2009 by AC. Last update by RWW, June 7, 2011.