1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
|
{
"question": [
"Is all genetic regulation done through DNA (e.g., prompters, repressors, activators) or are there other forms of genetic regulation? Please make you answer accessible to a non-expert.",
"What are the different relationship between traits?",
"Can landscape of QTL and GWAS hits be used to find relationships between traits ?"
],
"answer": [
"Genetic regulation is not only done through DNA elements like promoters, repressors, and activators. It also involves other components such as transcription factors, which are proteins that control the rate of transcription of genetic information from DNA to messenger RNA. Additionally, non-coding RNAs, which do not code for proteins, play a significant role in gene regulation. There are also epigenetic factors, which influence gene expression without changing the DNA sequence. These include chemical modifications to the DNA or proteins associated with it. So, genetic regulation is a complex process involving multiple elements and layers of control.",
"The relationships between traits can be described by four basic models: one-to-one, where one gene gives rise to one trait; one-to-many, where one gene affects many traits (pleiotropy); many-to-one, where many genes affect one trait (polygeny); and many-to-many, where multiple genes interact to influence multiple traits. Additionally, traits can also be related through genetic correlation, where the directions of effect are consistently aligned. Furthermore, traits can be interconnected through complex developmental processes and environmental interactions.",
"Yes, the landscape of QTL and GWAS hits can be used to find relationships between traits. This is done by mapping genome regions to variation in a large number of traits, thereby inferring biological relationships between those traits and connecting them into networks. This approach can help identify the genetic basis of variation in complex traits."
],
"contexts": [
[
"At the intermediate level, there are regulatory unitsmade up of multiple components, such as gene-promoter pairs. At the highest level, regulatory units interact to create a particulargene circuit, e.g. , two gene-promoter pairs can be arranged in amutually inhibitory network to create a genetic toggle switch. Ateach of these levels, one can identify sequence representationsthat define certain aspects of regulation and control, as well ascompositional relationships (e.g. , spatial arrangement and orientation) and interactions between biomolecules, molecular components, and/or sub-components that impact functional outputsand behaviors.",
"These regulatory programs are apparent across a variety of jointcontributions, from the independent contribution of each of the regulatory mechanisms to acooperative contribution of several mechanisms. A regulatory program may include a varietyof mechanisms such as transcription factors, chromatin remodeling complexes, and promoterregulatory elements. Natural genetic variations may provide important insights into regulatory programs. Inparticular, transcription profiles can be integrated with genotypic data across a population toidentify genomic loci that have an effect on gene expression (Mackay et al. , 2009), and hence itis possible to use these loci as potential regulatory mechanisms.",
"During the development,genes are turned on and off in a pre-programmed fashion, a process orchestratedby TFs, whose binding sites aggregate in the promoters near their controlled genes. A combinatorial control is achieved via different combinations of ubiquitous andcell-specific regulatory factors. Moreover, genes can initiate transcription at multiple loci (alternative promoters), creating RNA isoforms with different 5 regions. Alternative promoters are potentially important for gene-expression regulation orgenerating different protein products. Complex regulation in vivo can also involvemany more features, such as enhancers, locus control regions (LCRs), and/or scaffold/matrix attachment regions (S/MARs).Tightly regulated gene expression for specific cell types and developmental stages inresponse to different physiological conditions is driven by the orchestration of complex and multilayered gene regulatory networks (GRNs) (Maniatis and Reed, 2002). Inferring GRNs is of fundamental importance and a great challenge for molecularbiologists and geneticists. Mutations, including point mutations, insertions and deletions, translocations,and duplications, play critical roles in determining biological phenotypes and disease susceptibilities by perturbing the GRNs. Among them, single nucleotide polymorphisms (SNPs) generated by point mutations occur approximately one per 1000bases and are the predominant variations in man.",
"Gene expression directs the process of cellular differentiation, in which19specialized cells are generated for the different tissue types. The regulation of gene expression (i.e. gene regulation) controls the amount and timing of changes to the geneproduct. This is the basic mechanism for modifying cell function and thereby the versatility and adaptability of an organism. Therefore, gene expression and regulation functionas a bridge between genetic makeup and expression of observable traits. Despite its vital importance, determining the precise roles of given transcripts remainsa fundamental challenge.",
"INTRODUCTIONThe field of gene regulation is currently undergoing a renaissance.With the successful annotation of most of the protein-coding portion of the human genome [1], the focus of much research has shifted toward deciphering the regulatory logic governing the temporal, spatial and quantitative aspects of gene expression that is embedded in the remaining 98% of DNA that does not encode for protein [2].A flurry of papers stemming, in large part, from two broad areas of investigation has recently made a significant impact on the field of gene regulation.The first revolves around the genetic basis of human disease.Fueled by the power of linkage and genome-wide association studies, an ever-expanding list of human diseases has been associated with single nucleotide polymorphisms (SNPs) residing in noncoding regions of the genome [3].These disease-associated SNPs are thought to directly control some aspect of target gene expression, or are linked to other DNA variants that possess regulatory activity.In a small but growing number of cases, the regulatory SNPs identified in human genetic studies have led to the identification of disease susceptibility loci and have served as useful entry points for unraveling the complexities of the gene regulatory landscape (Table 1) [3].The second line of investigation that has revitalized gene expression research relates to the development of functional genomic approaches to screen noncoding DNA for regulatory potential.Genome-wide surveys of sequence conservation [4][5][6], histone modifications [7][8], DNAse I hypersensitivity [9] and DNA structure [10], have all significantly improved the detection of functional cis-acting regulatory sequences.This review will highlight recent examples from the literature that have successfully integrated genetic and genomic approaches to uncover the molecular basis by which cis-regulatory mutations alter gene expression and contribute to human disease.",
"Complexity of gene regulationGene regulation is a complex multi-layered process involving numerous proteins and non-coding RNAs which may act at a great distance from their target gene.Elaborate multi-protein/RNA complexes must be assembled at the site of regulation.The regulatory mechanism may be intricate and variable, potentially involving transcript rearrangement and mRNA degradation.It is now clear that RNA has a diverse set of functions and is more than just a messenger between gene and protein.The mammalian genome is extensively transcribed, giving rise to thousands of RNA transcripts that are never translated into proteins.Whether all of these transcripts are functional is currently debatable, but it is evident that these include families of RNA molecules with a regulatory function [34].The presence of a gene expression change, which is strongly correlated with relevant physiological changes, in the absence of proximate significant GWAS signals, suggests that relatively distant regulatory variants (and potentially many such variants) may act in combination to regulate the expression of the target gene of interest.Such putative gene expression-modulating variants could potentially act upon target gene expression through the mediation of non-protein-coding regulatory RNAs.For example, recent studies have shown that the expression of many genes is modulated by small interfering RNAs (siRNAs) and micro-RNAs, e.g.reviews by [10,30], which do not encode proteins.In addition to microRNAs, many nonprotein-coding RNA species (or \"RNA genes\"), such as long noncoding RNAs [42], are transcribed from the genome.Thus, there is compelling evidence that most of the genome may be transcribed [5,6,9,19,38,53,58,59,62] and the potential role of non-protein coding RNA genes in the modulation of protein-coding gene expression remains to be fully evaluated.",
"Transcription factors that bind to DNA recognize this sequence and use it to correctly position RNA polymerase, the enzyme that actually generates the transcript.Other sequences, called enhancers and repressors, speed up and slow down, respectively, the rate of transcription.Enhancer and repressor sequences can be quite distant from the gene's coding region.Other transcription factors recognize these sequences and further control how much and how fast mRNA is generated.All of these sequences are part of a gene and are required to generate the many proteins that control the overall maintenance and general metabolism of all of our cells.Genes that are expressed in all cell types, such as RNA polymerase and transcription factors, are called housekeeping genes.Concepts in the 21st Century: Genetic and Epigenetic Regulation of Gene ExpressionWe now know that only about 1% of our genome encodes proteins.Alternative splicing is the primary mechanism by which our approximately 20,000 genes can code for hundreds of thousands of proteins.Alternative splicing refers to modification of the primary mRNA produced during transcription (Figure 8).Only a portion of the transcript contains sequences that are translated into a protein.Introns, or intervening sequences, are removed after transcription, and the remaining sequences, known as exons, are spliced together.One transcript can be processed in multiple ways, such that different combinations of exons can be spliced together, producing many different proteins from the same primary transcript.The discovery of alternative splicing has changed our thinking about the central dogma because we now know that the concept of one gene encoding one protein is not true.",
"Of the total 20,000-25,000 protein-coding genes, occupying only 1.2% of the human genome, about six percent are functionally classified as TFs [8].However, some 93% of our genome is transcribed, by far the greatest part expressed as non-protein-coding RNAs (ncRNA), including the miR-NAs [9].An order of magnitude more numerous than all the proteins which make up living organisms are the transcrip-*Address correspondence to this author at the School of Medicine, University of Louisville, 580 S. Preston St., Louisville, KY 40202, USA; Tel: 502-852-2554; Fax: 502-852-2555; E-mail: Eugenia.Wang@Louisville.edution start sites (TSSs), located in promoter-proximal element regions, as well as an increasing number of putative promoter-distal elements, identified by the pilot ENCODE project [9].These recent findings, together with the fact that nonprotein-coding genomic sequence elements-such as miR-NAs-predominate and are evolutionarily conserved in our genome, challenge our traditional understanding of the definition of a gene, which has been generally considered a unit of genome sequence that is transcribed to produce a protein product for a given cellular function.Nevertheless, as the ENCODE consortium suggests, a gene may be defined as \"a union of genomic sequences encoding a coherent set of potentially overlapping functional products\" that eventually orchestrate the complex regulation and function of the host organism's cellular activities [10].An even bolder scenario is proposed by John S. Mattick, who suggests that the genome may consist largely of massively embedded RNA coding sequences directing regulatory networks, which may have co-evolved with proteins.These two complementary genomic sets may ultimately form the interacting RNAprotein regulatory networks which control the complex layers of signaling communication within all cells [11,12].Thus, the intriguing notion of epigenomic regulation of essential processes such as cell proliferation, differentiation, apoptosis, etc., characterized by feed-forward RNA regulatory networks, is becoming increasingly important in our appreciation of the epigenetic information required for the development of multi-cellular organisms [11].In this report, we focus our discussion on the suggestion that derailment of the RNA-protein interaction, and its subsequent impact on the regulatory networks which they direct, may constitute a significant fraction of the molecular mechanisms controlling the aging process.",
"During the development,genes are turned on and off in a pre-programmed fashion, a process orchestratedby TFs, whose binding sites aggregate in the promoters near their controlled genes. A combinatorial control is achieved via different combinations of ubiquitous andcell-specific regulatory factors. Moreover, genes can initiate transcription at multiple loci (alternative promoters), creating RNA isoforms with different 5 regions. Alternative promoters are potentially important for gene-expression regulation orgenerating different protein products. Complex regulation in vivo can also involvemany more features, such as enhancers, locus control regions (LCRs), and/or scaffold/matrix attachment regions (S/MARs).Tightly regulated gene expression for specific cell types and developmental stages inresponse to different physiological conditions is driven by the orchestration of complex and multilayered gene regulatory networks (GRNs) (Maniatis and Reed, 2002). Inferring GRNs is of fundamental importance and a great challenge for molecularbiologists and geneticists. Mutations, including point mutations, insertions and deletions, translocations,and duplications, play critical roles in determining biological phenotypes and disease susceptibilities by perturbing the GRNs. Among them, single nucleotide polymorphisms (SNPs) generated by point mutations occur approximately one per 1000bases and are the predominant variations in man.Gene expression regulation can take place at any step during the path of expression, including transcription, mRNA splicing and processing, export and subcellularlocalization, translation and post-translational modifications. These steps are oftencoupled with each other (Maniatis and Reed, 2002). Currently, it is still too early tobuild comprehensive and accurate dynamic models for truly realistic GRNs. The majority of computational methods attempt to detect cis-trans relationships, the basicbuilding blocks of GRNs, by modern statistical or machine learning approaches.",
"Other possible regulatory regions includeenhancers and silencer etc. In the coding regions of a gene, Triplets of nucleotides,known as codons, each encode for one of 20 amino acids or a signal. 3The process that a ribonucleic acid (RNA) synthesized from DNA is calledtranscription. One strand of DNA is served as template during transcription. The RNAtranscribed from the template DNA is identical in sequence with the other strand of theDNA which is called coding strand.",
"Third, instructions encoded within the embryos DNA can directly control if, andwhen, a nearby gene becomes activated; this is known as cis-regulation. Finally, similar instructionscan also control genes that are situated elsewhere in the embryos DNA through indirectmechanisms; this is known as trans-regulation. Now, Spies, Smith et al. have investigated these four processes in the offspring of two differentstrains of mice, one originally from Europe and the other from Southeast Asia. The two strains werecrossbred and the resulting embryos were analyzed to see which of the four processes affected geneactivity.",
"During the development,genes are turned on and off in a pre-programmed fashion, a process orchestratedby TFs, whose binding sites aggregate in the promoters near their controlled genes. A combinatorial control is achieved via different combinations of ubiquitous andcell-specific regulatory factors. Moreover, genes can initiate transcription at multiple loci (alternative promoters), creating RNA isoforms with different 5 regions. Alternative promoters are potentially important for gene-expression regulation orgenerating different protein products. Complex regulation in vivo can also involvemany more features, such as enhancers, locus control regions (LCRs), and/or scaffold/matrix attachment regions (S/MARs).Tightly regulated gene expression for specific cell types and developmental stages inresponse to different physiological conditions is driven by the orchestration of complex and multilayered gene regulatory networks (GRNs) (Maniatis and Reed, 2002). Inferring GRNs is of fundamental importance and a great challenge for molecularbiologists and geneticists. Mutations, including point mutations, insertions and deletions, translocations,and duplications, play critical roles in determining biological phenotypes and disease susceptibilities by perturbing the GRNs. Among them, single nucleotide polymorphisms (SNPs) generated by point mutations occur approximately one per 1000bases and are the predominant variations in man.",
"REGULATION OF GENE EXPRESSIONApart from the protein coding sequences, there are other biologically relevant nucleic acid sequences that play other important roles in the genome such as regulation of gene expression and maintenance of the chromatin structure (Pique-Regis et al., 2011).Regulation of gene expression involves a process that leads to increase or decrease in the production of specific proteins (Jacob and Monod, 1961).It is an important aspect of the cell because it increases the versatility and adaptability of an organism by allowing the cell to produce proteins only when they are needed (Payankaulam, 2010;Jacob and Monod, 1961).Gene expression is regulated at the level of transcription (described in 2.8), which can only occur if transcription factors bind to the DNA.Binding occurs within special nucleotide sequences called regulatory regions that are usually several hundred base pairs long (Lodish et al., 2000).Regulatory regions surround transcription start sites (TSSs) of genes apart from some sequences called enhancers that are located far upstream or downstream of their target gene (Birney et al., 2007;Dineen et al., 2007).",
"During the development,genes are turned on and off in a pre-programmed fashion, a process orchestratedby TFs, whose binding sites aggregate in the promoters near their controlled genes. A combinatorial control is achieved via different combinations of ubiquitous andcell-specific regulatory factors. Moreover, genes can initiate transcription at multiple loci (alternative promoters), creating RNA isoforms with different 5 regions. Alternative promoters are potentially important for gene-expression regulation orgenerating different protein products. Complex regulation in vivo can also involvemany more features, such as enhancers, locus control regions (LCRs), and/or scaffold/matrix attachment regions (S/MARs).Tightly regulated gene expression for specific cell types and developmental stages inresponse to different physiological conditions is driven by the orchestration of complex and multilayered gene regulatory networks (GRNs) (Maniatis and Reed, 2002). Inferring GRNs is of fundamental importance and a great challenge for molecularbiologists and geneticists. Mutations, including point mutations, insertions and deletions, translocations,and duplications, play critical roles in determining biological phenotypes and disease susceptibilities by perturbing the GRNs. Among them, single nucleotide polymorphisms (SNPs) generated by point mutations occur approximately one per 1000bases and are the predominant variations in man."
],
[
"Examples of economically important traits, their heritabilities, and relative economic values.",
"Genetic correlation is different from pleiotropy.Two traits have a pleiotropic relationship if many variants affect both.Genetic correlation is a stronger condition than pleiotropy: to exhibit genetic correlation, the directions of effect must also be consistently aligned.",
"This means that it is the developmentalbasis of trait integration, not simply the strength of the genetic correlations and observable patterns of covariation among traits, that will affect how components of a scalingrelationship can evolve. Although these powerful phenotype landscape models have generated important insights into the evolution of complex traits such as scaling relationships, they are difficult totest empirically (see Rice 2008).A, Shape variation within a group of organisms isshown by a line fit to a data cloud representing the size of two traits for a group of organisms. Shapevariation within a group of organisms is shown by a line (dark line) fit to a data cloud (gray elipse)representing the size of two traits for a group of organisms, in this case the brain-body size relationshipin humans (data from Koh 2005). B, Scaling relationships are divided into three classes based on thepattern of variation they describe.At the phenotypic level, detailed studies of physiology, morphology, and biochemistry canelucidate whether a higher-level trait has evolved via changes in different subordinate traits. At the genetic level, a first-pass black box approach to determine whether different genesunderlie the response to selection in replicate lines is to cross those lines and examine thetraits of interest in the F1, F2, and/or backcross populations (see also Rhodes and Kaweckithis volume).Particularly relevant for the evolution of scaling relationships,these models have revealed that the developmental basis of genetic correlations (e.g. , thedegree to which a given genetic correlation results from additive or nonadditive epistaticinteractions among traits) can profoundly affect the evolutionary malleability of the correlation, trait covariation, and the evolutionary trajectory of the complex phenotype (Wolfet al. 2001, 2004; Rice 2002, 2004a, 2008).The phenotype landscapeapproach has been extended to connect with existing quantitative genetic treatments ofmultivariate evolution, yielding an emergent theory exploring how developmental integration, or entanglement, among traits affects the symmetry and rates of trait evolution;the evolution of heritabilities; the impact of genetic correlations on evolutionary trajectoriesacross different time scales; the evolutionary relationships among trait means, variances,THE EVOLUTION OF ANIMAL FORM437and covariances; and the distribution of traits in phenotypic space (Wolf et al. 2001,2004; Rice 2004b, 2008).",
"In contrast, and consistently with our goal of identifying novel relationships among traits, module nos. 3, 4 and 5 suggest previously unknown connections between traits. We next characterized pairs of traits within each group of traits (trait pairs) to show that thequality of these pairs is not lower than in existing methods. We focused on three main properties oftrait pairs: the correlation among traits in a pair; the correlation between a trait pair and thetranscripts; and the knowledge-based relationships among traits.However, in most cases the genes and molecular mechanisms involved are not yet known so it ismore difficult to work out how the traits are connected. Computing techniques make it possible to assess the relationships between hundreds orthousands of traits at the same time. These high volume analyses can also allow scientists to identifyless obvious relationships that might be missed in more traditional types of study. Here, Oren et al. created a new computer algorithm to identify related traits, their shared geneticbasis, and the molecular mechanisms behind them.",
"This means that it is the developmentalbasis of trait integration, not simply the strength of the genetic correlations and observable patterns of covariation among traits, that will affect how components of a scalingrelationship can evolve. Although these powerful phenotype landscape models have generated important insights into the evolution of complex traits such as scaling relationships, they are difficult totest empirically (see Rice 2008).A, Shape variation within a group of organisms isshown by a line fit to a data cloud representing the size of two traits for a group of organisms. Shapevariation within a group of organisms is shown by a line (dark line) fit to a data cloud (gray elipse)representing the size of two traits for a group of organisms, in this case the brain-body size relationshipin humans (data from Koh 2005). B, Scaling relationships are divided into three classes based on thepattern of variation they describe.At the phenotypic level, detailed studies of physiology, morphology, and biochemistry canelucidate whether a higher-level trait has evolved via changes in different subordinate traits. At the genetic level, a first-pass black box approach to determine whether different genesunderlie the response to selection in replicate lines is to cross those lines and examine thetraits of interest in the F1, F2, and/or backcross populations (see also Rhodes and Kaweckithis volume).Particularly relevant for the evolution of scaling relationships,these models have revealed that the developmental basis of genetic correlations (e.g. , thedegree to which a given genetic correlation results from additive or nonadditive epistaticinteractions among traits) can profoundly affect the evolutionary malleability of the correlation, trait covariation, and the evolutionary trajectory of the complex phenotype (Wolfet al. 2001, 2004; Rice 2002, 2004a, 2008).The phenotype landscapeapproach has been extended to connect with existing quantitative genetic treatments ofmultivariate evolution, yielding an emergent theory exploring how developmental integration, or entanglement, among traits affects the symmetry and rates of trait evolution;the evolution of heritabilities; the impact of genetic correlations on evolutionary trajectoriesacross different time scales; the evolutionary relationships among trait means, variances,THE EVOLUTION OF ANIMAL FORM437and covariances; and the distribution of traits in phenotypic space (Wolf et al. 2001,2004; Rice 2004b, 2008).",
"As outlined by Lewontin (2011), the relationship between genotype and phenotype can be described by four basic models that have been, and still are, used in genetics: one-to-one, one-to-many, many-to-one, and many-to-many (see Fig. 1).The first goes back to the unit factor theory at the beginning of the twentieth century, i.e., one gene gives rise to one trait (Mayr 1982).The second model describes one gene affecting many traits (pleiotropy), while the third model accounts for many genes affecting one trait (polygeny).It is undoubtedly correct that every part of the genome is connected causally with the phenome (a set of phenotypes) by at least some molecular mechanistic pathways, but there is variation in this relation, which can make all of these four models valid at least for some cases.But generally for most eukaryotic organisms, model 4 (many-to-many) is the most acceptable description for most cases of the relationship between phenotype and genotype (Lewontin 2011).And often, the many-to-many model is insufficient, since genes and environment are usually both involved in the development of phenotypes, as captured by the norm-of-reaction concept (see, e.g., Falk 2001).",
"At the otherend are traits, such as growth, which are likely to be affected by many genes that each contributea small portion to the overall phenotype. Between these two extremes are traits that areregulated by more than one genetic locus (and are possibly also influenced by environmentalfactors), which show several intermediate phenotypes. Generally, the more loci that areinvolved in determining a quantitative trait, the more difficult it is to map and identify all ofthe causative QTLs.",
"Genetic Correlations Among Multiple TraitsWhen a sufficient number of traits have been tested in the same inbred strains, the geneticrelationships among the traits can be determined and a genetic framework developed usingmultivariate statistical methods. A growing literature of SI and RI strain surveys exists, withonline resources to search these data and to directly compare previous and new strain surveysMethods Mol Biol. Author manuscript; available in PMC 2011 January 1. Lariviere and MogilPage 4NIH-PA Author Manuscript(e.g. , http://www.jax.org/phenome, http://www.genenetwork.org).",
"However, common practice in genetics treats this relationshipas a straightforward one-to-one mapping from genotype to phenotype. The roots of this practicecan be traced to Mendel who chose traits with a direct relationship between genetic variation andphenotypic variation in formulating his particulate theory of inheritance. It has been furthersolidified by the successes of modern genetics in identifying genes involved in many simpleWtraits, such as rare human diseases. However, most traits are not simple and to understandcomplex traits it is necessary to decipher the developmental processes that occur between genesIEand traits.It was believed by many that for each trait variant we should expect to find acorresponding genetic change, or gene for that trait. Through historical happenstance therelationship between genes and traits was set up and treated as if it were one-to-one. But theproduction of a trait involves not only genes, but also their interactions with each other and theenvironment, and chance.Two approaches to understanding the genotype-phenotype relationship are describedand examples given of how both lead to a many-to-many relationship. First, cellular and geneticmechanisms, such as alternative splicing, DNA and chromatin modification, cellular gene choice,and gene regulation, which lead from DNA sequence to protein structure, are discussed. And,second, examples of variation in the genotype-phenotype relationship which can producevariable phenotypes from the same genetic information and stable phenotypes despite geneticvariation are presented. iiiTo examine how normal variation in complex repeated traits such as the mammaliandentition is produced two experimental approaches are taken."
],
[
"Another striking finding has been the revelation of the existence ofgenome regions to which variation in large number of traits can be mapped [29];such regions have been designated as QTL hotspots. This genetic information wasthen used to try to infer biological relationships between those traits and to connectthem into networks [30] (for example transcriptional networks). In more recentstudies, efforts have been devoted to the integration of phenotypes from differentlevels, jointly studying gene expression, proteome, metabolome and sometimesclassical traits such as diseases [31, 32].",
"First, it is possible to map Mendelian traitsand even quantitative traits with modest LOD scores with good precision, even whenusing a small numbers of strains7577. Second, a good way to transition from QTLs tospecific genes, variants, and mechanisms is often to use complementary resources suchas panels of common inbred strains, Collaborative Cross (CC), or Diversity Outbred (DO)cases, efficient screens of candidate genes using in vitro and in vivo assays 48,76, and evenhuman genome-wide association study (GWAS) data 7882.",
"For example, in comparative genomics, QTLs coming from different species andassociated with a given complex phenotype are aligned based on the syntenybetween these species. The overlapping genetic region is considered very likely tocontain the causal gene for this complex trait. In Chapter 9, we wondered whether it197Chapter 10is possible to apply this approach to the currently available data regarding thegenetic basis of physical activity in mice and humans in order to discover novelcandidate genes for this phenotype.",
"It is now widely appreciated that even when an association can be localized to a singlegene, that gene may not be the cause of the association [Smemo-2014], meaning that proximity tothe peak SNP is not sufficient to identify the causal gene. Therefore, a major goal of our study was tointegrate behavioral QTL and eQTL data. eQTLs can provide the crucial link between a regionimplicated by GWAS and the biological processes that underlie that association. We exploited theeasy access to tissue, which is a critical advantage of model organisms, to map eQTLs.Theseexamples illustrate the utility of combining GWAS with eQTL data to identify the molecularmechanism by which a chromosomal region influences a complex trait. DiscussionWe performed a GWAS in a commercially available outbred mouse population, which identifiednumerous physiological, behavioral, and expression QTLs. In several cases the implicated loci weresmaller than 1 Mb and contained just a handful of genes that included an obvious candidate. Inaddition, we used the eQTL results to further parse among the genes in the intervals that wereimplicated in the behavioral traits.",
"The authors analyzed GWAS data to confirm that annotating SNPs with a scorereflecting the strength of the evidence that the SNP is an eQTL can improve the ability todiscover true associations and may further clarify the nature of the mechanism driving theassociations. This raises the possibility that eQTL data may increase the proportion ofheritability explained by identifiable genetic factors, and be used to gain a betterunderstanding of the biology underlying complex traits.",
"Network analysesWe now have two QTL, and we have picked potentially interesting genes within each, but nowwe want to build up more evidence for which gene in our QTL interval is causal. The first, andmost obvious way, is to see what genes our trait of interest correlates with, in tissues that weexpect to be related to the trait. We calculated the Spearmans correlation between the traitBXD_17850 and all probes with expression data in T helper cells (GN319).",
"The advent of largerpanels and denser marker maps, in conjunction with high quality gene expression data, now means that expression QTLs arestatistically robust enough to be considered starting points forfurther study in their own right. This can be used to great effectin reverse complex trait analysis, a powerful new approach inwhich segregating genetic variation, as evidenced by a strongQTL, is mapped to other potentially interacting genes, and ultimately back to candidate phenotypes.",
"Since our driving application is toidentify the genes that cause variation in complex traits, it is necessary to show the relationship or distance between genes and QTLs. For that, we need an additional relationaltable describing the exact location of QTLs in the unit of megabases. Graph theoretic algorithms provide valuable information that is otherwise hard to discern about the data. However, many such algorithms incur long compute times and arefar from being interactive.",
"Using this tool, a QTL analysis may also shed light onwhether differences in phenotype are due to one or two largeeffect genes or many loci of small effect (Stapley et al. , 2010). A model constructed by Malcom (2011) highlights the importance of considering the genetic architecture when attempting topredict evolutionary trajectories by suggesting that a trait controlled by a small gene network will adapt more rapidly but reacha less than optimal endpoint, whereas a trait controlled by a largegene network will evolve more slowly but more accurately.",
"Network analysesWe now have two QTL, and we have picked potentially interesting genes within each, but nowwe want to build up more evidence for which gene in our QTL interval is causal. The first, andmost obvious way, is to see what genes our trait of interest correlates with, in tissues that weexpect to be related to the trait. We calculated the Spearmans correlation between the traitBXD_17850 and all probes with expression data in T helper cells (GN319).",
"We [16,18], and others [19,20] have indicated that the combined use of gene expression datatogether with QTL (quantitative trait locus) analysis canprovide for a better understanding of the genetics of complex traits.",
"These relationships provide important information forbiologists to understand and search for the genetic basis ofeQTL. An eQTL can span physically a large genomicregion, depending on the mapping experimental design. Due to the limitations of linkage studies it is difficult topin down which gene within an eQTL is the source ofeTrait variation [20]. By relating eTraits and genetic markers to their corresponding genes, our eQTL Viewer organizes each eQTL as a list of pairwise relationships betweenan eTrait gene and the multiple candidate genes in theeQTL region.",
"On the onehand, the genomic location that are in suspicion to be involved in the trait can still involvelarge genomic segments, e.g. , millions of basepairs that include many genes within the segment. On the other hand, GWAS may point toseveral or even many genomic locations for thetrait of interest, complicating further functionalanalysis. Analysis of Quantitative Trait Loci (QTL)QTL analysis reveals statistically signicantlinkage between phenotypes and genotypes,thereby providing explanation for the geneticbasis of variation in complex traits (Falconerand Mackay, 1996; Lynch and Walsh, 1998).",
"It is now widely appreciated that even when an association can be localized to a singlegene, that gene may not be the cause of the association [Smemo-2014], meaning that proximity tothe peak SNP is not sufficient to identify the causal gene. Therefore, a major goal of our study was tointegrate behavioral QTL and eQTL data. eQTLs can provide the crucial link between a regionimplicated by GWAS and the biological processes that underlie that association. We exploited theeasy access to tissue, which is a critical advantage of model organisms, to map eQTLs.Theseexamples illustrate the utility of combining GWAS with eQTL data to identify the molecularmechanism by which a chromosomal region influences a complex trait. DiscussionWe performed a GWAS in a commercially available outbred mouse population, which identifiednumerous physiological, behavioral, and expression QTLs. In several cases the implicated loci weresmaller than 1 Mb and contained just a handful of genes that included an obvious candidate. Inaddition, we used the eQTL results to further parse among the genes in the intervals that wereimplicated in the behavioral traits.",
"The remarkable success in mappinggenes linked to a number of disease traits using genomewide association studies (GWAS) in human cohorts hasrenewed interest in applying this same technique in modelorganisms such as inbred laboratory mice (Su et al. 2010). Unlike classical phenotypic traits, gene expression traitsgiving rise to cis-acting eQTL provide us with a prioriknowledge of the true QTL location (Doss et al. 2005),which can be used to empirically estimate the power of aGWAS performed at a similar scale (Hao et al. 2008;Schadt et al. 2008).",
"Genomic regions linked to complex traits can be identified by genetic mappingand quantitative trait locus (QTL) analysis (Shehzad and Okuno 2014). 7QTL mappingQTL mapping with molecular markers is the first strategy in genetic studies. In plantbreeding, QTL mapping is an essential step required for marker-assisted selection(Mohan et al. 1997; Shehzad and Okuno 2014). The fundamental idea underlying QTLanalysis is to associate genotype and phenotype in a population exhibiting a geneticvariation (Broman and Sen 2009).",
"QTL mapping studies thenseek to detect the polymorphisms underlying the complex traits of interest byscanning for alleles that co-vary withthe traits. Similar experiments also can be conducted with special derivatives of inbredstrains known as recombinant inbred(RI) mice. These animals are derivedby cross-breeding two or more distinctparental strains (which often divergewidely for the trait of interest), followedby inbreeding of the offspring for severalgenerations (Bailey 1971). Given thecorrect breeding strategy, this method1This is an issue faced by GWASs researchers when classifyingsamples as cases or controls.The investigatorsfirst identified all QTLs associated witha classical phenotype and then winnowed the list of potentially associatedgene-expression traits on the basis oftheir correlation or eQTL overlap withthe phenotype of interest. Candidategenes then were ranked by applyingthe LCMS technique, which uses theeQTL data to establish causal relationships between DNA loci and transcripts as well as between transcriptsand phenotypes and finally identifiesa model that best fits the data."
]
],
"task_id": [
"44B088326CD80B4980D810738D88A284",
"BF1705D2C26044038FF1483258548167",
"68AB7A78543D5B36206274837824091B"
]
}
|