aboutsummaryrefslogtreecommitdiff
{
  "question": [
    "How does one tell the difference between X and Y DNA, with respect to DNA tracing and determining QTLs",
    "Once a sperm combines with an egg, what determines how traits are  passed onto the resulting lifeform?",
    "How can I add a new species to the GeneNetwork database?",
    "which genes are typically associated with diabetes in QTL analyses?",
    "In which diseases is the gene TCF7L2 involved?"
  ],
  "answer": [
    "The text does not provide specific information on how to differentiate between X and Y DNA in the context of DNA tracing and determining Quantitative Trait Loci (QTLs). However, it discusses the process of identifying QTLs, which involves using mapping data from crosses involving more than two inbred strains and sequence variants in the progenitor strains. The strain distribution pattern in the progenitor strains is tested for consistency with the observed genetic effect of the QTL. This process helps assign a probability that any sequence variant is a Quantitative Trait Nucleotide (QTN). The text does not mention the specific roles of X and Y DNA in this process.",
    "Traits are passed onto the resulting lifeform through the process of meiosis. Each parent contributes 23 unpaired chromosomes, which combine to form the full complement of 23 chromosome pairs in the zygote. These chromosomes carry the genetic material that determines the traits of the offspring. The process of recombination or crossing over during meiosis also contributes to genetic variation in the offspring.",
    "The text does not provide specific instructions on how to add a new species to the GeneNetwork database. It is recommended to contact the administrators or operators of the GeneNetwork database for guidance on this process.",
    "The genes typically associated with diabetes in QTL analyses include TCF7L2, HHEX-IDE, EXT2, FTO, SLC30A8, IGF2BP2, CDKAL1, CDKN2A-CDKN2B, JAZF1, TSPAN8-LGR5, THADA, ADAMTS9, NOTCH2-ADAM30, CDC123-CAMK1D, KCNQ1, PPARG, and KCNJ11.",
    "The background text does not provide information on the involvement of the TCF7L2 gene in any diseases."
  ],
  "contexts": [
    [
      "Genetic mapping inmouse strains enhances the power of detecting modifier genes and identifying complexgenetic interactions. Genomewide quantitative trait locus (QTL) analysis, as described inmore detail below, represents a promising approach to detect genetic variants that areassociated with specific phenotypes and interact with each other. 16ACCEPTED MANUSCRIPTIn experimental crosses of two (inbred) strains the first generation (F1) ofoffsprings is genetically heterozygous but equal. Then in the next generation (F2) thePTstrain-specific genetic information is distributed across the genomes of their progeny andRIeach offspring is genetically unique.",
      "This contrast can be exploited to identify subregions that underlie the trans-QTLs [67]. SNPs were counted for all four pairs of parental haplotypesBvs D, B vs H, B vs C, and L vs Sand SNP profiles for the fourcrosses were compared (figure 6). Qrr1 is a highly polymorphicPLoS Genetics | www.plosgenetics.org8November 2008 | Volume 4 | Issue 11 | e1000260QTL Hotspot on Mouse Distal Chromosome 1Figure 5. QTL for aminoacyl-tRNA synthetases in distal Qrr1.",
      "The traditional approach to QTL mapping is to usetwo strains that differ maximally in the phenotype asparental strains for genetic crosses, with the followingcaveats. QTL analysis based on a single cross will mostlikely reflect only a small portion of the net geneticvariation, and QTL detection will be limited to regionswhere the two progenitor strains have functional polymorphisms. Data from multiple crosses, or from an HS,will overcome this limitation and can also be used toreduce QTL intervals [5,30].",
      "These candidate genes are then sequenced in the two parental inbredstrains looking for sequence dierences in coding or regulatory regions. After ne mapping the QTL interval and shortening the list of plausiblecandidate polymorphisms, the major challenge remains \u0001 proving denitivelywhich nucleotide polymorphism underlies the QTL. The most direct proofwould be replacing one strains allele with another strains allele (creating aFIG. 1. Intercross breeding strategy for mapping quantitative trait loci (QTLs). On the right, the parental, F1 hybrid, and intercross (F2) mousegenerations are depicted.",
      "Furthermore, splicing QTLs(sQTLs) rather than eQTLs could comprise the molecular mechanism linking DNA variants with YFP53; thus, sQTL analysis could uncover genes that would not normally bedetected at the level of differential gene expression (DGE),53 and thus, a differentially181182Molecular-Genetic and Statistical Techniques for Behavioral and Neural ResearchFigure 8.5 Schematic for immediate, rapid ne mapping in select F2 recombinants of the RCC-F2cross. Top panel: Genome-wide signicant QTL (green trace; red dashed line  signicance threshold;blue vertical lines  Bayes credible interval).",
      "Interval-specific haplotype analysisApproximately 97% of the genetic variation betweeninbred mouse strains is ancestral [22], so regions ofidentity by descent (IBD) between two strains used todetect a QTL are highly unlikely to contain the causalgenetic polymorphism underlying the QTL [28]. Forexample, a cross between C57BL/6J and A/J mice detectedwww.sciencedirect.coma blood pressure QTL on Chr 1 [7].",
      "Interval-specific haplotype analysisApproximately 97% of the genetic variation betweeninbred mouse strains is ancestral [22], so regions ofidentity by descent (IBD) between two strains used todetect a QTL are highly unlikely to contain the causalgenetic polymorphism underlying the QTL [28]. Forexample, a cross between C57BL/6J and A/J mice detectedwww.sciencedirect.coma blood pressure QTL on Chr 1 [7].",
      "At present, the BXD panel is composed of 80 different strains that all have beenfully genotyped.26 Variation in any quantifiable trait can be associated with thesegregation of parental alleles, and linkage genetics can map this variation toquantitative trait loci (QTLs), thereby identifying the genomic region(s) affectingthat trait. An overview of the QTL mapping approach is depicted in Figure 2. Classical QTL analysis has permitted the identification of loci that areassociated with variation in HSC traits.",
      "This comparison gives information about the reliability of the observed genotypeinformation: The more the marker locations differ between the two maps (which signifiesvariation in marker positions), the higher the possibility of genotyping errors. QTL mapping was done in several stages to identify loci acting individually and QTL thatinteracted, either additively or epistatically. To determine individually-acting QTL, a singleQTL genome scan was conducted with the function scanone.In general,linking genetic variation with trait variation identifies QTL and a significant linkage ofphenotype and genotype suggest that the DNA status helps to determine trait expression. As stated above, mouse QTL studies provide distinct advantages over human studiesin the examination of genetic causes of a quantitative trait (e.g. alcoholism), even in theabsence of specific hypotheses regarding its aetiology or candidate genes.The progenitor mouse strainsshould have sufficient variation for the traits of interest and they should be genetically diverseenough to enable genetic mapping (BENNETT et al. 2006; FLINT 2003; GRISEL 2000). Thesample size required for the identification of QTL depends largely on the effect size that aQTL contributes to phenotypes on interest. Inference about QTL can be made if one or moregenetic markers are over- or underrepresented in the analysed individuals. Genotyping isoften done by means of microsatellite markers, which contains mono, di-, tri-, ortetranucleotide tandem repeats flanked by specific sequences (Figure 4a).",
      "Importantly, whereasthese studies required substantial labor, time, and resources, X-QTL is a quick and easyapproach to achieve a comparable level of genetic dissection. The levels of complexityobserved here (e.g. 14 loci explaining 70% of the genetic variance for 4-NQO resistance) arestill dramatically lower than those seen in for some human traits in GWAS (e.g. 40 lociexplaining 5% of the variance for height 2,5). One obvious explanation is the difference inexperimental designs (line crosses vs. population association studies), but differences ingenetic architectures among species and traits may also contribute.",
      "The method uses two pieces of information: mapping data from crosses thatinvolve more than two inbred strains and sequence variants in the progenitor strains within the intervalcontaining a quantitative trait locus (QTL). By testing whether the strain distribution pattern in the progenitor strains is consistent with the observed genetic effect of the QTL we can assign a probability that anysequence variant is a quantitative trait nucleotide (QTN). It is not necessary to genotype the animals exceptat a skeleton of markers; the genotypes at all other polymorphisms are estimated by a multipoint analysis.",
      "The method uses two pieces of information: mapping data from crosses thatinvolve more than two inbred strains and sequence variants in the progenitor strains within the intervalcontaining a quantitative trait locus (QTL). By testing whether the strain distribution pattern in the progenitor strains is consistent with the observed genetic effect of the QTL we can assign a probability that anysequence variant is a quantitative trait nucleotide (QTN). It is not necessary to genotype the animals exceptat a skeleton of markers; the genotypes at all other polymorphisms are estimated by a multipoint analysis.",
      "which allows the the estimation of the degree of dominance of detected QTLs.but point out that various designs or linkage analysis methods have advantages and disadvantages and need to be chosen for the question at hand. In general, there are two common statistical approaches for detecting and/or locating QTL. Both approaches involve moving along the chromosome and considering data for one or several markers at a time and relating these to the traits of interest.",
      "Genotyping all the individual progeny formarkers that show allelic variation between the parental strains (either single nucleotide polymorphisms or simple sequence repeats) will allow the detection of associations between trait values and marker genotype, and in this way demonstrate to whichset of markers a QTL is linked. To reduce the genotyping effort, selective genotypingof the individuals at the extremes of the phenotypic spectrum can be performed (20,23). Although these three approaches are in general considered to be the best to detect andmap QTL, they have several disadvantages for quantitative traits involving HSC.",
      "So, how do you go about planning and performing a QTL study, and howdo you identify the responsible gene within a QTL that you have identified? Generally, one starts by performing a strain survey to find two parental inbredstrains that have a markedly different trait. One can now look up many differenttraits of inbred mice online at the Mouse Phenome Database (http://phenome. jax.org/pub-cgi/phenome/mpdcgi?rtn=docs/home). However, the trait you maywant to study may not be present in wild type mice, so you may want to crossa mutant (or genetically engineered) strain onto several inbred strains.By using a segregating populationit is possible to determine whether the observed difference in mRNA abundance betweentwo parental lines is caused by a sequence difference within the gene itself or close to it(cis-eQTL) or to a factor from a different location (trans or distal eQTL). If it is possibleto map both the gene itself (a binary trait) and the variation in the abundance of mRNAexpressed from that gene (a quantitative trait) onto the chromosomes (as above) then cisand trans-regulation can be easily differentiated.",
      "These candidate genes are then sequenced in the two parental inbredstrains looking for sequence dierences in coding or regulatory regions. After ne mapping the QTL interval and shortening the list of plausiblecandidate polymorphisms, the major challenge remains \u0001 proving denitivelywhich nucleotide polymorphism underlies the QTL. The most direct proofwould be replacing one strains allele with another strains allele (creating aFIG. 1. Intercross breeding strategy for mapping quantitative trait loci (QTLs). On the right, the parental, F1 hybrid, and intercross (F2) mousegenerations are depicted.",
      "QTL mapping studies thenseek to detect the polymorphisms underlying the complex traits of interest byscanning for alleles that co-vary withthe traits. Similar experiments also can be conducted with special derivatives of inbredstrains known as recombinant inbred(RI) mice. These animals are derivedby cross-breeding two or more distinctparental strains (which often divergewidely for the trait of interest), followedby inbreeding of the offspring for severalgenerations (Bailey 1971). Given thecorrect breeding strategy, this method1This is an issue faced by GWASs researchers when classifyingsamples as cases or controls."
    ],
    [
      "(1234567890)Complex trait variation in natural and experimental populations is due to specific DNA sequence polymorphisms, environmental effects, and the interactions between these factors(Johannes et al. 2009). Testis weight is a complex trait thatholds direct implications for reproductive success, as developmental abnormalities can lead to irregular sperm production and infertility in adulthood (Sharpe 2001). Variation intestis size has been linked to environmental factors such associal dominance, social organization, and seasonal changesacross numerous species.",
      "Selection could occur at multiple levels, from germ cell generation and propagation to fertilization and early embryonic growth.Chromosomal abnormalities, including aneuploidy, were found in 10-20% of spermatozoa and oocytes (20) and in the cleaved embryo, with a 21% rate of abnormalities in preimplantation embryos (21).These findings led to a model for natural selection against chromosome abnormalities (21).Selection extends to the end of gestation: Only approximately 30% of all conceptions result in a live birth, with more than half of aborted fetuses containing chromosomal abnormalities (22), a number likely to be an underestimate because of technological limitations in measuring all possible mutations.But even in the very small fraction of germ cell duos that survive this withering genome attack and result in a live birth, a number of severe de novo mutations will still be found (23).The data on gross chromosomal alterations suggest that overall, mutation frequency early in life is very high.The functional consequence, however, is limited because of selection.Somewhat surprisingly, this picture points toward an initial decline in genomic alterations, allowing the adult individual to acquire a somatic genome optimally equipped to provide function.",
      "We now need to investigate the candidates identified here and how their effects on parental and offspring traits are integrated into thegene networks determining individual development. By controlling for genetic variation in eithermothers or offspring we have been able to show that levels of maternal provisioning and offspringsolicitation are unique to specific genotypes (here each BXD line) and that solicitation is costly.",
      "Sex brings harmful alleles together into thesame genetic background, allowing selection to more efficiently purge them fromthe population and potentially producing some offspring that are fitter than eitherparent. However, the benefit of recombining deleterious mutations may depend on thenature of the epistatic interactions between them. The mutational deterministic hypothesis(Kondrashov 1988) depends partly on this epistasis.In most plants and animals, sexis a necessary component of reproduction, and the question for evolutionary biologistsis why reproductive mechanisms have evolved that way. In one of the experimentsdescribed next, evolutionary geneticists have nevertheless devised a way to compareevolution with and without recombination in the obligately sexual fruit fly.This disparity in investment is the basis for the twofold cost: asexualfemales hypothetically could transmit twice as many alleles at the same cost. In most plants and animals, mates tend to be unrelated, leading to outcrossing. Butsex usually also involves the basic process of physical recombination: the breakage andreunion of two different DNA or RNA molecules. Of these two processes, recombinationis clearly the more widespread feature of sexual reproduction. A variety of reproductivesystems, such as selfing and automixis, involve recombination but not outcrossing. Incontrast, relatively few reproductive systems have outcrossing without recombination.Longago, Wright (1931) noted that sex may destroy adaptation because a successful combination of characteristics is attained in individuals only to be broken up in the next generation by the mechanisms of meiosis itself. Similarly, if alleles at different loci werejointly responsible for the production of phenotypes, sex has the potential to break apartcoadapted gene complexes, as it moves alleles away from genetic backgrounds wherebeneficial epistatic interactions have evolved through natural selection. Why should sex therefore be so common, given the obvious costs?",
      "The reason for the rarity of these mutations is natural selection: If the mutations result in disorders that decrease health and reproductive fitness, they will eventually be eliminated from a population.In exceptional cases, mutations may cause both beneficial and detrimental consequences, resulting in opposing forces of positive selection and negative selection that may cause the mutations to be preserved at nonrare frequencies in a population.For example, the HbS mutation in the HBB gene (which produces the  subunit of hemoglobin) causes sickle cell disease when present in both alleles, a detrimental consequence, but protects against malaria when present in 1 allele, a beneficial consequence, ensuring that the mutation persists in populations in areas of the world where malaria is endemic.Genes are passed from parents to offspring via the process of meiosis by which gametes, the egg cells in the mother and the sperm cells in the father, are generated.Ordinarily, each cell has 23 pairs of chromosomes; the gametes have 23 unpaired chromosomes.In meiosis, the 23 pairs are split so that each gamete receives 1 chromosome from each pair (Figures 8 and 9).Two gametes (egg and sperm) ultimately join into a single cell, the zygote, which has the full complement of 23 chromosome pairs restored.If all goes well, the zygote gives rise to a live offspring.Crossing over-The swapping of genetic material that occurs in the germline.During the formation of egg and sperm cells, also known as meiosis, paired chromosomes from each parent align so that similar DNA sequences from the paired chromosomes cross over one another.Crossing over results in a shuffling of genetic material and is an important cause of the genetic variation seen among offspring.This process is also known as meiotic recombination.Recombination (meiotic recombination)-The swapping of genetic material that occurs in the germline.During the formation of egg and sperm cells, also known as meiosis, paired chromosomes from each parent align so that similar DNA sequences from the paired chromosomes recombine with one another.Recombination results in a shuffling of genetic material and is an important cause of the genetic variation seen among offspring.Also known as crossing over.",
      "Sex brings harmful alleles together into thesame genetic background, allowing selection to more efficiently purge them fromthe population and potentially producing some offspring that are fitter than eitherparent. However, the benefit of recombining deleterious mutations may depend on thenature of the epistatic interactions between them. The mutational deterministic hypothesis(Kondrashov 1988) depends partly on this epistasis.In most plants and animals, sexis a necessary component of reproduction, and the question for evolutionary biologistsis why reproductive mechanisms have evolved that way. In one of the experimentsdescribed next, evolutionary geneticists have nevertheless devised a way to compareevolution with and without recombination in the obligately sexual fruit fly.This disparity in investment is the basis for the twofold cost: asexualfemales hypothetically could transmit twice as many alleles at the same cost. In most plants and animals, mates tend to be unrelated, leading to outcrossing. Butsex usually also involves the basic process of physical recombination: the breakage andreunion of two different DNA or RNA molecules. Of these two processes, recombinationis clearly the more widespread feature of sexual reproduction. A variety of reproductivesystems, such as selfing and automixis, involve recombination but not outcrossing. Incontrast, relatively few reproductive systems have outcrossing without recombination.Longago, Wright (1931) noted that sex may destroy adaptation because a successful combination of characteristics is attained in individuals only to be broken up in the next generation by the mechanisms of meiosis itself. Similarly, if alleles at different loci werejointly responsible for the production of phenotypes, sex has the potential to break apartcoadapted gene complexes, as it moves alleles away from genetic backgrounds wherebeneficial epistatic interactions have evolved through natural selection. Why should sex therefore be so common, given the obvious costs?",
      "However, we expect thatonly at this level, the most signicant contributions brought by integrating epigenetics will bemade. Concluding Remarks and FutureProspectsFish sex ratios are the result of a complex interaction between genetic, biochemical, and environmental interactions. The ultimate resultof these interactions at the individual level isgender: male or female. However, at the population level, the combination of sex determination and differentiation sets the sex ratio. Inturn, sex ratios dene the reproductive capacityof populations and, if sex growth dimorphismexists, also the growth characteristics, something very important in an aquaculture context.Traditionally, it has been agreed that thenal sex of an individual (phenotypic sex)depends on two sequential processes: the sexdetermination system of the species and thegonad differentiation process (Valenzuela,2008). However, recently, these two seeminglydistinct processes are viewed as part of a general process leading to gonad formation andsex ratios (Sarre et al. , 2004; Quinn et al. , 2011;Uller and Helantera, 2011).",
      "Obehav is, in turn, influenced by offspring genesand environment (Ogene and Oenvir respectively). Hence, indirect genetic effects (blue arrows)and direct genetic effects (red arrow) are important influencers of behaviour. B) Parentoffspring conflict theory predicts that parental resource investment and offspring solicitationbehaviours are influenced by the fitness benefit to a focal individual (O), cost to a socialpartner such as a sibling (S1 and S2) or parent (P), and by their coefficient of relatedness(black arrows). 42Figure 2: Genomic imprinting can result in divergent phenotypes from the samegenotype. A) A paternally imprinted gene, i.e. maternally expressed.",
      "Therefore, the resulting phenotypic patterns lag a generationbehind the genetic transmission of the causal variants. The most well-studied parental genetic effectsare caused by deposition of maternal transcripts into the egg prior to fertilization, resulting indifferences in early embryonic development depending on the genotype of the mother. Certain geneshave also been shown to respond to maternal influence after birth through genetically definedmaternal behaviors (Weaver et al. , 2004).Because of the small contribution, through the sperm, ofthe paternal transcriptome to the fertilized zygote, and because of the stronger maternal contributionto child rearing in most model organisms, parental effects are typically thought of as synonymous withmaternal effects, although true paternal effects are known to exist (Rando, 2012). Maternal effects have been shown to be important during embryonic development, leading todifferences in the birth weight of mice depending on the genotype of the mother (Cowley et al. ,1989; Wolf et al. , 2011).",
      "It was believed by many that for each trait variant we should expect to find acorresponding genetic change, or gene for that trait. Through historical happenstance therelationship between genes and traits was set up and treated as if it were one-to-one. But theproduction of a trait involves not only genes, but also their interactions with each other and theenvironment, and chance."
    ],
    [
      "The hierarchical organization of GNs main Select and Search menu issimple and makes it relatively easy to find relevant data sets (Fig. 1). To get data, afteropening the browser, select the most appropriate Species from the dropdown menu. For anopen-ended search of phenotypes you can also select All Species at the bottom of the menu. The next steps are to select the Group, Type, and Data Set from the drop-down menus. Formany groups, a combination of phenotypes, genotypes, and molecular data are available.",
      "GeneNetwork contains data from awide range of species, from humans to soybeans, but most of the available phenotypic data isfrom mice. Within the mouse dataset there are groups of families, crosses, non-geneticgroupings, and individual data. The type of dataset must be selected after defining the speciesand sample population. While genotypes, mRNA, methylated DNA, protein, metagenomic, and2bioRxiv preprint doi: https://doi.org/10.1101/2020.12.23.424047; this version posted December 24, 2020. The copyright holder for this preprint(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. metabolome datasets are available (i.e.",
      "The hierarchical organization of GNs main Select and Search menu issimple and makes it relatively easy to find relevant data sets (Fig. 1). To get data, afteropening the browser, select the most appropriate Species from the dropdown menu. For anopen-ended search of phenotypes you can also select All Species at the bottom of the menu. The next steps are to select the Group, Type, and Data Set from the drop-down menus. Formany groups, a combination of phenotypes, genotypes, and molecular data are available.",
      "Search and Data RetrievalPoint your browser to www.genenetwork.org. This brings you by default tothe Search page, from which you can retrieve data from many GN data sets. We will focus on the default data set, defined by Species: Mouse, Group: BXD,Type: Whole Brain, Database: INIA Brain mRNA M430 (Apr05) PDNNEnter Kcnj* into the ALL or ANY field and click the Search button. Notethe location and annotation of available potassium channel genes in the SearchResults page that opens. Use the browser Back button to return to previous page.",
      "Addinformation on data provenance by giving details in Investigation, Protocols and ProtocolApplicationsCustomize Customize my XGAP database with extended variants of Trait and Subject. In the online XGAP demonstrator, Probe traits have asequence and genome location and Strain subjects have parent strains and (in)breeding method. Describe extensions using MOLGENISlanguage and the generator automatically changes XGAP database software to your researchUploadUpload data from measurement devices, public databases, collaborating XGAP databases, or a public XGAP repository with communitydata.However, a suitable and customizable integration ofthese elements to support high throughput genotype-tophenotype experiments is still needed [34]: dbGaP, GeneNetwork and the model organism databases aredesigned as international repositories and not to serveas general data infrastructure for individual projects;many of the existing bespoke data models are too complicated and specialized, hard to integrate between profiling technologies, or lack software support to easilyconnect to new analysis tools; and customization of theexisting infrastructures dbGaP, GeneNetwork or otherinternational repositories [35,36] or assembly of Bioconductor and generic model organism database components to suit particular experimental designs, organismsand biotechnologies still requires many minor andsometimes major manual changes in the software codethat go beyond what individual lab bioinformaticianscan or should do, and result in duplicated effortsbetween labs if attempted.",
      ", 2014; see Section 9). GeneNetwork is a database that enables searching for 4000 phenotypes from multiple studies in the BXD, HXB, and in other recombinant inbred rodent families, as well as in other model organismsand even humans (Mulligan et al. , 2017). GeneNetwork employed asomewhat dierent strategy than MPD in that it did not rely solely onresearchers submitting their data. Instead the database operators extracted the data from the scientic literature and integrated them into auniform format (Chesler et al. , 2003).",
      "GeneNetwork contains data from awide range of species, from humans to soybeans, but most of the available phenotypic data isfrom mice. Within the mouse dataset there are groups of families, crosses, non-geneticgroupings, and individual data. The type of dataset must be selected after defining the speciesand sample population. While genotypes, mRNA, methylated DNA, protein, metagenomic, and2bioRxiv preprint doi: https://doi.org/10.1101/2020.12.23.424047; this version posted December 24, 2020. The copyright holder for this preprint(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. metabolome datasets are available (i.e.",
      "However, a suitable and customizable integration of these elementsto support high throughput genotype-to-phenotype experiments is stillneeded[340]: dbGaP, GeneNetwork and the model organism databasesare designed as international repositories and not to serve as generaldata infrastructure for individual projects; many of the existing bespokedata models are too complicated and specialized, hard to integrate between proling technologies, or lack software support to easily connectto new analysis tools; and customization of the existing infrastructuresdbGaP, GeneNetwork or other international repositories[384, 154] orassembly of Bioconductor and generic model organism database components to suit particular experimental designs, organisms and biotechnologies still requires many minor and sometimes major manual changes382.1.",
      "All data presented in this paper were deposited in the online databaseGeneNetwork (www.genenetwork.org), an open web resource that containsgenotypic, gene expression, and phenotypic data from several genetic referencepopulations of multiple species (e.g. mouse, rat and human) and various celltypes and tissues.35;36 It provides a valuable tool to integrate gene networks andphenotypic traits, and also allows cross-cell type and cross-species comparativegene expression and eQTL analyses.",
      "There is a good chance that you will be able to apply these newtechniques to specific problems, even while you read. If you have a computer with anInternet connectionso much the better, and you can read and work along at the same time. This short review and primer will take you on a tour of a web site called GeneNetwork thatembeds many large data sets that are relevant to studies of behavioral variation. GeneNetwork is an unusual site because it contains a coherent \"universe\" of data, as well asmany powerful analytic tools.",
      "The GeneNetwork database provides open accessto BXD and other RI strain derived microarray data, single nucleotide polymorphism (SNP) data,and phenotypic data for quantitative trait loci analysis and gene expression correlation analyses. Gene expression data were exported for manually selected probes in the PDNN hippocampusdatabase (Hippocampus Consortium M430v2), and the PDNN whole brain database (INIA BrainmRNA M430). The Hippocampus database was chosen as one of the most elaborate brain databases,as well as most highly recommended dataset on GeneNetwork itself (http://www.genenetwork.org/webqtl/main.py?FormID=sharinginfo&GN_AccessionId=112).",
      "2016) and canalso be accessed in GeneNetwork by entering Record ID 18494 in the Get Anyspace on the Search page and clicking on the Search button. Alternatively, enterdata by hand into the designated boxes provided by GeneNetwork. These latteroptions also allow for the inclusion of trait variance. It is a good idea to namethe trait in the box provided. Then click Next, and manually enter the data foreach RI strain, F1, and founder strain. 3Author ManuscriptAfter entering the data, click on the blue plus sign button called Add.To submit multiple phenotypes at the sametime, select the option for Batch Submission under the Home tab. This allowsusers to submit up to 100 traits for analysis by GeneNetwork. Here, select BXDas the cross or RI set to analyze from the first pull-down menu. The phenotypefile should follow the format described in the Sample text (http://genenetwork.org/sample.txt). After uploading the appropriate file using theBrowse button, enter a name for the file in the Dataset space. The data will bestored in the GeneNetwork server for 24 hours. Click Next.Author ManuscriptMaterialsHere we will provide detailed instructions for using GeneNetwork along with someworked examples taken from the recent study of intravenous cocaine self-administrationby Dickson et al. (2016) in BXD RI mice. A complete overview of GeneNetwork is beyondthe scope of this protocol, but is extensively covered in elsewhere (see Mulligan et al. 2016;Williams & Mulligan 2012 for excellent reviews on GeneNetwork). A computer with an internet connection and current web browser. See the GeneNetwork.orgsite for information on supported browser versions. Author ManuscriptMethodEntering DataAuthor Manuscript1Link to http://www.genenetwork.org.",
      "Species in GenAge model organisms",
      "Data are reviewed before entry inGeneNetwork by the senior author. Phenotypes are currently split into 15 broadphenotypic categories (Supplementary Data 1). Phenome curation and descriptionwas initiated by R.W.W. and Dr Elissa Chesler in 2002 by literature review and dataextraction. The early work is described briey in Chesler et al.51,52. Most work overthe past 5 years has been performed by two of the coauthors (R.W.W. andM.K.M.). We have used a controlled vocabulary and set of rules described here(http://www.genenetwork.org/faq.html#Q-22).",
      "9) To bring your data to GeneWeaver,click on the GeneWeaver icon, making sure to be previouslylogin to your GeneWeaver account. You will be brought to theGeneSet upload page with the Genes Uploaded and theGeneweaver Analysis Platform139Fig. 5 Default settings at GeneNetwork.org are set to search Mouse, Phenotypes, from among the BXDPublished Phenotypes data set. Here the term nociception was searched forFig. 6 The search results page in GeneNetwork showing the 33 records retrieved from the phenotype searchfor nociception.Users may also share their data with other users selectively,make it public, or keep it restricted to a private account. Data can beimported by users, uploading their gene set data directly or exporting to GeneWeaver from within another online resource such asNeuro Informatics Framework (NIF) [8], Grappa [9], MousePhenome Database (MPD) [10] or GeneNetwork [11]. These datasets can then be added to your collection to be analyzed togetherwith other gene sets retrieved from the GeneWeaver database. To begin a GeneWeaver analysis a user must collect GeneSetstogether in a Project.Alternatively the spreadsheet can be saved as a .txt fileand uploaded by clicking on Switch to file upload. Oncecomplete click on upload GeneSet. 7. Once completed you are taken to the GeneSet detail page. Ifthere are errors in your uploaded data you can correct them byclicking on Edit. 8. Use the Add Selected to Project, and create a new project, e.g. Chronic Cocaine. 9. Now using the Search function populate this project with additional gene sets related to this study trying Queries such asCocaine Addiction, Chronic Cocaine."
    ],
    [
      "Figure 3 | Association statistics from one of the five type 2 diabetes genome-wide association studies 20 .The y axis represents the -log10 p value and the x axis represents each of the ~400,000 SNPs used in this scan.The point of each arrow indicates the location of the most strongly associated SNP in each of nine known type 2 diabetes gene regions.Two signals, in SLC30A8 and TCF2, were not captured on the Affymetrix chip.The plot was generated using Haploview.CDKAL1, CDK5 regulatorysubunit-associated protein 1-like 1; CDKN2, cyclin-dependent kinase inhibitor 2A; FTO, fat mass and obesity-associated; HHEX, haematopoietically expressed homeobox; IDE, insulin-degrading enzyme; IGF2BP2, insulin-like growth factor 2 mRNA-binding protein 2; KCNJ11, potassium inwardly-rectifying channel, subfamily J, member 11; PPARG, peroxisome proliferator-activated receptor- gene; SLC30A8, solute carrier family 30 (zinc transporter), member 8; TCF2, transcription factor 2, hepatic; TCF7L2, transcription factor 7-like 2 (T-cell specific, HMg-box).",
      ", for the Diabetes Genetics Replication And Meta-analysis (DIAGRAM) Consortium 9Genome-wide association (GWA) studies have identified multiple loci at which common variants modestly but reproducibly influence risk of type 2 diabetes (T2D) [1][2][3][4][5][6][7][8][9][10][11] .Established associations to common and rare variants explain only a small proportion of the heritability of T2D.As previously published analyses had limited power to identify variants with modest effects, we carried out meta-analysis of three T2D GWA scans comprising 10,128 individuals of European descent and B2.2 million SNPs (directly genotyped and imputed), followed by replication testing in an independent sample with an effective sample size of up to 53,975.We detected at least six previously unknown loci with robust evidence for association, including the JAZF1 (P  5.0  10 -14 ), CDC123-CAMK1D (P  1.2  10 -10 ), TSPAN8-LGR5 (P  1.1  10 -9 ), THADA (P  1.1  10 -9 ), ADAMTS9 (P  1.2  10 -8 ) and NOTCH2 (P  4.1  10 -8 ) gene regions.Our results illustrate the value of large discovery and follow-up samples for gaining further insights into the inherited basis of T2D. [3][4][5]7,10 , for the Diabetes Genetics Replication And Meta-analysis (DIAGRAM) Consortium 9Genome-wide association (GWA) studies have identified multiple loci at which common variants modestly but reproducibly influence risk of type 2 diabetes (T2D) [1][2][3][4][5][6][7][8][9][10][11] .Established associations to common and rare variants explain only a small proportion of the heritability of T2D.As previously published analyses had limited power to identify variants with modest effects, we carried out meta-analysis of three T2D GWA scans comprising 10,128 individuals of European descent and B2.2 million SNPs (directly genotyped and imputed), followed by replication testing in an independent sample with an effective sample size of up to 53,975.We detected at least six previously unknown loci with robust evidence for association, including the JAZF1 (P  5.0  10 -14 ), CDC123-CAMK1D (P  1.2  10 -10 ), TSPAN8-LGR5 (P  1.1  10 -9 ), THADA (P  1.1  10 -9 ), ADAMTS9 (P  1.2  10 -8 ) and NOTCH2 (P  4.1  10 -8 ) gene regions.Our results illustrate the value of large discovery and follow-up samples for gaining further insights into the inherited basis of T2D.",
      "Although these are considered to be loci convincingly associated with susceptibility to type 2 diabetes in populations of European descent, other genes related to susceptibility to the disease are probably still unidentified, particularly those for populations of other ancestries.In order to uncover genetic variants that increase the risk of type 2 diabetes, we conducted a genome-wide association study in Japanese individuals with type 2 diabetes and unrelated controls.We first genotyped 268,068 SNPs, which covered approximately 56% of common SNPs in the Japanese, in 194 individuals with type 2 diabetes and diabetic retinopathy (case 1) and in 1,558 controls (control 1) collected in the BioBank Japan.We compared the allele frequencies of 207,097 successfully genotyped SNPs and selected the 8,323 SNPs showing the lowest P values.We then attempted to genotype these 8,323 SNPs in 1,367 individuals with type 2 diabetes and diabetic retinopathy (case 2) and for 1,266 controls (control 2) (stage 2), and successfully obtained data for 6,731 SNPs (the P value distribution in the second test is shown in Supplementary Fig. 1a online).The results of principal component analysis 8 in the stage 1 and 2 samples and HapMap samples revealed that there was no evidence for population stratification between the case and control groups throughout the present tests (Supplementary Fig. 1b,c).We selected the 9 SNP loci showing P values o0.0001 (additive model in stage 2, Table 1) and genotyped a third set of cases and controls comprising 3,557 Japanese individuals with type 2 diabetes (cases 3,4,5) and 1,352 controls (controls 3,4).We evaluated the differences in the population structure among these three sets of case and two sets of control groups by Wright's F test.As the results indicated that there was no difference in the population structure among these groups (Supplementary Table 1b online), we combined these populations for the third test of case-control study.The third set of analysis identified the significant associations for six SNPs (Table 1), including the CDKAL1 locus at 6p22.3 (rs4712524, rs9295475 and rs9460546), the IGF2BP2 locus at 3q27.2 (rs6769511 and rs4376068) and the KCNQ1 locus at 11p15.5 (rs2283228).The remaining three SNPs (rs13259803, rs612774 and rs10836097) had P values of 40.05 in the third test and were not further examined.CDKAL1 and IGF2BP2 were previously reported as susceptibility genes for type 2 diabetes in the Japanese population 9 .Therefore, we focused on the KCNQ1 locus, which was highly associated with type 2 diabetes.",
      "Genome-wide association studies (GWAS) have recently revealed many novel SNPs associated with type 2 diabetes.These include SNPs located in the regions near TCF7L2, HHEX-IDE, EXT2, FTO, SLC30A8, IGF2BP2, CDKAL1, and CDKN2A-CDKN2B [8][9][10][11][12][13].A second phase of studies identified many additional variants, including those near JAZF1, TSPAN8-LGR5, THADA, ADAMTS9, NOTCH2-ADAM30, CDC123-CAMK1D, and KCNQ1 [14,15].The two genes in which common variants were previously convincingly associated with type 2 diabetes, PPARG and KCNJ11, were also identified in these GWAS [12,16,17].More recently, numerous other SNPs have been identified in additional GWAS and meta-analyses [18].",
      "As noted by Below et al. and others [34,44], there is a significant enrichment of eQTLs among top type 2 diabetesassociated loci.Genetic heritability estimates for type 2 diabetes are markedly higher than can be explained by the variation identified to date; to characterize this Bmissing^heritability, Torres et al. composed multiple SNP subsets by partitioning interrogated maker sets into groups by status as eQTL in several insulin-responsive peripheral tissues [45].They discovered that these subsets explain a greater portion of type 2 diabetes risk than expected by chance, suggesting a significant role of regulatory variation in diabetes susceptibility.Several reasons have been suggested as to why so much of the genetic heritability of type 2 diabetes remains unmapped to risk loci [46].Conclusive identification of less common (0.5-5 % MAF) variation of modest effect will require investments in extremely large sample sizes.The heterogeneous nature of Hispanic populations increases the challenge because to detect variation or effects specific to groups or environments may require sample sizes beyond what exist to be collected.There is evidence that parent of origin may influence effects of variants on type 2 diabetes risk [47].Studies in mouse models also demonstrate that some genetic effects on type 2 diabetes and related traits are modified by sex, diet, and epigenetic effects, indicating that careful environmental modelling and stratification will be necessary to identify some loci subject to interaction effects [48].Genetic characterizations of larger Hispanic samples are underway, but especially in the case of extremely rare or private variation, a return to family-based study designs will improve power through enrichment of allelic observations and increased environmental and genetic homogeneity [49].] performed a GWA study for lipid traits in a cohort of 3642 Hispanic participants from the Women's Health Initiative SNP Health Association Resource (WHI-SHARe) and reported genome-wide significant signals within or near the genes GCKR, LPL, and APOA/APOC for TG and CETP and APOA/APOC for HDL-C.These authors also showed that there is a substantial overlap in the genes associated with lipid traits in different population groups.When testing the markers showing genome-wide significance or suggestive evidence of association (p  10 5 ) in European GWA studies in the Hispanic and African American WHI cohorts, a strong enrichment of small p values was observed in both cohorts.Additionally, there was a significant correlation of the allelic effects of markers with p  10 5 identified in Europeans in the Hispanic and African American cohorts.The genomic regions showing association in Europeans accounted for a disproportionate amount of variance in both cohorts.",
      "Attempts to apply similar approaches to families in which either common forms of diabetes or obesity is segregating have proved to be largely unrewarding, 5 and the second wave of discovery involved a switch to tests of association.Although intrinsically more powerful than linkage analysis, association analysis suffers from the disadvantage that the signal can be detected only if one examines the causal variant itself or a nearby marker with which it is tightly correlated.Until the advent of methods that enabled genomewide surveys of association, researchers were therefore obliged to direct their attention to specific candidate variants or genes of interest. 6In retrospect, it is obvious that most such studies were seriously underpowered or focused on inappropriate candidates. 6Nevertheless, by accruing data over the course of multiple studies, some genuine susceptibility variants were identified.Common coding variants in PPARG and KCNJ11 (each of which encodes a protein that acts as a target for classes of therapeutic agents widely used in diabetes management) were shown to have modest effects on the risk of type 2 diabetes. 7,8Resequencing of the gene encoding the melanocortin-4 receptor (MC4R) resulted in the identification of low-frequency coding variants that explain approximately 2 to 3% of cases of severe obesity. 9he third, and most successful, wave of discovery has been driven by systematic, large-scale surveys of association between common DNA sequence variants and disease.The first demonstration that unbiased discovery efforts could reveal new insights into the pathogenesis of type 2 diabetes resulted from identification of the association between type 2 diabetes and variants within TCF7L2 (encoding transcription factor 7-like 2, a protein not previously identified as a biologic candidate). 10TCF7L2 has now been shown to modulate pancreatic islet function. 113][14][15][16][17][18] Together, these studies revealed six new associations, including variants near CDKAL1, CDKN2A, and CDKN2B (which encode putative or known regulators of cyclin-dependent kinases) and HHEX (which is transcribed into a homeobox protein implicated in beta-cell development).Typically each copy of a susceptibility allele at one of these loci is associated with a 15 to 20% increase in the risk of diabetes.Since then, the dominant approach to discovery has involved ever-larger aggregations of genomewide association data from multiple samples so as to improve the power to identify variants of modest effect: these studies have revealed more than 20 additional confirmed signals of susceptibility to type 2 diabetes [19][20][21][22] (Table 1 and Fig. 1).Though early studies were restricted to samples obtained from persons of European descent, genomewide association analyses conducted in other ethnic groups are now emerging. 23,24,29he current total of approximately 40 confirmed type 2 diabetes loci includes variants in or near WFS1 (wolframin) and the hepatocyte nuclear factors HNF1A and HNF1B (genes that also harbor rare mutations responsible for monogenic forms of diabetes) [30][31][32][33] ; the melatonin-receptor gene MTNR1B (which highlights the link between circadian and metabolic regulation) [26][27][28] ; and IRS1 (encoding insulin-receptor substrate 1), one of a limited number of type 2 diabetes loci with a primary effect on insulin action rather than on secretion. 25enomewide association studies of genetic variants influencing body-mass index (BMI) and obesity have been similarly productive, with three main strategies being adopted (Table 2 and Fig. 2).Genomewide association studies of population-based samples to examine the full range of BMI values have identified approximately 30 loci influencing BMI and the risk of obesity.The strongest signal remains the association with variants within FTO (the fat-mass and obesity-related gene). 13,34,45Other signals near BDNF, SH2B1, and NEGR1 (all implicated in aspects of neuronal function) reinforce the view of obesity as a disorder of hypothalamic function. 35,37,38,43A second approach, focusing on case-control analysis of persons selected from the extremes of the BMI distribution, has delivered a complementary, only partly overlapping, set of loci. 39,42,46,47Finally, genomewide analyses of patterns of fat distribution, prompted by the particularly deleterious health effects of visceral fat accumulation, have characterized approximately 15 loci that are largely distinct from those influencing overall adiposity 36,40,41,44 : many of the 15 display markedly stronger associations in women than in men.",
      "Recent large genome-wide association studies (GWAS) have identified multiple loci which harbor genetic variants associated with type 2 diabetes mellitus (T2D), many of which encode proteins not previously suspected to be involved in the pathogenesis of T2D.Most GWAS for T2D have focused on populations of European descent, and GWAS conducted in other populations with different ancestry offer a unique opportunity to study the genetic architecture of T2D.We performed genome-wide association scans for T2D in 3,955 Chinese (2,010 cases, 1,945 controls), 2,034 Malays (794 cases, 1,240 controls), and 2,146 Asian Indians (977 cases, 1,169 controls).In addition to the search for novel variants implicated in T2D, these multi-ethnic cohorts serve to assess the transferability and relevance of the previous findings from European descent populations in the three major ethnic populations of Asia, comprising half of the world's population.Of the SNPs associated with T2D in previous GWAS, only variants at CDKAL1 and HHEX/IDE/KIF11 showed the strongest association with T2D in the meta-analysis including all three ethnic groups.However, consistent direction of effect was observed for many of the other SNPs in our study and in those carried out in European populations.Close examination of the associations at both the CDKAL1 and HHEX/IDE/KIF11 loci provided some evidence of locus and allelic heterogeneity in relation to the associations with T2D.We also detected variation in linkage disequilibrium between populations for most of these loci that have been previously identified.These factors, combined with limited statistical power, may contribute to the failure to detect associations across populations of diverse ethnicity.These findings highlight the value of surveying across diverse racial/ethnic groups towards the fine-mapping efforts for the casual variants and also of the search for variants, which may be population-specific.",
      "Discovery of Novel Genes Associated With Type 1 DiabetesMellitus A total of 24,984 genes were analyzed in the initial gene-based GWAS.Three quantile-quantile plots for gene-based P-values, SNP-based P-values inside genes and SNP-based P-values outside genes are shown in Figure 1.We observed dramatic deviations at the tails of the distributions for the three plots.The deviation was much stronger for the plot of gene-based P-values than the other two plots, suggesting relatively higher power for gene-based association analysis.In short, through a gene-based association study, we identified 183 type 1 diabetes mellitus-associated genes that were insignificant in the original SNP-based association tests.Among the 183 genes, 171 genes are 'novel' genes identified for type 1 diabetes mellitus.Replication studies and/or differential expression studies further supported the significance of 53 genes to type 1 diabetes mellitus.In particular, four non-HLA genes (RASIP1, STRN4, BCAR1 and MYL2) and three HLA genes (FYN, HLA-J and PPP1R11) were validated by both replication and differential expression studies.",
      "Genome-wide association studies (GWAS) have identified >100 independent SNPs that modulate the risk of type 2 diabetes (T2D) and related traits.However, the pathogenic mechanisms of most of these SNPs remain elusive.Here, we examined genomic, epigenomic, and transcriptomic profiles in human pancreatic islets to understand the links between genetic variation, chromatin landscape, and gene expression in the context of T2D.We first integrated genome and transcriptome variation across 112 islet samples to produce dense cis-expression quantitative trait loci (cis-eQTL) maps.Additional integration with chromatin-state maps for islets and other diverse tissue types revealed that cis-eQTLs for islet-specific genes are specifically and significantly enriched in islet stretch enhancers.High-resolution chromatin accessibility profiling using assay for transposase-accessible chromatin sequencing (ATACseq) in two islet samples enabled us to identify specific transcription factor (TF) footprints embedded in active regulatory elements, which are highly enriched for islet cis-eQTL.Aggregate allelic bias signatures in TF footprints enabled us de novo to reconstruct TF binding affinities genetically, which support the high-quality nature of the TF footprint predictions.Interestingly, we found that T2D GWAS loci were strikingly and specifically enriched in islet Regulatory Factor X (RFX) footprints.Remarkably, within and across independent loci, T2D risk alleles that overlap with RFX footprints uniformly disrupt the RFX motifs at high-information content positions.Together, these results suggest that common regulatory variations have shaped islet TF footprints and the transcriptome and that a confluent RFX regulatory grammar plays a significant role in the genetic component of T2D predisposition.",
      "Attention turned instead to association approaches in larger, unrelated samples sets (Merikangas and Risch 2003).Association analyses, however, rely upon typing the causal variant or a closely correlated proxy, and hence, initial efforts were constrained by practical limitations of genotyping cost and capacity to the evaluation of variants within pre-defined candidate genes.Nonetheless, this approach heralded the first wave of robustly associated variants.For T2D, non-synonymous variants in genes encoding the targets of two drugs widely used in T2D management [P12A in PPARG (Altshuler et al. 2000) for thiazolidinediones and E23K in KCNJ11 (Gloyn et al. 2003) for sulfonylureas] showed consistent, though modest (per-allele odds ratios of *1.2), evidence of association with disease risk.For obesity, variants within two genes already known to harbour mutations implicated in monogenic obesity-MC4R (V103I, I251L) and PCSK1 (N221D, Q665E-S690T)-were shown to be associated with common obesity risk (Heid et al. 2005;Geller et al. 2004;Benzinou et al. 2008).However, the candidate gene approach is restricted by its intrinsic reliance upon prior knowledge and expectation.When, as with T2D and obesity, our understanding of disease pathogenesis is imperfect, there is a manifest need to extend the search for susceptibility variants across the entire genome in an unbiased, hypothesis-free manner.The first gene to be implicated in T2D susceptibility without prior biological candidacy was TCF7L2, discovered following systematic association analysis across a region of previously identified linkage (Grant et al. 2006).The most strongly associated variants at this locus have the greatest effect on T2D susceptibility of any common variant so far identified.",
      "Identifying the genetic variants that increase the risk of type 2 diabetes (T2D) in humans has been a formidable challenge.Adopting a genome-wide association strategy, we genotyped 1161 Finnish T2D cases and 1174 Finnish normal glucose-tolerant (NGT) controls with >315,000 single-nucleotide polymorphisms (SNPs) and imputed genotypes for an additional >2 million autosomal SNPs.We carried out association analysis with these SNPs to identify genetic variants that predispose to T2D, compared our T2D association results with the results of two similar studies, and genotyped 80 SNPs in an additional 1215 Finnish T2D cases and 1258 Finnish NGT controls.We identify T2D-associated variants in an intergenic region of chromosome 11p12, contribute to the identification of T2D-associated variants near the genes IGF2BP2 and CDKAL1 and the region of CDKN2A and CDKN2B, and confirm that variants near TCF7L2, SLC30A8, HHEX, FTO, PPARG, and KCNJ11 are associated with T2D risk.This brings the number of T2D loci now confidently identified to at least 10.",
      "Large-scale association studies conducted by DIAGRAM, in individuals overwhelmingly of European descent, have reported 65 lead SNPs associated with susceptibility to type 2 diabetes (1). Figure 1 illustrates how these SNPs and closely correlated proxy SNPs were systematically selected for allelic expression analysis.In brief, 1,525 proxy SNPs (r 2 .0.8, CEU, 1,000 Genomes Phase 1) were found.Of these SNPs (lead + proxies), 45/1,590 (2.8%) map to exons of 23 human RefSeq genes.For 18 of these genes, TaqMan SNP genotyping assays could be designed to map entirely to exonic sequence, thus allowing for amplification and measurement of mature (i.e., spliced) mRNA species and normalization of allelic expression using genomic DNA from the same individual.After exclusion of SNPs with ,4 heterozygotes (rs1801282, PPARG; rs3734621, KIF6) and assays where .50%cDNA samples yielded Ct values .36(rs2793823, ADAM30; rs7377, SRGN), indicating very low levels of gene expression, allelic expression could be determined for 14 genes in samples from 36 white nondiabetic donors.",
      "Associations of seven genes with type 2 diabetes in Chinese and Korean populationsRESULTSWe genotyped 13 representative SNPs from 7 genes implicated in type 2 diabetes in recent GWA studies in 3,041 type 2 diabetic case subjects and 3,678 nondiabetic control subjects from a Chinese population in Hong Kong and two Korean populations.The clinical characteristics of the subjects are summarized in Table 1.Table 2 showed the meta-analyses of type 2 diabetes association under a log additive model.There was no heterogeneity of ORs among the three study populations except for CDKN2A/B (rs10811661) (Q statistic P  0.03), with a random effect OR of 1.32 (1.15-1.52).Apart from two SNPs at CDKN2A/B (rs564398 and rs1333040), all other 11 SNPs were significantly associated with type 2 diabetes, with ORs ranging from 1.09 to 1.35 (1.3  10 12  P  0.016) in the combined samples (Table 2).Eight of the 11 SNPs remained significant after adjustment for multiple comparison by permutation (1.0  10 4  P empirical  0.012) (Table 2) despite nonsignificance of CDKN2A/B (rs10757278), TCF7L2 (rs7903146), and FTO (rs8050136).Because multiple SNPs with little or moderate linkage disequilibrium at CDKAL1 (r 2  0.56), CDKN2A/B (r 2  0.002-0.31),and HHEX (r 2  0.25-0.55)were studied (Supplementary Table 2), we examined haplotype associations but did not reveal more significant association than single marker analyses (Supplementary Table 3).Further haplotype analyses by conditioning rs7756992 on CDKAL1 haplotypes and rs7923837 on HHEX haplotypes revealed no significant residual associations (P  0.05; data not shown), suggesting that these two SNPs are sufficient to explain the respective multiple associations at CDKAL1 and HHEX.Although residual association was observed after conditioning rs10811661 on CDKN2A/B haplotypes (P  0.023), the much stronger single marker association of rs10811661 compared with rs10757278 (P  1.3  10 12 vs. 0.015; Table 2) suggests the former is the key associated SNP.Taken together, seven key SNPs from these genes were significant without correction for multiple comparisons.In this regard, TCF7L2 (rs7903146) showed the strongest effect on type 2 diabetes risk (OR 1.35), followed by CDKN2A/B (rs10811661), CDKAL1 (rs7756992), HHEX (rs7923837), IGF2BP2 (rs4402960), SLC30A8 (rs13266634), and FTO (rs8050136).These seven SNPs were further examined in the subsequent analyses.DISCUSSIONOur study provides important insights for the impact of the new type 2 diabetes genes identified through GWA studies.To our knowledge, this is the largest replication study in Asians up to now.We confirm the type 2 diabetes association of seven representative risk alleles for these seven genes found in Europeans (3)(4)(5)(6)(7)(8), suggesting many of the variants associated with type 2 diabetes in Europeans are also associated in Asians.These genetic effects seem to be additive.Despite differences in effect size of each gene, a crude estimate suggests up to 3.3-fold increased type 2 diabetes risk in subjects carrying eight or more risk alleles compared with those carrying two or fewer risk alleles (Supplementary Fig. 1).Two adjacent regions near CDKN2A/B have been reported to be associated with type 2 diabetes and cardiovascular diseases.Our data confirm the association of type 2 diabetes for rs10811661, found in the European type 2 diabetes studies (3,4,8), but not rs564398, found only in the Wellcome Trust Case Control Consortium Study (8).In addition, we found that the cardiovascular disease risk loci (rs1333040 and rs10757278) (14 -16) were not associated with type 2 diabetes.",
      "Finally, we examined whether genes identified using our association studies were enriched within diabetes-related pathways.We collated a list of 42 genes to which 53 CpG sites associated with T2D traits (CS score 1.77, combined P < 0.017) mapped.Even in this small dataset, pathway analysis (Supplementary Material, Table S12) indicated significant enrichment in 31 pathways (Fisher's exact P < 0.05), including those related to circadian clock (P = 0.005), adipocytokine signaling (P = 0.009), leptin pathway (P = 0.023), HDL-mediated lipid transport (P = 0.031) and insulin signaling (P = 0.033).",
      "These associations were subsequently confirmed in three reports simultaneously published in Science in April 2007 [13, 14. 23].Although several GWAS had been performed on T2DM in recent years, these latter studies are particularly important because of: the unprecedented joint collaborative effort to combine findings and to perform replication and meta-analysis; the large number of cases examined (each had 1,900 or more cases and controls with a combined number of 14,586 patients and 17,968 controls); and the common European ancestry of all subjects (reviewed in [23]).Despite some differences in the selection of phenotypes, there was remarkable consistency in the genes identified in these studies as linked to T2DM.In addition to replicating positive associations for TCF/L2, KCNJ11, PPARG, HHEX-IDE and SLC30A8, new variants were found in an intron (non-coding, spacer, section of a gene) of cyclin-dependent kinase 5 (CDK5)-regulatory subunit associated protein 1-like 1 (CDKAL1), in an intron of insulin-like growth factor binding protein 2 (IGF2BP2), in non-coding regions near the genes for cyclin-dependent kinase-inhibitor A and B (CDKN2A/B) on chromosome 9, and in the fat mass and obesity associated (FTO) region.",
      "A r t i c l e sBy combining genome-wide association data from 8,130 individuals with type 2 diabetes (T2D) and 38,987 controls of European descent and following up previously unidentified meta-analysis signals in a further 34,412 cases and 59,925 controls, we identified 12 new T2D association signals with combined P < 5  10 8 .These include a second independent signal at the KCNQ1 locus; the first report, to our knowledge, of an X-chromosomal association (near DUSP9); and a further instance of overlap between loci implicated in monogenic and multifactorial forms of diabetes (at HNF1A).The identified loci affect both beta-cell function and insulin action, and, overall, T2D association signals show evidence of enrichment for genes involved in cell cycle regulation.We also show that a high proportion of T2D susceptibility loci harbor independent association signals influencing apparently unrelated complex traits."
    ],
    [
      "Indicative diseases associated with the candidate aging genes",
      "In our study, genes regulated in NF1 blood samples appear to be potentially interesting for understanding the pathogenesis of NF1 and the function of NF1 gene.Adhesion molecules and genes involved in matrix remodeling may provide insights into the mechanism of bone malformations seen in some NF1 patients.Interestingly, six genes down-regulated in NF1 are implicated in craniofacial dysplasia (defects in neural crest development) and long bone dysplasia (defects in mesoderm development).The Armadillo repeat gene, deleted in velocardiofacial syndrome (ARVCF), is involved in some phenotypes associated with velocardiofacial syndrome, an autosomal dominant disorder characterized by cleft palate and facial dysmorphology and conotruncal heart defects [24].Chondroitin 6 sulfotransferase catalyzes the formation of choidroitin sulfate and a deficiency in this gene has been associated with a heritable form of spondyloepiphyseal dysplasia [28].Domain-specific mutations in transforming growth factor beta 1 have been implicated in Camurati -Engelmann disease, an autosomal dominant, progressive diaphyseal dysplasia characterized by hyperostosis and sclerosis of the diaphyses of long bones [12].Core-binding factor alpha1 subunit (CBFA1) is a runtdomain containing gene, mutations of which are responsible for cleidocranial dysplasia, a disease characterized by hypoplasia/aplasia of clavicles, patent fontanelles, supernumerary teeth, short stature, and other changes in skeletal patterning and growth [16].Mutations of TCOF1, are responsible for Treacher Collins Syndrome, one of the most common mandibulofacial dysostosis disorders [34].Finally, mutation of Pax 9 is associated with hypodontia [5,26].",
      "A Syndromic Intellectual Disability Disorder Caused by Variants in TELO2, a Gene Encoding a Component of the TTT Complex. Am J Hum Genet. 2016; 98: 909918. https://doi.org/10.1016/j.ajhg.2016.03.014 PMID: 2713259397. Moosa S, Altmuller J, Lyngbye T, Christensen R, Li Y, Nurnberg P, et al. Novel compound heterozygous mutations in TELO2 in a patient with severe expression of You-Hoover-Fong syndrome. MolGenet Genomic Med. 2017; 5: 580584. https://doi.org/10.1002/mgg3.287 PMID: 2894424098. Choy KR, Watters DJ. Neurodegeneration in ataxia-telangiectasia: Multiple roles of ATM kinase in cellular homeostasis. Developmental Dynamics. John Wiley and Sons Inc.; 2018. pp. 3346. https://doi. org/10.1002/dvdy.24522 PMID: 2854393599.",
      "Human Immunodeficiency-Centromeric Instability-Facial Anomalies (ICF) SyndromeICF syndrome is a rare genetic disorder caused by mutations in one of four identified genes: DNMT3b [57], HELLS [58], CDCA7 [58], and ZBTB24 [59].Most cases of ICF syndrome (55%) are caused by mutation in DNMT3b, the gene coding for the human de novo DNA methyltransferase [60].This disease causes severe immunodeficiency, increased susceptibility to infection, abnormal facial features, and cognitive disabilities [61,62].ICF syndrome is often diagnosed by the presence of stretched and fragile juxtacentromeric heterochromatin on chromosomes 1 and 16 in activated lymphocytes.Perhaps as a result of this phenotype, these chromosomes are more susceptible to breakage, missegregation resulting in aneuploidy, and micronuclei formation [62].Mutation in CDCA7, HELLS, and ZBTB24 also results in DNA methylation defects at alpha-satellites and satellite II DNA that is enriched on juxtacentromeric heterochromatin on chromosomes 1 and 16 [58,63].HELLS and CDCA7 work together in a complex known as CHIRRC (CDCA7-HELLS ICF-Related nucleosome Remodeling Complex) to catalyze nucleosome remodeling, which could modulate the accessibility of DNA for methylation [64].DNA methylation profiles in ICF patients with mutations in any of these three genes are different from that in ICF patients with DNMT3b mutation, indicating that these proteins may not all work in the same pathway [65].All observed ICF patients, however, have hypomethylation of the juxtacentromeric satellite II repeats, leading to the hypothesis that the chromosome fragility and disease symptoms are directly linked to DNA hypomethylation.",
      "A genetic factorsuch as M5ahl8/Tmc1m4 may be involved in the samepathway and could modify the deleterious effects associated with defects in other pathway components (such asFSCN2 and TMC1). D2 mice also have a deleteriousvariant of CDH23 (Noben-Trauth et al. 2003) that is epistatic to the effects of the FSCN2 variant (Johnson et al. 2008). CDH23 is a component of the stereocilia tip link,supporting the view that the polygenic nature of the progressive hearing loss of D2 mice is at least partially due todeficiencies in multiple stereocilia components conferringfunctional integrity.",
      "TCM Gene Disease",
      "To further understand these genes and potential phenotypic consequences, we looked for enrichment of deletions or duplications in a dataset of 29,085 individuals with NDDs and 19,584 controls. 50Six of the pLI >0.9 genes were enriched for deletions in individuals with NDDs (CDC37L1, NFIB, PTPRD, RFX3, SMARCA2, UHRF2), and all 27 were enriched for duplications in individuals with NDDs (BNC2, CDC37L1, CLTA, CNTFR, ELAVL2, MLLT3, NFIB, NOL6, PAX5, PSIP1, PTPRD, RFX3, RNF38, RPS6, RUSC2, SHB, SMARCA2, SMU1, TAF1L, TEK, TESK1, TLN1, TOPORS, UBAP1, UBE2R2, UHRF2, VCP).This observation suggests that the dosage of these genes may play a role in NDDs.The mappability of most of 9p is quite high for short-read WGS data, indicating that the detection of CNV should be robust (Figure S2).Copy-number assessments generated from short-read WGS data in individuals from the 1000 Genomes Project 54 reveal that the copy  number of the majority of 9p is not variable in the population (Figure S2).expand beyond NDDs, a search for other gene/disease associations was carried out (Table S3; Figure 2).This analysis revealed two genes in blood phenotypes 115501], TPM2).Importantly, 29 of these genes are known to be involved in autosomal recessive conditions, including DNAI1 in primary ciliary dyskinesia and GALT in galactosemia.Fourteen of these autosomal recessive genes are asso-ciated with neurological phenotypes (e.g., KANK1 in cerebral palsy and MPDZ in congenital hydrocephalus), which may contribute to atypical or severe NDD phenotypes in some patients with 9p CNVs.Disruption of these genes can thus potentially unmask recessive traits and contribute to phenotypic variability and should be explored in patients with complex presentations.",
      "CTCF has also been suggested to play a role in two human syndromes, Silver-Russell (SRS) and Beckwith-Wiedeman Syndrome (BWS) (Sparago et al. 2004;Eggermann et al. 2008).BWS is a developmental disorder with variable clinical symptoms including increased frequency of tumors, macroglossia, and overgrowth.It is caused by aberrant activation of the normally maternally imprinted H19/Igf2 locus on chromosome 11.Several sporadic and familial mutations have been characterized in the H19/Igf2 region, which lead to the loss of CTCF binding sites, directly implicating CTCF in the disease (Sparago et al. 2004;Eggermann et al. 2008).Similarly, in SRS, a developmental disorder characterized by severe growth retardation and body asymmetry caused by activation of the paternal allele of Igf2, mutations in CTCF binding sites have been identified (Scho nherr et al. 2008).CTCF has been implicated in various diseases.Trinucleotide repeat expansion diseases including Huntington's disease, fragile X mental retardation, and myotonic dystrophy are caused by excessive lengthening of microsatellite repeat sequences (reviewed in Orr and Zoghbi 2007).For example, although a healthy individual has a stretch of fewer than 27 CAG repeats in their HTT gene, a Huntington's disease patient has typically more than 35.The trigger for repeat expansion is unknown.Recent evidence suggests that CTCF might contribute to generating fragile sites within repeats, thus facilitating their expansion (Libby et al. 2008).Mutation of a CTCF-binding site near a repeat leads to increased genomic instability and increased repeat length, similar to that seen in disease situations.Although not tested, it is possible that interference with CTCF binding, either by mutation of its target site or mutations in an interacting partner, may contribute to trinucleotide repeat diseases (Libby et al. 2008).Defects in CTCF, and other genome organizers, may also play a key role in destabilizing expanded microsatellite repeats in other trinucleotide repeat diseases (reviewed in Dion and Wilson 2009).",
      "This is a particular problem for very rare diseases where only one or twofamilies are diagnosed with the condition. An example of the latter is tylosis withoesophageal cancer (TOC). This is an autosomal dominant single gene disorder thatoccurs in three families, with two of the families related by disease haplotype analysis. The entire minimal region (34 kb) has been sequenced (except for highly repetitiveregions) and no obvious disease-causing mutation has been identified, but a diseasemechanism has been postulated (MacDonald et al. , 2006).",
      "9(HLA-DR4) [108], cystic fibrosis (CFTR) [22], and thrombophilia (FV, FII, MTHFR)[109-111] have been widely used to guide preventive care. For example, prophylacticmastectomy or oophorectomy is recommended to predisposed individuals and has shownto reduce the risk of cancer by 90-95% in women [112]. Similarly, genetic screenings areavailable for prenatal and newborns to detect birth defects and genetic diseases includingcystic fibrosis, severe combined immunodeficiencies, phenylketonuria, tyrosinemia,sickle cell anemia, hearing loss, and congenital heart defects.The firstsuccessful application of this approach identified genomic loci responsible for an Xlinked phagocytic disorderchronic granulomatous disease (CYBB) [20]. This was soonfollowed by identification of the loci and ultimately genes responsible for other geneticdisorders including Duchene muscular dystrophy (DMD) [21], cystic fibrosis (CFTR)[22], Huntington disease (HTT) [23,24], polycystic kidney disease (PKD1, PKD2 andPKHD1) [25-27] , phenylketonuria (PAH)[28], albinism (TYR) [29] and many more. Currently, Online Mendelian Inheritance in Man (OMIM) catalogues 4,500 humandisorders for which the underlying genetic mutations are known(http://omim.org/statistics/entry).",
      "In addition, Mendelian disorders showing certain symptoms are extremely helpful in identifying genes influencing the same symptoms: the gene polymorphism for Netherton's syndrome (ichthyotic dermatosis, severe atopy) was associated with the development of asthma and eczema in children without Netherton's syndrome (Cookson and Moffatt 2000).As the deficiency of the FOXP3/Scurfin gene causes intractable diarrhea, eczema, hemolytic anemia, diabetes mellitus, or thyroid autoimmunity in humans (Patel 2001), dysregulation of FOXP3/ Scurfin gene expression may promote the development of autoimmune diseases, especially type 1 diabetes, thyroid autoimmunity and hemolytic anemia under various effects from circumstances.From an immunogenetical point of view, the FOXP3/Scurfin gene is located on chromosome (Chr) Xp11.23, which includes one of the type 1 diabetes susceptiblity loci (Cucca et al. 1998;Nerup and Pociot 2001).",
      "Studies of syndromes that include CTDs, such as the 22q11 deletion syndrome, have provided some clues regarding the specific genes that may be involved in determining the risk of CTDs (e.g.TBX1 [13]).In addition, studies of rare, presumably pathogenic, copy number variants [14][15][16], and inherited [17] and de novo [17,18] single nucleotide variants have identified genes that may contribute to the risk of CTDs [18,19].Yet, most affected patients do not carry a confirmed or suspected rare, causative variant.Moreover, rare variants, in particular rare de novo variants, do not account for the observed increase in risk of CTDs among the relatives of affected patients.",
      "This is a particular problem for very rare diseases where only one or twofamilies are diagnosed with the condition. An example of the latter is tylosis withoesophageal cancer (TOC). This is an autosomal dominant single gene disorder thatoccurs in three families, with two of the families related by disease haplotype analysis. The entire minimal region (34 kb) has been sequenced (except for highly repetitiveregions) and no obvious disease-causing mutation has been identified, but a diseasemechanism has been postulated (MacDonald et al. , 2006).",
      "Examplesof this application include the detection of over 98% of deletions occurring onthe dystrophin gene for the diagnosis of Duchenne muscular dystrophy (9,10),;an 8-bp deletion in exon 3 of the P450c21B gene in individuals affected by21-hydroxylase deficiency, a recessively inherited disease (11), and the F508mutation, a 3-bp deletion in the gene CFTR that is the most frequently mutation found in individuals affected with cystic fibrosis (12). Another exampleis detection of the internal tandem duplication (ITD) in the juxtamembranedomain-coding sequence of the FLT3 gene in acute leukemias.",
      "This is a particular problem for very rare diseases where only one or twofamilies are diagnosed with the condition. An example of the latter is tylosis withoesophageal cancer (TOC). This is an autosomal dominant single gene disorder thatoccurs in three families, with two of the families related by disease haplotype analysis. The entire minimal region (34 kb) has been sequenced (except for highly repetitiveregions) and no obvious disease-causing mutation has been identified, but a diseasemechanism has been postulated (MacDonald et al. , 2006).",
      "The location of this gene in a genomic regionresponsible for the disorder, as well as its strong coexpression with other known mitochondrial genesacross tissues, hinted that it may be involved in thedisease. Additional experiments proved that the genedid indeed cause the disorder and was most likely amitochondrial gene. Tissue gene expression databases, along with other databases, have also been used tocategorize, at a whole-genome level, genes potentiallyinvolved in a particular type of disease category (Calvoet al. 2006).",
      "A Syndromic Intellectual Disability Disorder Caused by Variants in TELO2, a Gene Encoding a Component of the TTT Complex. Am J Hum Genet. 2016; 98: 909918. https://doi.org/10.1016/j.ajhg.2016.03.014 PMID: 2713259397. Moosa S, Altmuller J, Lyngbye T, Christensen R, Li Y, Nurnberg P, et al. Novel compound heterozygous mutations in TELO2 in a patient with severe expression of You-Hoover-Fong syndrome. MolGenet Genomic Med. 2017; 5: 580584. https://doi.org/10.1002/mgg3.287 PMID: 2894424098. Choy KR, Watters DJ. Neurodegeneration in ataxia-telangiectasia: Multiple roles of ATM kinase in cellular homeostasis. Developmental Dynamics. John Wiley and Sons Inc.; 2018. pp. 3346. https://doi. org/10.1002/dvdy.24522 PMID: 2854393599.",
      "This is a particular problem for very rare diseases where only one or twofamilies are diagnosed with the condition. An example of the latter is tylosis withoesophageal cancer (TOC). This is an autosomal dominant single gene disorder thatoccurs in three families, with two of the families related by disease haplotype analysis. The entire minimal region (34 kb) has been sequenced (except for highly repetitiveregions) and no obvious disease-causing mutation has been identified, but a diseasemechanism has been postulated (MacDonald et al. , 2006)."
    ]
  ],
  "task_id": [
    "C0015BEE5FE41769A65126B79BB1E40D",
    "3A16235DA1E02B9148B9288A06EE567E",
    "E94FFD042BB146E8A429200590A6792D",
    "FCFCE5BBA2A8B3D8818890B9D2308C5A",
    "FFA6EADA5502933C0C30C9D16DCAA073"
  ]
}