gnqa/paper1_eval/src/data/datasets/human/dataset_domainexpert_general_3.json - gn-ai - A repository for GeneNetwork's AI tool development

{
  "question": [
    "For text and biological resources, do you mean add some books (on biology stuff) or/and web resources (as ensembl) on your system?",
    "what is ensembl?",
    "What is the difference between QTL mapping and GWAS?",
    "How do I determine which gene in my QTL is causal for the trait?",
    "Why do males have two Y chromosomes and females only one?"
  ],
  "answer": [
    "The text suggests using online bioinformatics resources such as Ensembl, UCSC Human Genome Browser, and others for research and data analysis. It doesn't specifically mention adding books or web resources to a system, but refers to utilizing these online tools and databases for information retrieval and bioinformatic analysis.",
    "Ensembl is a joint project between the EBI and the Wellcome Trust Sanger Institute that provides a database for genome data. Launched in 1999, it was the first to provide a window on the draft genome, curating the results of computational analyses. It contains automatically annotated genomes and integrates data from a wide range of biological research sources. Ensembl also provides tools for data retrieval and analysis, and it includes quality checks for genetic variants in its variation pipeline.",
    "QTL mapping is a statistical method used to identify regions in the genome that correlate with variation in a phenotype. It has high power, meaning it can detect a QTL, but it does not precisely identify which of the many genes within the QTL is causal. On the other hand, GWAS (Genome-Wide Association Studies) is a method used to identify genetic variants associated with complex traits. It has low power, requiring large numbers of individuals, but it has high precision, often identifying smaller candidate regions.",
    "To determine which gene in your QTL is causal for the trait, you can start by narrowing down the list of candidate genes within the QTL. This can be done by performing a strain survey, using genetically engineered mice to alter the expression of a candidate gene, or using comparative genomics to narrow down the QTL to a region containing only a few genes. You can then investigate whether the expression of these genes correlates with the phenotype(s) of interest. If a variant within a gene causes a change in its expression, and the expression of that gene correlates with expression of a phenotypic trait of interest, it is considered a good candidate. Network analyses can also be used to build up more evidence for which gene is causal. Finally, you can confirm the candidate genes by complementation of a QTL, which can be achieved in several ways, including transgenic complementation.",
    "Actually, males have one Y chromosome and one X chromosome, while females have two X chromosomes. This combination determines the sex of an individual."
  ],
  "contexts": [
    [
      "For certain types of important digital objects, there are well-curated, deeply-integrated,special-purpose repositories such as Genbank3, Worldwide Protein Data Bank (wwPDB4), andUniProt5 in the life sciences; Space Physics Data Facility (SPDF; http://spdf.gsfc.nasa.gov/) and Set ofIdentications, Measurements and Bibliography for Astronomical Data (SIMBAD6) in the spacesciences. These foundational and critical core resources are continuously curating and capturing highvalue reference datasets and ne-tuning them to enhance scholarly output, provide support for bothhuman and mechanical users, and provide extensive tooling to access their content in rich, dynamicways.",
      "These centralized data resources can often be veryhelpful for gaining a quick overview of an unfamiliar pathway or gene, but inevitablyone needs to re-enter the literature to build up a fuller picture and to answer the questions that are most relevant to the target phenotype or gene. The Internet is also anexcellent resource to help in this process; this probably makes the ubiquitous searchengine Google (http://www.google.com) one of the most powerful bioinformaticstools.The expansionof Web-linked interoperativity and interrogation tools means that new options willalready be available by the time this book is in print. One consequence of these advances could be the perception of a diminished necessity to perform bioinformaticanalysis. Although this is true in the sense that secondary database include an increasing amount of precooked bioinformatic data, there is a paradox in that the moresophisticated the public annotation becomes, the more important it is to understandthe underlying principles.Data retrieval at UCSC is facilitated by text and BLAT (Kent, 2002; a BLAST-likealgorithm) searches and bulk downloads of annotation or sequence data. Other complementary tools at UCSC have extended the functionality of UCSC. For instance,the Proteome Browser graphically displays protein properties such as hydrophobicity, charge and structural features across any publicly available protein sequence(Hinrichs et al. , 2006). As with Ensembl, the UCSC website has been well designedand is sympathetic to the naive user, but the UCSC graphical interface is more Spartan. If Ensembl is Disney, then UCSC is South Park.Data retrieval is extremely well catered for in Ensembl, with text searches of alldatabase entries, BLAST searches of all sequences archived, and the availability of bulkdownloads of all Ensembl data and even software source code. Ensembl annotationcan also be viewed interactively on ones local machine with the Apollo viewer (Lewiset al. , 2002; http://www.fruitfly.org/annot/apollo/). 4.4.2 The UCSC Human Genome BrowserThe UCSC Human Genome Browser (UCSC) bears many similarities to Ensembl;it, too, provides annotation of the NCBI assemblies, and it displays a similar array offeatures, including confirmed genes from Ensembl.",
      "Toachieve this goal, we integrated and make available big reference datain chapters 2 and 3, bridged model organism to human data in chapter4, translated generic methods into clinical applications in chapters 5and 6, and developed a platform to bring innovations into practice inchapter 7. The resources currently available are already plentiful, and both theamount and types of molecular life science data is growing at a tremendous pace.",
      "We present an easy-to-adopt module that weaves together several important bioinformatic tools so students can grasp how these tools are used in answering research questions. Students integrate information gathered from websites dealing with anatomy (Mouse BrainLibrary), quantitative trait locus analysis (WebQTL from GeneNetwork), bioinformatics and geneexpression analyses (University of California, Santa Cruz Genome Browser, National Center forBiotechnology Informations Entrez Gene, and the Allen Brain Atlas), and information resources(PubMed).",
      "Useful Online Genomics Resources.",
      "These centralized data resources can often be veryhelpful for gaining a quick overview of an unfamiliar pathway or gene, but inevitablyone needs to re-enter the literature to build up a fuller picture and to answer the questions that are most relevant to the target phenotype or gene. The Internet is also anexcellent resource to help in this process; this probably makes the ubiquitous searchengine Google (http://www.google.com) one of the most powerful bioinformaticstools.The expansionof Web-linked interoperativity and interrogation tools means that new options willalready be available by the time this book is in print. One consequence of these advances could be the perception of a diminished necessity to perform bioinformaticanalysis. Although this is true in the sense that secondary database include an increasing amount of precooked bioinformatic data, there is a paradox in that the moresophisticated the public annotation becomes, the more important it is to understandthe underlying principles.Data retrieval at UCSC is facilitated by text and BLAT (Kent, 2002; a BLAST-likealgorithm) searches and bulk downloads of annotation or sequence data. Other complementary tools at UCSC have extended the functionality of UCSC. For instance,the Proteome Browser graphically displays protein properties such as hydrophobicity, charge and structural features across any publicly available protein sequence(Hinrichs et al. , 2006). As with Ensembl, the UCSC website has been well designedand is sympathetic to the naive user, but the UCSC graphical interface is more Spartan. If Ensembl is Disney, then UCSC is South Park.",
      "These centralized data resources can often be veryhelpful for gaining a quick overview of an unfamiliar pathway or gene, but inevitablyone needs to re-enter the literature to build up a fuller picture and to answer the questions that are most relevant to the target phenotype or gene. The Internet is also anexcellent resource to help in this process; this probably makes the ubiquitous searchengine Google (http://www.google.com) one of the most powerful bioinformaticstools.The expansionof Web-linked interoperativity and interrogation tools means that new options willalready be available by the time this book is in print. One consequence of these advances could be the perception of a diminished necessity to perform bioinformaticanalysis. Although this is true in the sense that secondary database include an increasing amount of precooked bioinformatic data, there is a paradox in that the moresophisticated the public annotation becomes, the more important it is to understandthe underlying principles.Data retrieval at UCSC is facilitated by text and BLAT (Kent, 2002; a BLAST-likealgorithm) searches and bulk downloads of annotation or sequence data. Other complementary tools at UCSC have extended the functionality of UCSC. For instance,the Proteome Browser graphically displays protein properties such as hydrophobicity, charge and structural features across any publicly available protein sequence(Hinrichs et al. , 2006). As with Ensembl, the UCSC website has been well designedand is sympathetic to the naive user, but the UCSC graphical interface is more Spartan. If Ensembl is Disney, then UCSC is South Park.Data retrieval is extremely well catered for in Ensembl, with text searches of alldatabase entries, BLAST searches of all sequences archived, and the availability of bulkdownloads of all Ensembl data and even software source code. Ensembl annotationcan also be viewed interactively on ones local machine with the Apollo viewer (Lewiset al. , 2002; http://www.fruitfly.org/annot/apollo/). 4.4.2 The UCSC Human Genome BrowserThe UCSC Human Genome Browser (UCSC) bears many similarities to Ensembl;it, too, provides annotation of the NCBI assemblies, and it displays a similar array offeatures, including confirmed genes from Ensembl.",
      "There are online bioinformatics resources from which this type of information may be sourced.",
      "There aremany resources for annotating the results of a genome-wide study,all located in diverse databases and other web content. Having theability to harmonize and analyze historic data, together with highlycurated public resource data such as that found in model organismdatabases, adds tremendous depth and orthogonal informationsources to prioritize and refine the results of genetic analysis.",
      "These centralized data resources can often be veryhelpful for gaining a quick overview of an unfamiliar pathway or gene, but inevitablyone needs to re-enter the literature to build up a fuller picture and to answer the questions that are most relevant to the target phenotype or gene. The Internet is also anexcellent resource to help in this process; this probably makes the ubiquitous searchengine Google (http://www.google.com) one of the most powerful bioinformaticstools.The expansionof Web-linked interoperativity and interrogation tools means that new options willalready be available by the time this book is in print. One consequence of these advances could be the perception of a diminished necessity to perform bioinformaticanalysis. Although this is true in the sense that secondary database include an increasing amount of precooked bioinformatic data, there is a paradox in that the moresophisticated the public annotation becomes, the more important it is to understandthe underlying principles.Data retrieval at UCSC is facilitated by text and BLAT (Kent, 2002; a BLAST-likealgorithm) searches and bulk downloads of annotation or sequence data. Other complementary tools at UCSC have extended the functionality of UCSC. For instance,the Proteome Browser graphically displays protein properties such as hydrophobicity, charge and structural features across any publicly available protein sequence(Hinrichs et al. , 2006). As with Ensembl, the UCSC website has been well designedand is sympathetic to the naive user, but the UCSC graphical interface is more Spartan. If Ensembl is Disney, then UCSC is South Park."
    ],
    [
      "Annotation, preprocessing and categorization of dataWe used Ensembl (version 39) as the annotation reference database.Homology between human and mouse genes was derived via BioMart.The total number of genes under study comprises 15,277 Ensembl mouse genes representing the union of the homologue genes from all data sources.An overview about the T2DM specific datasets is given in Table 1.",
      "But the four sites are not equivalent; there are important distinctions between them in terms of the data analysed, the analyses carriedout and the way the results are displayed. 4.4.1 EnsemblEnsembl is a joint project between the EBI (http://www.ebi.ac.uk/) and the WellcomeTrust Sanger Institute (http://www.sanger.ac.uk/). The Ensembl database (Hubbardet al. , 2002; http://www.ensembl.org/), launched in 1999, was the first to provide awindow on the draft genome, curating the results of a series of computational analyses.Until January 2002 (Release 3.26.1), Ensembl used the UCSC draft sequenceassemblies as its starting point, but it is now based upon NCBI assemblies. TheEnsembl analysis pipeline consists of a rule-based system designed to mimic decisions made by a human annotator. The idea is to identify confirmed genes that arecomputationally predicted (by the GENSCAN gene prediction program) and alsosupported by a significant BLAST match to one or more expressed sequences orproteins. Ensembl also identifies the positions of known human genes from publicsequence database entries, usually using GENEWISE to predict their exon structures.Data retrieval is extremely well catered for in Ensembl, with text searches of alldatabase entries, BLAST searches of all sequences archived, and the availability of bulkdownloads of all Ensembl data and even software source code. Ensembl annotationcan also be viewed interactively on ones local machine with the Apollo viewer (Lewiset al. , 2002; http://www.fruitfly.org/annot/apollo/). 4.4.2 The UCSC Human Genome BrowserThe UCSC Human Genome Browser (UCSC) bears many similarities to Ensembl;it, too, provides annotation of the NCBI assemblies, and it displays a similar array offeatures, including confirmed genes from Ensembl.Ensembl provides a DAS referenceserver giving access to a wide range of specialist annotations of the humangenome (for more detail, see http://www.ensembl.org/das/). Data mining The ability to query very large databases in order to satisfy ahypothesis (top-down data mining), or to interrogate a database in order togenerate new hypotheses based on rigorous statistical correlations (bottom-updata mining). Domain (protein) A region of special biological interest within a single proteinsequence.",
      "But the four sites are not equivalent; there are important distinctions between them in terms of the data analysed, the analyses carriedout and the way the results are displayed. 4.4.1 EnsemblEnsembl is a joint project between the EBI (http://www.ebi.ac.uk/) and the WellcomeTrust Sanger Institute (http://www.sanger.ac.uk/). The Ensembl database (Hubbardet al. , 2002; http://www.ensembl.org/), launched in 1999, was the first to provide awindow on the draft genome, curating the results of a series of computational analyses.Until January 2002 (Release 3.26.1), Ensembl used the UCSC draft sequenceassemblies as its starting point, but it is now based upon NCBI assemblies. TheEnsembl analysis pipeline consists of a rule-based system designed to mimic decisions made by a human annotator. The idea is to identify confirmed genes that arecomputationally predicted (by the GENSCAN gene prediction program) and alsosupported by a significant BLAST match to one or more expressed sequences orproteins. Ensembl also identifies the positions of known human genes from publicsequence database entries, usually using GENEWISE to predict their exon structures.Data retrieval is extremely well catered for in Ensembl, with text searches of alldatabase entries, BLAST searches of all sequences archived, and the availability of bulkdownloads of all Ensembl data and even software source code. Ensembl annotationcan also be viewed interactively on ones local machine with the Apollo viewer (Lewiset al. , 2002; http://www.fruitfly.org/annot/apollo/). 4.4.2 The UCSC Human Genome BrowserThe UCSC Human Genome Browser (UCSC) bears many similarities to Ensembl;it, too, provides annotation of the NCBI assemblies, and it displays a similar array offeatures, including confirmed genes from Ensembl.Ensembl provides a DAS referenceserver giving access to a wide range of specialist annotations of the humangenome (for more detail, see http://www.ensembl.org/das/). Data mining The ability to query very large databases in order to satisfy ahypothesis (top-down data mining), or to interrogate a database in order togenerate new hypotheses based on rigorous statistical correlations (bottom-updata mining). Domain (protein) A region of special biological interest within a single proteinsequence.Anothergrowing area of activity is in cataloguing the genetic variation present in humanpopulations as Ensembl reflects the progress of the International Haplotype MapProject (Thorisson et al. , 2005). More speculative data, such as GENSCAN-predicted exons that have not beenincorporated into Ensembl-confirmed genes, may also be viewed. This means thatthe display can be used as a workbench for the user to develop personalized annotation.",
      "But the four sites are not equivalent; there are important distinctions between them in terms of the data analysed, the analyses carriedout and the way the results are displayed. 4.4.1 EnsemblEnsembl is a joint project between the EBI (http://www.ebi.ac.uk/) and the WellcomeTrust Sanger Institute (http://www.sanger.ac.uk/). The Ensembl database (Hubbardet al. , 2002; http://www.ensembl.org/), launched in 1999, was the first to provide awindow on the draft genome, curating the results of a series of computational analyses.Until January 2002 (Release 3.26.1), Ensembl used the UCSC draft sequenceassemblies as its starting point, but it is now based upon NCBI assemblies. TheEnsembl analysis pipeline consists of a rule-based system designed to mimic decisions made by a human annotator. The idea is to identify confirmed genes that arecomputationally predicted (by the GENSCAN gene prediction program) and alsosupported by a significant BLAST match to one or more expressed sequences orproteins. Ensembl also identifies the positions of known human genes from publicsequence database entries, usually using GENEWISE to predict their exon structures.",
      "EnsemblEnsembl is a publicly available web resource that contains automatically annotated genomes.It is integrated with other available biological databases like Jasper for binding motifs.It is a much larger web resource than T1Dbase, and contains general information about the human genome including variants.These include SNPs, insertions, deletions and somatic mutations (Alterations in DNA that occur after conception, meaning that they are not inherited) for several species.Data from Ensembl can be accessed in a number of ways.The names of all the SNPs that occur in the T1D susceptibility regions can be collected from Ensembl using the Biomart tool (Kinsella et al., 2011).To achieve this, the coordinates of the T1D regions obtained from T1Dbase are uploaded to the biomart query page which allows one to search the genome browser and retrieve data like the names, chromosomal positions, and genic positions (referred to as \"consequence to transcript\", in Ensembl) of the SNPs.The SNP genic positions tell if a SNP is located within a gene, adjacent to a gene or whether they occur in inter-genic positions between gene coding regions, as well as the particular genes in which they are located.Advantages of Ensembl:There is a number of advantages to using Ensembl. (i) It is a larger web resource than T1Dbase and integrates data from a wide range of biological research sources into its database.Therefore, available information is quite comprehensive. (ii) Genic positions for 99% of the variants obtained from T1Dbase could be retrieved. (iii) Ensembl contains quality checks for genetic variants in its variation pipeline.A variant is flagged as failed if certain quality criteria are not met, for instance if none of the variant alleles match the reference allele of the variant.Generally, Ensembl was found to give more detailed information regarding the genic positions of variants compared to T1Dbase.Information about genes, including gene names, chromosomal coordinates, biotype (coding or non-coding), and number of splice variants, can also be retrieved from Ensembl.",
      "doi:10.1093/nar/gkp858Cunningham F, Amode MR, Barrell D, Beal K,Billis K, Brent S, Carvalho-Silva D, ClaphamP, Coates G, Fitzgerald S, Gil L, Giron CG,Gordon L, Hourlier T, Hunt SE, Janacek SH,Johnson N, Juettemann T, Kahari AK, KeenanS, Martin FJ, Maurel T, McLaren W, MurphyDN, Nag R, Overduin B, Parker A, PatricioM, Perry E, Pignatelli M, Riat HS, SheppardD, Taylor K, Thormann A, Vullo A, WilderSP, Zadissa A, Aken BL, Birney E, Harrow J,Kinsella R, Muffato M, Ruffier M, Searle SM,Spudich G, Trevanion SJ, Yates A, ZerbinoDR, Flicek P (2015) Ensembl 2015.",
      "But the four sites are not equivalent; there are important distinctions between them in terms of the data analysed, the analyses carriedout and the way the results are displayed. 4.4.1 EnsemblEnsembl is a joint project between the EBI (http://www.ebi.ac.uk/) and the WellcomeTrust Sanger Institute (http://www.sanger.ac.uk/). The Ensembl database (Hubbardet al. , 2002; http://www.ensembl.org/), launched in 1999, was the first to provide awindow on the draft genome, curating the results of a series of computational analyses.Until January 2002 (Release 3.26.1), Ensembl used the UCSC draft sequenceassemblies as its starting point, but it is now based upon NCBI assemblies. TheEnsembl analysis pipeline consists of a rule-based system designed to mimic decisions made by a human annotator. The idea is to identify confirmed genes that arecomputationally predicted (by the GENSCAN gene prediction program) and alsosupported by a significant BLAST match to one or more expressed sequences orproteins. Ensembl also identifies the positions of known human genes from publicsequence database entries, usually using GENEWISE to predict their exon structures.Data retrieval is extremely well catered for in Ensembl, with text searches of alldatabase entries, BLAST searches of all sequences archived, and the availability of bulkdownloads of all Ensembl data and even software source code. Ensembl annotationcan also be viewed interactively on ones local machine with the Apollo viewer (Lewiset al. , 2002; http://www.fruitfly.org/annot/apollo/). 4.4.2 The UCSC Human Genome BrowserThe UCSC Human Genome Browser (UCSC) bears many similarities to Ensembl;it, too, provides annotation of the NCBI assemblies, and it displays a similar array offeatures, including confirmed genes from Ensembl.Ensembl provides a DAS referenceserver giving access to a wide range of specialist annotations of the humangenome (for more detail, see http://www.ensembl.org/das/). Data mining The ability to query very large databases in order to satisfy ahypothesis (top-down data mining), or to interrogate a database in order togenerate new hypotheses based on rigorous statistical correlations (bottom-updata mining). Domain (protein) A region of special biological interest within a single proteinsequence."
    ],
    [
      "However, the twomethods do not necessarily give the same result because theyare measuring complementary aspects of an association. Asquantitative trait loci (QTL) are added to Q, we expect highercompleteness because the QTL in Q cover more segments ofeach chromosome. However, if these quantitative trait loci(QTL) are unrelated to G, we expect many of them to beempty. Similarly, as genes are added to G, we expect higheraccuracy because selected genes are found in more locations.",
      "QTL can be mapped through GWAS or GWLS. eQTLAn expression Quantitative Trait Locus is a region in the genome at which allelic variation correlates with the mRNA expression level variation of a certain gene. Distant eQTLA distant (or trans) eQTL is an eQTL which is located far from the gene it controls (forexample on a different chromosome). Local eQTLA local (or cis) eQTL is an eQTL which is located nearby the gene it controls in the genome.",
      "QTL mapping, GWAS and genomic selectionInformation from SNP markers are being increasingly used to generate a deeper knowledge of the genetic basis of important traits and speed up the genetic progress in aquaculture species by means of GWAS and genomic selection, respectively (Y aez et al., 2014).GWAS allows the identification of genetic variants associated with complex traits (i.e QTL).When one or few QTL explain a high percentage of genetic variance for a particular trait, it is possible to improve the trait more rapidly by means of MAS.However, the complexity of some traits and the absence of QTL with major effects constrain the successful implementation of MAS.In contrast, genomic selection is the most appropriate way to select for traits that are controlled by several loci of small effects (i.e.polygenic traits) (Meuwissen et al., 2013).",
      "Mouse QTL mapping has high power but low precision (i.e., we can detect a QTL, but do not know which of tens or hundreds of genes is causal), whereas human GWAS has low power but high precision (tens or hundreds of thousands of individuals are needed, but candidate regions are often smaller).By combining the power of mouse QTL mapping and the precision of human PheWAS, we can do more than both individually.Candidate genes might show up in our analysis here that did not show up in our above analysis for several reasons, the most common being that gene expression was not measured in the relevant cell type or timepoint.",
      "In order to differentiate eQTL from QTLfor phenotypes other than gene expression (phenotypic, physiologic, or clinical QTL),here we will refer to the latter as pQTL. Although the GG approach is relatively recent and is still evolving, there are somegeneral features o f this approach that can be summarized as follows: (1) mapping o ftranscripts profiles as quantitative traits, (2) classification o f eQTL in cis and trans actingmode o f action, (3) identification o f loci where large number o f transcripts map, whichReproduced with permission of the copyright owner.",
      "Quantitative trait locus-mapping is a statistical methodused to map chromosomal intervals (loci) that contribute toheritable variance in phenotypes. The method simply compares the inheritance of allelic variants (B or D genotypesin our case) with differences in phenotypes. A QTL willgenerally cover a region that includes 10100 genes, andthese positional candidates can then be ranked roughly onthe basis of criteria such as the types of DNA variants, patterns of mRNA expression, data from complementary humangenetic cohorts (GWAS and linkage) and relevant literatureabout gene effects on central nervous system structure andfunction.",
      "These approaches were used to identify dozens of quantitative trait loci (QTL) for BMD and other bone traits [15,16]. However, identifying causative genes underlying QTL proved challenging [17]. Over the last decade, genemapping approaches have transitioned from low-resolution linkage mapping to high-resolution GWASs [11]. The first GWASs in mice used panels of inbred mouse strains [1821] andby leveraging accumulated recombinations, this approach significantly increased mapping resolution [19].",
      "The process of QTL mapping includes phenotyping andgenotyping at least several hundred animals from an informative mapping population (e.g. , B6D2 F2). Once a QTL isdetected and confirmed, additional work is needed to identify the specific gene(s) in the QTL interval responsible forthe phenotypic variation. An elegant way to improve QTLmapping resolution is through the development and testingof interval-specific congenic strains (Darvasi, 1997). Thismethod has been successfully used to fine map an ethanolwithdrawal QTL on mouse chromosome 4 (Fehr et al. ,2002; Shirley et al. , 2004).",
      "However, the twomethods do not necessarily give the same result because theyare measuring complementary aspects of an association. Asquantitative trait loci (QTL) are added to Q, we expect highercompleteness because the QTL in Q cover more segments ofeach chromosome. However, if these quantitative trait loci(QTL) are unrelated to G, we expect many of them to beempty. Similarly, as genes are added to G, we expect higheraccuracy because selected genes are found in more locations.",
      "This comparison gives information about the reliability of the observed genotypeinformation: The more the marker locations differ between the two maps (which signifiesvariation in marker positions), the higher the possibility of genotyping errors. QTL mapping was done in several stages to identify loci acting individually and QTL thatinteracted, either additively or epistatically. To determine individually-acting QTL, a singleQTL genome scan was conducted with the function scanone.",
      "Importantly, whereasthese studies required substantial labor, time, and resources, X-QTL is a quick and easyapproach to achieve a comparable level of genetic dissection. The levels of complexityobserved here (e.g. 14 loci explaining 70% of the genetic variance for 4-NQO resistance) arestill dramatically lower than those seen in for some human traits in GWAS (e.g. 40 lociexplaining 5% of the variance for height 2,5). One obvious explanation is the difference inexperimental designs (line crosses vs. population association studies), but differences ingenetic architectures among species and traits may also contribute.",
      "Forward genetics approaches that combine traditional QTLmapping with expression quantitative trait mapping (eQTL; in which case transcript abundance is the quantitative trait) [32] are increasingly being used to successfully transition fromQTL to QTG [3335]. Traditional QTL analysis will identify the genomic regions affectingtrait variation, while eQTL analysis can help in understanding which genes, pathways, and biological processes are also under the influence of a given QTL.",
      "On the onehand, the genomic location that are in suspicion to be involved in the trait can still involvelarge genomic segments, e.g. , millions of basepairs that include many genes within the segment. On the other hand, GWAS may point toseveral or even many genomic locations for thetrait of interest, complicating further functionalanalysis. Analysis of Quantitative Trait Loci (QTL)QTL analysis reveals statistically signicantlinkage between phenotypes and genotypes,thereby providing explanation for the geneticbasis of variation in complex traits (Falconerand Mackay, 1996; Lynch and Walsh, 1998).",
      "This feature of eQTL mapping alone can often make theeffort worthwhile; even with inexpensive genotyping, it is oftencost effective in the long term to generate a TDM map of severalthousand markers, which usually ensures that any two recombination events have a marker between them and that locations ofgenetic cross-overs are well-defined. TDMs can be integrated withGenotyping by Sequencing (GBS) approaches to anchor geneticmaps to physical maps (Poland et al. , 2012; Sonah et al. , 2013). Surprisingly, eQTL mapping is actually the simple portionof eQTL analysis work.",
      ", we can detect a QTL, butdo not know which of tens or hundreds of genes is causal), whereas human GWAS has lowpower but high precision (tens or hundreds of thousands of individuals are needed, butGenes 2022, 13, 61411 of 17candidate regions are often smaller). By combining the power of mouse QTL mapping andthe precision of human PheWAS, we can do more than both individually.",
      ", we can detect a QTL, butdo not know which of tens or hundreds of genes is causal), whereas human GWAS has lowpower but high precision (tens or hundreds of thousands of individuals are needed, butGenes 2022, 13, 61411 of 17candidate regions are often smaller). By combining the power of mouse QTL mapping andthe precision of human PheWAS, we can do more than both individually.",
      "The remarkable success in mappinggenes linked to a number of disease traits using genomewide association studies (GWAS) in human cohorts hasrenewed interest in applying this same technique in modelorganisms such as inbred laboratory mice (Su et al. 2010). Unlike classical phenotypic traits, gene expression traitsgiving rise to cis-acting eQTL provide us with a prioriknowledge of the true QTL location (Doss et al. 2005),which can be used to empirically estimate the power of aGWAS performed at a similar scale (Hao et al. 2008;Schadt et al. 2008).",
      "QTLmapping has been highly successful in determining causative loci underlying severaldisease phenotypes (Wang et al. 2004; Cervino et al. 2005; Abboud and Kaplowitz 2007)and can broadly be subdivided into two classes: linkage mapping and association mapping. For standard linkage mapping in experimental crosses, likelihood or regression approachesare used to map QTL, with flanking markers used to infer genotypes in the intervalsbetween widely spaced markers (i.e. > 1cM) (Lander and Botstein 1989; Haley and Knott1992).38Quantitative Trait Locus (QTL) mapping has been used to associate a specificgenotype with the variation in a single measured phenotype like high density lipoproteins(Wang and Paigen 2005) and ethanol tolerance (Grisel et al. 2002). At each locus in asegregating population, a model is fit which estimates the likelihood that this locus explainsthe variation in phenotype versus the likelihood that there is no genotypic effect on thephenotype. Interval mapping (Lander and Botstein 1989) is a variation on QTL mapping whichuses maximum likelihood estimation.",
      "QTL mapping studies thenseek to detect the polymorphisms underlying the complex traits of interest byscanning for alleles that co-vary withthe traits. Similar experiments also can be conducted with special derivatives of inbredstrains known as recombinant inbred(RI) mice. These animals are derivedby cross-breeding two or more distinctparental strains (which often divergewidely for the trait of interest), followedby inbreeding of the offspring for severalgenerations (Bailey 1971). Given thecorrect breeding strategy, this method1This is an issue faced by GWASs researchers when classifyingsamples as cases or controls."
    ],
    [
      "Prior belief or knowledge about the number of true causal and trueindependent links that might be expected in a typical QTL, depending on the studydesign, should be considered to safeguard against high false-positive rates (lowpositive predictive values). In studies that involve mapping gene expression (eQTL),protein (pQTL) or metabolite (mQTL) traits, information about co-localization ofQTL and genes that are functionally linked to the trait provides information aboutthe likelihood of causal links.",
      "The next step is to investigate whether the expression of these genes correlates with the phenotype(s) of interest.This would suggest a chain of causality: a variant within a gene causes a change in its expression, and the expression of that gene correlates with expression of a phenotypic trait of interest.To do this, we created a correlation matrix between all genes within a QTL with a cis-eQTL in any brain tissue as well as the phenotypes that contributed to the QTL (Supplementary Table S6).Any gene with a cis-eQTL and a significantly correlated expression was considered a good candidate.If the gene only had a cis-eQTL and correlation in a single brain region, then it suggested that this brain region might also be of interest for the phenotype (adding another link to this chain).",
      "One possible approach to facilitate this endeavor is to identify quantitative trait loci(QTL) that contribute to the phenotype and consequently unravel the candidategenes within these loci. Each proposed candidate locus contains multiple genes and,therefore, further analysis is required to choose plausible candidate genes. One ofsuch methods is to use comparative genomics in order to narrow down the QTL to aregion containing only a few genes. We illustrate this strategy by applying it togenetic findings regarding physical activity (PA) in mice and human.",
      "Network analysesWe now have two QTL, and we have picked potentially interesting genes within each, but nowwe want to build up more evidence for which gene in our QTL interval is causal. The first, andmost obvious way, is to see what genes our trait of interest correlates with, in tissues that weexpect to be related to the trait. We calculated the Spearmans correlation between the traitBXD_17850 and all probes with expression data in T helper cells (GN319).",
      "Anotherapproach to help to determine if a gene located near the mapped QTL wouldhave effects to influence the quantitative trait will be to use genetically engineered mice to determine if altering the expression of a candidate gene will alterthe phenotype of interest (38). However, it is possible that a quantitative trait isa combined effect of multiple genes located near the QTL (39).",
      "With a known QTL and abody of evidence suggesting possible roles for the affected gene,phenotypes can be predicted that may be modulated as a resultof this sequence variation. If this phenotype is of interest, itcan be directly measured and a traditional forward QTL analysis carried out to confirm the prediction. Such an approach isextremely attractive when the enormous cost and time requiredfor phenotyping a large panel is considered.",
      "The firststep is to narrow down the list ofcandidate causal genes within aFig1. IntervalmappingofoviductgrosspathologyacrosstheBXDstrainsQuantitative Trait Locus (QTL)arevealsaQTLondistalChr3. TheL RSvaluesareplottedinblueacrossthechromosomal region containinggenomeandmeasurethestrengthoftheassociationbetweensequence variants stronglychromosomeandMbposition(topandbottomX-axis,respectively)andassociated with phenotypicphenotypeexpression. Allelecontributionisshownbythered(C57BL/6J)andgreen(DBA/2J)lines. Redandgreyhorizontallinesindicategenome-variation.",
      "QTL mapping of traits in mouse cohorts often ends up with a genetic locus, composed of a list of candidategenes. Several studies proposed the use of mediation analysis to identify the causal gene (mediator) betweenthe genetic variant (independent variable) and the trait-of-interest (dependent variable) (Figure 1.4B) [7, 47,61, 77]. Mediation analysis can be used either on gene expression levels to identify the regulatory mechanisms[7, 47, 61], or on phenotypic traits to discover the potential causal drivers contributing to the phenotypicvariances [77] (Figure 1.4C upper).",
      "1a). Second-generation offspring are thenphenotyped and genotyped, and linkage analysis is carried out to identify a region that isassociated with the trait1. This approach has led to the identification of thousands of quantitative trait loci (QTLs) forvarious phenotypes and diseases. However, each QTL region is large, often tens ofmegabases, and contains hundreds of genes. The process of identifying the causal variantand the gene involved is therefore difficult and costly. Of the thousands of QTLs identified,only a small fraction of genes has been identified. NIH-PA Author Manuscript 2012 Macmillan Publishers Limited.",
      "Network analysesWe now have two QTL, and we have picked potentially interesting genes within each, but nowwe want to build up more evidence for which gene in our QTL interval is causal. The first, andmost obvious way, is to see what genes our trait of interest correlates with, in tissues that weexpect to be related to the trait. We calculated the Spearmans correlation between the traitBXD_17850 and all probes with expression data in T helper cells (GN319).",
      "10 JUNE 2016  VOL 352 ISSUE 6291aad0189-5R ES E A RC H | R E S EA R C H A R T I C LESolving QTLs: Finding the quantitativetrait geneFor cis-QTLs, the causal factors can be quicklyidentified: With few exceptions, they will be driven by variants within the gene itself or immediately adjacent. For trans-QTLs, mQTLs, andcQTLs, the identification of the causal quantitative trait gene (QTG) is challenging due to thewidth of the QTLs.",
      "Once the QTL interval is reduced to a reasonable size,the next step in the process involves sorting through thegenes within the interval and attempting to determinewhich is the QTG. This step is daunting because more thanone gene may be involved and the function of some geneswithin the interval may be unknown. Until recently, thisstep emphasized the detection of polymorphisms withincoding sequence (reviewed in Korstanje and Paigen, 2002and Glazier et al. 2002); for a polymorphism that producesan amino acid substitution, one can often infer and thentest for a functional consequence.",
      "To understand the genetic networks that underliequantitative variation in the trait, it is also very important todiscover genes whose expression is correlated with the traitafter accounting for the known effects of the QTL on thetrait. Many of these genes may have expression that isassociated with QTL genotype, and would therefore beidentified as important via the tests described above. Othergenes, however, may have expression values that are correlated with the trait but unassociated with genotype at theQTL.Theapproach is motivated by the fact that a research project isoften focused on a specific classical quantitative trait. If amajor QTL for this classical trait has been identified, it isoften desirable to test whether this QTL is also associatedwith the transcription level of any genes, which will provide clues as to which genes belong to the pathway that theQTL uses to modulate the classical trait.",
      "Quantitative trait loci (QTLs) can be identified in several ways, but isthere a definitive test of whether a candidate locus actually corresponds to a specific QTL? NIH-PA Author ManuscriptMuch of the genetic variation that underlies disease susceptibility and morphology is complexand is governed by loci that have quantitative effects on the phenotype. Gene-gene and geneenvironment interactions are common and make these loci difficult to analyse. Here, we presenta communitys view on the steps that are necessary to identify genetic loci that governquantitative traits, along with a set of interpretive guidelines.",
      "Confirmation of Candidate GenesThe next step is to prove that a particular gene is involved in the quantitative traitunder study. This is done by complementation of a QTL, which can be achieved inseveral ways (911,40). In principle, transgenic complementation is the most straightforward. This approach has been used successfully to demonstrate that Pla2g2a wasthe correct candidate gene for Mom1, a modifier of the apcmin allele that causesadenomatous polyposis coli (41).",
      "So, how do you go about planning and performing a QTL study, and howdo you identify the responsible gene within a QTL that you have identified? Generally, one starts by performing a strain survey to find two parental inbredstrains that have a markedly different trait. One can now look up many differenttraits of inbred mice online at the Mouse Phenome Database (http://phenome. jax.org/pub-cgi/phenome/mpdcgi?rtn=docs/home). However, the trait you maywant to study may not be present in wild type mice, so you may want to crossa mutant (or genetically engineered) strain onto several inbred strains.",
      "Along with correlations, this tool also derives new traits representing theprincipal components (Figure 2d). The user can add these principal components to their TraitCollection and proceed to perform QTL mapping, as in the case of a single trait QTLmapping. The R/QTL (Broman et al. 2003) and R/CAPE (Tyler et al. 2013) packages can beused for deeper analysis of epistasis and pleiotropy for multiple traits and multipleregulatory loci. Prioritizing Candidate Genes7Author ManuscriptFollowing the identification of a significant QTL, focus shifts to identifying theparticular gene(s) that cause the QTL.",
      "The investigatorsfirst identified all QTLs associated witha classical phenotype and then winnowed the list of potentially associatedgene-expression traits on the basis oftheir correlation or eQTL overlap withthe phenotype of interest. Candidategenes then were ranked by applyingthe LCMS technique, which uses theeQTL data to establish causal relationships between DNA loci and transcripts as well as between transcriptsand phenotypes and finally identifiesa model that best fits the data.",
      "The goal of QTL mapping is clearly theidentification and eventual confirmation of candidate genes(QTGs) underlying the phenotype. The evidence required forsuch confirmation has engendered much discussion (ComplexTrait Consortium 2003; Glazier et al. 2002) and is likely to varydepending on the nature of the trait and specific resourcesavailable to pin down underlying genes (e.g. availability ofknock-in or knock-outs, specific antibodies, siRNA, etc.). Thepaucity of QTGs meeting such multifaceted standards is testament to the difficulty of narrowing the confidence intervalsufficiently to identify and test suitable candidate genes (Flintet al."
    ],
    [
      "Y chromosome in peripheral blood cells increases with age in men (6) and is correlated with increased risk of cancer mortality and Alzheimer's disease (6,7).X chromosome mosaicism in women also increases with age (8), as does autosomal mosaicism in both sexes (9,10).Recent studies have shown that the prevalence of age-related mosaic abnormalities is greater in men than women (9,10); however, mechanisms underlying the sex differences observed in chromosomal mosaicism in humans are unknown.",
      "Recent reports suggested a role of Y chromosome loss in risk for all-cause mortality and common age-related disease such as cancer, Alzheimer disease as well as severe atherosclerosis [12][13][14][15][16][17][18][19][20].Building on such reports, we aimed to evaluate the contribution of male Y chromosome mosaicism to the risk for late-stage AMD.",
      "Box 1. Sex-specific cytonuclear interactionsSeveral predictions about the nature of cytonuclear conflicts follow from the patterns of chromosomal inheritance (Table I).In a mated pair of animals, mtDNA is co-transmitted with half of the autosomal genes, two-thirds of the X-linked genes and none of the Y-linked genes [76].This predicts that, relative to the autosomal case, positive nuclear-mitochondrial interactions are more likely to evolve for X-linked loci whereas deleterious interactions between Y-linked genes and mtDNA should accumulate (or cannot be purged efficiently).",
      "In addition to genetic data, the 9p Network Cohort dataset also lists the gender for all 719 individuals.Of these individuals, 406 individuals are female and 313 are male, indicating a female bias (Binomial test p  0.0006).This result was surprising considering that no female bias has been previously reported in 9p deletion and duplication syndromes.A possible explanation for the significant bias in the 9p Network Cohort dataset is the XY sex reversal phenotype, which is commonly observed in individuals with 9p deletion syndrome.This phenotype could lead to individuals with XY sex chromosomes being listed in the dataset as having a female gender.To further examine this hypothesis, we subset our dataset to include only the 236 individuals whose sex chromosomes are listed in their genetic information.For this much smaller subset, 125 individuals had female sex chromosomes and 111 had male sex chromosomes, indicating no significant sex bias (Binomial test p  0.4).We also found no significant gender bias in this group (Binomial test p  0.2), although we did confirm that four of the individuals with XY sex chromosomes had a gender of female.This comparison suggests that the XY sex reversal phenotype may be responsible for a female gender bias, but not a sex bias, in 9p deletion and duplication syndrome cohorts.",
      "Duplicated variants with multiple alternative alleles and variants in sex chromosomes X and Y",
      "Autosome-One of the numbered, or nonsex, chromosomes (1 through 22).X and Y are the sex chromosomes.",
      "Given such a high abundance of young male-biased genes, we asked whether their parental genes are also male-biased.We found that fewer parental genes of X-linked male-biased duplicates were also male-biased (20%, 2/10) compared to the parental genes of autosomal young male-biased duplicates (32%, 12/37).These data, despite the small sample sizes and being statistically not significant, may suggest that compared to autosomal young genes, X-linked young genes more often evolved novel male-biased expression.However, as the majority of young genes are the result of intrachromosomal duplication events, the pattern might also reflect the fact that X-linked old genes are less likely to be male-biased.A slight excess of X-linked female-biased genes was also detected (Fig. 2).Although most of them are old, a few recently arose on the X chromosome over 4 to 6 Myr in the common ancestor of the D. melanogaster and D. simulans clade (branch 5).This can be interpreted in the context of the dominance model of the sexual antagonism hypothesis.In this case, a dominant, X-linked gene that is favorable to females but disadvantageous for males can become fixed.The slow accumulation of female-biased genes in the X reflects an overall low rate of female gene origination, either due to a small dominance effect (the degree of dominance h!1/2), or a minor disadvantageous effect on males (the ratio of fitness effects of male relative to female k!0) along with a favorable effect on females (Vicoso and Charlesworth 2006, Equation 10).Regarding the second step in the evolution of male-biased genes, namely X!A transposition, sexual antagonism favorable for autosomal fixation (Vicoso and Charlesworth 2006) and/or MSCI (Lifschytz and Lindsley 1972;Betran et al. 2002) may play a role in this process.On the other hand, the within-chromosomal duplication rate is higher than the between-chromosomal duplication rate (Emerson et al. 2008), which may contribute to the slow pace of X!A transposition.It has been observed that male-biased genes in Drosophila are overrepresented on autosomes (Parisi et al. 2003;Ranz et al. 2003).Consistent with this result, a dynamic process that can explain the nonrandom autosomal distribution has also been observed, in which autosomal new genes with X-linked parental genes are often male-biased.Specifically, a significant excess of autosomal testisexpressed retrogenes were identified as RNA-duplicates of X-linked parental genes (Betran et al. 2002).Recently, similar X!A gene traffic was observed in the DNA-level duplication and relocation data set of the Drosophila genus (Vibranovski et al. 2009b), and was further confirmed for DNA-level duplications in the D. pseudoobscura neo-X chromosome (Meisel et al. 2009).In addition, selective extinction of neo-X linked male-biased genes also occurred in D. pseudoobscura (Sturgill et al. 2007).These three lines of genome-wide investigation support a common pattern of outof-X traffic for male-biased genes, resulting in an enrichment of these genes on autosomes in the long term.It has been reported that the initial manifestations of new gene emergence, namely polymorphic duplicates, occur at a lower frequency on the X chromosome, thus indicating that these duplicates are subject to stronger purifying selection (Emerson et al. 2008).Therefore, the excessive fixation of X-linked duplicates might not occur via neutral processes.Positive selection could have facilitated the fixation of X-linked young genes in addition to driving their subsequent sequence evolution.",
      "Occasionally, Y chromosome DNA is detected in the maternal plasma, and the fetus appears to have female genitalia on sonographic examination.The underlying mechanisms for this include a twin demise, a maternal disorder of sexual differentiation, such as Swyer syndrome, or that the mother has undergone a bone marrow or solid organ transplant from a male donor (Bianchi, 2018;Hartwig, Ambye, Sorensen, & Jorgensen, 2017).",
      "Becauseof the differences in sex chromosome number, the sexunmatched comparison contains internal controls, i.e. ,in this comparison, genes on the X-chromosome andY-chromosome (but not those on the autosomes) shouldshow copy number imbalances reective of a single copychange. We showed that the sample that is not sexmatched had readily detectable differences in aCGHsignals for genes on the X and Y chromosomes. No suchpatterns were evident for the autosomes of the sexunmatched individuals or for the sex chromosomes of thesex matched samples.",
      "Sex chromosome:The X or Y chromosome in human beings that determines the sex of an individual.Females have two X chromosomes in diploid cells; males have an X and a Y chromosome.The sex chromosomes comprise the 23rd chromosome pair in a karyotype.See also: autosome Sex-linked: Traits or diseases associated with the X or Y chromosome; generally seen in males.X chromosome: One of the two sex chromosomes, X and Y. See also: Y chromosome, sex chromosome Y chromosome: One of the two sex chromosomes, X and Y. See also; X chromosome, sex chromosome",
      "The male heterogamety (XY) is the mostcommon reported system, but many specieshave female heterogamety (ZW), and moreoccasionally, multiple chromosome systems(Almeida-Toledo and Foresti, 2001; Devlinand Nagahama, 2002; Penman and Piferrer,2008). Given the low resolution of optical microscopy to differentiate sex chromosomes insh, researchers have looked for an alternativein the tenfold longer meiotic chromosomes todetect mispairing tracts at the synaptonemalcomplex as an indication of the sex differentiated region with variable success.The exclusive femaleconstitution of gynogenetic genomes providesinformation on the SD system, especially in aXX/XY system, where all female progenies areexpected. If ZZ/ZW is the underlying system,male offspring always will be present, but theinterpretation is more complex and will dependon the distance of the SD region to centromereand on the viability of WW offspring (Devlinand Nagahama, 2002; Penman and Piferrer,2008). Induced triploids, on the other hand, areconstituted by the combination of two femaleand one male genomes (Piferrer et al.",
      "The existence of a maternally silenced X-linked imprinted locus playing a role in social cognition could explain why males (X m Y) are more vulnerable to disorders of social cognition such as autism spectrum disorders than are females (X m X p ).The absence of the expression of this gene would not lead to autism itself, but would eliminate a putative protective factor, making an individual more susceptible to the effects of other ASD-predisposing genetic mutations or environmental factors.",
      "When meiosis takes place, a pair of chromosomes may fail to separate properly, creating a sperm or egg that has either two copies or no copy of a specific chromosome.This is a sporadic event and it is called nondisjunction.Nondisjunction can lead to an extra chromosome, called trisomy, or a missing chromosome, called monosomy (GHR, 2008l).Down syndrome is an example of trisomy.Individuals who have Down syndrome have an extra chromosome number 21. Turner syndrome is an example of monosomy.Girls who have Turner syndrome have only one X chromosome.This causes them to have short stature and be infertile (NHGRI, 2008l).X-Linked InheritanceX-linked genetic disorders (also called sex-linked) are caused by gene mutations on the X chromosome.Most often X-linked genetic disorders are seen in males.Males inherit the X chromosome from their mother and the Y chromosome from their father.Because males have only one X chromosome, if they inherit a gene mutation on the X chromosome from their mother, they will have the disorder.Examples of X-linked genetic disorders occurring in males include hemophilia and Duchenne muscular dystrophy (GHR, 2008o)."
    ]
  ],
  "task_id": [
    "19EBC6CA7E425D0C1279475BD0B411B6",
    "9CFE932D7898C83E473E590BC77B4FCB",
    "6DBC070B2E4DC2FE8036E5BA7480B755",
    "5594EA025D9631328071B6A1A7EF1375",
    "C6B9A982C9283DE065A3371F1264095C"
  ]
}