aboutsummaryrefslogtreecommitdiff
path: root/gnqa/paper1_eval/src/data/datasets/old/experts_general2_dataset.json
blob: 20e7d00317f5eef505b14af7e386763d5fc70f5a (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
{
  "question": [
    "What about recombination in human centromeres?",
    "What about recombination in the human genome?",
    "How can I add a new species to the GeneNetwork database?",
    "Describe the role of mitochondrial DNA in heredity and how it differs from nuclear DNA.",
    "what is ensembl?"
  ],
  "answer": [
    "Human centromeres contain the CENP-B box, a 17-bp motif that is bound by the centromere protein CENP-B. This motif is present on every chromosome except for the Y chromosome. It is necessary for the formation of de novo centromeres on artificial chromosomes, but it is not essential for the formation of neocentromeres. Mice lacking CENP-B are viable and fertile.",
    "The text mentions the application of site-specific recombinase technology, which allows investigators to engineer genes in the mouse that will allow for the deletion, insertion, inversion, or exchange of chromosomal DNA with high fidelity. However, it does not provide specific details about recombination in the human genome.",
    "The text does not provide specific instructions on how to add a new species to the GeneNetwork database. It is recommended to contact the administrators or operators of the GeneNetwork database for guidance on this process.",
    "Mitochondrial DNA (mtDNA) is inherited solely from the maternal lineage, unlike nuclear DNA which requires both paternal and maternal contributions. This uniparental transmission of mtDNA is ensured by complex mechanisms that eliminate paternal mitochondria from sperm during fertilization. mtDNA also exhibits a higher mutation rate than nuclear DNA, leading to significant population-level mtDNA polymorphisms. These polymorphisms can be clustered into distinct haplogroups that represent major branch points on the mitochondrial phylogenetic tree. In contrast, nuclear DNA undergoes bi-parental recombination.",
    "Ensembl is a joint project between the EBI and the Wellcome Trust Sanger Institute. It is a publicly available web resource that contains automatically annotated genomes and integrates data from a wide range of biological research sources. Launched in 1999, the Ensembl database was the first to provide a window on the draft genome, curating the results of a series of computational analyses. It provides comprehensive information about the human genome including variants like SNPs, insertions, deletions, and somatic mutations for several species. Ensembl also offers tools for data retrieval and analysis."
  ],
  "contexts": [
    [
      "Primate and mouse centromeric DNA contain the CENP-B box, a 17-bp motif that is bound by the centromere protein CENP-B.In humans, the CENP-B box is present on every chromosome except for the Y chromosome [20].It is paradoxically necessary for formation of de novo centromeres on artificial chromosomes, but it is not essential for the formation of neocentromeres, and mice lacking CENP-B are viable and fertile [21,22].",
      "Box 3 Mechanism of homologous recombination and end joining  The severe phenotype of the mouse mutants and the highly cancer-prone human syndromes highlight the importance of homologous recombination.Mouse KU mutants display sensitivity to agents that lead to breaks in DNA, and have immunological problems because the KU proteins are involved in V(D)J recombination of antibody gene sequences.In addition, these mutants display poor development, several features of premature ageing and increased apoptosis of postmitotic neurons in the developing brain.Mice with defects in DNA-PK cs (SCID mice) display a similar but generally milder phenotype.In contrast, XRCC4-and ligase IV-knockout mice seem more severe, with late embryonic lethality resulting from massive ATM-and p53-dependent neuronal apoptosis 33,38 .",
      "Cells in G1 have only the homologous chromosome for recombination repair.However, this may be difficult to find in the complex genome.Moreover, it is potentially dangerous as a template for repair as it may lead to homozygosity for recessive mutations.As an alternative, the end-joining reaction simply links ends of a DSB together, without any template, using the end-binding KU70/80 complex and DNA-PK cs , followed by ligation by XRCC4-ligase4 (reviewed by 27,33; see the right panel of the figure, stages V-VII).The function of KU70/80 might involve end protection and approximating the ends, in addition to a signalling function by DNA-PK cs .End joining may be further facilitated when the ends are still held together through nucleosomes or other structures.End joining is sometimes associated with gain or loss of a few nucleotides if internal microhomologies are used for annealing before sealing.This implies the involvement of DNA polymerases and/or nucleases.Note that the KU complex is also involved in telomere metabolism 27,62 .found to be lethal 34 .Inactivation of ATR by itself is inviable already at the blastocyst stage.Inactivation of BRCA1 and BRCA2 in mice is also embryonically lethal; cell lines display defects in homologous recombination [35][36][37] .",
      "371  A tentative scenario for the homologousrecombination reaction is depicted in the left panel of the figure.To promote strand invasion into homologous sequences, the 5\u1371-3\u1371 exonuclease activity of the RAD50/MRE11/NBS1 complex (also a substrate for ATM phosphorylation) exposes both 3\u1371 ends 30 (I).RPA facilitates assembly of a RAD51 nucleoprotein filament that probably includes RAD51-related proteins XRCC2, XRCC3, RAD51B, C and D. RAD52 stimulates filament assembly (II).RAD51 has, like its Escherichia coli RecA counterpart, the ability to exchange the single strand with the same sequence from a double-stranded DNA molecule.Correct positioning of the sister chromatids by cohesins probably facilitates the identification of a homologous sequence.A candidate for the complex chromatin transactions associated with these DNA gymnastics is RAD54, a member of the SWI/SNF family of DNA-dependent ATPases.After identification of the identical sister chromatid sequence, the intact double-stranded copy is used as a template to properly heal the broken ends by DNA synthesis (III).Finally, the so-called Hollidayjunctions are resolved by resolvases 27,33,60 (IV).Homologous recombination involves the simultaneous action of large numbers of the same molecules, which are found to be concentrated in radiation-induced nuclear foci.These depend on, and also include, the BRCA1 and BRCA2 proteins 36 .Recent evidence implicates BRCA2 directly or indirectly in nuclear translocation of RAD51 (ref.61).",
      "This picture poses more questions than it seeks to answer.Is the grouping of the regions by product rather than by type of region correct?Given that the recombina- tion fraction between HLA-A and HLA-B is of the order of .08%,and that this is likely to represent a distance of at least hundreds of thousands of nucleotides, how are the pieces put together over such relatively long distances?Is it possible that regions of the DNA loop out, so that transcripts can be made directly from noncon- tiguous DNA sequences, the loops being held in place by small RNAs as suggested for the control of splicing by Steitz, and her colleagues [24] and by others [25]?If these small RNAs are coded for well outside the HLA region, does this provide a mechanism for control of expression of products by unlinked genes, as may be the case for one of the constituent polypeptides of the HLA-DR product?What might be the nature of the signals that control which of a multiple set of alternative regions is expressed by any given chromosome?",
      "Mamm Genome. 2006; 17:220\u2013229. [PubMed: 16518689] 72. Romanoski CE, et al. Systems genetics analysis of gene-by-environment interactions in human cells. Am J Hum Genet. 2010; 86:399\u2013410. [PubMed: 20170901] 73. Myers S, Freeman C, Auton A, Donnelly P, McVean G. A common sequence motif associated with recombination hot spots and genome instability in humans. Nature Genet. 2008; 40:1124\u2013 1129. [PubMed: 19165926] 74. Myers S, et al. Drive against hotspot motifs in primates implicates the PRDM9 gene in meiotic recombination. Science. 2010; 327:876\u2013879. [PubMed: 20044541] 75. Cordell HJ. Detecting gene-gene interactions that underlie human diseases. Nature Rev Genet. 2009; 10:392\u2013404.",
      "Classification of common conserved sequences in mammalian intergenic regions. Hum. Mol. Genet. 2002, 11, 669\u2013674. 25. Zhu, L.; Swergold, G.D.; Seldin, M.F. Examination of sequence homology between human chromosome 20 and the mouse genome: Intense conservation of many genomic elements. Hum. Genet. 2003, 113, 60\u201370. 26. Pevzner, P.; Tesler, G. Human and mouse genomic sequences reveal extensive breakpoint reuse in mammalian evolution. Proc. Natl. Acad. Sci. USA 2003, 100, 7672\u20137677. 27. Christmann, R.B. ; Sampaio-Barros, P.; Stifano, G.; Borges, C.L. ; de Carvalho, C.R. ; Kairalla, R.; Parra, E.R. ; Spira, A.; Simms, R.; Capellozzi, V.L. ; et al.",
      "a The table lists proteins in which mutations have been shown to increase homologous recombination (HR), gross chromosomal rearrangements (GCRs), chromosomal instability (CIN), sister chromatid exchanges (SCEs), tri-nucleotide repeat expansions and contractions (TNR), telomere fusions (Tel fusion), or fragile telomeres (Tel fragility).A phenotype inside brackets ([ ]) indicates that it is caused by overexpression of the protein.For further details and references see Supplementary Table1.Abbreviations: DSB, double-strand break; PCNA, proliferating cell nuclear antigen; RFC, replication factor C complex; SCF, Skp1-Cdc53/Cullin-F-box.",
      "Figure 3 Intermediates and chromosome structural alterations, as observed by different techniques. (a) Replication fork stalling, as monitored by 2D-gel electrophoresis and Southern analysis in yeast (for details about the technique, see Reference 161). (b) Slower human replication forks covering shorter DNA synthesis tracks, as determined by incorporation of IdU and CldU via DNA combing (52), which permits visualization of the process of replication on DNA fibers. (c) Accumulation of double-strand breaks (DSBs) or replicative stress, as inferred by \u03b3H2AX foci or by \u03b3H2AX pan staining, respectively, in human cells. (d ) DSBs or ssDNA (single-stranded DNA) gaps as seen directly by nuclear \"comet tails\" via single-cell electrophoresis assays in human cells (52). (e) Sister-chromatid exchanges (SCEs), as determined by Giemsa staining in human cells (207). ( f ) Hyper-recombination, as determined by colony sectoring in yeast (5). ( g) Gross chromosomal rearrangements (GCRs), as determined by spectral karyotyping in mouse cells (118). (h) Translocations, as visualized by pulse-field gel electrophoresis in yeast (168). (i ) Fragile sites, as detected by mitotic spreads in human cells (109). ( j) Telomere fusions, as determined by CO-FISH (chromosome-orientation fluorescent in situ hybridization) in mouse cells (124). (k) Anaphase bridges, presumably resulting from unfinished replication, dicentric chromosomes, and sister-chromatid nondisjunction, as detected by fluorescence microscopy in mouse cells.Arrows indicate the specific structural alterations referred to in each panel; in panel h, closed and open arrows indicate the position where the translocated or missing parental chromosome migrate or should migrate, respectively.When necessary, a normal control is shown on top of the panel, with the exception of panel a, which is shown on the left.Detailed description of each technique can be found in the references provided.Photos are from the laboratories of A. Nussenzweig ( g), A. Losada (k), M. Blasco ( j), L. Tora (i ), and ours (all others).Abbreviations: HR, homologous recombination; NHEJ, nonhomologous end-joining.",
      "In humans, the pericentromeric region of chromosome 9 is densely packed with segmental genomic duplications (segdups) and is prone to microdeletions and microduplications. 5In order to evaluate this region for microdeletions and microduplications in family T, we screened genomic DNA from affected individual II-7 by arrayCGH with the Nimblegen HD2 platform with the previously described CHP-SKN sample 6 as the reference.Data were normalized and CNVs were called by identifying regions where Z-scores consistently deviated from the diploid mean.At 9q21.11, a genomic duplication of ~270 kb was apparent in the genomic DNA of II-7 (Figure 1D).The Genomic duplications may or may not be in tandem with their parent segment and may be either in the same or inverted orientation. 7We developed primers that would uniquely amplify genomic DNA with the duplication under each of these conditions.Forward (5 0 -CCCAGCAGA AGCAATGGTGGTAGCC-3 0 ) and reverse (5 0 -GGTGGTGAA TCCAAAAACACAAGAACAAAGTC-3 0 ) primers diagnostic for a tandem inverted duplication (Figure 2A) yielded products of expected size in family T relatives with hearing loss, but yielded no product in unaffected family T relatives (Figure 2B).Genotypes of all 58 participating relatives in family T indicated that the tandem inverted duplication was coinherited with hearing loss.The duplication spans approximately positions 71,705,804 to 71,974,823 (hg19) on chromosome 9 for a size of ~269,023 bp.The duplication includes the entire locus for the tight junction protein TJP2, which spans positions 71,788,971 to 71,870,124 (hg19).",
      "Chromosomal context of human NORs  Human NORs are positioned on the short arms of the acrocentric chromosomes that still remain unsequenced and thus missing from the current human genome draft, GRCh38.p7.Seeking an understanding of the chromosomal context of human NORs and to identify potential NOR regulatory elements, my laboratory has begun to characterize the sequences on both proximal (centromeric) and distal (telomeric) sides of the rDNA arrays (Fig. 3A; Floutsakou et al. 2013).Building on earlier reports of sequences distal and proximal to the rDNA array on HSA21 and HSA22, respectively (Worton et al. 1988;Sakai et al. 1995;Gonzalez and Sylvester 1997), 207 kb of sequence immediately proximal and 379 kb distal to rDNA arrays have been reported recently (Floutsakou et al. 2013).Consensus proximal junction (PJ) and distal junction (DJ) sequences were constructed mostly from chromosome 21 BACs (bacterial artificial chromosomes).Comparison of these sequences with BACs and cosmids derived from the other acrocentrics revealed that the PJ and DJ sequences are, respectively, \u223c95% and 99% identical between all five acrocentric chromosomes.Conservation of DJ sequences among the acrocentrics is consistent with frequent recombination between the rDNA arrays on each of the acrocentric chromosomes (Worton et al. 1988).However, conservation of PJ sequences suggests that there must also be frequent recombination events in the interval between the centromere and rDNA arrays.Proximal sequences are almost entirely segmentally duplicated, similar to the regions bordering centromeres.Consequently, they are unlikely to contain any specific elements that would regulate the activity of the linked NOR.In contrast, the distal sequence is predominantly unique to the acrocentric short arms and is dominated by a very large inverted repeat.Each arm of the inverted repeat is >100 kb, and they share an average sequence identity of 80%.There is a large (\u223c40-kb) block of a 48base-pair (bp) satellite repeat, CER, at the distal end of the DJ (Fig. 3A).CER blocks are found distal to the rDNA on all acrocentric chromosomes, with additional pericentromeric blocks on chromosomes 14 and 22. Finally, there are two blocks of a novel 138-bp tandem repeat, ACRO138, present within the DJ.",
      "The conservation of DJ sequence between the five human acrocentric chromosomes provides a unique opportunity to visualize NORs by FISH.Whereas the rDNA content of NORs can vary greatly, probing of human metaphase chromosome spreads with a DJ BAC results in signal that is consistent between NORs (Floutsakou et al. 2013).Using this probing scheme, it was observed that in most human cell lines analyzed, including multiple primary lines, at least one and sometimes as many as four of the NORs present have very little or no detectable rDNA (C van Vuuren and B McStay, unpubl. ).Many studies have used silver staining of metaphase spreads prepared from stimulated human peripheral blood lymphocytes to determine how many NORs are active in normal human cells.The number of active NORs ranges from seven to 10, with an average of eight (Heliot et al. 2000).Possibly, NORs with low rDNA content are active but fall below a detection threshold in silver staining.At this point, it is worth considering the distribution of active versus silent rDNA repeats in humans and other mammals.If 50% of rDNA repeats are truly repressed, there are insufficient \"silent\" NORs to house them.We must conclude that active NORs are a mosaic of active and silent repeats.",
      "However, excluding some cases, recombination suppression occurs in a small genomic tract where these genes are located, and it does not extend over most of the sex chromosome pair, as occurs in mammals and birds (Bergero and Charlesworth, 2009). It is not clear if this suppression occurs by the presence of inversions or as a modulation of the recombination mechanism itself, but both could be involved (Bergero and Charlesworth, 2009). Evidence of recombination in the SD region in sex reversal individuals supports the second hypothesis.",
      "Orthologous chromosomes between baboon and human",
      "Lichter P, Cremer T, Borden J, Manuelidis L, Ward DC (1988) Delineation of individual human chromosomes in metaphase and interphase cells by in situ suppression hybridization using recombinant DNA libraries. Hum Genet 80:224\u2013234 3. Jang W, Yonescu R, Knutsen T, Brown T, Reppert T, Sirotkin K, Schuler GD, Ried T, Kirsch IR (2006) Linking the human cytogenetic map with nucleotide sequence: the CCAP clone set. Cancer Genet Cytogenet 168:89\u201397 4.",
      "Nature Genet 1:222\u2013225 55. Foote S, Vollrath D, Hilton A, Page DC (1992) The human Y chromosome: overlapping DNA clones spanning the euchromatic region. Science 258:60\u201366 56. Chumakov IM, Rigault P, Le Gall I et al (1995) A YAC contig map of the human genome. Nature 377:175\u2013297 57. Hudson TJ, Stein LD, Gerety SS et al (1995) An STS-based map of the human genome. Science 270:1945\u20131954 58. Coffey AJ, Roberts RG, Green ED et al (1992) Construction of a 2.6-Mb contig in yeast artificial chromosomes spanning the human dystrophin gene using an STSbased approach. Genomics 12:474\u2013484 59.",
      "Figure 4 Schematic depiction of proposed mechanisms for observed intrachromosomal rearrangements.The blue and red arrows indicate the orientation of the integrated plasmid loci and the recovered mouse sequences, respectively, on the original non-rearranged chromosome (left column).All four combinations are given for an arbitrarily orientated chromosome (green line).The middle column shows how two breakpoints (lightning signs) could lead to the inversion or deletion of the encompassed chromosomal sequence (yellow-orange dual tone line) and result in a recoverable mutation in the right column.The last row indicates the two options for a transposition, in which either the transgene locus or the recovered mouse sequence is copied or excised (as indicated by the pink and light blue arrows) and integrates in the breakpoint at the other location.",
      "As mentioned above, by taking into account that for a genome rearrangement to be detected, the 5\u0408 plasmid sequence of the breakpoint in lacZ must remain intact and end immediately in front of the recovered mouse sequence, the simplest intrachromosomal mutation that could have taken place was inferred (Fig. 4).Rearrangements with breakpoints in the mouse genome on either site of the integrated plasmid concatamer, but with reversely orientated sequences, could be inversions (Fig. 4).Rearrangements in the direction of the integrated plasmids, proximal for chromosome 3 and distal for chromosome 4 (Fig. 3), with similarly orientated breakpoints in the mouse genome, could be deletions (Fig. 4).Rearrangements in the reverse direction of the integrated plasmids, with reversely orientated mouse sequences, are more complicated and might be owing to transpositions (Fig. 4).According to these schemes, half of the intrachromosomal rearrangements would have been inversions, whereas deletions and transpositions each made up one fourth (Fig. 3).Alternatively, these rearrangements could be explained by translocations involving the transgene clusters integrated on either the homolog or the other chromosome.",
      "FIGURE 3. Telomere arrays of chicken and human chromosomes: the chicken genome contains more telomere sequence than the human genome.Chicken (a) and human (b) metaphase chromosomes and interphase cells hybridized with a telomeric sequence-peptide nucleic acid (PNA)-fluorescein probe.Human and chicken slide preparations were processed, and images were captured using the same parameters.Qualitatively, the telomere-positive fluorescent signals (white spots) from chicken cells and chromosomes have greater intensity than those of human (4\u2032,6 diamidino-2-phenylindole, DAPI counterstain).",
      "In a previous study on the accumulation of spontaneous genome rearrangements in normal mice with aging, we discovered that 50% of the events were intrachromosomal, i.e., large deletions or inversions [22].In contrast, in this present study most of the rearrangements resulted from inter-chromosomal recombination, in both the Ercc1-mutant and control animals (Table 3).Previously, we used lacZ-plasmid line 60 mice with integration sites on Chromosomes 3 and 4, while in the present study line 30 mice were used with a single integration site on Chromosome 11.This indicates that the relative frequency of translocations is founder line specific and could be due to the position of the lacZ-plasmid cluster on the chromosome.Indeed, the chromosomal integration sites in line 60 mice are in the E1 region of Chromosome 3 (half way along the chromosome) and the C5 region of Chromosome 4 (two-thirds of the way along the chromosome) [22], while the integration site of founder line 30 (used in this study) is on the centromeric tip of Chromosome 11 (region A1-A2; not shown).The proximal location on Chromosome 11 prevents the detection of all but relatively small intra-chromosomal recombinations; larger events would lead to loss of the centromere and, therefore, the entire chromosome.If the orientation of the integration site in line 30, which is currently unknown, is towards the centromere, transpositions and inversions towards the distal end are the only detectable large intra-chromosomal rearrangements (for a detailed explanation of the different chromosomal events that can occur at the lacZ locus, see [22])."
    ],
    [
      "Genome Res, 2011, 21: 1769\u20131776 Mattick JS, Dinger ME. The extent of functionality in the human genome. HUGO J, 2013, 7, doi:10.1186/1877-6566-1187-1182 ENCODE Project Consortium, Bernstein BE, Birney E, Dunham I, Green ED, Gunter C, Snyder M. An integrated encyclopedia of DNA elements in the human genome. Nature, 2012, 489: 57\u201374 Pheasant M, Mattick JS. Raising the estimate of functional human sequences. Genome Res, 2007, 17: 1245\u20131253 Hu T, Long M, Yuan D, Zhu Z, Huang Y, Huang S. The genetic equidistance result, misreading by the molecular clock and neutral theory and reinterpretation nearly half of a century later.",
      "This approach enables, on the one hand, studying the process of mammalian evolution and, on the other hand, translational studies using model organisms of complex human phenotypes. Detection of regions conserved between distant species points to high functional importance of these fragments of the DNA sequence. Human and mouse developmental lines diverged about 75 million years ago, and ever since evolutionary forces shaped the two genotypes in a different manner (Waterston et al. , 2002). Nevertheless, the extent of the changes is, however, small enough for conservation of local gene order (Waterston et al. , 2002).",
      "First, the human and mouse genome projects elucidated the sequences of over 20,000 genes [Lander et al. , 2001; Venter et al. , 2001], and most are expressed in the CNS. The availability of gene sequences has allowed rapid analysis of candidate human disease and disorder genes and the isolation of the mouse homologues. Second, the application of site-speci\ufb01c recombinase technology provides investigators with the opportunity to engineer genes in the mouse that will allow for the deletion, insertion, inversion, or exchange of chromosomal DNA with high \ufb01delity (for review see Branda and Dymechi, 2004].",
      "In some cases, structural variations, such as copy number polymorphisms, exist (Feuk et al. , 2006); however, because of the nature of the genome assembly process, these will invariably be collapsed into a single contig that does not reflect the natural sequence. To address the technical challenges of whole-genome assembly, the human genome is released as defined \u2018builds\u2019 on a quarterly basis (Lander et al. , 2001; reviewed in Chapter 4). The increasing complexity of processes that map data to the genome implicitly involves some lag in availability of the most current sequence assembly.",
      "In practical terms, this has meant that we acquire many fragments, from a few hundred bases to a few hundred kilobases in length, of a genome that must then be assembled computationally to produce a continuous sequence. In the case of the human genome, two unfinished \u2018draft\u2019 sequences were produced by different methods, one by the International Human Genome Sequencing Consortium (IHGSC) and one by Celera Genomics (CG). The IHGSC began with a BAC (bacterial artificial chromosome) clone-based physical map of the genome (IHGSC, 2001).",
      "4 Assembling a View of the Human Genome Colin A. M. Semple Bioinformatics, MRC Human Genetics Unit, Edinburgh EH4 2XU, UK  4.1 Introduction The miraculous birth of the draft human genome sequence took place against the odds. It was only made possible by parallel revolutions in the technologies used to produce, store and analyse the sequence data, and by the development of new, large-scale consortia to organize and obtain funding for the work (Watson, 1990). The initial flood of human sequence has subsided as the sequencing centres have sequenced genomes from other mammalian orders and beyond.",
      "THE HUMAN GENOME PROJECT IS generating vast amounts of new information at breakneck speed and causing a fundamental shift in disease research.Now with the availability of a nearly complete, high-accuracy sequence of the mouse genome (7), a new and powerful paradigm for biomedical research is established.The remarkable similarity of mouse and human genomes, in both synteny and sequence, unconditionally validates the mouse as an exceptional model organism for understanding human biology.The discovery among inbred mouse strains of defined regions of high and low genomic variation inherited primarily from two ancestral Mus subspecies (6) holds great promise to make mapping and positional cloning more rapid and feasible.Haplotype maps of inbred mouse strains combined with sophisticated delineation of their phenotypic variation and gene expression patterns will enable complex trait analysis on an unprecedented scale.This issue of Journal of Applied Physiology highlights inbred strain surveys exploring phenotypic variation in drug responses [see Crabbe et al. (1) and Watters et al. (8)  in this issue].These mouse initiatives demonstrate a viable, cost-effective alternative to human research requiring family studies, population linkage analysis, or genome-wide genotyping on a multitude of individuals for association mapping.",
      "How Many Genes are There in the Human Genome?",
      "The Landscape of Human Genome Variation",
      "In some cases, structural variations, such as copy number polymorphisms, exist (Feuk et al. , 2006); however, because of the nature of the genome assembly process, these will invariably be collapsed into a single contig that does not reflect the natural sequence. To address the technical challenges of whole-genome assembly, the human genome is released as defined \u2018builds\u2019 on a quarterly basis (Lander et al. , 2001; reviewed in Chapter 4). The increasing complexity of processes that map data to the genome implicitly involves some lag in availability of the most current sequence assembly.",
      "In practical terms, this has meant that we acquire many fragments, from a few hundred bases to a few hundred kilobases in length, of a genome that must then be assembled computationally to produce a continuous sequence. In the case of the human genome, two unfinished \u2018draft\u2019 sequences were produced by different methods, one by the International Human Genome Sequencing Consortium (IHGSC) and one by Celera Genomics (CG). The IHGSC began with a BAC (bacterial artificial chromosome) clone-based physical map of the genome (IHGSC, 2001).",
      "4 Assembling a View of the Human Genome Colin A. M. Semple Bioinformatics, MRC Human Genetics Unit, Edinburgh EH4 2XU, UK  4.1 Introduction The miraculous birth of the draft human genome sequence took place against the odds. It was only made possible by parallel revolutions in the technologies used to produce, store and analyse the sequence data, and by the development of new, large-scale consortia to organize and obtain funding for the work (Watson, 1990). The initial flood of human sequence has subsided as the sequencing centres have sequenced genomes from other mammalian orders and beyond.",
      "Science 291:1304\u2013 1351 3. Lander ES et al (2001) Initial sequencing and analysis of the human genome. Nature 409:860\u2013921 4. Engle LJ, Simpson CL, Landers JE (2006) Using high-throughput SNP technologies to study cancer. Oncogene 25:1594\u20131601 5. Elston RC, Anne Spence M (2006) Advances in statistical human genetics over the last 25 years. Stat Med 25:3049\u20133080 6. Larson GP et al (2005) Genetic linkage of prostate cancer risk to the chromosome 3 region bearing FHIT. Cancer Res 65:805\u2013814 7. Botstein D, Risch N (2003) Discovering genotypes underlying human phenotypes: past successes for mendelian disease, future approaches for complex disease.",
      "McPherson JD, Marra M, Hillier L et al (2001) A physical map of the human genome. Nature 409:934\u2013941 13. Burke DT, Carle GF, Olson MV. (1987) Cloning of large segments of exogenous DNA into yeast by means of artificial chromosome vectors. Science 236:806\u2013812 14. Fleischmann RD, Adams MD, White O et al (1995) Whole-genome random sequencing and assembly of Haemophilus influenzae Rd Science 269:496\u2013512 15. Arabidopsis Genome Initiative (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408:796\u2013815 16.",
      "T he human genome has been cracked wide open in recent years and is spilling many of its secrets.More than 100 genome wide association studies have been conducted for scores of hu man diseases, identifying hun dreds of polymorphisms that are widely seen to influence disease risk.After many years in which the study of complex human traits was mired in false claims and methodologic inconsistencies, ge nomics has brought not only com prehensive representation of com mon variation but also welcome rigor in the interpretation of sta tistical evidence.Researchers now know how to properly account for most of the multiple hypothesis testing involved in mining the ge nome for associations, and most reported associations reflect real biologic causation.But do they matter?",
      "In some cases, structural variations, such as copy number polymorphisms, exist (Feuk et al. , 2006); however, because of the nature of the genome assembly process, these will invariably be collapsed into a single contig that does not reflect the natural sequence. To address the technical challenges of whole-genome assembly, the human genome is released as defined \u2018builds\u2019 on a quarterly basis (Lander et al. , 2001; reviewed in Chapter 4). The increasing complexity of processes that map data to the genome implicitly involves some lag in availability of the most current sequence assembly.",
      "In practical terms, this has meant that we acquire many fragments, from a few hundred bases to a few hundred kilobases in length, of a genome that must then be assembled computationally to produce a continuous sequence. In the case of the human genome, two unfinished \u2018draft\u2019 sequences were produced by different methods, one by the International Human Genome Sequencing Consortium (IHGSC) and one by Celera Genomics (CG). The IHGSC began with a BAC (bacterial artificial chromosome) clone-based physical map of the genome (IHGSC, 2001).",
      "In some cases, structural variations, such as copy number polymorphisms, exist (Feuk et al. , 2006); however, because of the nature of the genome assembly process, these will invariably be collapsed into a single contig that does not reflect the natural sequence. To address the technical challenges of whole-genome assembly, the human genome is released as defined \u2018builds\u2019 on a quarterly basis (Lander et al. , 2001; reviewed in Chapter 4). The increasing complexity of processes that map data to the genome implicitly involves some lag in availability of the most current sequence assembly.",
      "In practical terms, this has meant that we acquire many fragments, from a few hundred bases to a few hundred kilobases in length, of a genome that must then be assembled computationally to produce a continuous sequence. In the case of the human genome, two unfinished \u2018draft\u2019 sequences were produced by different methods, one by the International Human Genome Sequencing Consortium (IHGSC) and one by Celera Genomics (CG). The IHGSC began with a BAC (bacterial artificial chromosome) clone-based physical map of the genome (IHGSC, 2001).",
      "4 Assembling a View of the Human Genome Colin A. M. Semple Bioinformatics, MRC Human Genetics Unit, Edinburgh EH4 2XU, UK  4.1 Introduction The miraculous birth of the draft human genome sequence took place against the odds. It was only made possible by parallel revolutions in the technologies used to produce, store and analyse the sequence data, and by the development of new, large-scale consortia to organize and obtain funding for the work (Watson, 1990). The initial flood of human sequence has subsided as the sequencing centres have sequenced genomes from other mammalian orders and beyond."
    ],
    [
      "The hierarchical organization of GN\u2019s main Select and Search menu is simple and makes it relatively easy to find relevant data sets (Fig. 1). To get data, after opening the browser, select the most appropriate Species from the dropdown menu. For an open-ended search of phenotypes you can also select All Species at the bottom of the menu. The next steps are to select the Group, Type, and Data Set from the drop-down menus. For many groups, a combination of phenotypes, genotypes, and molecular data are available.",
      "GeneNetwork contains data from a wide range of species, from humans to soybeans, but most of the available phenotypic data is from mice. Within the mouse dataset there are groups of families, crosses, non-genetic groupings, and individual data. The type of dataset must be selected after defining the species and sample population. While genotypes, mRNA, methylated DNA, protein, metagenomic, and 2 bioRxiv preprint doi: https://doi.org/10.1101/2020.12.23.424047; this version posted December 24, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. metabolome datasets are available (i.e.",
      "The hierarchical organization of GN\u2019s main Select and Search menu is simple and makes it relatively easy to find relevant data sets (Fig. 1). To get data, after opening the browser, select the most appropriate Species from the dropdown menu. For an open-ended search of phenotypes you can also select All Species at the bottom of the menu. The next steps are to select the Group, Type, and Data Set from the drop-down menus. For many groups, a combination of phenotypes, genotypes, and molecular data are available.",
      "Search and Data Retrieval Point your browser to www.genenetwork.org. This brings you by default to the Search page, from which you can retrieve data from many GN data sets. We will focus on the default data set, defined by Species: Mouse, Group: BXD, Type: Whole Brain, Database: INIA Brain mRNA M430 (Apr05) PDNN Enter \u201cKcnj*\u201d into the ALL or ANY field and click the Search button. Note the location and annotation of available potassium channel genes in the Search Results page that opens. Use the browser Back button to return to previous page.",
      "Add information on data provenance by giving details in Investigation, Protocols and ProtocolApplications  Customize Customize \u2018my\u2019 XGAP database with extended variants of Trait and Subject. In the online XGAP demonstrator, Probe traits have a sequence and genome location and Strain subjects have parent strains and (in)breeding method. Describe extensions using MOLGENIS language and the generator automatically changes XGAP database software to your research Upload  Upload data from measurement devices, public databases, collaborating XGAP databases, or a public XGAP repository with community data.",
      "However, a suitable and customizable integration of these elements to support high throughput genotype-tophenotype experiments is still needed [34]: dbGaP, GeneNetwork and the model organism databases are designed as international repositories and not to serve as general data infrastructure for individual projects; many of the existing bespoke data models are too complicated and specialized, hard to integrate between profiling technologies, or lack software support to easily connect to new analysis tools; and customization of the existing infrastructures dbGaP, GeneNetwork or other international repositories [35,36] or assembly of Bioconductor and generic model organism database components to suit particular experimental designs, organisms and biotechnologies still requires many minor and sometimes major manual changes in the software code that go beyond what individual lab bioinformaticians can or should do, and result in duplicated efforts between labs if attempted.",
      ", 2014; see Section 9). GeneNetwork is a database that enables searching for \u223c4000 phenotypes from multiple studies in the BXD, HXB, and in other recombinant inbred rodent families, as well as in other model organisms and even humans (Mulligan et al. , 2017). GeneNetwork employed a somewhat di\ufb00erent strategy than MPD in that it did not rely solely on researchers submitting their data. Instead the database operators extracted the data from the scienti\ufb01c literature and integrated them into a uniform format (Chesler et al. , 2003).",
      "GeneNetwork contains data from a wide range of species, from humans to soybeans, but most of the available phenotypic data is from mice. Within the mouse dataset there are groups of families, crosses, non-genetic groupings, and individual data. The type of dataset must be selected after defining the species and sample population. While genotypes, mRNA, methylated DNA, protein, metagenomic, and 2 bioRxiv preprint doi: https://doi.org/10.1101/2020.12.23.424047; this version posted December 24, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. metabolome datasets are available (i.e.",
      "However, a suitable and customizable integration of these elements to support high throughput genotype-to-phenotype experiments is still needed[340]: dbGaP, GeneNetwork and the model organism databases are designed as international repositories and not to serve as general data infrastructure for individual projects; many of the existing bespoke data models are too complicated and specialized, hard to integrate between pro\ufb01ling technologies, or lack software support to easily connect to new analysis tools; and customization of the existing infrastructures dbGaP, GeneNetwork or other international repositories[384, 154] or assembly of Bioconductor and generic model organism database components to suit particular experimental designs, organisms and biotechnologies still requires many minor and sometimes major manual changes 38 2.1.",
      "All data presented in this paper were deposited in the online database GeneNetwork (www.genenetwork.org), an open web resource that contains genotypic, gene expression, and phenotypic data from several genetic reference populations of multiple species (e.g. mouse, rat and human) and various cell types and tissues.35;36 It provides a valuable tool to integrate gene networks and phenotypic traits, and also allows cross-cell type and cross-species comparative gene expression and eQTL analyses.",
      "There is a good chance that you will be able to apply these new techniques to specific problems, even while you read. If you have a computer with an Internet connection\u2014so much the better, and you can read and work along at the same time. This short review and primer will take you on a tour of a web site called GeneNetwork that embeds many large data sets that are relevant to studies of behavioral variation. GeneNetwork is an unusual site because it contains a coherent \"universe\" of data, as well as many powerful analytic tools.",
      "The GeneNetwork database provides open access to BXD and other RI strain derived microarray data, single nucleotide polymorphism (SNP) data, and phenotypic data for quantitative trait loci analysis and gene expression correlation analyses. Gene expression data were exported for manually selected probes in the PDNN hippocampus database (Hippocampus Consortium M430v2), and the PDNN whole brain database (INIA Brain mRNA M430). The Hippocampus database was chosen as one of the most elaborate brain databases, as well as most highly recommended dataset on GeneNetwork itself (http://www.genenetwork.org/ webqtl/main.py?FormID=sharinginfo&GN_AccessionId=112).",
      "2016) and can also be accessed in GeneNetwork by entering Record ID 18494 in the Get Any space on the Search page and clicking on the Search button. Alternatively, enter data by hand into the designated boxes provided by GeneNetwork. These latter options also allow for the inclusion of trait variance. It is a good idea to name the trait in the box provided. Then click Next, and manually enter the data for each RI strain, F1, and founder strain. 3  Author Manuscript  After entering the data, click on the blue plus sign button called Add.",
      "To submit multiple phenotypes at the same time, select the option for Batch Submission under the Home tab. This allows users to submit up to 100 traits for analysis by GeneNetwork. Here, select BXD as the cross or RI set to analyze from the first pull-down menu. The phenotype file should follow the format described in the Sample text (http:// genenetwork.org/sample.txt). After uploading the appropriate file using the Browse button, enter a name for the file in the Dataset space. The data will be stored in the GeneNetwork server for 24 hours. Click Next.",
      "Author Manuscript  Materials Here we will provide detailed instructions for using GeneNetwork along with some \u201cworked\u201d examples taken from the recent study of intravenous cocaine self-administration by Dickson et al. (2016) in BXD RI mice. A complete overview of GeneNetwork is beyond the scope of this protocol, but is extensively covered in elsewhere (see Mulligan et al. 2016; Williams & Mulligan 2012 for excellent reviews on GeneNetwork). A computer with an internet connection and current web browser. See the GeneNetwork.org site for information on supported browser versions. Author Manuscript  Method Entering Data  Author Manuscript  1  Link to http://www.genenetwork.org.",
      "Species in GenAge model organisms",
      "Data are reviewed before entry in GeneNetwork by the senior author. Phenotypes are currently split into 15 broad phenotypic categories (Supplementary Data 1). Phenome curation and description was initiated by R.W.W. and Dr Elissa Chesler in 2002 by literature review and data extraction. The early work is described brie\ufb02y in Chesler et al.51,52. Most work over the past 5 years has been performed by two of the coauthors (R.W.W. and M.K.M.). We have used a controlled vocabulary and set of rules described here (http://www.genenetwork.org/faq.html#Q-22).",
      "9) To bring your data to GeneWeaver, click on the GeneWeaver icon, making sure to be previously login to your GeneWeaver account. You will be brought to the GeneSet upload page with the Genes Uploaded and the Geneweaver Analysis Platform  139  Fig. 5 Default settings at GeneNetwork.org are set to search \u201cMouse\u201d, \u201cPhenotypes\u201d, from among the \u201cBXD Published Phenotypes\u201d data set. Here the term nociception was searched for  Fig. 6 The search results page in GeneNetwork showing the 33 records retrieved from the phenotype search for nociception.",
      "Users may also share their data with other users selectively, make it public, or keep it restricted to a private account. Data can be imported by users, uploading their gene set data directly or exporting to GeneWeaver from within another online resource such as Neuro Informatics Framework (NIF) [8], Grappa [9], Mouse Phenome Database (MPD) [10] or GeneNetwork [11]. These datasets can then be added to your collection to be analyzed together with other gene sets retrieved from the GeneWeaver database. To begin a GeneWeaver analysis a user must collect \u201cGeneSets\u201d together in a \u201cProject\u201d.",
      "Alternatively the spreadsheet can be saved as a .txt file and uploaded by clicking on \u201cSwitch to file upload.\u201d Once complete click on upload GeneSet. 7. Once completed you are taken to the GeneSet detail page. If there are errors in your uploaded data you can correct them by clicking on \u201cEdit\u201d. 8. Use the Add Selected to Project, and create a new project, e.g. \u201cChronic Cocaine\u201d. 9. Now using the Search function populate this project with additional gene sets related to this study trying Queries such as \u201cCocaine Addiction\u201d, \u201cChronic Cocaine\u201d."
    ],
    [
      "Oxidative stress and mitochondrial DNA  Not long after it was discovered that mitochondria have their own genetic apparatus, Harman proposed that mitochondria play a central role in the free radical theory of aging [16].This idea was developed further by Miquel et al. [330], and the notion that mtDNA mutagenesis played a role in aging took hold.The phenotypical importance of mutations in mtDNA was demonstrated by Wallace et al. [331] and Holt et al. [332], who first showed that Leber's hereditary optic neuropathy and mitochondrial myopathies were caused by mtDNA mutations (reviewed in [333]).Because mtDNA is so close to the site of mitochondrial ROS production, it is exposed to considerably higher oxidative stress, resulting in 3-fold higher levels of DNA oxidative damage (the previously quoted 20-fold figure is apparently due to an isolation artifact [334,335]).In the 1990s a series of papers reported that the frequency of mitochondrial DNA deletions increases dramatically with age, being essentially undetectable in young individuals and reaching levels as high as 2% of mtDNA in old individuals.This age-related increase in mtDNA deletions was found in organisms as diverse as worms, mice, and humans (reviewed in [24,336]).The same is also true with mtDNA point mutations [337,338].Certain mtDNA polymorphisms have been found in increased frequency in centenarians, implying a protective effect during aging [339][340][341].Similar protective effects of mtDNA polymorphisms have been reported for the age-related neurodegenerative condition, Parkinson's disease [342].",
      "Variation in the structure and function of mitochondria underlies variation in organismal energetics broadly (Seebacher et al., 2010) and evidence for the importance of mitochondrial function in the evolution of natural populations continues to accumulate (Ballard and Melvin, 2010;Glanville et al., 2012;Hicks et al., 2012;Kurbalija Novi\u010di\u0107 et al., 2015).For example, variation in mitochondrial DNA sequences (mtDNA) can determine whole-organism metabolism, i.e., the rate at which organisms process energy from their environment, a phenomenon widespread across animal taxa (Arnqvist et al., 2010;Ballard et al., 2007;Ballard and Pichaud, 2014;Havird et al., 2019;Hood et al., 2018;James et al., 2016;Wolff et al., 2014).Specifically, mtDNA sequence variants are linked to functional metabolic differences in fish (Chapdelaine et al., 2020;Flight et al., 2011;Healy et al., 2019), birds (Scott et al., 2011), and mammals (Fontanillas et al., 2005), including humans (Amo and Brand, 2007;Dato et al., 2004;Niemi et al., 2003;Tranah et al., 2011).These mtDNA variants are often correlated with environmental factors such as temperature and altitude (Storz et al., 2010).However, other studies attempting to link mitochondrial function to mitochondrial DNA (mtDNA) sequence variation or environmental factors have offered mixed reports (Amo and Brand, 2007;Flight et al., 2011;Fontanillas et al., 2005;Hicks et al., 2012).",
      "The results here point to several potentially fruitful research directions.We have identified how nonsynonymous mutations in the mitochondrial genome associate with variation in whole-organism metabolism (including CytB, ND1, ND5 and ND6).A next step will be to characterize the molecular details of how these changes affect molecular function.It would also be beneficial to describe how variation in cellular oxygen consumption rate scales up to determine whole-organism metabolic rate across a range of temperatures, thus identifying potential mismatches across levels of organization that may impact organismal performance (Gangloff and Telemeco, 2018).While the interconnected processes that shape organismal and population-level responses to environmental variation do not lend themselves to simple narratives, and many molecular processes interact to produce the emergent ecotypic divergences at the phenotypic level, it is clear that the mitochondria play a central role even as that role may change across populations and ecological contexts (Fig. 1).Research within well-characterized natural systems, such as these garter snake populations, can offer illustrative case studies of how mitochondria respond to their environments, and thus impact physiological pathways and evolutionary patterns, creating variation in life histories and aging.",
      "Despite the complexities underlying observed variation in mitochondrial function, recent work has demonstrated examples of how evolution and plasticity in mitochondrial function across populations within a species can shape life histories.For example, evidence from Drosophila has demonstrated the effect of temperature on components of the ETC and has linked mtDNA variants to metabolic thermosensitivity (Pichaud et al., 2012), to differences in whole-organism metabolic rates (Kurbalija Novi\u010di\u0107 et al., 2015), and to fitness-related traits (Ballard et al., 2007;Pichaud et al., 2011;Pichaud et al., 2010).In general, studies in birds and mammals demonstrate that mitochondria of longer-lived species are more efficient in ATP production, produce less reactive oxygen species, and demonstrate increased antioxidant capacities (Barja and Herrero, 2000;Ku et al., 1993;Lambert et al., 2007).While some studies in lizards and snakes demonstrate a similar pattern (Olsson et al., 2008;Robert et al., 2007), the extent to which these results are generalizable across vertebrate taxa is not yet known.The diversity of life-history traits and immense variation in longevity demonstrated by reptiles, both within and among species, make these taxa ideal candidates for understanding how variation in mitochondrial physiology drives this variation in whole-organism traits (reviewed in Hoekstra et al., 2019).Such work has moved to the forefront with a recent focus on the ecological and evolutionary significance of aging processes in wild populations (reviewed in Nussey et al., 2013;Fletcher and Selman, 2015;Gaillard and Lema\u00eetre, 2020).",
      "Over evolutionary time, differential mortality rates are a selective force in shaping genetic structure.This results in divergence of a variety of physiological networks that shape, ultimately, patterns of aging and longevity in different habitats (Monaghan et al., 2008;Stojkovi\u0107 et al., 2017).Such selective pressures can have differential effects on the nuclear and mitochondrial genomes (McKenzie et al., 2019;Wolff et al., 2014).Genetic variation in the mitochondrial genome is known to drive mitochondrial function in many species (Ballard and Melvin, 2010;McKenzie et al., 2019;Novelletto et al., 2016) and we find this in our system as well.Whole organism metabolic rate varies with the mitochondrial genome haplogroups we identified in this study.T. elegans individuals with the introgressed T. sirtalis mitochondrial genome had the lowest metabolic rate and had 68 amino acid changes in the ETC genes relative to the T. elegans mitochondrial genomes.As species divergence are a continuation of population divergence, this introgression provides additional insight into how genetic variation can alter mitochondrial function.Whether the lower metabolic rate in our snakes with the introgressed mitochondrial genome is due to the fixed amino acid changes between the species or a mismatch between the coadapted nuclear and mitochondrially-encoded ETC proteins that could alter function of the mitochondria (Burton et al., 2013;Haenel, 2017;Rawson and Burton, 2002;Toews et al., 2014;Wolff et al., 2014) will require further comparisons to T. sirtalis individuals.",
      "Building on previous work in this system, the current study tests three primary hypotheses about how variation in mtDNA and mitochondrial function relate to variation in life-history traits and aging within this system (Fig. 1): (1) First, we test whether rates of cellular oxygen consumption in isolated immune cells exhibit patterns that are consistent with the hypothesis that cellular processes drive whole-organism senescence and aging, and if these patterns differ between the SA and FA ecotypes and between sexes.By measuring basal, ATP-production associated, and maximal rates of cellular oxygen consumption, we further test for evidence that phenotypic divergence is dependent on a specific aspect of oxidative phosphorylation within immune cells.The energetics of these cells are particularly important given their essential role in modulating disease and infection, important factors contributing to senescence (Metcalf et al., 2019).We predict that SA snakes will maintain levels of cellular oxygen consumption across age, whereas the FA snakes will show a decline with age, especially in ATP-associated rates, possibly due to continual degradation of electron transport chain functionality from accumulating oxidative damage and reduced DNA repair mechanisms (Robert and Bronikowski, 2010;Schwartz and Bronikowski, 2013). ( 2) Second, we expand our mitochondrial genomics dataset to quantify mtDNA genetic structure across the landscape and test whether mtDNA haplotypes, and alleles at a nonsynonymous SNP in the Cytochrome B (CytB) gene correlate with aging ecotypes. (3) Third, we test the hypothesis that variation in mtDNA correlates with whole-organism variation in metabolic rates, suggesting a pathway linking mitochondrial genetic variation in mtDNA to whole-organism energetics.We first test whether different haplotypes differ in resting metabolic rate.Then, we test the effects of the nonsynonymous SNP in CytB on resting metabolic rate.The CytB gene encodes a component of complex III of the ETC, and was previously found to segregate between these life-history ecotypes (Schwartz et al., 2015).This SNP results in an amino acid substitution from isoleucine (aliphatic, hydrophobic) to threonine (hydrophilic) on a region that comes into close contact with a nuclear-encoded subunit (Schwartz et al., 2015).We combine previously published and new data on whole-organism resting metabolic rates (oxygen consumption) to test for the effects of this nonsynonymous mutation in three populations where we find heterogeneity at this nucleotide, thus allowing us to disentangle the effects of shared environment (population) from sequence variation (SNP).We predict that this SNP will correlate with variation in whole-organism metabolic rate, demonstrating a putatively adaptive difference between the derived and ancestral sequence.By utilizing this integrative data setfrom genes to organelles to whole organisms to populationsin a known life-history context, we are able to test hypotheses across levels of organization to provide a more complete picture of the complicated story of mitochondria and life history (Havird et al., 2019).",
      "mtDNA Diversity  Unlike the nuclear genome, which requires both paternal and maternal contributions, mtDNA is inherited solely from the maternal lineage.It is unclear what advantage a uniparental mtDNA transmission confers, but one possibility is to minimize the number of distinct genomes to maximize the efficiency of a multi-genomic system (Hill et al. 2019).In fact, humans have developed complex, redundant mechanisms to ensure uniparental inheritance of mtDNA (DeLuca and O'Farrell 2012; Rojansky et al. 2016).Paternal mitochondria from sperms that enter into the egg during fertilization are actively and selectively eliminated via mitophagy through two E3 ligases, PARKIN, and MUL1 (Rojansky et al. 2016).PARKIN and MUL1 serve redundant purposes, and mitophagy becomes insufficient to eliminate paternal mtDNA only in the absence of both (Rojansky et al. 2016).Even though oocytes have  at least a thousand-fold more mitochondria than a sperm cell (Rojansky et al. 2016) and heteroplasmy levels would be very low if paternal mtDNA were to contaminate the embryo, the results can still be non-trivial.However, challenging this notion, a recent study provides evidence of potential paternal transmission (Luo et al. 2018), but awaits further corroborating studies (Lutz-Bonengel and Parson 2019).",
      "MtDNA exhibit a higher mutation rate than nuclear DNA, leading to significant population-level mtDNA polymorphisms (van Oven and Kayser 2009; Wallace 1999; Wallace and Chalkia 2013).In fact, the co-evolution of the mitonuclear genomes has been proposed to be driven by mtDNA mutations that select for compensatory changes in the nuclear genome (Havird and Sloan 2016).Populations that share similar mtDNA polymorphisms can be clustered into distinct haplogroups that are designated using all letters of the alphabet (i.e., A through Z).The mtDNA haplogroups represent major branch points on the mitochondrial phylogenetic tree that have strong regional ties around the globe, thus supporting the concept of a 'mitochondrial eve' (Wallace 1999).Haplogroups present inherently different mitonuclear interactions (Zaidi and Makova 2019), which eventually affect the aging process (Wolff et al. 2016).For example, one haplogroup commonly found in Ashkenazi Jews can interact with a specific enrichment of an amino acid sequence in complex I, and result in altered susceptibility to type 2 diabetes mellitus (Gershoni et al. 2014).The effect of mitonuclear compatibility on lifespan is influenced by environmental cues in flies (Drummond et al. 2019).It is unclear if mitonuclear compatibility is invariable throughout an organism's life, or antagonistically pleiotropic during aging, making it a difficult moving target to understand.",
      "Background: The accumulation of mitochondrial DNA (mtDNA) mutations, and the reduction of mtDNA copy number, both disrupt mitochondrial energetics, and may contribute to aging and age-associated phenotypes.However, there are few genetic and epidemiological studies on the spectra of blood mtDNA heteroplasmies, and the distribution of mtDNA copy numbers in different age groups and their impact on age-related phenotypes.In this work, we used whole-genome sequencing data of isolated peripheral blood mononuclear cells (PBMCs) from the UK10K project to investigate in parallel mtDNA heteroplasmy and copy number in 1511 women, between 17 and 85 years old, recruited in the TwinsUK cohorts.",
      "Discussion  Two significant questions are raised by the findings that mitochondrial DNA can integrate into the nucleus.Firstly, is this an extraordinarily rare event or is it occurring continually and at high frequency?Secondly, can such an event have pathological consequences to the organism?",
      "Phylogeny  The mtDNA is maternally inherited (120) by offspring through the oocyte cytoplasm; namely, the mother transmits her mtDNAs to all of her offspring, and her daughters transmit their mtDNAs to the next generation.This is the consequence of the fact that the mature oocyte such as mouse (304) or bovine (144) contains lOO-1,000 times more mtDNA than is found in somatic cells.Hence, the few sperm mtDNAs that enter the egg (130) have little effect on the genotype.The maternal inheritance results in sequentially diverged mtDNA polymorphism of modern human, as shown in Figure 2. The polymorphism derives from the combinations of small deletions and additions of <14 bp in noncoding region and base substitutions including some point mutations in coding region.",
      "There have been few reports on distinct correlation between mitochondrial morphology and human aging, except changes in number and size of mitochondria associated with age.Concerning the gross structure of mitochondria, the overwhelming importance of the cell nucleus in mitochondrial biogenesis should be noted, because the major parts of mitochondrial proteins are encoded by nuclear genes that are stable during life with the efficient repair mechanism for nDNA.",
      "Early data on DNA polymorphism detected by restriction endonuclease (263) have suggested that the evolutionary change of mtDNA in higher animals occurs mainly by nucleotide substitution rather than by deletion and insertion.The mtDNA nucleotide sequence evolves 6-17 times faster than comparable nuclear DNA gene sequences (51,52,405).Rapid evolution of mtDNA of higher primates including human, 0.02 base substitutions per site per million years, was calculated from the restriction map of mtDNA (51).Because orthodox recombination mechanism appears to be absent in mtDNA (128), germline mutation seems to go down to posterity as maternal inheritance from our common ancestor (57).",
      "A number of conclusions may be drawn from these results.Firstly, the data begin to answer the question of how closely mtDNA replication is kept in synchrony with nuclear DNA replication: it would appear to be regulated not by direct coupling to the nuclear DNA replication, but rather by the cell mass to be serviced by mitochondria.",
      "It may be that high mtDNA levels are indeed indicative of compromised mitochondria, but that the underlying defects are unrelated to alterations in the DNA sequence.Alternatively, elevated quantities of mtDNA might be associated with increased metabolic requirements of the embryo, rather than organelles of suboptimal function.It is possible that embryos produced by older oocytes are under some form of stress and therefore have larger energy requirements.Functional experiments will be required to address these questions.Whatever the underlying basis, the current study has unequivocally demonstrated that female reproductive aging is associated with changes in the mtDNA content at the blastocyst stage.",
      "Age-associated alterations of the mitochondrial genome occur in several different species; however, their physiological relevance remains unclear.The age-associated changes of mitochondrial DNA (mtDNA) include nucleotide point mutations and modifications, as well as deletions.In this review, we summarize the current literature on age-associated mtDNA mutations and deletions and comment on their abundance.A clear need exists for a more thorough evaluation of the total damage to the mitochondrial genome that accumulates in aged tissues.\u1b67 1997 Elsevier Science Inc.",
      "Mitochondrial genetics  One underexplored avenue for determining maternal risk for preterm birth involves the influence of the mitochondrial genome.The high mutation rate of mito chondrial DNA (mtDNA), together with the fact that most of its encoded proteins are evolutionarily con served, allowing for the selection of neutral or beneficial variants, has generated interest in defining human mtDNA variations and their roles in human biology [58].",
      "Clearly, as mitochondrial metabolic and genetic therapies advance for treating mitochondrial disease, they will also be available to enhance the personal lives of others.However, mitochondrial genetic variation appears to have been one of the primary factors that permitted our ancestors to adapt to new environments, survive adverse conditions, and multiple throughout the globe.Is it possible that by taking over control of individual mtDNA variation, we might also be setting our species on the road to functional decline and ultimately extinction?",
      "Mitochondrial therapeutics and performance enhancement  It is now clear that not all mtDNA variation is deleterious.Indeed, about 25% of all ancient mtDNA variation appears to have caused functional mitochondrial changes and thus been adaptive.Those mtDNA variants that are adapted to warm climates have mtDNA variants that result in tightly coupled OXPHOS, thus maximizing ATP output and minimizing heat production.The presence of these mtDNAs permits maximum muscle performance but also predispose sedentary individuals that consume excess calories to multiple problems.They would be prone to be overweight and their mitochondria would generate excessive ROS, thus making them susceptible to a variety of degenerative diseases, cancer and premature aging.Partially uncoupled mitochondria generate more heat, but at the expense of ATP production.Individual's with these variants are better able to tolerate the cold, and are less prone to obesity.They also generate less ROS making then resistant to degenerative diseases and aging.Finally, the mitochondria are why we breathe.Hence, mitochondrial variation might be an important factor in individual predisposition to altitude sickness.",
      "Human mtDNA codes for 13 essential polypeptide components of the mitochondrial oxidative phosphorylation (OXPHOS) system.mtDNA undergoes strict maternal inheritance, resulting in the absence of bi-parental recombination (Elson et al., 2001) and has a high mutation rate (Tuppen et al., 2010).As such, the evolution of mtDNA is characterised by the emergence of distinct lineages (or haplogroups) (Hernstadt et al., 2002).This results in high levels of mtDNA variation at the population level despite its rather small size, which is also illustrated by the large number of sub-haplogroups (van Oven and Kayser, 2009).Africa"
    ],
    [
      "Annotation, preprocessing and categorization of data  We used Ensembl (version 39) as the annotation reference database.Homology between human and mouse genes was derived via BioMart.The total number of genes under study comprises 15,277 Ensembl mouse genes representing the union of the homologue genes from all data sources.An overview about the T2DM specific datasets is given in Table 1.",
      "But the four sites are not equivalent; there are important distinctions between them in terms of the data analysed, the analyses carried out and the way the results are displayed. 4.4.1 Ensembl Ensembl is a joint project between the EBI (http://www.ebi.ac.uk/) and the Wellcome Trust Sanger Institute (http://www.sanger.ac.uk/). The Ensembl database (Hubbard et al. , 2002; http://www.ensembl.org/), launched in 1999, was the first to provide a window on the draft genome, curating the results of a series of computational analyses.",
      "Until January 2002 (Release 3.26.1), Ensembl used the UCSC draft sequence assemblies as its starting point, but it is now based upon NCBI assemblies. The Ensembl analysis pipeline consists of a rule-based system designed to mimic decisions made by a human annotator. The idea is to identify \u2018confirmed\u2019 genes that are computationally predicted (by the GENSCAN gene prediction program) and also supported by a significant BLAST match to one or more expressed sequences or proteins. Ensembl also identifies the positions of known human genes from public sequence database entries, usually using GENEWISE to predict their exon structures.",
      "Data retrieval is extremely well catered for in Ensembl, with text searches of all database entries, BLAST searches of all sequences archived, and the availability of bulk downloads of all Ensembl data and even software source code. Ensembl annotation can also be viewed interactively on one\u2019s local machine with the Apollo viewer (Lewis et al. , 2002; http://www.fruitfly.org/annot/apollo/). 4.4.2 The UCSC Human Genome Browser The UCSC Human Genome Browser (UCSC) bears many similarities to Ensembl; it, too, provides annotation of the NCBI assemblies, and it displays a similar array of features, including confirmed genes from Ensembl.",
      "But the four sites are not equivalent; there are important distinctions between them in terms of the data analysed, the analyses carried out and the way the results are displayed. 4.4.1 Ensembl Ensembl is a joint project between the EBI (http://www.ebi.ac.uk/) and the Wellcome Trust Sanger Institute (http://www.sanger.ac.uk/). The Ensembl database (Hubbard et al. , 2002; http://www.ensembl.org/), launched in 1999, was the first to provide a window on the draft genome, curating the results of a series of computational analyses.",
      "Until January 2002 (Release 3.26.1), Ensembl used the UCSC draft sequence assemblies as its starting point, but it is now based upon NCBI assemblies. The Ensembl analysis pipeline consists of a rule-based system designed to mimic decisions made by a human annotator. The idea is to identify \u2018confirmed\u2019 genes that are computationally predicted (by the GENSCAN gene prediction program) and also supported by a significant BLAST match to one or more expressed sequences or proteins. Ensembl also identifies the positions of known human genes from public sequence database entries, usually using GENEWISE to predict their exon structures.",
      "Data retrieval is extremely well catered for in Ensembl, with text searches of all database entries, BLAST searches of all sequences archived, and the availability of bulk downloads of all Ensembl data and even software source code. Ensembl annotation can also be viewed interactively on one\u2019s local machine with the Apollo viewer (Lewis et al. , 2002; http://www.fruitfly.org/annot/apollo/). 4.4.2 The UCSC Human Genome Browser The UCSC Human Genome Browser (UCSC) bears many similarities to Ensembl; it, too, provides annotation of the NCBI assemblies, and it displays a similar array of features, including confirmed genes from Ensembl.",
      "Ensembl provides a DAS reference server giving access to a wide range of specialist annotations of the human genome (for more detail, see http://www.ensembl.org/das/). Data mining The ability to query very large databases in order to satisfy a hypothesis (\u2018top-down\u2019 data mining), or to interrogate a database in order to generate new hypotheses based on rigorous statistical correlations (\u2018bottom-up\u2019 data mining). Domain (protein) A region of special biological interest within a single protein sequence.",
      "But the four sites are not equivalent; there are important distinctions between them in terms of the data analysed, the analyses carried out and the way the results are displayed. 4.4.1 Ensembl Ensembl is a joint project between the EBI (http://www.ebi.ac.uk/) and the Wellcome Trust Sanger Institute (http://www.sanger.ac.uk/). The Ensembl database (Hubbard et al. , 2002; http://www.ensembl.org/), launched in 1999, was the first to provide a window on the draft genome, curating the results of a series of computational analyses.",
      "Until January 2002 (Release 3.26.1), Ensembl used the UCSC draft sequence assemblies as its starting point, but it is now based upon NCBI assemblies. The Ensembl analysis pipeline consists of a rule-based system designed to mimic decisions made by a human annotator. The idea is to identify \u2018confirmed\u2019 genes that are computationally predicted (by the GENSCAN gene prediction program) and also supported by a significant BLAST match to one or more expressed sequences or proteins. Ensembl also identifies the positions of known human genes from public sequence database entries, usually using GENEWISE to predict their exon structures.",
      "Data retrieval is extremely well catered for in Ensembl, with text searches of all database entries, BLAST searches of all sequences archived, and the availability of bulk downloads of all Ensembl data and even software source code. Ensembl annotation can also be viewed interactively on one\u2019s local machine with the Apollo viewer (Lewis et al. , 2002; http://www.fruitfly.org/annot/apollo/). 4.4.2 The UCSC Human Genome Browser The UCSC Human Genome Browser (UCSC) bears many similarities to Ensembl; it, too, provides annotation of the NCBI assemblies, and it displays a similar array of features, including confirmed genes from Ensembl.",
      "Ensembl provides a DAS reference server giving access to a wide range of specialist annotations of the human genome (for more detail, see http://www.ensembl.org/das/). Data mining The ability to query very large databases in order to satisfy a hypothesis (\u2018top-down\u2019 data mining), or to interrogate a database in order to generate new hypotheses based on rigorous statistical correlations (\u2018bottom-up\u2019 data mining). Domain (protein) A region of special biological interest within a single protein sequence.",
      "Ensembl  Ensembl is a publicly available web resource that contains automatically annotated genomes.It is integrated with other available biological databases like Jasper for binding motifs.It is a much larger web resource than T1Dbase, and contains general information about the human genome including variants.These include SNPs, insertions, deletions and somatic mutations (Alterations in DNA that occur after conception, meaning that they are not inherited) for several species.Data from Ensembl can be accessed in a number of ways.The names of all the SNPs that occur in the T1D susceptibility regions can be collected from Ensembl using the Biomart tool (Kinsella et al., 2011).To achieve this, the coordinates of the T1D regions obtained from T1Dbase are uploaded to the biomart query page which allows one to search the genome browser and retrieve data like the names, chromosomal positions, and genic positions (referred to as \"consequence to transcript\", in Ensembl) of the SNPs.The SNP genic positions tell if a SNP is located within a gene, adjacent to a gene or whether they occur in inter-genic positions between gene coding regions, as well as the particular genes in which they are located.",
      "Advantages of Ensembl:  There is a number of advantages to using Ensembl. (i) It is a larger web resource than T1Dbase and integrates data from a wide range of biological research sources into its database.Therefore, available information is quite comprehensive. (ii) Genic positions for 99% of the variants obtained from T1Dbase could be retrieved. (iii) Ensembl contains quality checks for genetic variants in its variation pipeline.A variant is flagged as failed if certain quality criteria are not met, for instance if none of the variant alleles match the reference allele of the variant.Generally, Ensembl was found to give more detailed information regarding the genic positions of variants compared to T1Dbase.",
      "Information about genes, including gene names, chromosomal coordinates, biotype (coding or non-coding), and number of splice variants, can also be retrieved from Ensembl.",
      "doi:10.1093/nar/gkp858 Cunningham F, Amode MR, Barrell D, Beal K, Billis K, Brent S, Carvalho-Silva D, Clapham P, Coates G, Fitzgerald S, Gil L, Giron CG, Gordon L, Hourlier T, Hunt SE, Janacek SH, Johnson N, Juettemann T, Kahari AK, Keenan S, Martin FJ, Maurel T, McLaren W, Murphy DN, Nag R, Overduin B, Parker A, Patricio M, Perry E, Pignatelli M, Riat HS, Sheppard D, Taylor K, Thormann A, Vullo A, Wilder SP, Zadissa A, Aken BL, Birney E, Harrow J, Kinsella R, Muffato M, Ruffier M, Searle SM, Spudich G, Trevanion SJ, Yates A, Zerbino DR, Flicek P (2015) Ensembl 2015.",
      "But the four sites are not equivalent; there are important distinctions between them in terms of the data analysed, the analyses carried out and the way the results are displayed. 4.4.1 Ensembl Ensembl is a joint project between the EBI (http://www.ebi.ac.uk/) and the Wellcome Trust Sanger Institute (http://www.sanger.ac.uk/). The Ensembl database (Hubbard et al. , 2002; http://www.ensembl.org/), launched in 1999, was the first to provide a window on the draft genome, curating the results of a series of computational analyses.",
      "Until January 2002 (Release 3.26.1), Ensembl used the UCSC draft sequence assemblies as its starting point, but it is now based upon NCBI assemblies. The Ensembl analysis pipeline consists of a rule-based system designed to mimic decisions made by a human annotator. The idea is to identify \u2018confirmed\u2019 genes that are computationally predicted (by the GENSCAN gene prediction program) and also supported by a significant BLAST match to one or more expressed sequences or proteins. Ensembl also identifies the positions of known human genes from public sequence database entries, usually using GENEWISE to predict their exon structures.",
      "Data retrieval is extremely well catered for in Ensembl, with text searches of all database entries, BLAST searches of all sequences archived, and the availability of bulk downloads of all Ensembl data and even software source code. Ensembl annotation can also be viewed interactively on one\u2019s local machine with the Apollo viewer (Lewis et al. , 2002; http://www.fruitfly.org/annot/apollo/). 4.4.2 The UCSC Human Genome Browser The UCSC Human Genome Browser (UCSC) bears many similarities to Ensembl; it, too, provides annotation of the NCBI assemblies, and it displays a similar array of features, including confirmed genes from Ensembl.",
      "Ensembl provides a DAS reference server giving access to a wide range of specialist annotations of the human genome (for more detail, see http://www.ensembl.org/das/). Data mining The ability to query very large databases in order to satisfy a hypothesis (\u2018top-down\u2019 data mining), or to interrogate a database in order to generate new hypotheses based on rigorous statistical correlations (\u2018bottom-up\u2019 data mining). Domain (protein) A region of special biological interest within a single protein sequence."
    ]
  ]
}