aboutsummaryrefslogtreecommitdiff
path: root/gnqa/paper1_eval/src/data/datasets/human/dataset_citizenscientist_general_2.json
blob: c8cde59c00b3d6f0f8b73c379fa81b70740f1770 (about) (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
{
  "question": [
    "What about recombination in human centromeres?",
    "How does recombination work in human centromeres?",
    "What about recombination in the human genome?",
    "Create a how to guide for genetic sequencing ",
    "What is the significance of the length of telomeres? "
  ],
  "answer": [
    "Human centromeres contain the CENP-B box, a 17-bp motif that is bound by the centromere protein CENP-B. This motif is present on every chromosome except for the Y chromosome. It is necessary for the formation of de novo centromeres on artificial chromosomes, but it is not essential for the formation of neocentromeres. Mice lacking CENP-B are viable and fertile.",
    "In human centromeres, recombination involves the CENP-B box, a 17-bp motif that is bound by the centromere protein CENP-B. This motif is present on every chromosome except for the Y chromosome. It is necessary for the formation of de novo centromeres on artificial chromosomes, but it is not essential for the formation of neocentromeres. Despite this, mice lacking CENP-B are viable and fertile. This suggests that recombination in human centromeres involves complex interactions between specific DNA motifs and proteins.",
    "The text mentions the application of site-specific recombinase technology, which allows for the deletion, insertion, inversion, or exchange of chromosomal DNA with high fidelity. This suggests that recombination, a process that can cause these types of genetic changes, is a significant aspect of the human genome.",
    "1. Start with an initial linkage or association in a genetic study.\n2. Use bioinformatics to analyze the genomic sequence, which is a common thread through every stage.\n3. Establish true orthology between genes in different species, especially if one is identified to play a role in a disease model.\n4. Apply the approaches to sequence data from any collection of organisms, with an emphasis on human genetics.\n5. Understand the genome structure and content to provide context for subsequent discussions.\n6. Utilize genotyping and sequencing technologies to produce, store, and analyze the sequence data.\n7. Use the genome sequence as a framework for integration of genetic and biological data.\n8. Analyze short-read, whole genome, DNA sequences.\n9. Perform comparative analysis of the genome sequences from members of a family to define sequencing errors and genetic heterozygosity.\n10. Track sequence changes/inconsistencies in inheritance from parent to offspring.",
    "The length of telomeres is significant as it is associated with aging and disease. Shorter telomeres are considered a sign of advanced age and have been linked to age-related diseases, mortality, and higher risk of heart disease and infection-related death. Longer telomeres, on the other hand, are associated with protection from age-related diseases, better cognitive function, and healthier aging. Telomere length can also be influenced by factors such as stress, with high-stress situations potentially leading to faster telomere shortening and thus, accelerated aging."
  ],
  "contexts": [
    [
      "Primate and mouse centromeric DNA contain the CENP-B box, a 17-bp motif that is bound by the centromere protein CENP-B.In humans, the CENP-B box is present on every chromosome except for the Y chromosome [20].It is paradoxically necessary for formation of de novo centromeres on artificial chromosomes, but it is not essential for the formation of neocentromeres, and mice lacking CENP-B are viable and fertile [21,22].",
      "Box 3 Mechanism of homologous recombination and end joiningThe severe phenotype of the mouse mutants and the highly cancer-prone human syndromes highlight the importance of homologous recombination.Mouse KU mutants display sensitivity to agents that lead to breaks in DNA, and have immunological problems because the KU proteins are involved in V(D)J recombination of antibody gene sequences.In addition, these mutants display poor development, several features of premature ageing and increased apoptosis of postmitotic neurons in the developing brain.Mice with defects in DNA-PK cs (SCID mice) display a similar but generally milder phenotype.In contrast, XRCC4-and ligase IV-knockout mice seem more severe, with late embryonic lethality resulting from massive ATM-and p53-dependent neuronal apoptosis 33,38 .Cells in G1 have only the homologous chromosome for recombination repair.However, this may be difficult to find in the complex genome.Moreover, it is potentially dangerous as a template for repair as it may lead to homozygosity for recessive mutations.As an alternative, the end-joining reaction simply links ends of a DSB together, without any template, using the end-binding KU70/80 complex and DNA-PK cs , followed by ligation by XRCC4-ligase4 (reviewed by 27,33; see the right panel of the figure, stages V-VII).The function of KU70/80 might involve end protection and approximating the ends, in addition to a signalling function by DNA-PK cs .End joining may be further facilitated when the ends are still held together through nucleosomes or other structures.End joining is sometimes associated with gain or loss of a few nucleotides if internal microhomologies are used for annealing before sealing.This implies the involvement of DNA polymerases and/or nucleases.Note that the KU complex is also involved in telomere metabolism 27,62 .found to be lethal 34 .Inactivation of ATR by itself is inviable already at the blastocyst stage.Inactivation of BRCA1 and BRCA2 in mice is also embryonically lethal; cell lines display defects in homologous recombination [35][36][37] .371A tentative scenario for the homologousrecombination reaction is depicted in the left panel of the figure.To promote strand invasion into homologous sequences, the 5-3 exonuclease activity of the RAD50/MRE11/NBS1 complex (also a substrate for ATM phosphorylation) exposes both 3 ends 30 (I).RPA facilitates assembly of a RAD51 nucleoprotein filament that probably includes RAD51-related proteins XRCC2, XRCC3, RAD51B, C and D. RAD52 stimulates filament assembly (II).RAD51 has, like its Escherichia coli RecA counterpart, the ability to exchange the single strand with the same sequence from a double-stranded DNA molecule.Correct positioning of the sister chromatids by cohesins probably facilitates the identification of a homologous sequence.A candidate for the complex chromatin transactions associated with these DNA gymnastics is RAD54, a member of the SWI/SNF family of DNA-dependent ATPases.After identification of the identical sister chromatid sequence, the intact double-stranded copy is used as a template to properly heal the broken ends by DNA synthesis (III).Finally, the so-called Hollidayjunctions are resolved by resolvases 27,33,60 (IV).Homologous recombination involves the simultaneous action of large numbers of the same molecules, which are found to be concentrated in radiation-induced nuclear foci.These depend on, and also include, the BRCA1 and BRCA2 proteins 36 .Recent evidence implicates BRCA2 directly or indirectly in nuclear translocation of RAD51 (ref.61).",
      "This picture poses more questions than it seeks to answer.Is the grouping of the regions by product rather than by type of region correct?Given that the recombina- tion fraction between HLA-A and HLA-B is of the order of .08%,and that this is likely to represent a distance of at least hundreds of thousands of nucleotides, how are the pieces put together over such relatively long distances?Is it possible that regions of the DNA loop out, so that transcripts can be made directly from noncon- tiguous DNA sequences, the loops being held in place by small RNAs as suggested for the control of splicing by Steitz, and her colleagues [24] and by others [25]?If these small RNAs are coded for well outside the HLA region, does this provide a mechanism for control of expression of products by unlinked genes, as may be the case for one of the constituent polypeptides of the HLA-DR product?What might be the nature of the signals that control which of a multiple set of alternative regions is expressed by any given chromosome?",
      "Mamm Genome. 2006; 17:220229. [PubMed: 16518689]72. Romanoski CE, et al. Systems genetics analysis of gene-by-environment interactions in humancells. Am J Hum Genet. 2010; 86:399410. [PubMed: 20170901]73. Myers S, Freeman C, Auton A, Donnelly P, McVean G. A common sequence motif associatedwith recombination hot spots and genome instability in humans. Nature Genet. 2008; 40:11241129. [PubMed: 19165926]74. Myers S, et al. Drive against hotspot motifs in primates implicates the PRDM9 gene in meioticrecombination. Science. 2010; 327:876879. [PubMed: 20044541]75. Cordell HJ. Detecting gene-gene interactions that underlie human diseases. Nature Rev Genet. 2009; 10:392404.",
      "Classification of common conserved sequences in mammalianintergenic regions. Hum. Mol. Genet. 2002, 11, 669674. 25. Zhu, L.; Swergold, G.D.; Seldin, M.F. Examination of sequence homology between humanchromosome 20 and the mouse genome: Intense conservation of many genomic elements. Hum. Genet. 2003, 113, 6070. 26. Pevzner, P.; Tesler, G. Human and mouse genomic sequences reveal extensive breakpoint reuse inmammalian evolution. Proc. Natl. Acad. Sci. USA 2003, 100, 76727677. 27. Christmann, R.B. ; Sampaio-Barros, P.; Stifano, G.; Borges, C.L. ; de Carvalho, C.R. ; Kairalla, R.;Parra, E.R. ; Spira, A.; Simms, R.; Capellozzi, V.L. ; et al.",
      "a The table lists proteins in which mutations have been shown to increase homologous recombination (HR), gross chromosomal rearrangements (GCRs), chromosomal instability (CIN), sister chromatid exchanges (SCEs), tri-nucleotide repeat expansions and contractions (TNR), telomere fusions (Tel fusion), or fragile telomeres (Tel fragility).A phenotype inside brackets ([ ]) indicates that it is caused by overexpression of the protein.For further details and references see Supplementary Table1.Abbreviations: DSB, double-strand break; PCNA, proliferating cell nuclear antigen; RFC, replication factor C complex; SCF, Skp1-Cdc53/Cullin-F-box.Figure 3 Intermediates and chromosome structural alterations, as observed by different techniques. (a) Replication fork stalling, as monitored by 2D-gel electrophoresis and Southern analysis in yeast (for details about the technique, see Reference 161). (b) Slower human replication forks covering shorter DNA synthesis tracks, as determined by incorporation of IdU and CldU via DNA combing (52), which permits visualization of the process of replication on DNA fibers. (c) Accumulation of double-strand breaks (DSBs) or replicative stress, as inferred by H2AX foci or by H2AX pan staining, respectively, in human cells. (d ) DSBs or ssDNA (single-stranded DNA) gaps as seen directly by nuclear \"comet tails\" via single-cell electrophoresis assays in human cells (52). (e) Sister-chromatid exchanges (SCEs), as determined by Giemsa staining in human cells (207). ( f ) Hyper-recombination, as determined by colony sectoring in yeast (5). ( g) Gross chromosomal rearrangements (GCRs), as determined by spectral karyotyping in mouse cells (118). (h) Translocations, as visualized by pulse-field gel electrophoresis in yeast (168). (i ) Fragile sites, as detected by mitotic spreads in human cells (109). ( j) Telomere fusions, as determined by CO-FISH (chromosome-orientation fluorescent in situ hybridization) in mouse cells (124). (k) Anaphase bridges, presumably resulting from unfinished replication, dicentric chromosomes, and sister-chromatid nondisjunction, as detected by fluorescence microscopy in mouse cells.Arrows indicate the specific structural alterations referred to in each panel; in panel h, closed and open arrows indicate the position where the translocated or missing parental chromosome migrate or should migrate, respectively.When necessary, a normal control is shown on top of the panel, with the exception of panel a, which is shown on the left.Detailed description of each technique can be found in the references provided.Photos are from the laboratories of A. Nussenzweig ( g), A. Losada (k), M. Blasco ( j), L. Tora (i ), and ours (all others).Abbreviations: HR, homologous recombination; NHEJ, nonhomologous end-joining.",
      "In humans, the pericentromeric region of chromosome 9 is densely packed with segmental genomic duplications (segdups) and is prone to microdeletions and microduplications. 5In order to evaluate this region for microdeletions and microduplications in family T, we screened genomic DNA from affected individual II-7 by arrayCGH with the Nimblegen HD2 platform with the previously described CHP-SKN sample 6 as the reference.Data were normalized and CNVs were called by identifying regions where Z-scores consistently deviated from the diploid mean.At 9q21.11, a genomic duplication of ~270 kb was apparent in the genomic DNA of II-7 (Figure 1D).The Genomic duplications may or may not be in tandem with their parent segment and may be either in the same or inverted orientation. 7We developed primers that would uniquely amplify genomic DNA with the duplication under each of these conditions.Forward (5 0 -CCCAGCAGA AGCAATGGTGGTAGCC-3 0 ) and reverse (5 0 -GGTGGTGAA TCCAAAAACACAAGAACAAAGTC-3 0 ) primers diagnostic for a tandem inverted duplication (Figure 2A) yielded products of expected size in family T relatives with hearing loss, but yielded no product in unaffected family T relatives (Figure 2B).Genotypes of all 58 participating relatives in family T indicated that the tandem inverted duplication was coinherited with hearing loss.The duplication spans approximately positions 71,705,804 to 71,974,823 (hg19) on chromosome 9 for a size of ~269,023 bp.The duplication includes the entire locus for the tight junction protein TJP2, which spans positions 71,788,971 to 71,870,124 (hg19).",
      "Chromosomal context of human NORsHuman NORs are positioned on the short arms of the acrocentric chromosomes that still remain unsequenced and thus missing from the current human genome draft, GRCh38.p7.Seeking an understanding of the chromosomal context of human NORs and to identify potential NOR regulatory elements, my laboratory has begun to characterize the sequences on both proximal (centromeric) and distal (telomeric) sides of the rDNA arrays (Fig. 3A; Floutsakou et al. 2013).Building on earlier reports of sequences distal and proximal to the rDNA array on HSA21 and HSA22, respectively (Worton et al. 1988;Sakai et al. 1995;Gonzalez and Sylvester 1997), 207 kb of sequence immediately proximal and 379 kb distal to rDNA arrays have been reported recently (Floutsakou et al. 2013).Consensus proximal junction (PJ) and distal junction (DJ) sequences were constructed mostly from chromosome 21 BACs (bacterial artificial chromosomes).Comparison of these sequences with BACs and cosmids derived from the other acrocentrics revealed that the PJ and DJ sequences are, respectively, 95% and 99% identical between all five acrocentric chromosomes.Conservation of DJ sequences among the acrocentrics is consistent with frequent recombination between the rDNA arrays on each of the acrocentric chromosomes (Worton et al. 1988).However, conservation of PJ sequences suggests that there must also be frequent recombination events in the interval between the centromere and rDNA arrays.Proximal sequences are almost entirely segmentally duplicated, similar to the regions bordering centromeres.Consequently, they are unlikely to contain any specific elements that would regulate the activity of the linked NOR.In contrast, the distal sequence is predominantly unique to the acrocentric short arms and is dominated by a very large inverted repeat.Each arm of the inverted repeat is >100 kb, and they share an average sequence identity of 80%.There is a large (40-kb) block of a 48base-pair (bp) satellite repeat, CER, at the distal end of the DJ (Fig. 3A).CER blocks are found distal to the rDNA on all acrocentric chromosomes, with additional pericentromeric blocks on chromosomes 14 and 22. Finally, there are two blocks of a novel 138-bp tandem repeat, ACRO138, present within the DJ.The conservation of DJ sequence between the five human acrocentric chromosomes provides a unique opportunity to visualize NORs by FISH.Whereas the rDNA content of NORs can vary greatly, probing of human metaphase chromosome spreads with a DJ BAC results in signal that is consistent between NORs (Floutsakou et al. 2013).Using this probing scheme, it was observed that in most human cell lines analyzed, including multiple primary lines, at least one and sometimes as many as four of the NORs present have very little or no detectable rDNA (C van Vuuren and B McStay, unpubl. ).Many studies have used silver staining of metaphase spreads prepared from stimulated human peripheral blood lymphocytes to determine how many NORs are active in normal human cells.The number of active NORs ranges from seven to 10, with an average of eight (Heliot et al. 2000).Possibly, NORs with low rDNA content are active but fall below a detection threshold in silver staining.At this point, it is worth considering the distribution of active versus silent rDNA repeats in humans and other mammals.If 50% of rDNA repeats are truly repressed, there are insufficient \"silent\" NORs to house them.We must conclude that active NORs are a mosaic of active and silent repeats.",
      "However, excluding some cases, recombinationsuppression occurs in a small genomic tractwhere these genes are located, and it doesnot extend over most of the sex chromosomepair, as occurs in mammals and birds (Bergeroand Charlesworth, 2009). It is not clear if thissuppression occurs by the presence of inversions or as a modulation of the recombinationmechanism itself, but both could be involved(Bergero and Charlesworth, 2009). Evidence ofrecombination in the SD region in sex reversalindividuals supports the second hypothesis.",
      "Orthologous chromosomes between baboon and human",
      "Lichter P, Cremer T, Borden J, Manuelidis L, Ward DC (1988) Delineation ofindividual human chromosomes in metaphase and interphase cells by in situ suppression hybridization using recombinant DNA libraries. Hum Genet 80:2242343. Jang W, Yonescu R, Knutsen T, Brown T, Reppert T, Sirotkin K, Schuler GD, RiedT, Kirsch IR (2006) Linking the human cytogenetic map with nucleotide sequence:the CCAP clone set. Cancer Genet Cytogenet 168:89974.NatureGenet 1:22222555. Foote S, Vollrath D, Hilton A, Page DC (1992) The human Y chromosome: overlapping DNA clones spanning the euchromatic region. Science 258:606656. Chumakov IM, Rigault P, Le Gall I et al (1995) A YAC contig map of the humangenome. Nature 377:17529757. Hudson TJ, Stein LD, Gerety SS et al (1995) An STS-based map of the humangenome. Science 270:1945195458. Coffey AJ, Roberts RG, Green ED et al (1992) Construction of a 2.6-Mb contig inyeast artificial chromosomes spanning the human dystrophin gene using an STSbased approach. Genomics 12:47448459.",
      "Figure 4 Schematic depiction of proposed mechanisms for observed intrachromosomal rearrangements.The blue and red arrows indicate the orientation of the integrated plasmid loci and the recovered mouse sequences, respectively, on the original non-rearranged chromosome (left column).All four combinations are given for an arbitrarily orientated chromosome (green line).The middle column shows how two breakpoints (lightning signs) could lead to the inversion or deletion of the encompassed chromosomal sequence (yellow-orange dual tone line) and result in a recoverable mutation in the right column.The last row indicates the two options for a transposition, in which either the transgene locus or the recovered mouse sequence is copied or excised (as indicated by the pink and light blue arrows) and integrates in the breakpoint at the other location.As mentioned above, by taking into account that for a genome rearrangement to be detected, the 5 plasmid sequence of the breakpoint in lacZ must remain intact and end immediately in front of the recovered mouse sequence, the simplest intrachromosomal mutation that could have taken place was inferred (Fig. 4).Rearrangements with breakpoints in the mouse genome on either site of the integrated plasmid concatamer, but with reversely orientated sequences, could be inversions (Fig. 4).Rearrangements in the direction of the integrated plasmids, proximal for chromosome 3 and distal for chromosome 4 (Fig. 3), with similarly orientated breakpoints in the mouse genome, could be deletions (Fig. 4).Rearrangements in the reverse direction of the integrated plasmids, with reversely orientated mouse sequences, are more complicated and might be owing to transpositions (Fig. 4).According to these schemes, half of the intrachromosomal rearrangements would have been inversions, whereas deletions and transpositions each made up one fourth (Fig. 3).Alternatively, these rearrangements could be explained by translocations involving the transgene clusters integrated on either the homolog or the other chromosome.",
      "FIGURE 3. Telomere arrays of chicken and human chromosomes: the chicken genome contains more telomere sequence than the human genome.Chicken (a) and human (b) metaphase chromosomes and interphase cells hybridized with a telomeric sequence-peptide nucleic acid (PNA)-fluorescein probe.Human and chicken slide preparations were processed, and images were captured using the same parameters.Qualitatively, the telomere-positive fluorescent signals (white spots) from chicken cells and chromosomes have greater intensity than those of human (4,6 diamidino-2-phenylindole, DAPI counterstain).",
      "In a previous study on the accumulation of spontaneous genome rearrangements in normal mice with aging, we discovered that 50% of the events were intrachromosomal, i.e., large deletions or inversions [22].In contrast, in this present study most of the rearrangements resulted from inter-chromosomal recombination, in both the Ercc1-mutant and control animals (Table 3).Previously, we used lacZ-plasmid line 60 mice with integration sites on Chromosomes 3 and 4, while in the present study line 30 mice were used with a single integration site on Chromosome 11.This indicates that the relative frequency of translocations is founder line specific and could be due to the position of the lacZ-plasmid cluster on the chromosome.Indeed, the chromosomal integration sites in line 60 mice are in the E1 region of Chromosome 3 (half way along the chromosome) and the C5 region of Chromosome 4 (two-thirds of the way along the chromosome) [22], while the integration site of founder line 30 (used in this study) is on the centromeric tip of Chromosome 11 (region A1-A2; not shown).The proximal location on Chromosome 11 prevents the detection of all but relatively small intra-chromosomal recombinations; larger events would lead to loss of the centromere and, therefore, the entire chromosome.If the orientation of the integration site in line 30, which is currently unknown, is towards the centromere, transpositions and inversions towards the distal end are the only detectable large intra-chromosomal rearrangements (for a detailed explanation of the different chromosomal events that can occur at the lacZ locus, see [22])."
    ],
    [
      "Primate and mouse centromeric DNA contain the CENP-B box, a 17-bp motif that is bound by the centromere protein CENP-B.In humans, the CENP-B box is present on every chromosome except for the Y chromosome [20].It is paradoxically necessary for formation of de novo centromeres on artificial chromosomes, but it is not essential for the formation of neocentromeres, and mice lacking CENP-B are viable and fertile [21,22].",
      "Box 3 Mechanism of homologous recombination and end joiningThe severe phenotype of the mouse mutants and the highly cancer-prone human syndromes highlight the importance of homologous recombination.Mouse KU mutants display sensitivity to agents that lead to breaks in DNA, and have immunological problems because the KU proteins are involved in V(D)J recombination of antibody gene sequences.In addition, these mutants display poor development, several features of premature ageing and increased apoptosis of postmitotic neurons in the developing brain.Mice with defects in DNA-PK cs (SCID mice) display a similar but generally milder phenotype.In contrast, XRCC4-and ligase IV-knockout mice seem more severe, with late embryonic lethality resulting from massive ATM-and p53-dependent neuronal apoptosis 33,38 .Cells in G1 have only the homologous chromosome for recombination repair.However, this may be difficult to find in the complex genome.Moreover, it is potentially dangerous as a template for repair as it may lead to homozygosity for recessive mutations.As an alternative, the end-joining reaction simply links ends of a DSB together, without any template, using the end-binding KU70/80 complex and DNA-PK cs , followed by ligation by XRCC4-ligase4 (reviewed by 27,33; see the right panel of the figure, stages V-VII).The function of KU70/80 might involve end protection and approximating the ends, in addition to a signalling function by DNA-PK cs .End joining may be further facilitated when the ends are still held together through nucleosomes or other structures.End joining is sometimes associated with gain or loss of a few nucleotides if internal microhomologies are used for annealing before sealing.This implies the involvement of DNA polymerases and/or nucleases.Note that the KU complex is also involved in telomere metabolism 27,62 .found to be lethal 34 .Inactivation of ATR by itself is inviable already at the blastocyst stage.Inactivation of BRCA1 and BRCA2 in mice is also embryonically lethal; cell lines display defects in homologous recombination [35][36][37] .371A tentative scenario for the homologousrecombination reaction is depicted in the left panel of the figure.To promote strand invasion into homologous sequences, the 5-3 exonuclease activity of the RAD50/MRE11/NBS1 complex (also a substrate for ATM phosphorylation) exposes both 3 ends 30 (I).RPA facilitates assembly of a RAD51 nucleoprotein filament that probably includes RAD51-related proteins XRCC2, XRCC3, RAD51B, C and D. RAD52 stimulates filament assembly (II).RAD51 has, like its Escherichia coli RecA counterpart, the ability to exchange the single strand with the same sequence from a double-stranded DNA molecule.Correct positioning of the sister chromatids by cohesins probably facilitates the identification of a homologous sequence.A candidate for the complex chromatin transactions associated with these DNA gymnastics is RAD54, a member of the SWI/SNF family of DNA-dependent ATPases.After identification of the identical sister chromatid sequence, the intact double-stranded copy is used as a template to properly heal the broken ends by DNA synthesis (III).Finally, the so-called Hollidayjunctions are resolved by resolvases 27,33,60 (IV).Homologous recombination involves the simultaneous action of large numbers of the same molecules, which are found to be concentrated in radiation-induced nuclear foci.These depend on, and also include, the BRCA1 and BRCA2 proteins 36 .Recent evidence implicates BRCA2 directly or indirectly in nuclear translocation of RAD51 (ref.61).",
      "Homologous Recombination RepairDuring HR, broken DNA ends are first recognized by the Mre11-Rad50-Xrs2 (MRX) complex (Mre11-Rad50-Nbs1 (MRN) complex in mammals) and are processed by MRE11 to 3' single-strand DNA (ssDNA) tails through a series of 5' 3' strand resection activities [6,9,45].The C-terminus of NBS1 interacts with ATM and recruits it to DSBs [46].ATM belongs to the phosphatidylinositol-3-like kinaserelated kinase (PIKK) family and plays an important role in the propagation of the initial DSB lesion by phosphorylating a number of downstream substrates.In undamaged cells, ATM forms inactive dimers or multimers.Upon induction of DSBs, ATM is autophosphorylated at serine 1981, leading to its dissociation into activated monomers [47].Activated ATM rapidly phosphorylates and activates downstream repair factors to directly promote their recruitment to sites of DNA damage.Perhaps, the most important event is the ATM-dependent phosphorylation of the histone variant H2AX at the C-terminal of the protein, corresponding to Ser139 ( -H2AX) [48].Other substrates for activated ATM include the proteins SMC1, NBS1, CHK2, p53, BRCA1 and MDC1 [49].Key amongst these substrates are the Chk2 kinase and p53 which act to reduce cyclin-dependent kinase (CDK) activity and arrest cells in the various stages of cell cycle to allow time of the completion of DNA repair.Following recruitment and activation of ATM, BRCA1, BRCA2 and RAD52 epistasis group proteins including XRCC2, XRCC3, RAD51B, RAD51C and RAD51D [50] are also recruited to DSBs to further transmit signals to downstream processing enzymes.The single-strand overhangs are then rapidly bound by ssDNA-binding protein replication protein A (RPA), and recruit Rad51 and Rad52 to the damaged sites [45].Loading of Rad51 onto the ssDNA tail subsequently results in the formation of ssDNA-Rad51 nucleoprotein filament, which then searches for its homologous counterpart in the corresponding intact sister chromatid.If the specific region of duplex DNA is found, strand invasion is initiated in the presence of another set of HR facilitating proteins (Rad54, Rad55, and Rad57) followed by strand exchange and joint molecule formation [45,51].Once the Holliday junctions are resolved, distal broken ends are sealed through DNA synthesis by DNA polymerase, resulting in an errorfree repair event and preserving genetic contents [44].In response to antigen or humoral stimulation, class switch recombination (CSR) further diversifies antibodies by switching their isotypes [63].CSR occurs between two switch (S) regions located upstream of C H (constant regions of immunoglobulin heavy chain).Similar to V(D)J recombination, CSR also involves DSB generation and NHEJ.Upon humoral stimulation, activation-induced cytidine deaminase (AID) deaminates deoxycytidine (dC) resulting in deoxyuracil (dU) bases on both strands of two transcriptionally active S regions [64].The dU is excised by the uracil DNA glycosylases (UNG) and the resultant abasic site is further cut by apurinic/apyrimidinic endonuclease 1/2 (APE-1/2), generating single strand breaks (SSBs).Either two adjacent SSBs on opposite strands spontaneously lead to one DSB, or the MMR machinery is triggered to convert SSB to DSB [65].Deficiency of AID, UNG, APE or any of the MMR components, including Msh2, Msh6, Mlh1, Pms2 and Exo1, leads to loss or reduction of CSR in B cells [63].After DSB formation, the NHEJ pathway is activated.The Ku70-Ku80 heterodimers bind to the DNA ends and recruit necessary proteins to process the DNA ends to facilitate the ligation mediated by Xrcc4-DNA ligase IV complex [66].CSR in Ku70 / and Ku80 / B cells is nearly ablated [67,68].Either Xrcc4 or DNA ligase IV deficiency causes significant reduction in CSR [69,70].While compatible ends are joined rapidly by canonical NHEJ components, complex lesions need substantial processing and are re-ligated slowly.In the later case, ATM, 53BP1 and MRM complex cooperate with canonical NHEJ components to mediate end-joining recombination.Disruption of ATM, 53BP1 or MRN complex in mice leads to defects in either V(D)J recombination or CSR or both [71][72][73][74].Recent studies in mouse models deficient in NHEJ core components revealed a robust alt-NHEJ pathway that utilizes microhomology to mediate the end joining in CSR [69,70].Alt-NHEJ leads to Ig locus deletion and translocation.However, the molecular mechanisms underlying alt-NHEJ are not well elucidated so far.",
      "Action of RecQ helicases on DNA recombination intermediatesTwo key intermediates of HR (homologous recombination) are the four-stranded HJ and the three-stranded D-loop.An early important observation was made that BLM and WRN selectively bind HJ structures in vitro and are capable of efficiently promoting ATP-dependent HJ branch migration through greater than 2 kb of DNA [54,55], suggesting they may act upon such four-stranded structures at blocked or collapsed replication forks to allow processing into mature recombinants.Other RecQ helicases that have been shown to efficiently unwind HJ structures include E. coli RecQ, Sgs1, RECQ1 and RECQ5 [15,16,56,57].The bacterial HJ core recognition protein RuvA inhibits HJ branch migration by BLM, WRN, RECQ1 or RECQ5 [15,16,55,58], suggesting that these RecQ helicases specifically recognize the HJ core where they initiate unwinding.",
      "This picture poses more questions than it seeks to answer.Is the grouping of the regions by product rather than by type of region correct?Given that the recombina- tion fraction between HLA-A and HLA-B is of the order of .08%,and that this is likely to represent a distance of at least hundreds of thousands of nucleotides, how are the pieces put together over such relatively long distances?Is it possible that regions of the DNA loop out, so that transcripts can be made directly from noncon- tiguous DNA sequences, the loops being held in place by small RNAs as suggested for the control of splicing by Steitz, and her colleagues [24] and by others [25]?If these small RNAs are coded for well outside the HLA region, does this provide a mechanism for control of expression of products by unlinked genes, as may be the case for one of the constituent polypeptides of the HLA-DR product?What might be the nature of the signals that control which of a multiple set of alternative regions is expressed by any given chromosome?",
      "Mamm Genome. 2006; 17:220229. [PubMed: 16518689]72. Romanoski CE, et al. Systems genetics analysis of gene-by-environment interactions in humancells. Am J Hum Genet. 2010; 86:399410. [PubMed: 20170901]73. Myers S, Freeman C, Auton A, Donnelly P, McVean G. A common sequence motif associatedwith recombination hot spots and genome instability in humans. Nature Genet. 2008; 40:11241129. [PubMed: 19165926]74. Myers S, et al. Drive against hotspot motifs in primates implicates the PRDM9 gene in meioticrecombination. Science. 2010; 327:876879. [PubMed: 20044541]75. Cordell HJ. Detecting gene-gene interactions that underlie human diseases. Nature Rev Genet. 2009; 10:392404.",
      "a The table lists proteins in which mutations have been shown to increase homologous recombination (HR), gross chromosomal rearrangements (GCRs), chromosomal instability (CIN), sister chromatid exchanges (SCEs), tri-nucleotide repeat expansions and contractions (TNR), telomere fusions (Tel fusion), or fragile telomeres (Tel fragility).A phenotype inside brackets ([ ]) indicates that it is caused by overexpression of the protein.For further details and references see Supplementary Table1.Abbreviations: DSB, double-strand break; PCNA, proliferating cell nuclear antigen; RFC, replication factor C complex; SCF, Skp1-Cdc53/Cullin-F-box.Figure 3 Intermediates and chromosome structural alterations, as observed by different techniques. (a) Replication fork stalling, as monitored by 2D-gel electrophoresis and Southern analysis in yeast (for details about the technique, see Reference 161). (b) Slower human replication forks covering shorter DNA synthesis tracks, as determined by incorporation of IdU and CldU via DNA combing (52), which permits visualization of the process of replication on DNA fibers. (c) Accumulation of double-strand breaks (DSBs) or replicative stress, as inferred by H2AX foci or by H2AX pan staining, respectively, in human cells. (d ) DSBs or ssDNA (single-stranded DNA) gaps as seen directly by nuclear \"comet tails\" via single-cell electrophoresis assays in human cells (52). (e) Sister-chromatid exchanges (SCEs), as determined by Giemsa staining in human cells (207). ( f ) Hyper-recombination, as determined by colony sectoring in yeast (5). ( g) Gross chromosomal rearrangements (GCRs), as determined by spectral karyotyping in mouse cells (118). (h) Translocations, as visualized by pulse-field gel electrophoresis in yeast (168). (i ) Fragile sites, as detected by mitotic spreads in human cells (109). ( j) Telomere fusions, as determined by CO-FISH (chromosome-orientation fluorescent in situ hybridization) in mouse cells (124). (k) Anaphase bridges, presumably resulting from unfinished replication, dicentric chromosomes, and sister-chromatid nondisjunction, as detected by fluorescence microscopy in mouse cells.Arrows indicate the specific structural alterations referred to in each panel; in panel h, closed and open arrows indicate the position where the translocated or missing parental chromosome migrate or should migrate, respectively.When necessary, a normal control is shown on top of the panel, with the exception of panel a, which is shown on the left.Detailed description of each technique can be found in the references provided.Photos are from the laboratories of A. Nussenzweig ( g), A. Losada (k), M. Blasco ( j), L. Tora (i ), and ours (all others).Abbreviations: HR, homologous recombination; NHEJ, nonhomologous end-joining.",
      "Chromosomal context of human NORsHuman NORs are positioned on the short arms of the acrocentric chromosomes that still remain unsequenced and thus missing from the current human genome draft, GRCh38.p7.Seeking an understanding of the chromosomal context of human NORs and to identify potential NOR regulatory elements, my laboratory has begun to characterize the sequences on both proximal (centromeric) and distal (telomeric) sides of the rDNA arrays (Fig. 3A; Floutsakou et al. 2013).Building on earlier reports of sequences distal and proximal to the rDNA array on HSA21 and HSA22, respectively (Worton et al. 1988;Sakai et al. 1995;Gonzalez and Sylvester 1997), 207 kb of sequence immediately proximal and 379 kb distal to rDNA arrays have been reported recently (Floutsakou et al. 2013).Consensus proximal junction (PJ) and distal junction (DJ) sequences were constructed mostly from chromosome 21 BACs (bacterial artificial chromosomes).Comparison of these sequences with BACs and cosmids derived from the other acrocentrics revealed that the PJ and DJ sequences are, respectively, 95% and 99% identical between all five acrocentric chromosomes.Conservation of DJ sequences among the acrocentrics is consistent with frequent recombination between the rDNA arrays on each of the acrocentric chromosomes (Worton et al. 1988).However, conservation of PJ sequences suggests that there must also be frequent recombination events in the interval between the centromere and rDNA arrays.Proximal sequences are almost entirely segmentally duplicated, similar to the regions bordering centromeres.Consequently, they are unlikely to contain any specific elements that would regulate the activity of the linked NOR.In contrast, the distal sequence is predominantly unique to the acrocentric short arms and is dominated by a very large inverted repeat.Each arm of the inverted repeat is >100 kb, and they share an average sequence identity of 80%.There is a large (40-kb) block of a 48base-pair (bp) satellite repeat, CER, at the distal end of the DJ (Fig. 3A).CER blocks are found distal to the rDNA on all acrocentric chromosomes, with additional pericentromeric blocks on chromosomes 14 and 22. Finally, there are two blocks of a novel 138-bp tandem repeat, ACRO138, present within the DJ.",
      "However, excluding some cases, recombinationsuppression occurs in a small genomic tractwhere these genes are located, and it doesnot extend over most of the sex chromosomepair, as occurs in mammals and birds (Bergeroand Charlesworth, 2009). It is not clear if thissuppression occurs by the presence of inversions or as a modulation of the recombinationmechanism itself, but both could be involved(Bergero and Charlesworth, 2009). Evidence ofrecombination in the SD region in sex reversalindividuals supports the second hypothesis.",
      "Figure2| The homologous and nonhomologous DNA end-joining pathways.a | Homologous recombination requires that the cell be diploid for the DNA region involved.A break on one chromosome can invade the homologous region on the homologous chromosome with the aid of radiation-sensitivity protein 51 (RAD51) and the single-stranded-binding protein, replication protein A (RPA).Subsequent steps involve DNA synthesis by DNA polymerase to copy the information from the intact chromosome, before ligating the newly synthesized region back to the chromosome that is undergoing repair (lower portion of diagram).The DNA crossovers are resolved to generate the two resulting intact duplexes (not shown).If the chromosome being copied is wild type, then the information content of the broken chromosome is restored to wild type.Other proteins that participate in this pathway include RAD54, RAD55, RAD57, breast cancer protein 1 (BRCA1) and BRCA2, and the Fanconi anaemia gene products76 .b | The NHEJ pathway starts with the binding of the Ku heterodimer (Ku70-Ku86) to the DNA ends32  .Ku is thought to then recruit the Artemis-DNA-PK cs complex, which functions as an endonuclease to trim 5 and 3 overhangs.After fill-in synthesis, Ku recruits the XRCC4-DNA-ligase-IV complex to carry out the ligation.When the Artemis-DNA-PK cs complex trims the DNA ends, nucleotides at the DNA ends are permanently lost.",
      "Though the pathway is not fullyknown it employs a number of proteins including XRCC1, PARP1 (Poly ADP Ribose22polymerase 1, DNA ligase III, Polynucleotide kinase (PNK) , Flap endonuclease 1(Fen1), Mre11, Rad50 and Nbs1 [111-113]. Homologous recombination: In homologous recombination the broken end of a DSB isjoined to its correct partner by using the information in the sister chromatid (in G2phase), homologous chromosome or a similar repeat in the DNA. This pathway isstarted by recognition of the DSB by MRN complex.",
      "Lichter P, Cremer T, Borden J, Manuelidis L, Ward DC (1988) Delineation ofindividual human chromosomes in metaphase and interphase cells by in situ suppression hybridization using recombinant DNA libraries. Hum Genet 80:2242343. Jang W, Yonescu R, Knutsen T, Brown T, Reppert T, Sirotkin K, Schuler GD, RiedT, Kirsch IR (2006) Linking the human cytogenetic map with nucleotide sequence:the CCAP clone set. Cancer Genet Cytogenet 168:89974.",
      "Figure 4 Schematic depiction of proposed mechanisms for observed intrachromosomal rearrangements.The blue and red arrows indicate the orientation of the integrated plasmid loci and the recovered mouse sequences, respectively, on the original non-rearranged chromosome (left column).All four combinations are given for an arbitrarily orientated chromosome (green line).The middle column shows how two breakpoints (lightning signs) could lead to the inversion or deletion of the encompassed chromosomal sequence (yellow-orange dual tone line) and result in a recoverable mutation in the right column.The last row indicates the two options for a transposition, in which either the transgene locus or the recovered mouse sequence is copied or excised (as indicated by the pink and light blue arrows) and integrates in the breakpoint at the other location.As mentioned above, by taking into account that for a genome rearrangement to be detected, the 5 plasmid sequence of the breakpoint in lacZ must remain intact and end immediately in front of the recovered mouse sequence, the simplest intrachromosomal mutation that could have taken place was inferred (Fig. 4).Rearrangements with breakpoints in the mouse genome on either site of the integrated plasmid concatamer, but with reversely orientated sequences, could be inversions (Fig. 4).Rearrangements in the direction of the integrated plasmids, proximal for chromosome 3 and distal for chromosome 4 (Fig. 3), with similarly orientated breakpoints in the mouse genome, could be deletions (Fig. 4).Rearrangements in the reverse direction of the integrated plasmids, with reversely orientated mouse sequences, are more complicated and might be owing to transpositions (Fig. 4).According to these schemes, half of the intrachromosomal rearrangements would have been inversions, whereas deletions and transpositions each made up one fourth (Fig. 3).Alternatively, these rearrangements could be explained by translocations involving the transgene clusters integrated on either the homolog or the other chromosome.",
      "FIGURE 3. Telomere arrays of chicken and human chromosomes: the chicken genome contains more telomere sequence than the human genome.Chicken (a) and human (b) metaphase chromosomes and interphase cells hybridized with a telomeric sequence-peptide nucleic acid (PNA)-fluorescein probe.Human and chicken slide preparations were processed, and images were captured using the same parameters.Qualitatively, the telomere-positive fluorescent signals (white spots) from chicken cells and chromosomes have greater intensity than those of human (4,6 diamidino-2-phenylindole, DAPI counterstain).",
      "In a previous study on the accumulation of spontaneous genome rearrangements in normal mice with aging, we discovered that 50% of the events were intrachromosomal, i.e., large deletions or inversions [22].In contrast, in this present study most of the rearrangements resulted from inter-chromosomal recombination, in both the Ercc1-mutant and control animals (Table 3).Previously, we used lacZ-plasmid line 60 mice with integration sites on Chromosomes 3 and 4, while in the present study line 30 mice were used with a single integration site on Chromosome 11.This indicates that the relative frequency of translocations is founder line specific and could be due to the position of the lacZ-plasmid cluster on the chromosome.Indeed, the chromosomal integration sites in line 60 mice are in the E1 region of Chromosome 3 (half way along the chromosome) and the C5 region of Chromosome 4 (two-thirds of the way along the chromosome) [22], while the integration site of founder line 30 (used in this study) is on the centromeric tip of Chromosome 11 (region A1-A2; not shown).The proximal location on Chromosome 11 prevents the detection of all but relatively small intra-chromosomal recombinations; larger events would lead to loss of the centromere and, therefore, the entire chromosome.If the orientation of the integration site in line 30, which is currently unknown, is towards the centromere, transpositions and inversions towards the distal end are the only detectable large intra-chromosomal rearrangements (for a detailed explanation of the different chromosomal events that can occur at the lacZ locus, see [22])."
    ],
    [
      "Genome Res, 2011, 21: 17691776Mattick JS, Dinger ME. The extent of functionality in the humangenome. HUGO J, 2013, 7, doi:10.1186/1877-6566-1187-1182ENCODE Project Consortium, Bernstein BE, Birney E, Dunham I,Green ED, Gunter C, Snyder M. An integrated encyclopedia of DNAelements in the human genome. Nature, 2012, 489: 5774Pheasant M, Mattick JS. Raising the estimate of functional humansequences. Genome Res, 2007, 17: 12451253Hu T, Long M, Yuan D, Zhu Z, Huang Y, Huang S. The geneticequidistance result, misreading by the molecular clock and neutraltheory and reinterpretation nearly half of a century later.",
      "This approach enables, on the one hand, studying the process ofmammalian evolution and, on the other hand, translational studies using modelorganisms of complex human phenotypes. Detection of regions conserved betweendistant species points to high functional importance of these fragments of the DNAsequence. Human and mouse developmental lines diverged about 75 million years ago, andever since evolutionary forces shaped the two genotypes in a different manner(Waterston et al. , 2002). Nevertheless, the extent of the changes is, however, smallenough for conservation of local gene order (Waterston et al. , 2002).",
      "First, the human and mouse genome projectselucidated the sequences of over 20,000 genes [Lander et al. ,2001; Venter et al. , 2001], and most are expressed in the CNS. The availability of gene sequences has allowed rapid analysis ofcandidate human disease and disorder genes and the isolation ofthe mouse homologues. Second, the application of site-specicrecombinase technology provides investigators with the opportunity to engineer genes in the mouse that will allow for thedeletion, insertion, inversion, or exchange of chromosomalDNA with high delity (for review see Branda and Dymechi,2004].",
      "In some cases, structural variations, such as copy number polymorphisms,exist (Feuk et al. , 2006); however, because of the nature of the genome assemblyprocess, these will invariably be collapsed into a single contig that does not reflectthe natural sequence. To address the technical challenges of whole-genome assembly,the human genome is released as defined builds on a quarterly basis (Lander et al. ,2001; reviewed in Chapter 4). The increasing complexity of processes that mapdata to the genome implicitly involves some lag in availability of the most currentsequence assembly.Inpractical terms, this has meant that we acquire many fragments, from a few hundredbases to a few hundred kilobases in length, of a genome that must then be assembled computationally to produce a continuous sequence. In the case of the humangenome, two unfinished draft sequences were produced by different methods, oneby the International Human Genome Sequencing Consortium (IHGSC) and one byCelera Genomics (CG). The IHGSC began with a BAC (bacterial artificial chromosome) clone-based physical map of the genome (IHGSC, 2001).4Assembling a View of theHuman GenomeColin A. M. SempleBioinformatics, MRC Human Genetics Unit, Edinburgh EH4 2XU, UK4.1 IntroductionThe miraculous birth of the draft human genome sequence took place againstthe odds. It was only made possible by parallel revolutions in the technologiesused to produce, store and analyse the sequence data, and by the development ofnew, large-scale consortia to organize and obtain funding for the work (Watson,1990). The initial flood of human sequence has subsided as the sequencing centres have sequenced genomes from other mammalian orders and beyond.",
      "THE HUMAN GENOME PROJECT IS generating vast amounts of new information at breakneck speed and causing a fundamental shift in disease research.Now with the availability of a nearly complete, high-accuracy sequence of the mouse genome (7), a new and powerful paradigm for biomedical research is established.The remarkable similarity of mouse and human genomes, in both synteny and sequence, unconditionally validates the mouse as an exceptional model organism for understanding human biology.The discovery among inbred mouse strains of defined regions of high and low genomic variation inherited primarily from two ancestral Mus subspecies (6) holds great promise to make mapping and positional cloning more rapid and feasible.Haplotype maps of inbred mouse strains combined with sophisticated delineation of their phenotypic variation and gene expression patterns will enable complex trait analysis on an unprecedented scale.This issue of Journal of Applied Physiology highlights inbred strain surveys exploring phenotypic variation in drug responses [see Crabbe et al. (1) and Watters et al. (8)  in this issue].These mouse initiatives demonstrate a viable, cost-effective alternative to human research requiring family studies, population linkage analysis, or genome-wide genotyping on a multitude of individuals for association mapping.",
      "How Many Genes are There in the Human Genome?",
      "The Landscape of Human Genome Variation",
      "In some cases, structural variations, such as copy number polymorphisms,exist (Feuk et al. , 2006); however, because of the nature of the genome assemblyprocess, these will invariably be collapsed into a single contig that does not reflectthe natural sequence. To address the technical challenges of whole-genome assembly,the human genome is released as defined builds on a quarterly basis (Lander et al. ,2001; reviewed in Chapter 4). The increasing complexity of processes that mapdata to the genome implicitly involves some lag in availability of the most currentsequence assembly.Inpractical terms, this has meant that we acquire many fragments, from a few hundredbases to a few hundred kilobases in length, of a genome that must then be assembled computationally to produce a continuous sequence. In the case of the humangenome, two unfinished draft sequences were produced by different methods, oneby the International Human Genome Sequencing Consortium (IHGSC) and one byCelera Genomics (CG). The IHGSC began with a BAC (bacterial artificial chromosome) clone-based physical map of the genome (IHGSC, 2001).4Assembling a View of theHuman GenomeColin A. M. SempleBioinformatics, MRC Human Genetics Unit, Edinburgh EH4 2XU, UK4.1 IntroductionThe miraculous birth of the draft human genome sequence took place againstthe odds. It was only made possible by parallel revolutions in the technologiesused to produce, store and analyse the sequence data, and by the development ofnew, large-scale consortia to organize and obtain funding for the work (Watson,1990). The initial flood of human sequence has subsided as the sequencing centres have sequenced genomes from other mammalian orders and beyond.",
      "Science 291:130413513. Lander ES et al (2001) Initial sequencing and analysis of the human genome. Nature 409:8609214. Engle LJ, Simpson CL, Landers JE (2006) Using high-throughput SNP technologies to study cancer. Oncogene 25:159416015. Elston RC, Anne Spence M (2006) Advances in statistical human genetics over thelast 25 years. Stat Med 25:304930806. Larson GP et al (2005) Genetic linkage of prostate cancer risk to the chromosome3 region bearing FHIT. Cancer Res 65:8058147. Botstein D, Risch N (2003) Discovering genotypes underlying human phenotypes:past successes for mendelian disease, future approaches for complex disease.McPherson JD, Marra M, Hillier L et al (2001) A physical map of the humangenome. Nature 409:93494113. Burke DT, Carle GF, Olson MV. (1987) Cloning of large segments of exogenousDNA into yeast by means of artificial chromosome vectors. Science 236:80681214. Fleischmann RD, Adams MD, White O et al (1995) Whole-genome randomsequencing and assembly of Haemophilus influenzae Rd Science 269:49651215. Arabidopsis Genome Initiative (2000) Analysis of the genome sequence of theflowering plant Arabidopsis thaliana. Nature 408:79681516.",
      "T he human genome has been cracked wide open in recent years and is spilling many of its secrets.More than 100 genome wide association studies have been conducted for scores of hu man diseases, identifying hun dreds of polymorphisms that are widely seen to influence disease risk.After many years in which the study of complex human traits was mired in false claims and methodologic inconsistencies, ge nomics has brought not only com prehensive representation of com mon variation but also welcome rigor in the interpretation of sta tistical evidence.Researchers now know how to properly account for most of the multiple hypothesis testing involved in mining the ge nome for associations, and most reported associations reflect real biologic causation.But do they matter?",
      "In some cases, structural variations, such as copy number polymorphisms,exist (Feuk et al. , 2006); however, because of the nature of the genome assemblyprocess, these will invariably be collapsed into a single contig that does not reflectthe natural sequence. To address the technical challenges of whole-genome assembly,the human genome is released as defined builds on a quarterly basis (Lander et al. ,2001; reviewed in Chapter 4). The increasing complexity of processes that mapdata to the genome implicitly involves some lag in availability of the most currentsequence assembly.Inpractical terms, this has meant that we acquire many fragments, from a few hundredbases to a few hundred kilobases in length, of a genome that must then be assembled computationally to produce a continuous sequence. In the case of the humangenome, two unfinished draft sequences were produced by different methods, oneby the International Human Genome Sequencing Consortium (IHGSC) and one byCelera Genomics (CG). The IHGSC began with a BAC (bacterial artificial chromosome) clone-based physical map of the genome (IHGSC, 2001).",
      "In some cases, structural variations, such as copy number polymorphisms,exist (Feuk et al. , 2006); however, because of the nature of the genome assemblyprocess, these will invariably be collapsed into a single contig that does not reflectthe natural sequence. To address the technical challenges of whole-genome assembly,the human genome is released as defined builds on a quarterly basis (Lander et al. ,2001; reviewed in Chapter 4). The increasing complexity of processes that mapdata to the genome implicitly involves some lag in availability of the most currentsequence assembly.Inpractical terms, this has meant that we acquire many fragments, from a few hundredbases to a few hundred kilobases in length, of a genome that must then be assembled computationally to produce a continuous sequence. In the case of the humangenome, two unfinished draft sequences were produced by different methods, oneby the International Human Genome Sequencing Consortium (IHGSC) and one byCelera Genomics (CG). The IHGSC began with a BAC (bacterial artificial chromosome) clone-based physical map of the genome (IHGSC, 2001).4Assembling a View of theHuman GenomeColin A. M. SempleBioinformatics, MRC Human Genetics Unit, Edinburgh EH4 2XU, UK4.1 IntroductionThe miraculous birth of the draft human genome sequence took place againstthe odds. It was only made possible by parallel revolutions in the technologiesused to produce, store and analyse the sequence data, and by the development ofnew, large-scale consortia to organize and obtain funding for the work (Watson,1990). The initial flood of human sequence has subsided as the sequencing centres have sequenced genomes from other mammalian orders and beyond."
    ],
    [
      "We (Hein, Schierup and Wiuf) have published a300 page book on molecular population genetics titled Gene Genealogies, Sequence Variation and Evolution OxfordUniversity Press, and are presently developing a tutorial in association mapping that we hope to publish as a booklet in2006 and are also involved in a very large EU collaboration (Holland, Denmark, Iceland and UK) to find susceptibilitygenes for breast and prostate cancer. In comparative genomics, the most fundamental investigation is to find genes in a pair of aligned genomes.",
      "Key bioinformatic steps totake a genetic study from an initial linkage or association to laboratory genotyping are illustrated. The reader should note the role of genomic sequence as a common thread through every stageregions in man (see Chapter 5). Similar issues also exist in the establishment oftrue orthology between genes in different species, where one is identified to play arole in a disease model. If two genes are truly orthologous, their evolution closelyfollows patterns of speciation (Fitch, 2000).In general terms, the approaches we describe can be applied to sequence data from any collection of organisms, but our emphasis here is primarily onBioinformatics for Geneticists, Second Edition. Edited by Michael R. Barnes2007 John Wiley & Sons, Ltd ISBN 978-0-470-02619-9 (HB) ISBN 978-0-470-02620-5 (PB)\u0002C106CH 6 COMPARATIVE GENOMICSquestions of relevance to human genetics. We begin, in Section 6.2 by presenting anoverview of genome structure and content, providing a context for the subsequentdiscussions.4Assembling a View of theHuman GenomeColin A. M. SempleBioinformatics, MRC Human Genetics Unit, Edinburgh EH4 2XU, UK4.1 IntroductionThe miraculous birth of the draft human genome sequence took place againstthe odds. It was only made possible by parallel revolutions in the technologiesused to produce, store and analyse the sequence data, and by the development ofnew, large-scale consortia to organize and obtain funding for the work (Watson,1990). The initial flood of human sequence has subsided as the sequencing centres have sequenced genomes from other mammalian orders and beyond.This fully indexed but semi-intelligibleBioinformatics for Geneticists, Second Edition. Edited by Michael R. Barnes2007 John Wiley & Sons, Ltd ISBN 978-0-470-02619-9 (HB) ISBN 978-0-470-02620-5 (PB)\u0002C4CH 1 BIOINFORMATICS CHALLENGES FOR THE GENETICISTbook of life immediately began to serve as a valuable framework for integration ofgenetic and biological data. However, knowledge of the genome sequence did notimmediately clarify the nature and structure of human genetic variation.",
      "2011) human genome reference sequence provides a basis foranalyzing short-read, whole genome, DNA sequences andFunct Integr Genomics (2012) 12:19sequencing of individuals from families segregating for aparticular phenotype (Roach et al. 2010; Dewey et al. 2011;Tian et al. 2012). The comparative analysis of the genomesequences from members of a family enabled the noise inDNA sequencing, namely, sequencing errors and genetic heterozygosity in DNA from a diploid organism, to be defined(Roach et al. 2010; Dewey et al. 2011). Through the process oftracking sequence changes/inconsistencies in inheritance fromparent to offspring, Roach et al.",
      "Characteristics of genotyping and sequencing technologies",
      "Key bioinformatic steps totake a genetic study from an initial linkage or association to laboratory genotyping are illustrated. The reader should note the role of genomic sequence as a common thread through every stageregions in man (see Chapter 5). Similar issues also exist in the establishment oftrue orthology between genes in different species, where one is identified to play arole in a disease model. If two genes are truly orthologous, their evolution closelyfollows patterns of speciation (Fitch, 2000).In general terms, the approaches we describe can be applied to sequence data from any collection of organisms, but our emphasis here is primarily onBioinformatics for Geneticists, Second Edition. Edited by Michael R. Barnes2007 John Wiley & Sons, Ltd ISBN 978-0-470-02619-9 (HB) ISBN 978-0-470-02620-5 (PB)\u0002C106CH 6 COMPARATIVE GENOMICSquestions of relevance to human genetics. We begin, in Section 6.2 by presenting anoverview of genome structure and content, providing a context for the subsequentdiscussions.4Assembling a View of theHuman GenomeColin A. M. SempleBioinformatics, MRC Human Genetics Unit, Edinburgh EH4 2XU, UK4.1 IntroductionThe miraculous birth of the draft human genome sequence took place againstthe odds. It was only made possible by parallel revolutions in the technologiesused to produce, store and analyse the sequence data, and by the development ofnew, large-scale consortia to organize and obtain funding for the work (Watson,1990). The initial flood of human sequence has subsided as the sequencing centres have sequenced genomes from other mammalian orders and beyond.This fully indexed but semi-intelligibleBioinformatics for Geneticists, Second Edition. Edited by Michael R. Barnes2007 John Wiley & Sons, Ltd ISBN 978-0-470-02619-9 (HB) ISBN 978-0-470-02620-5 (PB)\u0002C4CH 1 BIOINFORMATICS CHALLENGES FOR THE GENETICISTbook of life immediately began to serve as a valuable framework for integration ofgenetic and biological data. However, knowledge of the genome sequence did notimmediately clarify the nature and structure of human genetic variation.",
      "Key bioinformatic steps totake a genetic study from an initial linkage or association to laboratory genotyping are illustrated. The reader should note the role of genomic sequence as a common thread through every stageregions in man (see Chapter 5). Similar issues also exist in the establishment oftrue orthology between genes in different species, where one is identified to play arole in a disease model. If two genes are truly orthologous, their evolution closelyfollows patterns of speciation (Fitch, 2000).In general terms, the approaches we describe can be applied to sequence data from any collection of organisms, but our emphasis here is primarily onBioinformatics for Geneticists, Second Edition. Edited by Michael R. Barnes2007 John Wiley & Sons, Ltd ISBN 978-0-470-02619-9 (HB) ISBN 978-0-470-02620-5 (PB)\u0002C106CH 6 COMPARATIVE GENOMICSquestions of relevance to human genetics. We begin, in Section 6.2 by presenting anoverview of genome structure and content, providing a context for the subsequentdiscussions.4Assembling a View of theHuman GenomeColin A. M. SempleBioinformatics, MRC Human Genetics Unit, Edinburgh EH4 2XU, UK4.1 IntroductionThe miraculous birth of the draft human genome sequence took place againstthe odds. It was only made possible by parallel revolutions in the technologiesused to produce, store and analyse the sequence data, and by the development ofnew, large-scale consortia to organize and obtain funding for the work (Watson,1990). The initial flood of human sequence has subsided as the sequencing centres have sequenced genomes from other mammalian orders and beyond.This fully indexed but semi-intelligibleBioinformatics for Geneticists, Second Edition. Edited by Michael R. Barnes2007 John Wiley & Sons, Ltd ISBN 978-0-470-02619-9 (HB) ISBN 978-0-470-02620-5 (PB)\u0002C4CH 1 BIOINFORMATICS CHALLENGES FOR THE GENETICISTbook of life immediately began to serve as a valuable framework for integration ofgenetic and biological data. However, knowledge of the genome sequence did notimmediately clarify the nature and structure of human genetic variation.",
      "Ample time was allotted to answer questions and a copy of \"A Guide to Your Genome\" (National Human Genome Research Institute 2007) was provided to further assist participants' understanding and ability to communicate results with family members or others.",
      "Key bioinformatic steps totake a genetic study from an initial linkage or association to laboratory genotyping are illustrated. The reader should note the role of genomic sequence as a common thread through every stageregions in man (see Chapter 5). Similar issues also exist in the establishment oftrue orthology between genes in different species, where one is identified to play arole in a disease model. If two genes are truly orthologous, their evolution closelyfollows patterns of speciation (Fitch, 2000).In general terms, the approaches we describe can be applied to sequence data from any collection of organisms, but our emphasis here is primarily onBioinformatics for Geneticists, Second Edition. Edited by Michael R. Barnes2007 John Wiley & Sons, Ltd ISBN 978-0-470-02619-9 (HB) ISBN 978-0-470-02620-5 (PB)\u0002C106CH 6 COMPARATIVE GENOMICSquestions of relevance to human genetics. We begin, in Section 6.2 by presenting anoverview of genome structure and content, providing a context for the subsequentdiscussions.4Assembling a View of theHuman GenomeColin A. M. SempleBioinformatics, MRC Human Genetics Unit, Edinburgh EH4 2XU, UK4.1 IntroductionThe miraculous birth of the draft human genome sequence took place againstthe odds. It was only made possible by parallel revolutions in the technologiesused to produce, store and analyse the sequence data, and by the development ofnew, large-scale consortia to organize and obtain funding for the work (Watson,1990). The initial flood of human sequence has subsided as the sequencing centres have sequenced genomes from other mammalian orders and beyond.This fully indexed but semi-intelligibleBioinformatics for Geneticists, Second Edition. Edited by Michael R. Barnes2007 John Wiley & Sons, Ltd ISBN 978-0-470-02619-9 (HB) ISBN 978-0-470-02620-5 (PB)\u0002C4CH 1 BIOINFORMATICS CHALLENGES FOR THE GENETICISTbook of life immediately began to serve as a valuable framework for integration ofgenetic and biological data. However, knowledge of the genome sequence did notimmediately clarify the nature and structure of human genetic variation."
    ],
    [
      "In some organisms, there is no clear relationship between telomere length and lifespan.Age-related telomere attrition could not be detected in Daphnia pulex (57) or sea urchin species (Strongylocentrotus franciscanus and Lytechinus variegatus) (58).Studies in C. elegans examining natural variation in telomere length and experimentally manipulated telomere length detect no correlation with lifespan (59,60), and in Drosophila, which uses a telomerase-independent mechanism for telomere maintenance, there is a similar lack of correlation between longevity and telomere length (61).Similarly, data on sex differences in age-related telomere shortening are mixed.For example, in the ant species Lasius niger, the rate of telomere shortening is more rapid in short-lived males compared to longer-lived females.But, mean telomere length does not differ between the two types of females, queens and workers, despite the fact that queens live much longer than workers (up to 28 years vs 2-3 months) (62).These findings suggest that the question of how telomere shortening affects aging across species and how sex affects telomere attrition rates are complex.",
      "With new methodologies to assess relative telomere length by Q-PCR, studies were designed to address the impact of telomere length on aging, aging associated pathologies, and mortality.One such study has correlated shorter leukocyte telomere lengths at age 60 with a three times higher risk of heart disease and an eightfold increase in risk of infection-related death (36), thereby associating measured relative cellular aging with disease and life expectancy.In a similar way, chronic stress was shown to correlate with short leukocyte telomere length, a phenomenon attributed to higher levels of oxidative stress at the cellular level (70).More recent studies have linked telomere length in smooth muscle cells with senescence and disease severity in patients with atherosclerosis (141,150).Leukocyte telomere length was also short in a cohort of similar patients and associated with a higher risk of developing occult cardiovascular disease (71).More data are needed to understand and validate the use of leukocyte telomere length as a biomarker for cardiovascular and other diseases.",
      "Shortening of the telomeres at the ends of chromosomes has been associated with age-related disease and mortality [16][17][18].A recent study identified a common haplotype of four SNPs in the human telomerase reverse transcriptase gene (hTERT) that is enriched in centenarians and associated with longer telomere length [19].It was also shown that centenarians and their offspring maintain longer telomeres compared with controls and that longer telomeres are associated with protection from age-related diseases, better cognitive function and lipid profiles of healthy ageing [19].",
      "New research has indicated how social factors, such as subordination, may translate into biological effects (epel et al. 2004;Chae et al. 2014).In a now classic study, epel et al. ( 2004) examined the telomere lengths of fifty-eight healthy premenopausal women who either had a healthy child (n = 19) or were giving care to a chronically ill child (n = 39. )They measured perceived stress, years of caregiving, telomere length, and oxidative stress.They found highly statistically significant differences in telomere length between women taking care of chronically ill children and those who had healthy children.They found highly statistically significant negative correlations between telomere length and perceived stress and years of caregiving.Telomerase activity had highly statistically significant negative correlations with perceived stress and years of caregiving.Oxidative stress was highly positively correlated with perceived stress and years of caregiving.They concluded that the telomere length shortening was equivalent to 9 to 17 years of aging in the high stress group.Telomere length is considered a biomarker of aging (Finch and Kirkwood 2000).Thus, this study showed that caregiver stress had essentially aged these women 9 to 17 years compared to women who had healthy children.",
      "Adult studies have also found a negative correlation with baseline telomere length, suggesting a negative feedback regulation of leukocyte telomere length (Farzaneh-Far et al. 2010;Aviv et al. 2009;Epel et al. 2008;Nordfjall et al. 2009).It is possible that while our follow-up period was shorter than Shalev et al. 2013 and adult studies, which had a minimum of 5 year intervals with the exception of Puterman et al. (2015) who followed for a one-year time period, there may be biological regulation of telomere length at 4 and 5 years of age such that shorter telomeres are more robustly maintained, whereas longer telomeres have greater rates of decline, over a short period of one year.It is unlikely that this relationship is due to assay error or regression to the mean given the consistency of our findings across studies.We have had similar findings of longer telomeres having greater rates of decline and shorter telomeres being maintained in our different studies (Farzaneh-Far et al. 2010;Epel et al. 2008;Puterman et al. 2015).The single, consistent predictor of the rate of telomere attrition shown in multiple adult and the few child longitudinal studies is the baseline measurement of telomere length at the start of each study.This suggests the importance of understanding predictors of telomere length prior to adulthood, as it determines in part the rate of change (Revesz et al. 2014a, b;Nordfjall et al. 2009).Moreover, longitudinal studies in adults have had found that telomere attrition rate is dependent on baseline telomere length independent of any phenotypic predictors of shortening, such as disease or demographic variables (Nordfjall et al. 2009), attesting to the importance of studies to evaluate risk factors for shortening prior to adulthood.Rates of decline in childhood may be particularly relevant for later chronic disease risk as shorter telomere length has been implicated in disease progression through exposure to cellular senescence, inflammatory cytokines and adipocyte hypertrophy (Raschenberger et al. 2015;Willeit et al. 2014;Monickaraj et al. 2012;Fyhrquist et al. 2013).Telomere attrition in adultsAdditional early childhood telomere studies are also needed so as to better interpret disease across the lifespan.Specifically, retrospective adult studies have pointed to the importance of early life exposures, finding associations of shorter telomere length in adulthood with perinatal complications, and stressful and deprived early childhood environments, including factors associated with lower socioeconomic status and those of physical or social neglect (Drury et al. 2012;Shalev et al. 2014;Tyrka et al. 2010, Kananen et al. 2010).Some adult studies have found that predictors of shorter telomere length involving inflammation and oxidative stress exposures such as depression are only observed in younger adults compared with middle aged and older ones (Philips et al. 2013), suggesting it may be harder to tease out risk factors for accelerated attrition as, firstly, these processes happen early in life, and, secondly, repeat exposures to inflammation and oxidative stress may statistically plateau out across older age groups.While a strong maternal heritability to telomere length is widely reported (Broer et al. 2013), with overall heritability of telomere length estimated to be 64 % (95 % CI 39-83 %) (Hjelmborg et al. 2015a, b), the heritability of attrition rate is poorly understood.Hjelmborg et al. recently published data on studies of monozygotic and dizygotic twins and calculated heritability of attrition rate to be 28 % (95 % CI 16-44 %), less than the heritability of telomere length, suggesting a sizable environmental component.However, this twin study emphasized the importance of understanding environmental factors at birth and in the first years of life to better understand telomere dynamics in adulthood, as environmental exposures in adulthood played a small role in explaining adult rate of telomere loss.Additionally, studies suggest that the genetic variants associated with leucocyte telomere length in adults do not determine leukocyte telomere length in children.Different genetic determinants impact child compared with adult telomere length (Stathopoulou et al. 2015).Stathopoulou et al. suggest that single nucleotide polymorphisms (SNPs), associated with generally minor effects on telomere length in adulthood, may determine adult telomere maintenance processes versus different SNPs which are responsible for regulating telomere attrition in childhood.Abstract Telomeres are the protective complexes at the end of chromosomes, required for genomic stability.Little is known about predictors of attrition in young children or the relationship between parental and child patterns of telomere change.Telomere length was assessed twice over one year, at 4 and at 5 years of age, in Latino preschool children (n = 77) and their mothers (n = 70) in whole blood leukocytes.Maternal and child rates of attrition during the same time period were compared in 70 mother-child pairs.More children showed lengthened telomeres over one year compared to their mothers and very few children showed attrition (2.6 %).Approximately 31 % of children and 16 % of mothers displayed lengthening over one year while 66 % of children showed maintenance in contrast with 74 % of mothers.The strongest predictor for child telomere length change was child's baseline telomere length (r = 0.61,p < 0.01).Maternal rate of change was associated with child rate of change (r = 0.33, p < 0.01).After controlling for child baseline telomere length, the relationship between child and maternal rate of change trended towards significance (Coeff = 0.20, 95 % CI 0.03 to 0.43; p = 0.08).",
      "Blackburn and Epel, a health psychologist who did original research on how specific lifestyle and psychological habits can protect telomeres, published The Telomere Effect (Blackburn & Epel, 2017), in which they suggested that individuals with shorter telomeres developed diseases earlier in life (a shorter \"disease span\").What follows is the evidence from these authors, their colleagues, and other researchers describing how length of telomeres contributes to mind-body connection and healthy longevity.",
      "As early as at the time of birth, each of the 92 telomeres of the human genome has its own characteristic length.Additionally, each telomere shortens by its individual attrition rate.In general, longer telomeres at birth are associated with higher age-dependent attrition rates and vice versa.Overall, telomere shortening appears more dynamic in males.It is generally accepted that telomeres shorten during DNA replication both in vitro and in vivo.In individuals, short telomeres are considered to be a sign of advanced age.Cawthon and coworkers (2003) showed that telomere shortening in humans likely contributes to mortality, supporting the hypothesis that they might act as a mitotic clock (Allsopp et al., 1992).Telomere length dynamics, however, does not seem to Abstract.During aging, telomeres are gradually shortened, eventually leading to cellular senescence.By T/C-FISH (telomere/centromere-FISH), we investigated human telomere length differences on single chromosome arms of 205 individuals in different age groups and sexes.For all chromosome arms, we found a linear correlation between telomere length and donor age.Generally, males had shorter telomeres and higher attrition rates.Every chromosome arm had its individual age-specifi c telomere length and erosion pattern, resulting in an unexpected heterogeneity in chromosomespecifi c regression lines.This differential erosion pattern, however, does not seem to be accidental, since we found a correlation between average telomere length of single chromosome arms in newborns and their annual attrition rate.Apart from the above-mentioned sex-specifi c discrepancies, chromosome arm-specifi c telomere lengths were strikingly similar in men and women.This implies a mechanism that arm specifi cally regulates the telomere length independent of gender, thus leading to interchromosomal telomere variations.In conclusion, these data suggest that with increasing physical and genetic length of whole chromosomes, the corresponding telomeres also tend to be longer and that recombination rate and telomere length are inversely proportional.In conclusion, a combination of overall and chromosomespecifi c shorter telomeres and more pronounced age-dependent telomere erosion could be observed in males.There is a prospective clinical study strongly suggesting that longer telomeres decrease the risk of dying (Cawthon et al., 2003).With this in mind, the telomere length discrepancies between the sexes may indeed be a factor infl uencing the differences in their life expectancy.In every chromosome a linear decline of telomere length with age was observed, being more pronounced in men independent of the examined chromosome arm.This might suggest that telomere length on single chromosome arms may be infl uenced by the same factors which determine overall telomere length.S. Mayer a S. Brderlein a S. Perner a I. Waibel a A. Holdenried a N. Ciloglu a C. Hasel a T. Mattfeldt a K.V. Nielsen b P. Mller a a Institute of Pathology, University of Ulm, Ulm (Germany); b DakoCytomation A/S, Glostrup (Denmark) follow uniformity.In previous studies, sex-specifi c differences in telomere length and attrition rate of men and women were found (Benetos et al., 2001;Cawthon et al., 2003;Nawrot et al., 2004), suggesting gender differences in behavior of telomeres.In individual chromosome arms, telomere length was also shown not to be homogeneous (Lansdorp et al., 1996;Benn, 1997;Martens et al., 1998;Surralles et al., 1999;Hao and Tan, 2001;Londono-Vallejo et al., 2001;Graakjaer et al., 2003), some telomeres being signifi cantly shorter, others longer than the average length.To date, these characteristics in telomere lengths could not be set in a biological context, as only a few groups have provided detailed information about chromosome-specifi c patterns of telomere distribution (Lansdorp et al., 1996;Graakjaer et al., 2003).Whether accumulation of short telomeres (Martens et al., 2000;Londono-Vallejo et al., 2001) or rather the shortest telomere of one specifi c chromosome arm (Hemann et al., 2001) elicits senescence, remains an open question so far.In recent literature, there are hints that the average telomere length may be higher in women and that their annual shortening rate may be somewhat lower (Vaziri et al., 1993;Rufer et al., 1998;Jeanclos et al., 2000), but these reported differences failed to reach statistical signifi cance except for one study (Jeanclos et al., 2000).Here, we provide compelling evidence that this is indeed the case.",
      "Shortening of the telomeres at the ends of chromosomes has been associated with age-related disease and mortality [16][17][18].A recent study identified a common haplotype of four SNPs in the human telomerase reverse transcriptase gene (hTERT) that is enriched in centenarians and associated with longer telomere length [19].It was also shown that centenarians and their offspring maintain longer telomeres compared with controls and that longer telomeres are associated with protection from age-related diseases, better cognitive function and lipid profiles of healthy ageing [19]."
    ]
  ],
  "task_id": [
    "2F8796A8C3DC633F00DB901C9BA396DA",
    "DEE6D385D1B01B4155AA4ABE59515893",
    "10ABD2210053119B18D94F1FE266E73E",
    "C52A9690417093A861C669A0753689BD",
    "93DE2EF005059DFEA5A7FBBA3BD17D03"
  ]
}