{ "question": [ "Once a sperm combines with an egg, what determines how traits are passed onto the resulting lifeform?", "Can you explain what a ribosomal binding site at a high level and make it accessable to a non-expert?", "Once a sperm combines with an egg, what determines how traits are passed onto the resulting lifeform?", "Can you explain the difference between sequencing with short reads vs long reads? Please make you answer accessible to a non-expert.", "Can you explain why using a pangenome-based reference might be more useful than simply using a single linear reference? Please make you answer accessible to a non-expert." ], "answer": [ "The traits are determined by the combination of genes from both the sperm and the egg. These genes are located on the chromosomes that each parent contributes. The process of meiosis ensures that each gamete (sperm or egg) contributes one chromosome from each pair, resulting in the offspring having a full set of 23 chromosome pairs. Additionally, certain parental genetic effects, such as maternal transcripts deposited into the egg prior to fertilization, can influence early embryonic development and result in differences in traits depending on the genotype of the mother.", "A ribosomal binding site is a specific sequence on a molecule of mRNA (messenger RNA) that the ribosome recognizes and binds to when it's time to make a protein. Think of it like a 'start' sign that tells the ribosome where to begin translating the mRNA into a protein. This process is part of how our bodies use genetic information to build the proteins we need to function.", "Traits are passed onto the resulting lifeform through the process of meiosis, where each gamete (sperm and egg) contributes one chromosome to each pair in the zygote. This process results in a shuffling of genetic material, known as recombination or crossing over, which is a significant cause of genetic variation among offspring. Additionally, certain traits can be influenced by maternal transcripts deposited into the egg prior to fertilization, leading to differences in early embryonic development depending on the genotype of the mother.", "Sequencing with short reads and long reads refers to the length of DNA segments that are sequenced in one go. Short-read sequencing, like that done by Illumina, generates many small pieces of DNA sequence, typically around 100-150 base pairs long. This method is relatively inexpensive and produces a high volume of data, but can struggle with complex parts of the genome.\n\nOn the other hand, long-read sequencing, like that done by Pacific Biosciences (PacBio) or Oxford Nanopore Technologies (ONT), sequences much longer pieces of DNA, sometimes up to 100,000 base pairs. This can provide more complete information about the genome and can handle complex regions better. However, it tends to have a higher error rate and is more expensive.\n\nIn simple terms, imagine trying to solve a jigsaw puzzle: short-read sequencing gives you many small pieces, which can be harder to put together, especially in complex regions. Long-read sequencing gives you fewer, but much larger pieces, which can make the puzzle easier to solve, but might be more costly and have more mistakes.", "Using a single linear reference, like a specific genome, can limit the scope of genetic variation we can study. It assumes that all genetic variations align neatly with this reference, which isn't always the case in reality. A pangenome-based reference, on the other hand, includes multiple genomes and thus captures a wider range of genetic variations. This can help us better understand and study the complexity of genetic diversity." ], "contexts": [ [ "Selection could occur at multiple levels, from germ cell generation and propagation to fertilization and early embryonic growth.Chromosomal abnormalities, including aneuploidy, were found in 10-20% of spermatozoa and oocytes (20) and in the cleaved embryo, with a 21% rate of abnormalities in preimplantation embryos (21).These findings led to a model for natural selection against chromosome abnormalities (21).Selection extends to the end of gestation: Only approximately 30% of all conceptions result in a live birth, with more than half of aborted fetuses containing chromosomal abnormalities (22), a number likely to be an underestimate because of technological limitations in measuring all possible mutations.But even in the very small fraction of germ cell duos that survive this withering genome attack and result in a live birth, a number of severe de novo mutations will still be found (23).The data on gross chromosomal alterations suggest that overall, mutation frequency early in life is very high.The functional consequence, however, is limited because of selection.Somewhat surprisingly, this picture points toward an initial decline in genomic alterations, allowing the adult individual to acquire a somatic genome optimally equipped to provide function.", "We now need to investigate the candidates identified here and how their effects on parental and offspring traits are integrated into thegene networks determining individual development. By controlling for genetic variation in eithermothers or offspring we have been able to show that levels of maternal provisioning and offspringsolicitation are unique to specific genotypes (here each BXD line) and that solicitation is costly.", "In most plants and animals, sexis a necessary component of reproduction, and the question for evolutionary biologistsis why reproductive mechanisms have evolved that way. In one of the experimentsdescribed next, evolutionary geneticists have nevertheless devised a way to compareevolution with and without recombination in the obligately sexual fruit fly.Sex brings harmful alleles together into thesame genetic background, allowing selection to more efficiently purge them fromthe population and potentially producing some offspring that are fitter than eitherparent. However, the benefit of recombining deleterious mutations may depend on thenature of the epistatic interactions between them. The mutational deterministic hypothesis(Kondrashov 1988) depends partly on this epistasis.Longago, Wright (1931) noted that sex may destroy adaptation because a successful combination of characteristics is attained in individuals only to be broken up in the next generation by the mechanisms of meiosis itself. Similarly, if alleles at different loci werejointly responsible for the production of phenotypes, sex has the potential to break apartcoadapted gene complexes, as it moves alleles away from genetic backgrounds wherebeneficial epistatic interactions have evolved through natural selection. Why should sex therefore be so common, given the obvious costs?This disparity in investment is the basis for the twofold cost: asexualfemales hypothetically could transmit twice as many alleles at the same cost. In most plants and animals, mates tend to be unrelated, leading to outcrossing. Butsex usually also involves the basic process of physical recombination: the breakage andreunion of two different DNA or RNA molecules. Of these two processes, recombinationis clearly the more widespread feature of sexual reproduction. A variety of reproductivesystems, such as selfing and automixis, involve recombination but not outcrossing. Incontrast, relatively few reproductive systems have outcrossing without recombination.", "The reason for the rarity of these mutations is natural selection: If the mutations result in disorders that decrease health and reproductive fitness, they will eventually be eliminated from a population.In exceptional cases, mutations may cause both beneficial and detrimental consequences, resulting in opposing forces of positive selection and negative selection that may cause the mutations to be preserved at nonrare frequencies in a population.For example, the HbS mutation in the HBB gene (which produces the subunit of hemoglobin) causes sickle cell disease when present in both alleles, a detrimental consequence, but protects against malaria when present in 1 allele, a beneficial consequence, ensuring that the mutation persists in populations in areas of the world where malaria is endemic.Genes are passed from parents to offspring via the process of meiosis by which gametes, the egg cells in the mother and the sperm cells in the father, are generated.Ordinarily, each cell has 23 pairs of chromosomes; the gametes have 23 unpaired chromosomes.In meiosis, the 23 pairs are split so that each gamete receives 1 chromosome from each pair (Figures 8 and 9).Two gametes (egg and sperm) ultimately join into a single cell, the zygote, which has the full complement of 23 chromosome pairs restored.If all goes well, the zygote gives rise to a live offspring.Crossing over-The swapping of genetic material that occurs in the germline.During the formation of egg and sperm cells, also known as meiosis, paired chromosomes from each parent align so that similar DNA sequences from the paired chromosomes cross over one another.Crossing over results in a shuffling of genetic material and is an important cause of the genetic variation seen among offspring.This process is also known as meiotic recombination.Recombination (meiotic recombination)-The swapping of genetic material that occurs in the germline.During the formation of egg and sperm cells, also known as meiosis, paired chromosomes from each parent align so that similar DNA sequences from the paired chromosomes recombine with one another.Recombination results in a shuffling of genetic material and is an important cause of the genetic variation seen among offspring.Also known as crossing over.", "In most plants and animals, sexis a necessary component of reproduction, and the question for evolutionary biologistsis why reproductive mechanisms have evolved that way. In one of the experimentsdescribed next, evolutionary geneticists have nevertheless devised a way to compareevolution with and without recombination in the obligately sexual fruit fly.Sex brings harmful alleles together into thesame genetic background, allowing selection to more efficiently purge them fromthe population and potentially producing some offspring that are fitter than eitherparent. However, the benefit of recombining deleterious mutations may depend on thenature of the epistatic interactions between them. The mutational deterministic hypothesis(Kondrashov 1988) depends partly on this epistasis.Longago, Wright (1931) noted that sex may destroy adaptation because a successful combination of characteristics is attained in individuals only to be broken up in the next generation by the mechanisms of meiosis itself. Similarly, if alleles at different loci werejointly responsible for the production of phenotypes, sex has the potential to break apartcoadapted gene complexes, as it moves alleles away from genetic backgrounds wherebeneficial epistatic interactions have evolved through natural selection. Why should sex therefore be so common, given the obvious costs?This disparity in investment is the basis for the twofold cost: asexualfemales hypothetically could transmit twice as many alleles at the same cost. In most plants and animals, mates tend to be unrelated, leading to outcrossing. Butsex usually also involves the basic process of physical recombination: the breakage andreunion of two different DNA or RNA molecules. Of these two processes, recombinationis clearly the more widespread feature of sexual reproduction. A variety of reproductivesystems, such as selfing and automixis, involve recombination but not outcrossing. Incontrast, relatively few reproductive systems have outcrossing without recombination.", "Aberrant recombination patterns on chromosomes that have missegregated have also been identified as an important factor, in both male and female gametes (Table I).This is because recombination together with cohesion of sister chromatids establish the unique 'bivalent' chromosome structure where homologous partner chromosomes are tethered together, a configuration that is critical for their accurate segregation in meiosis I (Fig. 2A).The remarkable feature is that recombination occurs in foetal oocytes whereas chromosome segregation takes place decades later (Fig. 2A).Since mammalian oocytes are arrested at the G2/M transition (or dictyate stage), this raises the intriguing question of how the bivalent is maintained until the meiotic divisions.", "However, we expect thatonly at this level, the most signicant contributions brought by integrating epigenetics will bemade. Concluding Remarks and FutureProspectsFish sex ratios are the result of a complex interaction between genetic, biochemical, and environmental interactions. The ultimate resultof these interactions at the individual level isgender: male or female. However, at the population level, the combination of sex determination and differentiation sets the sex ratio. Inturn, sex ratios dene the reproductive capacityof populations and, if sex growth dimorphismexists, also the growth characteristics, something very important in an aquaculture context.Traditionally, it has been agreed that thenal sex of an individual (phenotypic sex)depends on two sequential processes: the sexdetermination system of the species and thegonad differentiation process (Valenzuela,2008). However, recently, these two seeminglydistinct processes are viewed as part of a general process leading to gonad formation andsex ratios (Sarre et al. , 2004; Quinn et al. , 2011;Uller and Helantera, 2011).", "Obehav is, in turn, influenced by offspring genesand environment (Ogene and Oenvir respectively). Hence, indirect genetic effects (blue arrows)and direct genetic effects (red arrow) are important influencers of behaviour. B) Parentoffspring conflict theory predicts that parental resource investment and offspring solicitationbehaviours are influenced by the fitness benefit to a focal individual (O), cost to a socialpartner such as a sibling (S1 and S2) or parent (P), and by their coefficient of relatedness(black arrows). 42Figure 2: Genomic imprinting can result in divergent phenotypes from the samegenotype. A) A paternally imprinted gene, i.e. maternally expressed.", "Therefore, the resulting phenotypic patterns lag a generationbehind the genetic transmission of the causal variants. The most well-studied parental genetic effectsare caused by deposition of maternal transcripts into the egg prior to fertilization, resulting indifferences in early embryonic development depending on the genotype of the mother. Certain geneshave also been shown to respond to maternal influence after birth through genetically definedmaternal behaviors (Weaver et al. , 2004).Because of the small contribution, through the sperm, ofthe paternal transcriptome to the fertilized zygote, and because of the stronger maternal contributionto child rearing in most model organisms, parental effects are typically thought of as synonymous withmaternal effects, although true paternal effects are known to exist (Rando, 2012). Maternal effects have been shown to be important during embryonic development, leading todifferences in the birth weight of mice depending on the genotype of the mother (Cowley et al. ,1989; Wolf et al. , 2011).", "It was believed by many that for each trait variant we should expect to find acorresponding genetic change, or gene for that trait. Through historical happenstance therelationship between genes and traits was set up and treated as if it were one-to-one. But theproduction of a trait involves not only genes, but also their interactions with each other and theenvironment, and chance." ], [ "SpliceAid2, a tool that allows us to know which proteins can bind to the desired RNA sequence, was used to investigate the effects of studied SNPs in protein binding affinity.This tool is entirely based on the human true splicing site and experimentally assessed target motif.The results of mutation effects on splicing binding in spliceAid2 database demonstrate information on creation of splice sites or strengthening of cryptic splice sites and provide information on intron retention, appearance, and disappearance of new alternative splice site forms (Piva et al., 2012).", "The variations of binding sites can be collected from knowntarget genes, mutagenesis studies (Hallikas et al. , 2006), phylogenetic shadowing (orthologous binding sites in different species) (Ostrin et al. , 2006), and in vitro SELEXexperiments (Liu and Stormo, 2005). Several recent technologies, such as SELEXSAGE (Roulet et al. , 2002) and protein-binding microarray (PBM) (Mukherjee et al. ,2004), allow the determination of binding specificity in a high throughput manner. The profile or motif of binding sites can then be described with a consensussequence.An alternative mechanism for translation initiation has been identified thatdoes not obey the first AUG rule; this involves cap-independent internal264CH 11 PREDICTIVE FUNCTIONAL ANALYSIS OF POLYMORPHISMSribosome binding mediated by a Y-shaped secondary structure, denoted the internal ribosome entry site (IRES), located in the 5 UTR of 510 per cent ofhuman mRNA molecules (see Le and Maizel, 1997, for a review of these elements). IRES elements are complex stem loop structures, and there is no reliablesequence consensus to allow prediction of the possible functional effects of polymorphisms in these elements.", "Based on structural modeling of the observedprotein sequences, the authors suggest that the glycine-rich conserved regions offerflexibility to the protein in the form of a wobble (Riede et al. 1987; Drexler et al. 1989). A number of recent studies have explicitly investigated the issue of specialist-generalisttrade-offs. For example, Turner and Elena (2000) evolved replicate populations of anRNA virus on novel hosts using a single novel host or alternating novel hosts. Theauthors observed improvements in fitness on the novel hosts.", "5.5.3 Candidates in Qrr1d and Possible Link with Local Protein SynthesisTransfer RNAs are direct biological partners of the ARS, and the cluster oftRNAs in the highly polymorphic intergenic region of Qrr1d is an enticing candidate. Inaddition to their role in shuttling amino acids, tRNAs also act as sensors of cellular aminoacid levels and regulate transcription of genes involved in amino acid metabolism and theARS (Ryckelynck et al. , 2005). There is tissue specificity in the expression of differenttRNA isoforms (Dittmar et al.", "The elucidation of the structure of the ribonuclease inhibitor, a protein containing 15 LRRs,revealed that the LRR sequence is a structural motif,each consisting of a b-strand connected by variableloops to a helical or extended part of about 24 aminoacid residues in length with a conserved 11-residuesegment corresponding to the b-strand and adjacentCorrespondence: Dr Antoni Matilla Duenas, Institute of Child Health, University College London, 30 Guilford Street, London WC1N 1EH, UK. Fax: +44(0)207 905 2301.", "The elucidation of the structure of the ribonuclease inhibitor, a protein containing 15 LRRs,revealed that the LRR sequence is a structural motif,each consisting of a b-strand connected by variableloops to a helical or extended part of about 24 aminoacid residues in length with a conserved 11-residuesegment corresponding to the b-strand and adjacentCorrespondence: Dr Antoni Matilla Duenas, Institute of Child Health, University College London, 30 Guilford Street, London WC1N 1EH, UK. Fax: +44(0)207 905 2301.", "Based on structural modeling of the observedprotein sequences, the authors suggest that the glycine-rich conserved regions offerflexibility to the protein in the form of a wobble (Riede et al. 1987; Drexler et al. 1989). A number of recent studies have explicitly investigated the issue of specialist-generalisttrade-offs. For example, Turner and Elena (2000) evolved replicate populations of anRNA virus on novel hosts using a single novel host or alternating novel hosts. Theauthors observed improvements in fitness on the novel hosts.", "Figure 4. Structure of the ribosomal DNA gene cluster in S. cerevisiae.The location of the rDNA cluster on chromosome XII is shown at the top, with the telomere (TEL) and centromere (CEN) indicated.A detailed view of an rDNA repeat unit is shown below.The 35S and 5S rRNA gene-coding regions are indicated, as is the rDNA origin of replication (rARS).The RFB (red box) is bound by Fob1p (pink).The locations of the 35S promoter and the bidirectional noncoding promoter E-pro (blue box), silenced by Sir2p, are indicated.", "An alternative mechanism for translation initiation has been identified thatdoes not obey the first AUG rule; this involves cap-independent internal264CH 11 PREDICTIVE FUNCTIONAL ANALYSIS OF POLYMORPHISMSribosome binding mediated by a Y-shaped secondary structure, denoted the internal ribosome entry site (IRES), located in the 5 UTR of 510 per cent ofhuman mRNA molecules (see Le and Maizel, 1997, for a review of these elements). IRES elements are complex stem loop structures, and there is no reliablesequence consensus to allow prediction of the possible functional effects of polymorphisms in these elements.The variations of binding sites can be collected from knowntarget genes, mutagenesis studies (Hallikas et al. , 2006), phylogenetic shadowing (orthologous binding sites in different species) (Ostrin et al. , 2006), and in vitro SELEXexperiments (Liu and Stormo, 2005). Several recent technologies, such as SELEXSAGE (Roulet et al. , 2002) and protein-binding microarray (PBM) (Mukherjee et al. ,2004), allow the determination of binding specificity in a high throughput manner. The profile or motif of binding sites can then be described with a consensussequence.", "Schluenzen F, Tocilj A, Zarivach R, Harms J, Glyehmann M, Janell D, Bashan A,Bartels H, Agmon I, Franceschi F, Yonath A (2000) Structure of functionally activated small ribosomal subunit at 3.3 Angstroms resolution. Cell 102:61562319. Hendrickson WA (1991) Determination of macromolecular structures from anomalous diffraction of synchrotron radiation. Science 254:515820. Yang C, Pflugrath JW, Courville DA, Stence CN, Ferrara JD (2003) Away fromthe edge: SAD phasing from the sulfur anomalous signal measured in-house withchromium radiation. Acta Crystallogr D59(11):1943195721. Kissinger CR, Gehlhaar DK, Smith BA, Bouzida D (2001) Molecular replacementby evolutionary search.Using this technique, the 6xHis-tagged ExoS ADP-ribosyltransferaseprotein was shown to bind 14-3-3 in a direct fashion (56). The two purifiedproteins were mixed together, and were subjected to nickel-charged affinitychromatography to isolate 6xHis-protein complexes. To quantify the interaction, the amount of 14-3-3 eluted from the 6xHis-ExoS resin was determined. It was found that essentially molar equivalents of ExoS and 14-3-3 werereversibly bound to the affinity matrix while a ligand binding mutant of 14-3-3or egg albumin was detected primarily in the unbound phase (56).Proc Natl Acad Sci USA91:9022902630. Wilson DS, Keefe AD, Szostak JW (2001) The use of mRNA display to selecthigh-affinity protein-binding peptides. Proc Natl Acad Sci USA 98:3750375531. Hanes J, Jermutus L, Weber-Bornhauser S, Bosshard HR, Plckthun A (1998)Ribosome display efficiently selects and evolves high-affinity antibodies in vitrofrom immune libraries. Proc Natl Acad Sci USA 95:141301413532. Fields S, Song O (1989) A novel genetic system to detect protein-protein interactions. Nature 340:24524633. Ma J, Ptashne M (1988) Converting a eukaryotic transcriptional inhibitor into anactivator. Cell 55:44344634.Khler F, Mller KM (2003) Adaptation of the Ras-recruitment system to the analysisof interactions between membrane-associated proteins. Nucleic Acids Res 31:e2841. Osborne MA, Dalton S, Kochan JP (1995) The yeast tribrid system-geneticdetection of trans-phosphorylated ITAM-SH2-interactions. Biotechnology (NY)13:1474147842. Zhang J, Lautar S (1996) A yeast three-hybrid method to clone ternary proteincomplex components. Anal Biochem 242:687243. Licitra EJ, Liu JO (1996) A three-hybrid system for detecting small lig-and-proteinreceptor interactions. Proc Natl Acad Sci USA 93:128171282144. Tafelmeyer P, Johnsson N, Johnsson K (2004) Transforming a (beta/alpha)8-barrelenzyme into a split-protein sensor through directed evolution.", "The variations of binding sites can be collected from knowntarget genes, mutagenesis studies (Hallikas et al. , 2006), phylogenetic shadowing (orthologous binding sites in different species) (Ostrin et al. , 2006), and in vitro SELEXexperiments (Liu and Stormo, 2005). Several recent technologies, such as SELEXSAGE (Roulet et al. , 2002) and protein-binding microarray (PBM) (Mukherjee et al. ,2004), allow the determination of binding specificity in a high throughput manner. The profile or motif of binding sites can then be described with a consensussequence.An alternative mechanism for translation initiation has been identified thatdoes not obey the first AUG rule; this involves cap-independent internal264CH 11 PREDICTIVE FUNCTIONAL ANALYSIS OF POLYMORPHISMSribosome binding mediated by a Y-shaped secondary structure, denoted the internal ribosome entry site (IRES), located in the 5 UTR of 510 per cent ofhuman mRNA molecules (see Le and Maizel, 1997, for a review of these elements). IRES elements are complex stem loop structures, and there is no reliablesequence consensus to allow prediction of the possible functional effects of polymorphisms in these elements.", "Gribskov M, Robinson NL: Use of receiver operating characteristic (ROC) analysis to evaluate sequence matching. Computerand Chemistry 1996, 20:25-33. Ren B, Robert F, Wyrick JJ, Aparicio O, Jennings EG, Simon I, Zeitlinger J, Schreiber J, Hannett N, Kanin E, Volkert TL, Wilson CJ, Bell SP,Young RA: Genome-wide location and function of DNA binding proteins. Science 2000, 290(5500):2306-2309. Iyer VR, Horak CE, Scafe CS, Botstein D, Snyder M, Brown PO:Genomic binding sites of the yeast cell-cycle transcriptionfactors SBF and MBF. Nature 2001, 409(6819):533-538.", "The variations of binding sites can be collected from knowntarget genes, mutagenesis studies (Hallikas et al. , 2006), phylogenetic shadowing (orthologous binding sites in different species) (Ostrin et al. , 2006), and in vitro SELEXexperiments (Liu and Stormo, 2005). Several recent technologies, such as SELEXSAGE (Roulet et al. , 2002) and protein-binding microarray (PBM) (Mukherjee et al. ,2004), allow the determination of binding specificity in a high throughput manner. The profile or motif of binding sites can then be described with a consensussequence.An alternative mechanism for translation initiation has been identified thatdoes not obey the first AUG rule; this involves cap-independent internal264CH 11 PREDICTIVE FUNCTIONAL ANALYSIS OF POLYMORPHISMSribosome binding mediated by a Y-shaped secondary structure, denoted the internal ribosome entry site (IRES), located in the 5 UTR of 510 per cent ofhuman mRNA molecules (see Le and Maizel, 1997, for a review of these elements). IRES elements are complex stem loop structures, and there is no reliablesequence consensus to allow prediction of the possible functional effects of polymorphisms in these elements." ], [ "(1234567890)Complex trait variation in natural and experimental populations is due to specific DNA sequence polymorphisms, environmental effects, and the interactions between these factors(Johannes et al. 2009). Testis weight is a complex trait thatholds direct implications for reproductive success, as developmental abnormalities can lead to irregular sperm production and infertility in adulthood (Sharpe 2001). Variation intestis size has been linked to environmental factors such associal dominance, social organization, and seasonal changesacross numerous species.", "Selection could occur at multiple levels, from germ cell generation and propagation to fertilization and early embryonic growth.Chromosomal abnormalities, including aneuploidy, were found in 10-20% of spermatozoa and oocytes (20) and in the cleaved embryo, with a 21% rate of abnormalities in preimplantation embryos (21).These findings led to a model for natural selection against chromosome abnormalities (21).Selection extends to the end of gestation: Only approximately 30% of all conceptions result in a live birth, with more than half of aborted fetuses containing chromosomal abnormalities (22), a number likely to be an underestimate because of technological limitations in measuring all possible mutations.But even in the very small fraction of germ cell duos that survive this withering genome attack and result in a live birth, a number of severe de novo mutations will still be found (23).The data on gross chromosomal alterations suggest that overall, mutation frequency early in life is very high.The functional consequence, however, is limited because of selection.Somewhat surprisingly, this picture points toward an initial decline in genomic alterations, allowing the adult individual to acquire a somatic genome optimally equipped to provide function.", "It is a cornerstone of modern biology that a purposeful genetic program drives all biological processes that occur from conception to reproductive maturation.But, once reproductive maturation is reached, thought is divided in respect to whether the aging process is a continuation of the genetic program or whether it is the result of random losses in molecular fidelity.Despite the claim by many researchers to the contrary, there is no direct evidence that genes drive age changes.I will discuss how genes are involved in the finitude of life subsequently.", "In most plants and animals, sexis a necessary component of reproduction, and the question for evolutionary biologistsis why reproductive mechanisms have evolved that way. In one of the experimentsdescribed next, evolutionary geneticists have nevertheless devised a way to compareevolution with and without recombination in the obligately sexual fruit fly.Sex brings harmful alleles together into thesame genetic background, allowing selection to more efficiently purge them fromthe population and potentially producing some offspring that are fitter than eitherparent. However, the benefit of recombining deleterious mutations may depend on thenature of the epistatic interactions between them. The mutational deterministic hypothesis(Kondrashov 1988) depends partly on this epistasis.Longago, Wright (1931) noted that sex may destroy adaptation because a successful combination of characteristics is attained in individuals only to be broken up in the next generation by the mechanisms of meiosis itself. Similarly, if alleles at different loci werejointly responsible for the production of phenotypes, sex has the potential to break apartcoadapted gene complexes, as it moves alleles away from genetic backgrounds wherebeneficial epistatic interactions have evolved through natural selection. Why should sex therefore be so common, given the obvious costs?This disparity in investment is the basis for the twofold cost: asexualfemales hypothetically could transmit twice as many alleles at the same cost. In most plants and animals, mates tend to be unrelated, leading to outcrossing. Butsex usually also involves the basic process of physical recombination: the breakage andreunion of two different DNA or RNA molecules. Of these two processes, recombinationis clearly the more widespread feature of sexual reproduction. A variety of reproductivesystems, such as selfing and automixis, involve recombination but not outcrossing. Incontrast, relatively few reproductive systems have outcrossing without recombination.", "Crossing over-The swapping of genetic material that occurs in the germline.During the formation of egg and sperm cells, also known as meiosis, paired chromosomes from each parent align so that similar DNA sequences from the paired chromosomes cross over one another.Crossing over results in a shuffling of genetic material and is an important cause of the genetic variation seen among offspring.This process is also known as meiotic recombination.The reason for the rarity of these mutations is natural selection: If the mutations result in disorders that decrease health and reproductive fitness, they will eventually be eliminated from a population.In exceptional cases, mutations may cause both beneficial and detrimental consequences, resulting in opposing forces of positive selection and negative selection that may cause the mutations to be preserved at nonrare frequencies in a population.For example, the HbS mutation in the HBB gene (which produces the subunit of hemoglobin) causes sickle cell disease when present in both alleles, a detrimental consequence, but protects against malaria when present in 1 allele, a beneficial consequence, ensuring that the mutation persists in populations in areas of the world where malaria is endemic.Genes are passed from parents to offspring via the process of meiosis by which gametes, the egg cells in the mother and the sperm cells in the father, are generated.Ordinarily, each cell has 23 pairs of chromosomes; the gametes have 23 unpaired chromosomes.In meiosis, the 23 pairs are split so that each gamete receives 1 chromosome from each pair (Figures 8 and 9).Two gametes (egg and sperm) ultimately join into a single cell, the zygote, which has the full complement of 23 chromosome pairs restored.If all goes well, the zygote gives rise to a live offspring.Recombination (meiotic recombination)-The swapping of genetic material that occurs in the germline.During the formation of egg and sperm cells, also known as meiosis, paired chromosomes from each parent align so that similar DNA sequences from the paired chromosomes recombine with one another.Recombination results in a shuffling of genetic material and is an important cause of the genetic variation seen among offspring.Also known as crossing over.", "In most plants and animals, sexis a necessary component of reproduction, and the question for evolutionary biologistsis why reproductive mechanisms have evolved that way. In one of the experimentsdescribed next, evolutionary geneticists have nevertheless devised a way to compareevolution with and without recombination in the obligately sexual fruit fly.Sex brings harmful alleles together into thesame genetic background, allowing selection to more efficiently purge them fromthe population and potentially producing some offspring that are fitter than eitherparent. However, the benefit of recombining deleterious mutations may depend on thenature of the epistatic interactions between them. The mutational deterministic hypothesis(Kondrashov 1988) depends partly on this epistasis.Longago, Wright (1931) noted that sex may destroy adaptation because a successful combination of characteristics is attained in individuals only to be broken up in the next generation by the mechanisms of meiosis itself. Similarly, if alleles at different loci werejointly responsible for the production of phenotypes, sex has the potential to break apartcoadapted gene complexes, as it moves alleles away from genetic backgrounds wherebeneficial epistatic interactions have evolved through natural selection. Why should sex therefore be so common, given the obvious costs?This disparity in investment is the basis for the twofold cost: asexualfemales hypothetically could transmit twice as many alleles at the same cost. In most plants and animals, mates tend to be unrelated, leading to outcrossing. Butsex usually also involves the basic process of physical recombination: the breakage andreunion of two different DNA or RNA molecules. Of these two processes, recombinationis clearly the more widespread feature of sexual reproduction. A variety of reproductivesystems, such as selfing and automixis, involve recombination but not outcrossing. Incontrast, relatively few reproductive systems have outcrossing without recombination.", "However, we expect thatonly at this level, the most signicant contributions brought by integrating epigenetics will bemade. Concluding Remarks and FutureProspectsFish sex ratios are the result of a complex interaction between genetic, biochemical, and environmental interactions. The ultimate resultof these interactions at the individual level isgender: male or female. However, at the population level, the combination of sex determination and differentiation sets the sex ratio. Inturn, sex ratios dene the reproductive capacityof populations and, if sex growth dimorphismexists, also the growth characteristics, something very important in an aquaculture context.Traditionally, it has been agreed that thenal sex of an individual (phenotypic sex)depends on two sequential processes: the sexdetermination system of the species and thegonad differentiation process (Valenzuela,2008). However, recently, these two seeminglydistinct processes are viewed as part of a general process leading to gonad formation andsex ratios (Sarre et al. , 2004; Quinn et al. , 2011;Uller and Helantera, 2011).", "Obehav is, in turn, influenced by offspring genesand environment (Ogene and Oenvir respectively). Hence, indirect genetic effects (blue arrows)and direct genetic effects (red arrow) are important influencers of behaviour. B) Parentoffspring conflict theory predicts that parental resource investment and offspring solicitationbehaviours are influenced by the fitness benefit to a focal individual (O), cost to a socialpartner such as a sibling (S1 and S2) or parent (P), and by their coefficient of relatedness(black arrows). 42Figure 2: Genomic imprinting can result in divergent phenotypes from the samegenotype. A) A paternally imprinted gene, i.e. maternally expressed.", "Because of the small contribution, through the sperm, ofthe paternal transcriptome to the fertilized zygote, and because of the stronger maternal contributionto child rearing in most model organisms, parental effects are typically thought of as synonymous withmaternal effects, although true paternal effects are known to exist (Rando, 2012). Maternal effects have been shown to be important during embryonic development, leading todifferences in the birth weight of mice depending on the genotype of the mother (Cowley et al. ,1989; Wolf et al. , 2011).Therefore, the resulting phenotypic patterns lag a generationbehind the genetic transmission of the causal variants. The most well-studied parental genetic effectsare caused by deposition of maternal transcripts into the egg prior to fertilization, resulting indifferences in early embryonic development depending on the genotype of the mother. Certain geneshave also been shown to respond to maternal influence after birth through genetically definedmaternal behaviors (Weaver et al. , 2004).", "It was believed by many that for each trait variant we should expect to find acorresponding genetic change, or gene for that trait. Through historical happenstance therelationship between genes and traits was set up and treated as if it were one-to-one. But theproduction of a trait involves not only genes, but also their interactions with each other and theenvironment, and chance." ], [ "One of the major drawbacks in NGS technologies is the relatively small reads obtained, which are up to 500pb length in pyrosequencing (Roche 454 genome sequencer) and roughly 50pb for Illumina Gene Analyzer.In order to generate new sequencing technologies that boost the size of sequencing reads, great efforts have been made to develop single-molecule sequencing (SMS) platforms.In SMS sequencing, synthesis is accomplished by single DNA molecule arrays without the amplification step performed in NGS sequencing platforms.This no-amplification procedure allows an increase in the number of DNA molecules tested to improve the throughput, the absence of PCR reduces the price of sequencing, and this technology also increases the length of sequencing reads [12][13][14].", "However, the quality and completeness of sequencesshould be considered when undertaking an analysis. For a finished sequence, theaccuracy is expected to be high; with less than one nucleotide error per 100 000 nucleotides and fewer than one insertion/deletion error per 200 000 nucleotides, the vastmajority of which are located in tandemly repetitive sequence (IHGSC, 2004), andthere should be no gaps in sequence coverage. The quality of draft sequences dependsto a large degree on the depth of coverage.", "Thesignificantly longer error-corrected SMRT reads as comparedto other available technologies are expected to enable sequencing through complex repeats, close genomic gaps, allowhaplotype phasing, and catalog alternative splicing isoformsthat are generally out of reach for second-generation sequencing technologies. There are still many challenges ahead forthis third-generation sequencing technology particularly inincreasing sequencing accuracy and throughput. Lee Hood focused the challenge of high throughputdata centers on signal to noise issues as he addressedthe advances that have occurred in dealing with analyzing the complexity of biological systems (Tian et al. 2012).Michael Schatz and Tim Smith reported on the use of SMRTDNA along with other second-generation technologies tosequence and assemble various small genomes includingEscherichia coli strains and budding yeast. SMRT long readshave a mean length of 12 kb for most genomes tested withthe longest reads reaching up to 15 kb. The read accuracy isabout 85% with the vast majority of the errors correspondingto base insertions, ranging from one to several bases, followedby deletions and mismatches.", "The improvements associated with 3rdGS technologies have led to advances in read quality but significant advances are required to overcome some limitations, such as fresh starting material or intact cells and, moreover, issues associated with high sequencing error rate Long-read sequencing technologies are particularly efficient in de novo genome assembly [62] or complex structural genomic sequencing.SMRT, for example, allowed the sequencing of long stretches of CGG as a Mycobacterium tuberculosis genome, characterized by high redundancy (65.6% of GC bases) or short tandem repeats (STPs) implicated in X fragile syndrome (FXS) [63,64].", "On the other hand, short-insertpaired-end capability enables high-resolutionre-sequencing, transcriptome analysis (RANSeq) and miRNA discovery. Comparison of Illumina Sequencingand 454 PyrosequencingBoth Illumina and 454 Roche/Pyrosequencinghave been used in different rainbow troutgenome projects, which showed the pros andcons of each technology. The main differences are read length and throughput: Illumina sequencing generates relatively short sequence reads of approximately 100 bp, but withenormously high throughput of approximately20 Gb; in contrast, 454 sequencing generatesrelatively long reads of approximately 400 bp,but its throughput is only approximately 0.4 Gb.Based upon different choices ofsequencing technology, the sequencing yieldsand read lengths vary. Currently, three main next-generationsequencing platforms are widely used in theRNA-Seq, the 454, Illumina and ABI SOLiD. Among these platforms, the throughput variesfrom hundreds of thousands of reads for the454 system to hundreds of millions of readsfor the Illumina and ABI SOLiD systems(Marguerat and Bahler, 2010). The readlengths typically range from 30100 bp forIllumina and SOLiD to 200500 bp for 454.In general, Illumina and SOLiD platforms arerelatively inexpensive, while the 454 technologyoffers longer reads, but is more expensive perrun. Illumina, SOLiD and 454 technologiescan be combined in a hybrid assemblystrategy: short reads that are sequenced at agreater depth are assembled into contigs, andlong reads are subsequently used to scaffoldthe contigs and resolve variants (Martin andWang, 2011). Two main approaches can be used for RNASeq data analysis. One way is to map the resulting reads to a reference genome or referencetranscriptome. This is usually taken in wellstudied species with sequenced genome.The most problematic compromise of NGS isthe relatively shorter read length of 35400 bpas compared with 7501000 bp of thecapillary-based sequencing. Discussions on thischapter will mainly focus on the 454 Roche/pyrosequencing and the Illumina sequencingtechnologies that have been used in rainbowtrout studies. Detailed description of NGStechnologies can be found in the literature(Mardis, 2008b, 2008a; Shendure and Ji, 2008;Zhou et al. , 2010; Kircher and Kelso, 2010). 454 Roche/PyrosequencingSequencing workow of the 454 Roche GSFLX and GS Junior platforms consists of thefollowing basic steps (Figure 11.1A).These technologiesallow sequencing of millions of reads in a relatively short time (Voelkerding et al. , 2009). The assembly and sequencing analysis, as wellas the whole process, require signicant computational and economic investment, but thehigh volume of data allows a full genome tobe sequenced in a relatively short time. Severaldifferent platforms are available, including 454Life Sciences (Roche), Solexa (Illumina), Solid(Applied Biosystems), and single molecule sequencing (Helicos Biosciences). All platformsshare a common approach, involving massiveparallel sequencing of DNA molecules thatare spatially separated. (Brenner et al.Technologically, 454 Roche/Pyrosequencing ismore suitable for de novo characterization ofnonmodel genomes/transcriptomes, and Illumina is more suitable for re-sequencing knowngenomes, digital gene expression, and miRNAdiscovery. However, such technological capability difference is complicated by the cost associated with the sequencing platforms. Forinstance, Illumina HiSeq 2000 instrument cangenerate 20 Gb sequence per sequencing lanethat cost only $30004000, while one run of 454sequencing can only generate 400 million bpthat cost over $10,000.With theseNGS technologies, libraries are constructedand sequenced from amplied fragments ofsingle-stranded DNA. The most signicantadvantage of NGS technology is saving thetime-consuming, laborious, and equipmentassociated steps of cloning and colony pickingof the Sanger-based capillary sequencing. Thesecond most important advantage of NGS isthe tremendous increase in sequencing outputin terms of the number of reads (14800million reads) and total bases per run (0.4300 Gbp) compared with 96 reads of 7501000 bp each in the capillary-based sequencing.Although shorter reads produced by Illumina or SOLiD compared with the 454 technology may be more challenging for de novosequence assembly, the preexisting ESTs produced by Sanger sequencing can be used tofacilitate the assembly (Liu et al. , 2011), andthe algorithms for short reads de novo assembly are being developed (e.g. , Grabherret al. , 2011). Xiang et al. assembled the shortreads from Illumina RNA-Seq deep sequencing to generate the nonredundant consensuswhich is subsequently used as references forDGE prole analysis (Xiang et al. , 2010).", "There are currently three general WGS strategies (Figure 2): (1) short-read WGS using the Illumina technology, which currently yields paired-end 150 bp reads with low error rates in the range of 0.1%-0.5%;(2) long-read WGS using singlemolecule technologies from Pacific Biosciences (PacBio) or Oxford Nanopore Technologies (ONT), which yield 10-100 kb reads-and occasionally much longer-with high error rates in the range of 10%-15%; and (3) linked-read WGS using the technology from 10X Genomics, which generates barcoded Illumina short-reads from longer molecules (e.g., 50 kb).Due to considerations of cost, ease of use, and accuracy, the overwhelming majority of human genetics studies employ short-read WGS using the Illumina HiSeq or NovaSeq platform, and we therefore focus primarily on analysis of this data type.", "Currently, there are two main types of long-read technologies: single-molecule real-time sequencing approaches and synthetic approaches that rely on existing shortread technologies to construct long reads in silico.The single-molecule approaches differ from short-read approaches in that they do not rely on a clonal population of amplified DNA fragments to generate detectable Following cluster generation or bead deposition onto a slide, fragments are sequenced by ligation, in which a fluorophore-labelled two-base-encoded probe, which is composed of known nucleotides in the first and second positions (dark blue), followed by degenerate or universal bases (pink), is added to the DNA library.The two-base probe is ligated onto an anchor (light purple) that is complementary to an adapter (red), and the slide is imaged to identify the first two bases in each fragment.Unextended strands are capped by unlabelled probes or phosphatase to maintain cycle synchronization.Finally, the terminal degenerate bases and the fluorophore are cleaved off the probe, leaving a 5 bp extended fragment.The process is repeated ten times until two out of every five bases are identified.At this point, the entire strand is reset by removing all of the ligated probes and the process of probe binding, ligation, imaging and cleavage is repeated four times, each with an n + 1, n + 2, n + 3 or n + 4 offset anchor.b | Complete Genomics.DNA is sequenced using the combinatorial probe -anchor ligation (cPAL) approach.After DNA nanoball deposition, an anchor complementary to one of four adapter sequences and a fluorophore-labelled probe are bound to each nanoball.The probe is degenerate at all but the first position.The anchor and probe are then ligated into position and imaged to identify the first base on either the 3 or the 5 side of the anchor.Next, the probe-anchor complex is removed and the process begins again with the same anchor but a different probe with the known base at the n + 1 position.This is repeated until five bases from the 3 end of the anchor and five bases from the 5 end of the anchor are identified.Another round of hybridization occurs, this time using anchors with a five-base offset identifying an additional five bases on either side of the anchor.Finally, this whole process is repeated for each of the remaining three adapter sequences in the nanoball, generating 100 bp paired-end reads.Figure 5 | Real-time and synthetic long-read sequencing approaches.A | Real-time long-read sequencing platforms.Aa | Single-molecule real-time (SMRT) sequencing from Pacific Biosciences (PacBio).Template fragments are processed and ligated to hairpin adapters at each end, resulting in a circular DNA molecule with constant single-stranded DNA (ssDNA) regions at each end with the double-stranded DNA (dsDNA) template in the middle.The resulting 'SMRTbell' template undergoes a size-selection protocol in which fragments that are too large or too small are removed to ensure efficient sequencing.Primers and an efficient 29 DNA polymerase are attached to the ssDNA regions of the SMRTbell.The prepared library is then added to the zero-mode waveguide (ZMW) SMRT cell, where sequencing can take place.To visualize sequencing, a mixture of labelled nucleotides is added; as the polymerase-bound DNA library sits in one of the wells in the SMRT cell, the polymerase incorporates a fluorophore-labelled nucleotide into an elongating DNA strand.During incorporation, the nucleotide momentarily pauses through the activity of the polymerase at the bottom of the ZMW, which is being monitored by a camera.Ab | Oxford Nanopore Technologies (ONT).DNA is initially fragmented to 8-10 kb.Two different adapters, a leader and a hairpin, are ligated to either end of the fragmented dsDNA.Currently, there is no method to direct the adapters to a particular end of the DNA molecule, so there are three possible library conformations: leader -leader, leaderhairpin and hairpin-hairpin.The leader adapter is a double-stranded adapter containing a sequence required to direct the DNA into the pore and a tether sequence to help direct the DNA to the membrane surface.Without this leader adapter, there is minimal interaction of the DNA with the pore, which prevents any hairpin-hairpin fragments from being sequenced.The ideal library conformation is the leader-hairpin.In this conformation the leader sequence directs the DNA fragment to the pore with current passing through.As the DNA translocates through the pore, a characteristic shift in voltage through the pore is observed.Various parameters, including the magnitude and duration of the shift, are recorded and can be interpreted as a particular k-mer sequence.As the next base passes into the pore, a new k-mer modulates the voltage and is identified.At the hairpin, the DNA continues to be translocated through the pore adapter and onto the complement strand.This allows the forward and reverse strands to be used to create a consensus sequence called a '2D' read.B | Synthetic long-read sequencing platforms.Ba | Illumina.Genomic DNA templates are fragmented to 8-10 kb pieces.They are then partitioned into a microtitre plate such that there are around 3,000 templates in a single well.Within the plate, each fragment is sheared to around 350 bp and barcoded with a single barcode per well.The DNA can then be pooled and sent through standard short-read pipelines.Bb | 10X Genomics' emulsion-based sequencing.With as little as 1 ng of starting material, the GemCode can partition arbitrarily large DNA fragments, up to ~100 kb, into micelles (also called 'GEMs') along with gel beads containing adapter and barcode sequences.The GEMs typically contain ~0.3 copies of the genome and 1 unique barcode out of 750,000.Within each GEM, the gel bead dissolves and smaller fragments of DNA are amplified from the original large fragments, each with a barcode identifying the source GEM.After sequencing, the reads are aligned and linked together to form a series of anchored fragments across a span of ~50 kb.Unlike the Illumina system, this approach does not attempt to get full end-to-end coverage of a single DNA fragment.Instead, the reads from a single GEM are dispersed across the original DNA fragment and the cumulative coverage is derived from multiple GEMs with dispersed -but linked -reads.Part Aa is adapted from REF.18, Nature Publishing Group.Part Ba is adapted from REF.62.Synthetic long-reads.Unlike true sequencing platforms, synthetic long-read technology relies on a system of barcoding to associate fragments that are sequenced on existing short-read sequencers 61 .These approaches partition large DNA fragments into either microtitre wells or an emulsion such that very few molecules exist in each partition.Within each partition the template fragments are sheared and barcoded.This approach allows for sequencing on existing short-read instrumentation, after which data are split by barcode and reassembled with the knowledge that fragments sharing barcodes Barcodes A series of known bases added to a template molecule either through ligation or amplification.After sequencing, these barcodes can be used to identify which sample a particular read is derived from.Template fragments are processed and ligated to hairpin adapters at each end, resulting in a circular DNA molecule with constant single-stranded DNA (ssDNA) regions at each end with the double-stranded DNA (dsDNA) template in the middle.The resulting 'SMRTbell' template undergoes a size-selection protocol in which fragments that are too large or too small are removed to ensure efficient sequencing.Primers and an efficient 29 DNA polymerase are attached to the ssDNA regions of the SMRTbell.The prepared library is then added to the zero-mode waveguide (ZMW) SMRT cell, where sequencing can take place.To visualize sequencing, a mixture of labelled nucleotides is added; as the polymerase-bound DNA library sits in one of the wells in the SMRT cell, the polymerase incorporates a fluorophore-labelled nucleotide into an elongating DNA strand.During incorporation, the nucleotide momentarily pauses through the activity of the polymerase at the bottom of the ZMW, which is being monitored by a camera.Ab | Oxford Nanopore Technologies (ONT).DNA is initially fragmented to 8-10 kb.Two different adapters, a leader and a hairpin, are ligated to either end of the fragmented dsDNA.Currently, there is no method to direct the adapters to a particular end of the DNA molecule, so there are three possible library conformations: leader -leader, leaderhairpin and hairpin-hairpin.The leader adapter is a double-stranded adapter containing a sequence required to direct the DNA into the pore and a tether sequence to help direct the DNA to the membrane surface.Without this leader adapter, there is minimal interaction of the DNA with the pore, which prevents any hairpin-hairpin fragments from being sequenced.The ideal library conformation is the leader-hairpin.In this conformation the leader sequence directs the DNA fragment to the pore with current passing through.As the DNA translocates through the pore, a characteristic shift in voltage through the pore is observed.Various parameters, including the magnitude and duration of the shift, are recorded and can be interpreted as a particular k-mer sequence.As the next base passes into the pore, a new k-mer modulates the voltage and is identified.At the hairpin, the DNA continues to be translocated through the pore adapter and onto the complement strand.This allows the forward and reverse strands to be used to create a consensus sequence called a '2D' read.B | Synthetic long-read sequencing platforms.Ba | Illumina.Genomic DNA templates are fragmented to 8-10 kb pieces.They are then partitioned into a microtitre plate such that there are around 3,000 templates in a single well.Within the plate, each fragment is sheared to around 350 bp and barcoded with a single barcode per well.The DNA can then be pooled and sent through standard short-read pipelines.Bb | 10X Genomics' emulsion-based sequencing.With as little as 1 ng of starting material, the GemCode can partition arbitrarily large DNA fragments, up to ~100 kb, into micelles (also called 'GEMs') along with gel beads containing adapter and barcode sequences.The GEMs typically contain ~0.3 copies of the genome and 1 unique barcode out of 750,000.Within each GEM, the gel bead dissolves and smaller fragments of DNA are amplified from the original large fragments, each with a barcode identifying the source GEM.After sequencing, the reads are aligned and linked together to form a series of anchored fragments across a span of ~50 kb.Unlike the Illumina system, this approach does not attempt to get full end-to-end coverage of a single DNA fragment.Instead, the reads from a single GEM are dispersed across the original DNA fragment and the cumulative coverage is derived from multiple GEMs with dispersed -but linked -reads.Part Aa is adapted from REF. 18, Nature Publishing Group.Part Ba is adapted from REF. 62.There are currently two systems available for generating synthetic long-reads: the Illumina synthetic long-read sequencing platform (FIG.5c) and the 10X Genomics emulsion-based system (FIG.5d).The Illumina system (formerly Moleculo) partitions DNA into a microtitre plate and does not require specialized instrumentation.However, the 10X Genomics instruments (GemCode and Chromium) use emulsion to partition DNA and require the use of a microfluidic instrument to perform pre-sequencing reactions.With as little as 1 ng of starting material, the 10X Genomics instruments can partition arbitrarily large DNA fragments, up to ~100 kb, into micelles called 'GEMs' , which typically contain 0.3 copies of the genome and one unique barcode.Within each GEM, a gel bead dissolves and smaller fragments of DNA are amplified from the original large fragments, each with a barcode identifying the source GEM.After sequencing, the reads are aligned and linked together to form a series of anchored fragments across the span of the original fragment.Unlike the Illumina system, this approach does not attempt gapless, end-to-end coverage of a single DNA fragment.Instead it relies on linked reads, in which dispersed, small fragments that are derived from a single long molecule share a communal barcode.Although these fragments leave segments of the original large molecule without any coverage, the gaps are overcome by ensuring that there are many long fragments from the same genomic region in the initial preparation, thus generating a read cloud wherein linked reads from each long fragment can be stacked, combining their individual coverage into an overall map (FIG.5d).Comparison of single-molecule and synthetic longread sequencing. There is growing interest in the field of long-read sequencing, and each system has its own advantages and drawbacks (TABLE 1).Currently, the most widely used instrument in long-read sequencing is the PacBio RS II instrument.This device is capable of generating single polymerase reads in excess of 50 kb with average read lengths of 10-15 kb for a long-insert library.Such properties are ideal for de novo genome assembly applications 63 , for revealing complex longrange genomic structures 64 and for full-length transcript sequencing.There are, however, several notable limitations.The single-pass error rate for long reads is as high as 15% with indel errors dominating 65 , raising concerns about the utility of the instrument 66 .Fortunately, these errors are randomly distributed within each read and hence sufficiently high coverage can overcome the high error rate 67 .The use of a circular template by PacBio also provides a level of error correction.The more frequently a single molecule is sequenced, the higher the resulting accuracy -up to ~99.999% for insert sequences derived from at least 10 subreads 59,68 .This high accuracy rivals that of Sanger sequencing, leading researchers to speculate that this technology can be used in a manner analogous to Sanger-based SNP validation 65 .The runtimes and throughput of this instrument can be tuned by controlling the length of time for which the sensor monitors the ZMW; longer templates require longer times.For example, a 1 kb library that is run for 1 hour will generate around 7,500 bases of sequence per molecule, with an average of 8 passes, whereas a 4-hour run will generate around 30,000 bases per molecule and ~30 passes.Conversely, a 10 kb library requires a 4-hour run to generate ~30,000 bases with ~3 passes.The limited throughput and high costs of PacBio RS II (around $1,000 per Gb), in addition to the need for high coverage, place this instrument out of reach of many small laboratories.However, in an attempt to ameliorate these concerns, PacBio has launched the Sequel System, which reportedly has a throughput 7 that of the RS II, thus halving the cost of sequencing a human genome at 30 coverage 69 .Single-end and paired-end sequencingIn single-end sequencing, a DNA template is sequenced only in one direction.In paired-end sequencing, a DNA template is sequenced from both sides; the forward and reverse reads may or may not overlap.A deviation in the expected genome alignment between two ends of a paired-end read can indicate astructural variation." ], [ "A good starting point is with the databases called Integrated Microbial Genomes (IMG; http://img.jgi.doe. gov/) or National Center for Biotechnology Information (NCBI; http://www.ncbi.nlm.nih.gov/). Many genomes identified inIMG as belonging to the Genome Encyclopedia of Bacteria and Archaea (GEBA; http://jgi.doe.gov/our-science/science-programs/microbial-genomics/phylogenetic-diversity/) project are relatively unstudied. In Step 2, either the instructor or the student chooses an annotation approach. A few of the annotation questions that one might ask using GENI-ACT are these: An initial automated gene call is made following sequencing. Is the automated gene call accurate?", "The GO Consortium coordinates an effort to maximize the utility of a large and representative set of key genomes, which we refer to as reference genomes.The Reference Genome project has two aspects: (i) to encourage complete and precise annotations of the proteins for the species widely used as model organisms; and (ii) to provide inferred annotations for proteins for which no experimental data are available [4].We describe here the homology-based method and software we have developed to achieve those goals.", "Although comparative immunogenomic analyses clearly benefit from highly quality WG assemblies, costs may still prevent large-scale analyses involving many species.Nevertheless, the availability of high-quality reference genomes for select species within target clades can enable much more accurate assembly and annotation of other species using SRseq, providing a way forward whilst minimising costs (e.g., [65], Figure 2, Key figure).", "Each of these has advantages anddisadvantages, and, although the main genome portals are generally consistent, theymay not give the same answers in every case. Bearing in mind that only the first two ofthese gene location methods are based on stable (almost) unambiguous information,it is better to use at least two ways to define and store the results: for example, a sectionof raw sequence and a gene name, or a primary accession number and a set of genomecoordinates. The BACE1 gene will be used as an example of a known gene to locate.This problem can be overcomeby generating several multiple sequence alignments, one with each of the sequencesunder study as the reference. This solution is time-consuming, raises the additionalproblem of integrating results between alignments, and exposes the second majordrawback to the reference sequence approach; that is, the potential for inconsistencieswhen using alternate sequences as the reference. A solution to the problems presented by reference sequence-based alignment andanalysis has been proposed in the form of a threaded blockset (Blanchette et al. ,2004).", "Toachieve this goal, we integrated and make available big reference datain chapters 2 and 3, bridged model organism to human data in chapter4, translated generic methods into clinical applications in chapters 5and 6, and developed a platform to bring innovations into practice inchapter 7. The resources currently available are already plentiful, and both theamount and types of molecular life science data is growing at a tremendous pace.Interpretation can be sped up using the huge amount of useful information collected by laboratories, public databases and biobanks. Unfortunately, for now, all these sources of useful data cannot be easily integrated and explored in unison. Further, while many innovative analysismethods emerge from research on a regular basis, a lack of standardization makes it dicult to adopt, share, compare and validate them inpractice. Here we report a lightweight framework for genome interpretationpipelines that aims to enable rapid implementation and adaptation ofanalysis protocols that integrate reference annotation data (e.g. ClinVar, ExAC, GoNL), run best-practice analysis tools (e.g.", "Links to external resources, including genome browsers, sequence databases, protein structure databases, and many other webresources, make it possible to place results in a rich bioinformatic context. It is practical to integrate data across allof these domains and scales for the simple reason that datahave been collected from a common genetic referencepanel of strains. The integration of data across domains in the RI strainsalso allows us to easily study pleiotropic effects of variability in gene expression.", ", 2012), or the 1001 Genomes Project for A. thaliana (1001 Genomes Consortium,2016), may even be better suited for similar analyses. In thecase of humans, however, it is almost impossible to simultaneously phenotype individuals and sample multi-tissue andmulti-omic data, while controlling the environmental sources ofvariation. Assessing the use of these tools may require cohortsthat have extensive multi-omics datasets available or have relevant samples biobanked, e.g. , the Framingham Heart Study(Mahmood et al. , 2014). Imputation of gene expression in deeptissues from either reference transcriptome datasets (Gamazonet al.", "The use oftrue strain-specific genomes for read alignment, rather thanthe reference genome or imputed pseudogenomes, willpose new analytical challenges. It will also offer theopportunity to capture biological signals which are notapparent in the present framework. One remaining gap in the CC infrastructure is the lackof a centralized, public platform for sharing and integrating phenotype data on CC lines. The Mouse PhenomeDatabase (http://phenome.jax.org/) (Grubb et al.The genome assembly and annotations are made available to the community via many onlinegenome browsers, the most popular of which are hosted bythe University of California at Santa Cruz [UCSC, (Karolchik et al. 2014)] and Ensembl (Flicek et al. 2013). Use ofa single haploid reference sequence as an anchor for allstudies of genetic variation in mouse offers many practicaladvantages. But the dependency on a reference genomerequires several assumptions about the nature of geneticvariation which may be violated in practicethe strongestof which is that of genomic collinearity (i.e.", "Each of these has advantages anddisadvantages, and, although the main genome portals are generally consistent, theymay not give the same answers in every case. Bearing in mind that only the first two ofthese gene location methods are based on stable (almost) unambiguous information,it is better to use at least two ways to define and store the results: for example, a sectionof raw sequence and a gene name, or a primary accession number and a set of genomecoordinates. The BACE1 gene will be used as an example of a known gene to locate.This problem can be overcomeby generating several multiple sequence alignments, one with each of the sequencesunder study as the reference. This solution is time-consuming, raises the additionalproblem of integrating results between alignments, and exposes the second majordrawback to the reference sequence approach; that is, the potential for inconsistencieswhen using alternate sequences as the reference. A solution to the problems presented by reference sequence-based alignment andanalysis has been proposed in the form of a threaded blockset (Blanchette et al. ,2004).", "The large number of bioinformatic tools that have beenmade available to scientists during the last few years has presented theproblem of which to use and how best to obtain scientifically valid answers(3). In this chapter, we will provide a guide for the most efficient way toanalyze a given sequence or to collect information regarding a gene, protein,structure, or interaction of interest by applying current publicly available software and databases that mainly use the World Wide Web.", "The genome assemblyand annotations are made available to the community via many online genome browsers, themost popular of which are hosted by the University of California at Santa Cruz [UCSC,(Karolchik et al. 2014)] and Ensembl (Flicek et al. 2013). Use of a single haploid referencesequence as an anchor for all studies of genetic variation in mouse offers many practicaladvantages. But the dependency on a reference genome requires several assumptions aboutthe nature of genetic variation which may be violated in practicethe strongest of which isthat of genomic collinearity (i.e.Detailed knowledge of the subspecies contributions to CC genomes, obtained by integratingCC lines haplotype mosaics with data from the Mouse Phylogeny Viewer, will be critical tothis effort. Author ManuscriptAuthor ManuscriptMost of the resources discussed in this review ultimately depend on the mouse referencegenome. A high-quality, well-annotated reference assembly for any model organism isextremely valuable for the research community. In addition to the genomic sequence itself, areference genome provides a backbone for annotation and a common coordinate system toanchor genetic maps.", "Each of these has advantages anddisadvantages, and, although the main genome portals are generally consistent, theymay not give the same answers in every case. Bearing in mind that only the first two ofthese gene location methods are based on stable (almost) unambiguous information,it is better to use at least two ways to define and store the results: for example, a sectionof raw sequence and a gene name, or a primary accession number and a set of genomecoordinates. The BACE1 gene will be used as an example of a known gene to locate.This problem can be overcomeby generating several multiple sequence alignments, one with each of the sequencesunder study as the reference. This solution is time-consuming, raises the additionalproblem of integrating results between alignments, and exposes the second majordrawback to the reference sequence approach; that is, the potential for inconsistencieswhen using alternate sequences as the reference. A solution to the problems presented by reference sequence-based alignment andanalysis has been proposed in the form of a threaded blockset (Blanchette et al. ,2004).", "The general conclusion of this comparison is thus that co-assembly and co-binning approaches would be useful for retrieving substantially more genomes in relatively long (>5) subject-specific time series, whereas the potential advantage of retrieving more low-abundance species in a cross-sectional co-assembly is overcome by the disadvantage of having to use more complex approaches such as DESMAN to resolve the strain variation.That is perhaps more appropriate where the aim is to extract as much information as possible from a single study rather than to produce a single comprehensive high fidelity strain catalog.Because time series comprising more than 5 samples from the same subject and body site are very rare in the available cohorts (only 70 individuals -i.e., 1.0% -in our database), co-assembly is not considered in the present work as it would not provide advantages.", "Each of these has advantages anddisadvantages, and, although the main genome portals are generally consistent, theymay not give the same answers in every case. Bearing in mind that only the first two ofthese gene location methods are based on stable (almost) unambiguous information,it is better to use at least two ways to define and store the results: for example, a sectionof raw sequence and a gene name, or a primary accession number and a set of genomecoordinates. The BACE1 gene will be used as an example of a known gene to locate." ] ], "task_id": [ "847F1E1599EECDE92F99B7581728FFE8", "037BAB6CB2DE7A42AAAA73CE5DA8DB73", "B2AA6DE557D652A0A660C4E0FAC1124D", "7EC697DE62C0C57E601EC3F5B295DF61", "0A6673A0B69F0FF9C9657FB797DD1FE2" ] }