diff options
author | Nyeusi D. Shebes | 2025-02-27 22:04:47 -0600 |
---|---|---|
committer | Nyeusi D. Shebes | 2025-02-27 22:04:47 -0600 |
commit | 835e229909e9bdb6e084c5112672065886517adb (patch) | |
tree | 7b99357246dd46da8cd77c9bb98f58a4cfd16834 /gnqa/data/study1/datasets/human | |
parent | 997742a6f0da09418b727f08318719182ac703f1 (diff) | |
download | gn-ai-835e229909e9bdb6e084c5112672065886517adb.tar.gz |
refactoring codebase
Diffstat (limited to 'gnqa/data/study1/datasets/human')
26 files changed, 2422 insertions, 0 deletions
diff --git a/gnqa/data/study1/datasets/human/dataset_citizenscientist_aging_1.json b/gnqa/data/study1/datasets/human/dataset_citizenscientist_aging_1.json new file mode 100644 index 0000000..e180eee --- /dev/null +++ b/gnqa/data/study1/datasets/human/dataset_citizenscientist_aging_1.json @@ -0,0 +1,106 @@ +{ + "question": [ + "List as many studies as you can that include rapamycin.", + "Why is it so diffuclut to map gene loci that control aging in humans?", + "What is apoptosis?", + "which genes are involved in the aging process", + "what causes the aging process" + ], + "answer": [ + "1. A study by Kaeberlein in 2013b on the use of rapamycin to prevent organ transplant rejection, some forms of cancer, and restenosis in cardiac stents.\n2. A study by Mannick et al. in 2014 on the use of rapamycin derivative RAD001 to improve age-associated decline in immune function in healthy elderly people.\n3. A study by Yi et al. in 2014 on the use of rapamycin in dogs to improve outcomes in a glycogen storage disease model.\n4. A study by Paoloni et al. in 2010 on the use of rapamycin in veterinary clinical trials as a treatment for osteosarcoma.\n5. A study by Kaeberlein in 2015 on the use of rapamycin in a veterinary clinical trial to assess side effects and effects on age-associated cardiac function in healthy elderly dogs.\n6. A study by Johnson et al. in 2013 on the use of rapamycin as a pharmacological intervention for extending lifespan and delaying age-related functional declines in rodents.\n7. A study by Augustine et al. in 2007 and de Oliveira et al. in 2011 on the side effects of rapamycin.\n8. A study by Lamming et al. in 2012 on the possible exception of impaired glucose homeostasis as a side effect of rapamycin.\n9. A study by Larson et al. in 2016 on the pharmacokinetic analysis of rapamycin treatment in healthy dogs.\n10. A study by Dai et al. in 2014 and Flynn et al. in 2013 on the improvements in cardiac function in aged dogs and mice after rapamycin treatment.\n11. A study by Johnson et al. in 2015 on the beneficial impacts of rapamycin on multiple age-related phenotypes in aging mice.\n12. A study by Chen et al. in 2009 on the effects of rapamycin on the aged immune system in elderly mice.\n13. A study by Mannick et al. in 2014 on the use of rapamycin derivative RAD001 to enhance the function of the aged immune system in humans.", + "Mapping gene loci that control aging in humans is difficult due to several reasons. Firstly, aging is a complex phenotype influenced by multiple genes and numerous different pathophysiological processes and diseases. Secondly, any common variation in genes associated with aging probably has a small effect, requiring large studies for identification. Thirdly, human studies face issues like environmental heterogeneity, genetic diversity, and lack of birth matched controls. Lastly, the influence of environmental factors and human-specific confounders like psychosocial, economic, and cultural factors can potentially mask purely biological aging mechanisms, making the analysis more difficult.", + "Apoptosis, also known as cell suicide or programmed cell death, is a biological process in multicellular organisms that allows specific cells to be removed during the development of complex tissues, or potentially dangerous damaged cells to be destroyed for the benefit of the whole organism. It is characterized by a sequence of well-defined events resulting in cell destruction and is necessary for normal cell turnover. It is also essential to various other biological processes.", + "The genes involved in the aging process include daf-16, daf-2, BAZ2B, HMGB4, NOC2L, RAI1, SIK1, SMARCA2, SPZ1, TBP, TRIP13, ZKSCAN1, DBH, TPO, LSS, GPER, HCRTR2, ATG2A, NEDD4L, PSMB1, UBXN4, USP6, EEF1A2, ITGB2, TUBB2C, WRN, ABCA7, AZGP1, CD36, DEGS2, PI4KA, SOAT2, APOE, LDLR, CDKN2B, RBM38, IGF1R, FOXO3, SNCA, NAP1L4, GAB2, QKI, and many others.", + "The aging process is caused by a combination of factors including genetic influences, environmental conditions, and stochastic processes. It involves the accumulation of molecular damage, mutations, incomplete repair, and genetic programs. Other factors include wear and tear on cells, decreased immune defenses, oxidation and inefficient mitochondria, toxins and radiation, glycosylation, caloric intake and sirtuin production, neurotransmitter imbalance, hormone mechanisms, reduced nitric oxide, and stem cell slowdown. Aging is also associated with changes in dynamic biological, physiological, environmental, psychological, behavioral, and social processes." + ], + "contexts": [ + [ + "Intervention trials and cell-based monotherapy", + "Rapamycin is used clinically to prevent organ transplant rejection, for some forms of cancer, and to prevent restenosis in cardiac stents (Kaeberlein 2013b).Shortterm treatment with the rapamycin derivative RAD001 improves ageassociated decline in immune function, as measured by antibody response to an influenza vaccine, in healthy elderly people (Mannick et al. 2014).Rapamycin has been shown to be well tolerated in dogs, improves outcome in a glycogen storage disease model (Yi et al. 2014), and is currently being tested in veterinary clinical trials as a treatment for osteosarcoma (Paoloni et al. 2010).A veterinary clinical trial is underway to assess the side effect profile and effects on age-associated cardiac function following 10 weeks of rapamycin treatment in healthy elderly dogs (Kaeberlein 2015).Rapamycin is currently the most effective pharmacological intervention for extending lifespan and delaying a broad range of age-related functional declines in rodents (Johnson et al. 2013).However, the doses used clinically to prevent organ transplant rejection are associated with side effects, such as impaired wound healing, edema, elevated circulating triglycerides, impaired glucose homeostasis, gastrointestinal discomfort, and mouth ulcers (Augustine et al. 2007;de Oliveira et al. 2011).These adverse side effects would likely preclude long-term use of rapamycin at these levels in otherwise healthy people.With the possible exception of impaired glucose homeostasis (Lamming et al. 2012), these side effects have not been observed at doses that are associated with increased lifespan and healthspan in mice, however, raising the possibility that lower doses of this drug could promote healthy aging with minimal adverse effects.This trial is designed to determine whether treatment with the drug rapamycin (see Table 1) can significantly reduce age-related disease and disability as well as mortality in middle-aged large dogs.The initial phase of this trial, which is in progress at the time of this writing, is intended to enroll at least 32 dogs 6 years of age or older and 40 lb in weight or greater.Each animal receives an initial veterinary exam and comprehensive blood work along with a cardiac exam including echocardiography (Fig. 3).Those dogs that do not present with any abnormalities or significant pre-existing health conditions are randomized into either placebo or rapamycin treatment groups for a 10-week treatment period.Initial rapamycin dosing regimens were determined, in part, based on pharmacokinetic analysis of rapamycin treatment in healthy dogs (Larson et al. 2016).After 10 weeks in the study, each dog receives another full exam and blood chemistry panel as well as repeat cardiac exam.The primary goals of this first phase are to establish appropriate dosing of rapamycin in the absence of significant adverse events and to determine whether similar improvements in cardiac function are achieved in aged dogs after 10 weeks of rapamycin treatment, as has been observed in laboratory mice (Dai et al. 2014;Flynn et al. 2013).To date, only one study has been performed assessing the impact of a rapamycin derivative on healthy aging in people.In this trial, it was observed that 6 weeks of treatment with the rapamycin derivative RAD001 (everolimus) was sufficient to enhance function of the aged immune system, as assessed by response to an influenza vaccine (Mannick et al. 2014).This recapitulates what was observed in elderly mice (Chen et al. 2009), and suggests that at least some of the mechanisms by which rapamycin delays aging in mice work similarly in humans.Although both compounds have essentially identical biological activities, RAD001 was used in this study instead of rapamycin because the study was funded by Novartis, who holds the patent rights for RAD001 (rapamycin is now off patent and sold as a generic drug).The doses of RAD001 used in the human immune aging study were lower than those typically used to prevent organ transplant rejection and showed improved side effect profiles, although some adverse effects, including the presence of mouth sores in a subset of the patients, were noted.Pending the outcome of phase 1, we anticipate enrolling several hundred additional dogs with similar entry criteria into a longer-term, 3-5 year study, to carefully assess the extent to which rapamycin improves health and reduces mortality in middle-age companion dogs.In addition to cardiac function, assessments of multiple age-related phenotypes will be performed including measures of cognitive function, muscle function, kidney function, glucose homeostasis, and cancer incidence.Many of these parameters are beneficially impacted by rapamycin in aging mice (Johnson et al. 2015), and we predict that rapamycin will induce similar improvements in aging dogs.Fig. 3 Design of the current short-term rapamycin intervention trial.Dogs must weigh at least 40 pounds and be at least 6 years old at time of entry into the study.If no significant pre-existing health conditions are detected at the first exam, dogs are randomized into either placebo or one of the rapamycin treatment groups.Red indicates the 10-week period during which the dogs receive either rapamycin or placebo.Dogs receive the same generic rapamycin (sirolimus) pill that is provided to human patients.Asterisk Serum and feces are collected at each appointment for future metabolomic and microbiome analyses and for quantitation of circulating rapamycin levels", + "All patients included in the study were receiving medications such as anti-hypertensive drugs (mainly angiotensin-converting enzyme inhibitors), blood sugar lowering agents and diuretics.In addition, infusions of human recombinant erythropoietin and iron hydroxide were administered.Just one patient reported a history of gastric cancer.Patients with chronic viral diseases (hepatitis, HIV) were excluded from the study.", + "RapamycinRapamycin is a macrolide isolated from Streptomyces hygroscopicus, a bacteria from Pascua Island (Rapa Nui).It has functions as an antibiotic, an immune suppressant drug, and it is also proposed as a CRM.After the first studies, it was found that rapamycin could induce the extension of the replicative life of yeast through the inhibition of TOR signaling [57].This compound could extend the lifetime useful in 20-month-old mice in correlation with TOR activity [58].These studies were the basis of the research to determine the function of rapamycin as a CRM, due to its modulating properties over proteostasis.In addition, studies suggest that rapamycin can be combined with other compounds (metformin, losartan, statins, propranolol, and aspirin among others) to potentiate their anti-aging activity [59].", + "One out of the 25 FDA approved Breast cancerdrugs (Gemcitabine), was found in the top 20 drug list from LINCS from breast cancer stage I (dark magenta). As shown in Fig. 12, one drug out of 25 FDA approved Breast cancer drugs, Gemcitabine, was found asrepurposed drug from LINCS for breast cancer stage III. Letrozole (Breast cancer drug) has similar structure(greater than 60%) with Ruxolitinib (repurposed drug from LINCS) a drug for the treatment of intermediate orhigh-risk myelofibrosis (Fig. 13).One out of the 25 FDA approved Breastcancer drugs (Palbociclib), was found in the top 20 drug list from LINCS from breast cancer stage II (deep pink). Scientific Reports | 6:20518 | DOI: 10.1038/srep2051813www.nature.com/scientificreports/Figure 11. Highlighted target genes that physically interact with genes from the breast cancer stageII common network pattern and their corresponding repurposed drugs from LINCS, along with theirstructurally similar Breast cancer drugs. As shown in Figs 1617 two target genes (TOP2A and TYMS) are also involved in the Triple Negative pattern.Two of them (Gemcitabine and Palbociclib) are included in the list of the 25 knownFDA-approved Breast cancer therapeutic drugs. We performed a Hypergeometric distribution test in order tofind the statistical significance of this drug overlapping. More precisely, LINCS_L1000 database is comprisedfrom 20,413 chemical reagents. Twenty two out of twenty five breast cancer drugs are also included in LINCSdatabase. Finally, from the 105 drugs that were found from our analysis, the probability of finding two drugs tooverlap with the Breast Cancer drugs in LINCS is 0.005471157, pointing out that there is statistical significancein their selection.Two from the 25 FDAapproved Breast cancer drugs (Gemcitabine and Palbociclib), was found in the top 20 drug list from LINCSfrom Luminal A breast cancer (dark magenta and deep pink respectively).One out of the 25 FDAapproved Breast cancer drugs (Gemcitabine), was found in the top 20 drug list from LINCS from breast cancerstage III (dark magenta). that was found from the drug repurposing analysis of HER2 pattern. It has similar structure - 75% withWZ-4002 repurposed drug, which is a novel mutant-selective inhibitor of EGFR. Finally, both Palbocicliband WZ-4002 are structurally similar to Dasatinib (more than 60%), which is a cancer drug used to treatacute lymphoblastic leukemia.18 two drugs out of 25 FDA approved Breast cancer drugs Gemcitabine and Palbociclib were also found as repurposed drugs from LINCS for breast cancer Luminal A (Fig. 18). Two genes from theLuminal A network pattern physically interact with four genes that involved in Histone deacetylases class(HDAC1, HDAC2, HDAC3 and HDAC8), which are target genes of Vorinostat (repurposed drug from LINCS). Vorinostat is a member of a larger class of compounds that inhibit histone deacetylases (HDAC) and it is usedto treat cutaneous T cell lymphoma (CTCL).Network pattern for each breast cancer subtype and the common interactions across Luminal Aand Luminal B. As shown in Fig. 8, one drug out of 25 FDA approved Breast cancer drugs, Gemcitabine, was proposed asrepurposed drug by the LINCS for breast cancer stage I. Furthermore, Gemcitabine is quite similar (tanimoto31similarity greater than 80%) with Clofarabine and Kinetin-riboside (repurposed drugs from LINCS). Clofarabineis also an anti-cancer, antineoplastic chemotherapy drug and is classified as an antimetabolite.One from the 25 FDAapproved Breast cancer drugs (Gemcitabine), was found in the top 20 drug list from LINCS from breast cancerstage IV (dark magenta). at the G2/M phase. Coconut milk contains kinetin riboside and is thought to have the potential to inhibit the progression of many cancers, including prostate, colon and breast cancer. One study found that carcinogen-inducedmammary tumors in mice were reduced by coconut oil too (http://foodforbreastcancer.com/). Moreover, in StageI, Sepantronium bromide (repurposed drug from LINCS) has been found similar with Vinblastine Breast cancerdrug and Idarubicin with Doxorubicin and Epirubicin respectively.", + "Rapamycin has serious side effects, particularly as an immunosuppressor, and thus it is not suitable as an antiaging drug.As in sirtuins, however, these studies highlight the road from basic discovery on the biology of aging to antiaging interventions.Further studies of the TOR pathway and of repressors more specific of its downstream signaling pathway are ongoing.Whether rapamycin produces a change in another parameter related to energy uptake or utilization is unknown, and determining which of its effects modulate lifespan is an important unsolved question.Like resveratrol, TOR has attracted considerable attention from the pharmaceutical industry, particularly in the context of cancer (Meric-Bernstam and Gonzalez-Angulo, 2009).", + "Hayes DF, Stearns V, Rae J, Flockhart D; 32 Consortium on Breast Cancer Pharmacogenomics.A model citizen?Is tamoxifen more effective than aromatase inhibitors if we pick the right patients?J. Natl Cancer Inst.100(9), 610-613 (2008)." + ], + [ + "Recent developments on the genetics of aging can be seen as several streams of effort.In general, humans show a relatively modest (<50%) heritability of life spans (results obtained from twin studies discussed below).The apoE polymorphisms are remarkable for their influence on both cardiovascular disease and Alzheimer disease.In contrast, rare mutant genes with high penetrance cause these same diseases but with early onset and a major shortening of the life span.Shortlived laboratory models (fruit flies, nematodes, mice) are yielding rapid advances, with the discovery of mutants that increase life spans in association with altered metabolism, which leads to questions on the physiological organization of aging processes.Although these early findings do not show that a conserved genetic program actually controls aging processes across animal phylogeny, it is striking how frequently findings of metabolic rate, insulin signaling, and free radicals have emerged from very different approaches to aging in nematodes and mammals, for example.These findings hint that the genetic control of life span was already developed in the common ancestor of modern animals so that subsequent evolution of life spans was mediated by quantitative changes in the control of metabolism through insulin and the production of free radicals.", + "FUTURE DIRECTIONS: HIGHER RESOLUTION DATA VIA HIGHER THROUGHPUT ASSAYSOne inescapable conclusion of the aggregate results of genome-wide studies of aging to date (see summary Table 1) is that we have not come close to saturating the number of potentially lifespan-altering genes in any organism.This is in no small part because directly generating survival curves is a relatively time-consuming process in most model organisms using current methods.There are several possible ways to address this.One way that has been tried is by attempting to find surrogate phenotypes [72,73,126] that can be screened more rapidly, or even scored under selection.Another is mining candidates from the many whole-genome expression profiles.Results to date with these have been very fruitful, but have not suggested that these methods alone will rapidly saturate our search for lifespan-and healthspan-altering genes in tractable model organisms.", + "Chromosome mapping of genes that were differentially expressed in mice of different ages and/or in response to CR revealed a wide distribution of genes with some physical clustering of responsive genes within the genome.The latter findings are consistent with the concept that aging is a complex process and that evolutionary adaptations to aging, if they exist, may or may not involve geographic clustering of functionally related genes.", + "Geneticlinkage studies of long-lived human families identified alongevity locus while candidate gene approaches have beenused to identify and confirm the association betweenspecific variants in the FOXO3A gene and humanlongevity [37]. Genome-wide association studies havealso been used to identify the association of APOE with life123Aging Clin Exp Resspan and have yielded insights into potential biologicalpathways and processes related to aging. Despite thesesuccesses, several problems are inherent in humanlongevity studies including potentially high degrees ofenvironmental heterogeneity, genetic diversity, and lack ofbirth matched controls, among others [8].", + "The aging process most certainly is under highly polygenic controls This should not discourage us from pursuing a search for those loci which may be of profound importance to human aging as it ordinarily occurs in most human beings.", + "In most experimentally modified animal model systems, single-gene mutations in many different genes have major life extension effects (Fontana et al., 2010;Kenyon, 2010).However, natural human and animal longevity is presumed to be a complex trait (Finch & Tanzi, 1997).In humans, both candidate gene and genome-wide genetic association approaches have been applied in an attempt to identify longevity loci.The frequency of genetic variants has been typically compared between nonagenarian cases and young controls, revealing loci at which genetic variants may contribute to a higher or lower probability of survival into old age.The initial candidate gene studies aimed at finding human longevity genes were dominated by contradictory results (Christensen et al., 2006).The more consistent evidence obtained by repeated observation in independent cohort studies for association with longevity has so far only been observed for three loci, the apolipoprotein E (APOE) locus (Schachter et al., 1994;Christensen et al., 2006), the FOXO3A locus (Willcox et al., 2008;Flachsbart et al., 2009;Pawlikowska et al., 2009;Soerensen et al., 2010), and the AKT1 locus (Pawlikowska et al., 2009).Thus, despite the expectation that longevity would be influenced by many genetic variants with small effect sizes, the effect of variants has consistently been shown in only three genes.", + "1993), andgene expression microarrays (Pletcher et al. 2002). Given the ambiguities and limitations of large-effect mutant studies of aging, discussed earlier, those publications do notprovide very useful evidence with respect to the question of the number of loci thataffect aging. At present, the best answer to the question of the number of genes controlling aging is many (Rose and Long 2002), in keeping with the original expectations ofevolutionary biologists. However, studies of the genetics of the experimental evolution of aging are nowamenable to the application of genomic methods.", + "Accepted Article 2013 The Authors Aging Cell 2013 Blackwell Publishing Ltd/Anatomical Society of Great Britain and Ireland over 90 years and 1,955 controls between 55 and 80 years did not reveal genome-wide significant loci (Newman et al., 2010) and neither did the analyses of all-cause mortality and survival free of major disease in this cohort (Walter et al., 2011).A smaller Dutch study of 403 nonagenarians and 1,670 controls younger than 65 years identified the APOE gene as a mortality locus (Deelen et al., 2011), which was confirmed in a German study of 763 long-lived individuals and 1,085 younger controls (Nebel et al., 2011) and a longitudinal study of 1,606 Danes showed that the effect size of this association increases at the highest ages (Jacobsen et al., 2010).Apparently, the influence of the common genetic variation on longevity is small which requires large meta-GWA studies for identification.Alternatively, rare genetic variants may play a more important role in longevity.Since the previous linkage studies showed contradictory results potentially due to heterogeneity in the longevity phenotype, it is expected that longevity is influenced by many private rare variants.", + "Ageing is complex and takes a long time to study -a lifetime in fact.This makes it difficult to discern its causes, among the countless possibilities based on an individual's genes, behaviour or environment.While thousands of regions in an individual's genetic makeup are known to influence their risk of different diseases, those that affect how long they will live have proved harder to disentangle.Timmers et al. sought to pinpoint such regions, and then use this information to predict, based on their DNA, whether someone had a better or worse chance of living longer than average.", + "Several explanations are possible for the lack of genomewide significant findings.First, mortality is arguably 1 of the most complex phenotypes, and several trajectories toward extreme old age have been identified (Evert et al., 2003).Multiple genes could mediate the aging process but would have their effects through numerous different patho-physiological processes and diseases that act as intermediate factors on the pathway to death (de Magalhaes et al., 2010).Therefore, any common variation in genes associated with aging probably has a small effect.Second, the largely negative findings of this and other studies contrast with the intriguing animal studies of longevity.Very large effects of single genes on lifespan have indeed been observed in laboratory animals, but humans often have several homologues of these genes which might significantly differ in function or compensate for mutated genes through redundant mechanisms (Kuningas et al., 2008).This could explain why our top findings did not include genes in these pathways found in animal models.Animal models also represent genetically homogenous populations and are exposed to controlled environmental influences.The lack of replication of animal model findings in humans suggests that the use of knockout animals may not provide the optimal approach to understanding the variation in survival in humans as interactions with environmental factors may obscure the associations and prevent the identification of loci in humans.The lack of success in the identification of genes related to aging in humans may be due to the complexity of the phenotype.One approach to investigate aging and longevity is to compare frequencies of genetic variants between nonagenarians or centenarians and the general population.This approach led to the discovery of an association between APOE (Deelen et al., 2011;Ewbank, 2007;Gerdes et al., 2000) and more recently FOXO3A (Anselmi et al., 2009;Flachsbart et al., 2009;Li et al., 2009a;Pawlikowska et al., 2009;Willcox et al., 2008) and human aging and longevity.However, a recent genome-wide association study (GWAS) of individuals reaching the age of 90 or older failed to identify genome-wide significant variants (Newman et al., 2010).", + "In addition to timing differences, a small proportion of genes (10%-15%) exhibit opposite trends of expression changes with age in humans and macaques (Supplemental Fig. S13).Interestingly, such differences are ;1.5 times more common in aging than in development, an observation consistent with the lower strength of purifying selection on the gene regulation at old age (discussed below).These differences could also reflect extreme shifts in developmental timing between species, as well as technical artifacts.Future studies, using additional species and alternative methodology, are needed to address this issue.", + "1993), andgene expression microarrays (Pletcher et al. 2002). Given the ambiguities and limitations of large-effect mutant studies of aging, discussed earlier, those publications do notprovide very useful evidence with respect to the question of the number of loci thataffect aging. At present, the best answer to the question of the number of genes controlling aging is many (Rose and Long 2002), in keeping with the original expectations ofevolutionary biologists. However, studies of the genetics of the experimental evolution of aging are nowamenable to the application of genomic methods.", + "The remarkable discoveries of the past 2 decades showing that single genes can regulate aging in model organisms demonstrate that aging can be genetically manipulated (Finch and Ruvkun, 2001;Kenyon, 2010).Hundreds of genes that modulate longevity have now been identified in model organisms (de Magalha es et al., 2009a).In some cases (e.g., in worms), mutations in single genes can extend lifespan by almost 10-fold (Ayyadevara et al., 2008).Nonetheless, aging is a complex process that derives not from single genes but from the interactions of multiple genes with each other and with the environment.Evidence from animal systems shows a major impact of the environment on aging, yet environmental manipulations of aging act through genes and proteins, usually by triggering signaling pathways and modulating gene expression.In fact, some genes have been shown in model organisms to have varying effects on lifespan depending on diet (Heikkinen et al., 2009).Genes that can regulate aging in model organisms cannot be directly applied to humans through genetic manipulations for numerous legal, ethical, and technical reasons.If we could understand how the environment modulates these aging-related genes, we might be able to create antiaging therapies applicable to humans, potentially through diet, lifestyle, and even pharmacological interventions.Therefore, understanding genome-environment interactions in the context of aging can be a powerful approach to identify attractive targets for drug design.", + "TRANSLATION OF LONGEVITY MODEL ORGANISMS AND CORE AGING PATHWAYSGenetic studies on lifespan have proven to be challenging.While longevity is a defining trait for a given species, the lifespan of individuals is of limited heritability, making analyses more difficult.Exceptional human life span, although a rare phenotype, is likely multifactorial; refined analyses are required to obtain statistically robust genomic signatures of longevity (Zhang et al., 2020) and these have proven elusive.Unlike laboratory models, the effect of environmental variance cannot be controlled in human studies, potentially masking purely biological aging mechanisms.Even laboratory models cannot replicate the complex \"environment\" of humans; it includes psychosocial, economic, and cultural factors, rather than strictly biological.These human-specific confounders are difficult or impossible to target in traditional model organisms.Despite these limitations, experimentally tractable model organisms have proven invaluable in deciphering the purely genetic contribution to lifespan, including genes and pathways conserved across the tree of life.ANALYSIS OF HUMAN VARIATION IN THE GENETIC CONTROL OF LONGEVITYHeritability studies have convincingly demonstrated that at least some fraction of human lifespan is heritable.In tandem, large-scale genome-wide association studies (GWAS) have identified numerous loci associated with age-related traits (Buniello et al., 2019).While genetic studies have functionally shown an inverse effect of multiple age-related, diseaseassociated variants on lifespan regulation, the number of well-replicated longevity-conferring variants remains limited to variants in APOE (ApoE 2), and more recently, CDKN2A/B and IL6 (see Table 1).To date, studies in humans have been hampered by the specific phenotype definitions used, sample sizes of the extreme phenotypes, and modest heritability of the longevity-related traits (Breitbach et al., 2019).This is due to the complex interplay of biological and social factors involved in human aging, as well as the limited power of GWAS, which require sampling thousands of subjects to achieve statistical significance (Breitbach et al., 2019).Genetic studies of aging have also been hindered by an inconsistent use of definitions of aging (reviewed in Baghdadi et al., 2020).The two main ways of conducting research on the genetics of longevity in human populations are by studying (i) the lifespan (continuous trait, years lived) and (ii) the longevity (dichotomous trait, i.e., being among the longest-lived individuals within a specific population).These complexities have limited the resolution and capability of broad association studies of human longevity.Importantly, these genomic analyses focus on a shift of survival in a population; these variables may be genetically distinct from the mechanisms establishing potential for longevity overall (Figure 1A).We argue that an understanding of this shift in lifespan as well as genetic mechanisms of regulating a species specific 'set points' (Figure 1B) will aid in the conceptual distinction of aging and longevity in humans.", + "With modern genomic technologies and largescale data analysis methods, it is possible to sift through the genes of populations to find the loci that act to postpone aging. [3]There are uncertainties with the comparison of populations with different rates of aging.However, it is superior to experimental designs that only consider age-dependence or dietary-response, without determining causal mechanisms.", + "Most of the human candidate gene studies were performed in cross-sectional designs (Box 1 and Fig. 1), comparing allele frequencies of potential longevity loci between highly aged individuals and young controls.The candidate gene studies based on single genes have pointed a role for genes involved in, e.g., GH/insulin/IGF-1 signaling, immune regulation, and lipoprotein metabolism (Supporting Information Table S1), although most of these results have not (yet) been confirmed in sufficient independent studies.The most convincing human longevity loci today are APOE and FOXO3A which have frequently been associated with longevity in cross-sectional studies (see for a review [26]) and survival in prospective studies [27][28][29] (Fig. 3).APOE encodes the protein apolipoprotein E which seems to play a role in e.g., lipoprotein metabolism, cognitive function, and immune regulation [30].FOXO3A encodes the protein forkhead box O3 which acts as a transcription factor for many different genes involved in processes like apoptosis and oxidative stress [31].Conclusions and prospectsOver the past two decades the human aging field has built up the necessary resources to study the biology of aging and longevity by establishing human populations with a diversity of designs.Meta-analyses integrating genetic and phenotypic datasets have successfully identified variants associated with a range of age-related traits and diseases.Despite these accomplishments, the number of novel leads contributing to human lifespan regulation is limited.Although positive regions of linkage and suggestive GWAS hits have been reported, the field has not yet identified the loci that explain the clustering of longevity in families and the variation in biological aging rate in the population.As for animal models, down-signaling of the IIS and mTOR pathway appeared to be relevant in humans.These findings are being followed up by molecular and physiological profiling using skin, fat and muscle tissue of long-lived family members and controls.Human studies now also include the response of nutrient sensing systems to the application of dietary and physical challenges." + ], + [ + "Apoptosis, or controlled cell death [62], is another major stressed-cell response, and was also represented in our results (Fig. 9e).A large body of direct evidence points to apoptosis as one of the main routes of RPE degeneration in AMD [63].Induction of apoptosis upon stress is dictated by the action of master regulator p53, and it was recently shown that aging increases the activity of p53 in RPE cells and the likelihood for apoptotic cell death [64].Consistent with this evidence, we found association with pathways in Transcriptional regulation by TP53 group (Fig. 9d).In particular, Regulation of TP53 activity through methylation was among the top pathway in our association analysis (Table 1), suggesting that p53 modification by methylation and the closely related histone modifications [Protein lysine methyltransferases (PKMTs) methylate histone lysine in Fig. 9e] play important roles in RPE apoptosis regulation.In the intrinsic apoptotic pathway induced by oxidative stress, cytochrome c is released from mitochondria into the cytosol, binding and activating caspases, the main proteases central to apoptotic action.We found association in pathways involving 'inhibitor of apoptosis' (IAP) and its negative regulator 'second mitochondrial activator of caspases' (SMAC) [65], which suggests that disruption to regulatory mechanisms preventing apoptosis in RPE cells may play roles in AMD.", + "ApoptosisPersistent DNA damage", + "42ABSTRACT 18A MODULARIZED MODEL OF APOPTOSISHA Harrington, KHo, Sk Ghosh, KC Tung , CY Kao, and B AgudaImperial College London, Courant Institute of Mathematical Sciences New YorkUniversity, University of Texas at Arlington, University of Texas SouthwesternMedical Center, Mathematical Biosciences Institute, and Department ofMathematics, The Ohio State University Columbus, OH, USABackground: One of the key physiological mechanisms employed by the cell(during development and for maintenance of homeostasis) in multi-cellularorganism is apoptosis, which is characterized by a sequence of well-definedevents resulting in cell destruction.", + "14Apoptosis is caused by the activation of the caspase cascade, which isinitiated by two signaling routes (stress-induced death and death-domainreceptor-induced death) (Domen 2001). This process can be prevented by antiapoptotic molecules, such as Bcl-2 (Domen and Weissman 2000). Directevidence for the involvement of apoptosis in HSC number regulation came fromthe findings that overexpression of the anti-apoptotic gene bcl-2 led to increasednumbers of Thy-1.1low, Sca-1+, c-kit+, Lin- cells, a population with long-termmulti-lineage repopulation potential (Domen et al. 2000).Several lines of evidence have indicated that apoptosis acts as animportant regulator of stem cells. First of all, expression of some apoptosisrelated genes were detected in human and/or murine HSCs (Domen 2001). Secondly, targeted disruption of some of these genes in null and dominantnegative mutant mice interfered with normal apoptotic processes in HSCs. Forexample, overexpression of Bcl-2, a negative regulator of apoptosis, increasednot only the numbers and competitive repopulation capabilities of HSCs, but alsothe resistance of HSCs to apoptosis induced by ionizing radiation (Domen andWeissman 2003).", + "ApoptosisCell suicide, or apoptosis, is a well-studied biological phenomenon in multicellular organisms that allows specific cells to be removed during the development of complex tissues, or potentially dangerous damaged cells to be destroyed for the benefit of the whole organism.The lack of an apparent evolutionary benefit for such a process in a single-celled organism initially caused controversy about the presence of an apoptotic pathway in yeast.Today, however, a number of yeast orthologues to mammalian apoptosis genes have been discovered and apoptotic-like cell death has been linked to mating, colony formation, and aging (Buttner et al. 2006;Eisenberg et al. 2007;Frohlich et al. 2007).With respect to aging, both replicatively and chronologically aged cells that die have increased ROS and display apoptotic phenotypes (Fabrizio et al. 2004a;Herker et al. 2004;Laun et al. 2001).The importance of apoptosis in yeast aging has yet to be fully characterized.At the very least, yeast apoptosis provides a useful pathway for studying genetic interactions for age-related diseases that affect humans, such as cancer.Readers interested in further information related to yeast apoptosis are referred to several in-depth reviews (Buttner et al. 2006;Eisenberg et al. 2007;Frohlich et al. 2007).", + "Early redistribution of plasma membrane phosphatidylserine is a generalfeature of apoptosis regardless of the initiating stimulus: inhibition by overexpression ofBcl-2 and Abl. J Exp Med 182: 1545-56. Mathew CG (2006). Fanconi anaemia genes and susceptibility to cancer. Oncogene 25:5875-84. McBride MW, Carr FJ, Graham D, Anderson NH, Clark JS, Lee WK et al (2003). Microarray analysis of rat chromosome 2 congenic strains. Hypertension 41: 847-53. Merino-Trigo A, Kerr MC, Houghton F, Lindberg A, Mitchell C, Teasdale RD et al(2004).", + "When a cell harbors such severe DNA damage that it is beyond repair, it is disposed of through apoptosis.Alternatively, DNA damage can induce cellular senescence, the irreversible cessation of mitosis.Both processes are critically dependent on p53, which is known as the guardian of the genome [3] .DNA damage may also trigger autophagy, a cellular catabolic process that maintains homeostasis [4] .It should be noted that under normal conditions cells are rarely exposed to very high doses of DNAdamaging agents, which may be the explanation why we do not age and die because we run out of cells.However, aging is associated with some atrophy [1] and it is conceivable that at older ages bursts of DNA damage, for example from free radical reactions associated with inflammation, do occur and give rise to an increasingly high rate of apoptosis or cellular senescence.While there is some evidence for increased apoptosis and cellular senescence at old age, it is doubtful that under normal conditions this would lead to a significant loss of functional cells.", + "Apoptosis, or programmed cell death, literally eliminates cells at risk for neoplastic transformation.Senescence, by contrast, permanently arrests their growth.Both processes are controlled by the p53 tumor suppressor protein (Amundson, Myers, & Fornace, 1998;Bringold & Serrano, 2000;Hickman, Moroni, & Helin, 2002;Itahana, Dimri, & Campisi, 2001).p53 is a transcriptional regulator that both transactivates and transrepresses target genes in response to stress (Prives & Hall, 1999;Ryan, Phillips, & Voudsen, 2001).These target genes, in turn, stimulate DNA repair, transient cell cycle arrest, permanent cell cycle arrest (senescence) or cell death (apoptosis), depending on cell type, degree and type of damage, and other variables.In contrast, cells that lack normal p53 regulation or function -for example, tumor cells -tend to die in response to telomere dysfunction.Some normal human cells, on the other hand, undergo a senescence growth arrest.In either case, when present, p53 is crucial for mediating the cellular response to telomere dysfunction (Yaswen & Stampfer, 2002) (Fig. 4).", + "Cell death, and in particularapoptosis, can be caused by a number of mechanisms includingloss of growth factors and excitotoxicity (e.g. , Bhutta and Anand,2002; Nikolic et al. , 2013). It is of interest therefore, that proximalto the region of the QTL there are several genes that are relatedto growth factors including the latent transforming growth factorprotein 2 (ltbp2), placental growth factor (pgf), and transforminggrowth factor beta (Tgf beta).", + "Apoptosis-related gene expression profiles", + "Apoptosis.Programmed death of cells during embryogenesis and metamorphosis or during cell turnover in adult tissues.", + "14Apoptosis is caused by the activation of the caspase cascade, which isinitiated by two signaling routes (stress-induced death and death-domainreceptor-induced death) (Domen 2001). This process can be prevented by antiapoptotic molecules, such as Bcl-2 (Domen and Weissman 2000). Directevidence for the involvement of apoptosis in HSC number regulation came fromthe findings that overexpression of the anti-apoptotic gene bcl-2 led to increasednumbers of Thy-1.1low, Sca-1+, c-kit+, Lin- cells, a population with long-termmulti-lineage repopulation potential (Domen et al. 2000).Several lines of evidence have indicated that apoptosis acts as animportant regulator of stem cells. First of all, expression of some apoptosisrelated genes were detected in human and/or murine HSCs (Domen 2001). Secondly, targeted disruption of some of these genes in null and dominantnegative mutant mice interfered with normal apoptotic processes in HSCs. Forexample, overexpression of Bcl-2, a negative regulator of apoptosis, increasednot only the numbers and competitive repopulation capabilities of HSCs, but alsothe resistance of HSCs to apoptosis induced by ionizing radiation (Domen andWeissman 2003).", + "Fraction of cells displaying apoptosis", + "It has been known that mitochondria play a central role in the life and death of cells (Kroemer & Reed, 2000).Apoptosis was observed in developmentally arrested embryos by 72 h, but not at 24 h after FCCP treatment, despite considerable telomere attrition at this early stage, suggesting that telomere attrition occurs prior to apoptosis and may serve as an intermediate step between mitochondrial dysfunction and apoptosis.These results also suggest that telomere shortening may signal apoptosis (Lee et al ., 1998;Karlseder et al ., 1999).", + "Cell DeathA form of programmed cell death, apoptosis is necessary for normal cell turnover and is essential to a plethora of other biological processes.Apoptosis can be executed via Bcl-2 activation of caspases, via signals from the death receptor on the plasma membrane, or via induction by granzyme B secreted from cytotoxic T cells (Tc cells) [35].Endonucleases and proteases are activated by active caspases, eventually leading to the death of the cell.With age, however, apoptotic activity changes.In heart [36], kidney [37], skeletal muscle [38], and Tc cells [39], increased apoptosis has been reported, perhaps contributing to loss of cellularity in these tissues.This escalation across various tissues may be attributed to the increased production of free radicals [40] and furthermore exacerbated by the accumulation of DNA damage in the aged cells [41].As the risk increases for cells to turn cancerous and dysfunctional with advancing age, increased apoptosis in aged cells is argued to be a defense strategy.In other tissues, such as the colon, apoptosis appears to decrease with age perhaps contributing to the accumulation of senescent cells and age-associated carcinogenesis [42].", + "The regulation and execution of apoptosis in endothelial cells is a complex process involving paracrine factors, membrane receptors, interaction of pro-and anti-apoptotic factors and cysteinyl aspartate-specific proteases (caspases).Recent studies suggest that in aging there is an imbalance in the expression of pro-and anti-apoptotic genes resulting in an enhanced apoptosis in the myocardium (19), central nervous system (24), skeletal muscle (10), lung (33), and liver (2,33).Yet, age-related alterations in the expression of pro-and anti-apoptotic genes in coronary arteries have not been elucidated.", + "Apoptosis modulating genesApopotosis or programmed cell death is associated with alterations in cell morphology, particularly the nucleus, with endonucleatytic cleavage of DNA into nucleosomal length fragments.Apoptosis may result from withdrawal of growth signals." + ], + [ + "Indicative biological pathways associated with the candidate aging genesFig. 2 Significant biological processes associated with the candidate aging genesFollowing are examples of the identified genes and experimental or GWAS link between these genes and aging.On the list of the 25 top genes, NAP1L4 encodes a member of the nucleosome assembly protein (NAP) family, which interacts with both core and linker histones, and shuttles between the cytoplasm and nucleus, suggesting a role as histone chaperone.Histone protein levels decline during aging, and dramatically affect chromatin structure.Remarkably, the lifespan can be extended by manipulations that reverse the age-dependent changes to chromatin structure, indicating the pivotal role of chromatin structure in aging [32].In another example, gene expression of NAP1L4 increases with age in the skin tissue [33].Findings of GWAS link a number of the identified genes to age-related disorders, such as GAB2 and late onset Alzheimer's disease [86], and QKI and coronary heart disease/myocardial infarction [79].Interestingly, GWAS reports also link QKI to successful aging [87].", + "Examples of biological candidate genes with pleiotropic functions, which are involved in aging in general and in musculoskeletal aging in particular, are numerous: (a) in addition to the IGF-1 and vitamin D genes, estrogen metabolism pathway genes, including estrogen receptors and aromatase (CYP19), are associated with fat-free mass (Walsh et al. 2005) and BMD (Shearman et al. 2004), prostate and breast cancer (Gallicchio et al. 2006), and cardiovascular disease risk (Shearman et al. 2003).", + "In-depth analysis of the age-regulated genes revealed that multiple genes in the DNA damage response pathway were upregulated with age including those that function in non-homologous end-joining repair (mre11, rad50, Ku80 and mus308) and in translesion DNA synthesis (mus205 and DNApol-eta) [44][45][46].Genes that encoded enzymes with antioxidant properties, such as the thioredoxin reductase Trxr-1, and antioxidant genes involved in glutamate metabolism, such as GlnRS, isoQC and QC, were also upregulated with age [47][48][49][50].We also observed increased age-associated expression of chaperone genes (Cct1, Cct4, Cct5, Cct6, Hsc70-4) and the unfolded protein response transcription factor Xbp1, consistent with an induction of the unfolded protein response [51][52][53].Under stress conditions, there is a translational switch that favors production of stressrelated proteins while decreasing translation of other proteins [54].Paralogs of canonical translation factors such as NAT1 and Rack1, which were both upregulated, promote this switch to cap-independent translation [55,56].Notably, Rheb, which is downregulated with age, positively regulates ribosome production and capdependent translation by activating the mechanistic target of rapamycin (mTOR) kinase pathway [57].Thus, decreased Rheb levels during aging could decrease mTOR pathway activity, which extends lifespan and is protective against age-related pathology [58].Together, these data suggest that multiple genes are induced in aging photoreceptors to mitigate the effects of oxidative stress, protein misfolding and DNA damage.", + "CellAge vs human orthologues of longevity-associated model organism genesTo understand how senescence is linked to the genetics of aging processes, we looked at the intersection of CellAge genes and the 869 genes in the human orthologues of model organisms' longevity-associated genes (LAGs) dataset, collected based on quantitative changes in lifespan [34].Like CellAge, where genes are classified based on whether their upregulation induces, inhibits, or has an unknown impact on CS, the longevity orthologues dataset also provides information on the effect of upregulation of its genes, namely whether it promotes (pro, 421) or inhibits (anti, 448) longevity (Additional file 1: Table S7; Additional file 2: Fig. S2).Using network biology, we implicated the CellAge genes in various processes, particularly cell division and immune system processes.We used network topology to identify potential regulators of CS and bottlenecks that could impact various downstream processes if deregulated.Indeed, we identified 11 genes that have already been shown to contribute towards CS, which will be added to future versions of CellAge.Finally, we experimentally verified 26 genes that induce CS morphology or biomarkers when knocked down in human mammary fibroblasts.Of these, 13 genes (C9orf40, CDC25A, CDCA4, CKAP2, GTF3C4, HAUS4, IMMT, MCM7, MTHFD2, MYBL2, NEK2, NIPA2, and TCEB3) were strong hits in inducing a senescent phenotype.Results: We develop CellAge (http://genomics.senescence.info/cells),a manually curated database of 279 human genes driving cellular senescence, and perform various integrative analyses.Genes inducing cellular senescence tend to be overexpressed with age in human tissues and are significantly overrepresented in anti-longevity and tumor-suppressor genes, while genes inhibiting cellular senescence overlap with pro-longevity and oncogenes.Furthermore, cellular senescence genes are strongly conserved in mammals but not in invertebrates.We also build cellular senescence protein-protein interaction and co-expression networks.Clusters in the networks are enriched for cell cycle and immunological processes.Network topological parameters also reveal novel potential cellular senescence regulators.Using siRNAs, we observe that all 26 candidates tested induce at least one marker of senescence with 13 genes (C9orf40, CDC25A, CDCA4, CKAP2, GTF3C4, HAUS4, IMMT, MCM7, MTHFD2, MYBL2, NEK2, NIPA2, and TCEB3) decreasing cell number, activating p16/p21, and undergoing morphological changes that resemble cellular senescence.Conclusions: Overall, our work provides a benchmark resource for researchers to study cellular senescence, and our systems biology analyses reveal new insights and gene regulators of cellular senescence.", + "Genomics-a fundamental basis for understanding skin agingIn the last decade, genomic tools such as gene chips have been widely developed.This accomplishment has provided us with deeper insights into the molecular events underlying skin aging. 137Gene expression profiling has led to identification of pathways affected by aging, and this information has led to the development of new strategies to enable better skin repair and antiaging benefits. 138ene expression patterns were examined in sun-protected (buttocks) and sun-exposed skin (extensor forearm) from 10 young (age 19 to 20 years) and 10 older women (age 63 to 67 years) to examine gene expression profiles associated with chronologic skin aging and photoaging.Chronologic and photoaging were both associated with downregulation of the biologic process of lipid synthesis.In particular, genes involved in cholesterol and fatty acid synthesis were downregulated, as were genes associated with epidermal differentiation, including keratin filaments and cornified envelope components.An upregulation of the biologic processes of inflammatory response and wound healing, the molecular functions of cytokine activity and protease activity and the cellular component theme of extracellular matrix was also observed in both skin aging types.Elastin gene expression was upregulated with aging only in the photodamaged arm and remained unchanged in the sunprotected buttock.This finding corresponds to the histopathologic findings that show typical elastotic changes, the \"solar elastosis,\" in photoaged skin. 139urther studies conducted to investigate changes in gene expression during skin aging have been performed on naturally aged human foreskin obtained from children and elderly men.Some of the mechanisms proposed to be involved in the induction of aging comprise disturbed lipid metabolism, altered insulin and STAT3 signalling, upregulation of apoptotic genes partly due to the deregulation of FOXO1, downregulation of members of the jun and fos family, differential expression of cytoskeletal proteins (eg, keratin 2A, 6A, and 16A), extracellular matrix components (eg, PI3, S100A2, A7, A9, SPRR2B), and proteins involved in cell-cycle control (eg, CDKs, GOS2). 140Similar results have been presented by a study related to aging of skeletal muscle. 141n a previous study, we proposed that one of the factors significantly involved in the initiation of aging might be the physiologic decline of hormones occurring with age.Human SZ95 sebocytes in vitro treated with hormone levels that can be found in 60 year-old women produce less lipids than sebocytes treated with a hormone mixture representing that found in the serum of 20 year-old women. 6A differential gene expression between SZ95 sebocytes under the 20 and 60 year-old hormone mixture detected differentially expressed genes that are involved in biologic processes such as DNA repair and stability, mitochondrial function, oxidative stress, cell cycle and apoptosis, ubiquitin-induced proteolysis, and transcriptional regulation. 139,140A comparison of these results with data obtained from the aged kidney 142 identified key genes that may be of great importance for global aging.The most significantly altered signalling pathway was that of TGF-.A disturbed function of this cascade has been also c-Fos, which heterodimerize to form the activator protein 1 (AP-1) complex.AP-1 is a key regulator of skin aging, because it induces the expression of the MMP family and inhibits type I procollagen gene expression through interference with TGF- signalling pathway.It has been postulated that MAP kinases may be activated by excess production of reactive oxygen species (ROS) that occurs with advanced age and may be superimposed by extrinsic factors such as ultraviolet irradiation.Excess ROS production also leads to accumulation of cellular damage, which includes oxidation of DNA resulting in mutations, oxidation of proteins leading to reduced function, and oxidation of membrane lipids resulting in reduced transport efficiency and altered transmembrane signalling.IL, interleukin; NF-B, nuclear factor-B; TGF-, transforming growth factor-; TSP-1, thrombospondin-1; TSP-2, thrombospondin-2; VEGF, vascular endothelial growth factor.associated with tumorigenesis, such as in pancreatic, prostate, intestine, breast, and uterine cancer.", + "Analysis of prior research (Online Resource 5) shows that the revealed genes can be explicitly involved in other key biological processes in an organism whose role is known to be changing with aging.Specifically, ten genes (BAZ2B, HMGB4, NOC2L, RAI1, SIK1, SMARCA2, SPZ1, TBP, TRIP13, and ZKSCAN1) regulate transcription which is believed to be disrupted when an organism is getting older (Roy et al. 2002).The DBH, TPO, and LSS genes are involved in synthesis of catecholamine, thyroid, and vitamin D hormones, respectively.The GPER binds estrogen and HCRTR2 binds orexin-A and orexin-B neuropeptid hormones.Hormonal deregulation with aging is considered to be one of the major components of senescent processes in an organism (Barzilai and Gabriely 2010).Five genes (ATG2A, NEDD4L, PSMB1, UBXN4, and USP6) are involved in degradation of proteins through ubiquitin-proteasome and the lysosomal/autophagic system.Dysfunction of this system leads to accumulation of damaged proteins in an organism that is associated with aging (Koga et al. 2011).Protein degradation through ubiquitin-mediated proteolysis plays an important role in cell-cycle regulation (Reed 2003).The PSMB1, SIK1, TRIP13, and TTN genes in the revealed set coordinate cell cycle.Cell cycle is linked with the aging-related processes in humans through a gradual increase in cell division errors in all tissues in an organism (Ly et al. 2000).Five genes (EEF1A2, DBH, ITGB2, TUBB2C, and WRN) take part in regulation of apoptosis which plays an important role in the aging process and tumorigenesis (Salvioli et al. 2008).Seven genes (ABCA7, AZGP1, CD36, DEGS2, LSS, PI4KA, and SOAT2) are involved in lipid metabolism which plays one of the key roles in human longevity and healthy aging (Barzilai et al. 2003).", + "Genes that are age-regulated in all tissues would reveal genes involved in core mechanisms that underlie cellular ageing.Zahn et al. [63] discovered genetic pathways that show common age regulation in human kidney, brain and muscle.They used microarrays to analyse expression in 81 skeletal muscle samples from patients aged 16 -86 years and found 250 age-regulated muscle genes [63].Similar to the ageing expression profile for the kidney, the overall expression behaviour of this set of age-regulated muscle genes correlated with the physiological as well as chronological age of the muscle sample.Next, they compared their muscle-ageing results to previously published data on kidney and brain ageing of similarly large sample size [56,60].Although most of the age-related changes were tissue specific, they found evidence for common age regulation of six genetic pathways in all three tissues.Specifically, there is an overall increase in expression of the extracellular matrix genes, the ribosomal genes, the cell growth genes and the complement activation genes in all three tissues.Increased overall expression of the extracellular matrix and complement activation gene sets with advancing age may contribute to widespread fibrosis and inflammation in the elderly.There is an overall decrease in expression of the chloride transport genes and the electron transport genes in all three tissues.Decreased overall expression of electron transport chain genes with age might support the mitochondrial free-radical theory of ageing [67], as free-radical generation by mitochondria would preferentially damage the electron transport chain protein complexes.Decreased expression of the electron transport genes (encoded in the nucleus) might be caused by feedback regulation from damage to the electron transport chain protein complexes [63].However, it is also possible that increased oxidative damage occurs as a consequence of the decreased expression of the electron transport chain genes.In addition, an increasing number of studies in model organisms have critically challenged the mitochondrial free-radical theory of ageing [68].", + "DiscussionAging studies from model organisms such as yeast, worms, and flies have repeatedly shown that changes in the expression of certain genes have an effect upon longevity.Although similar aging processes are likely to operate across multiple species [30], it has been much more difficult to identify longevity candidate genes in human studies [30].A key question in human aging is to what extent a signature of aging may be detectable across tissues.Until now there has been a lack of large transcriptional profiles from the same human individuals in multiple tissues.The MuTHER study provides insight into the human aging process by interrogating the largest multiple human tissue gene expression resource to identify genes in which expression was affected by chronological age.The analysis of the skin and adipose tissues samples identified several hundred genes responsive to changes in chronological age.However, the 43 shared genes in skin and adipose tissue showed a single common identifiable pathway related to the stress response.From over 1,800 transcripts that have altered expression with age in skin and adipose tissues, 14 also had age-related differential expression in brain.The limited overlap in these two experiments may partly reflect the smaller sample size of the brain expression dataset, the differences in age range between the studies (16 to 83 years for brain samples; 39 to 85 years for MUTHER samples), or the inclusion of males in the brain samples.But it may also imply, as other studies have suggested, that the effects of age on gene transcription are tissue specific [6,31,32].This hypothesis was supported by the comparison with known related aging genes from the GenAge database, which identified an overlap for a small number of aging-related genes with our data.The GenAge database was the result of a meta-analysis using age-related expression profiles from human brain, kidney, and skeletal muscle, and several expression profiles from mouse and rat; no adipose tissue or skin samples were included (Additional file, Table 1 in [7]).The limited overlap between these datasets supports the idea that molecular signatures of aging reflect predominantly a tissue-specific transcriptional response.The lack of age-related genes in transformed LCLs, suggest that the transformation to immortalize a cell line may mask or even remove the age-related signatures in gene expression.The transformation of primary B lymphocytes into LCLs requires infection by the Epstein-Barr virus which has the effect of disrupting the p53 signaling pathway in order to induce growth and survival [33].Joehanes et al. [15] identified only five genes with age-associated expression in LCLs, including p53 itself (TP53).Although the authors attribute the lack of age-affected genes to their small sample size (n=50) and narrow age range, our analysis with a much larger sample size found even fewer age-related changes, suggesting a lack of detectable aging signature in LCLs.The analysis in the subset of fresh lymphocytes suggested an age influence in fresh lymphocytes may potentially be detectable with a larger sample size.", + "Genes Whose Expression Decreased with Age.Of the 26 genes that decreased expression with age in control mice, 23% are involved in DNA replication and the cell cycle (Table 2).Most of these have a negative effect on cell growth and division.Among these, the product of phosphatase and tensin homolog (Pten) gene is a tumor suppressor that induces cell-cycle arrest through inhibition of the phosphoinositide 3-kinase pathway (28).B cell translocation gene 2 (Btg2) is a tumor suppressor that increases expression in response to DNA damage (29).The murine gene product of the amino-terminal enhancer of split (Aes) is a potent corepressor of gene expression and cellular proliferation (30).Calcium-binding protein A11 (S100a10) binds to and regulates the activity of annexin II, which is involved in the transduction of calcium-related mitogenic signals (31).Insulin-like growth factor (IGF) binding protein 1 (Igfbp1) plays an important role in the negative regulation of the IGF-1 system, a stimulator of mitogenesis (32).", + "daf-16 dependent genesAmong the 52 genes that we have tested, 29 genes act almost completely in a daf-16 dependent manner, to regulate lifespan (Table 2).One of the genes identified was daf-2 (Y55D5A_391.b).This serves as a proof of principle that our screen is effective in identification of aging genes.", + "Several of the genes we identify have previously been shown to influence lifespan in experiments on model organisms.For example, knockouts of the orthologs of APOE, LDLR, CDKN2B, and RBM38 in mice shortens their lifespan [24][25][26][27] , while knockout of IGF1R has the opposite effect 28 .Similarly, overexpression of the FOXO3 orthologue in Drosophila melanogaster 29 and the SNCA orthologue in Caenorhabditis elegans 30 have shown to extend their respective lifespans.Many of our genes are also enriched for pathways previously related to ageing in eukaryotic model organisms, including genomic stability, cellular senescence, and nutrient sensing 31 .For example, FOXO3 and IGF1R are well-known players modulating survival in response to dietary restriction 32 , but we also highlight genes involved in the response to DNA damage and apoptosis, such as CDKN2B, USP28, E2F2, and BCL3.In addition to hallmarks discovered in model organisms, our results suggest that haem metabolism may play a role in human ageing.This pathway includes genes involved in processing haem and differentiation of erythroblasts 33 .Although the enrichment is largely driven by genes linked to the LDLR locus, genes linked to other loci of interest (such as FOXO3, CDKN2B, LINC02513) are involved in similar biological pathways: myeloid differentiation, erythrocyte homeostasis, and chemical homeostasis.", + "Hundreds of genes in several pathways act as regulators of ageing (1,32).However, analysis of DrugAge and other HAGR databases has revealed that the overlap between the targets of lifespan-extending drugs and known ageing related genes is modest (31).This indicates that most ageing-related pathways have yet to be targeted pharmacologically; DrugAge may aid in guiding further assays.This was recently demonstrated in one study where machine learning was used to predict whether a compound would increase lifespan in worms using data from Dru-gAge.The best model had 80% prediction accuracy and the top hit compounds could broadly be divided into compounds affecting mitochondria, inflammation, cancer, and gonadotropin-releasing hormone (33).", + "Top 25genes co-expressed with aging related genesAging-related gene prediction and putative transcriptional mechanismsGeneFriends was used to identify genes related to aging.A seed list of genes known to be consistently overexpressed with age in mammals was used [18].In total, 1119 genes were co-expressed with the aging seed list at p <10 -6 ; Table 1 shows the top 25 genes.Many of these genes have been associated with age-related diseases.Several other genes that have been shown to play a role in aging such as lysosomal-associated membrane protein-2 Lamp2 [19] (p = 5.68 -30 ), Fas [20] (p = 2.70 -31 ) and growth hormone receptor Ghr [21] (p = 1.34 -19 ) also showed a significant co-expression.Anxa2, Anxa3 and Anxa4 also show a low p-value (p < 10 -25 ) as well as several S100 calcium binding proteins which have been shown to interact with annexins [22].", + "Genetic studies have shown that aging can be slowed in mutants that are defective in a wide range of cellular processes (such as mitochondrial function, chromatin regulation, insulin signaling, transcriptional regulation, and genome stability).This indicates that aging is a complex process driven by diverse molecular pathways and biochemical events.As such, a powerful approach to study aging is to use systems biology, which allows a multitude of factors affecting aging to be analyzed in parallel.For example, DNA microarrays and gene expression chips have been used to perform a genome-wide analysis of changes in gene expres-sion in old age.Extensive studies in Caenorhabditis elegans and Drosophila melanogaster have identified hundreds of ageregulated genes (Hill et al. 2000;Zou et al. 2000;Lund et al. 2002;Pletcher et al. 2002;Murphy et al. 2003).Several studies have described age-regulated genes in the muscle and brain of mice (Lee et al. 1999(Lee et al. , 2000) ) and the retina and muscle of humans (Yoshida et al. 2002;Welle et al. 2003Welle et al. , 2004).These age-regulated genes may serve as markers of aging, enabling one to assess physiological age independently of chronological age.Analysis of the functions of these age-regulated genes has identified specific biochemical mechanisms that change toward the end of life.", + "Age-Regulated Genes Involved in Reproductive Capacity.Decline in reproductive capacity is an age-related phenotype, and the reproductive system seems to play an important role in longevity (22).For example, signals from germ cells can affect lifespan in C. elegans (23).In our study, we observed decreased RNA levels for several genes involved in reproduction (Fig. 3).These include two genes that encode members of the Acp family.The Acp from male flies stimulates female egg-laying and facilitates storage of sperm in the female genital tract (24).In addition, two ESTs showing age-regulated decrease of transcript levels represent different genes with homology to Arabidopsis MALE STERIL- In Northern analysis, the ratios were calculated by dividing mRNA levels at 25-, 40-, and 50-day time points by those at 3-day time points after normalization with mRNA levels of the control gene rp49.Ratios in microarray analysis are provided from each of the duplicate experiments for comparison.ITY 2 (MS2; ref. 25), a gene involved in gametogenesis.Furthermore, an EST with homology to peanut, a member of the septin family (26), is down-regulated in older flies.This downregulation may reflect a decrease in spermatogenesis." + ], + [ + "There are multiple definitions of the aging process.Aging may be perceived as the random, systemic loss of molecular fidelity that, after reproductive maturity, accumulates to levels that eventually exceed tissue repair, turnover, or maintenance capacity (Hayflick 2004).The underlying molecular mechanisms of aging remain a subject of debates (de Magalhaes et al. 2009): tissue deterioration might not be programmed, being just a function of increase in entropy (Hayflick 2004).No genes are necessary to drive a stochastic process; however, there are genes that act to prevent an organism from destruction and disorganization.It may be due to the absence of specific disease-causing alleles or due to the presence of favorable alleles (Halaschek-Wiener et al. 2009).These genes may inhibit entropy, regulate inflammation, maintain DNA repair (such as telomere maintenance factors), or provide antioxidant functions (e.g., antagonists of reactive oxygen species).As healthy cells adapt to degeneration, differential expression of genes with age may indicate a transcriptional response to aging rather than a deleterious mechanism of aging per se (de Magalhaes et al. 2009).It might be postulated that there exist alleles that confer a pleiotropic effect on structure and function during aging (Lunetta et al. 2007).These alleles should regulate the ability of an organism to withstand challenging endogenous and exogenous influences.", + "Why does ageing evolve? The intrinsic decline in function that occurs during ageing appears to be caused by the accumulation of damage, particularly at the molecular level.As far as we know, no genes have evolved specifically because they cause damage to accumulate, and the evolution of ageing can therefore be understood only as a side-effect of other causes of evolutionary change.The mechanisms by which ageing can evolve were first elucidated by J.B.S. Haldane [14], P.B. Medawar [15] and G.C. Williams [16].Extrinsic hazards from disease, predation and accidents mean that even potentially immortal organisms will die.Genetic effects that become apparent only later in life encounter a reduced force of natural selection, because not all their bearers will survive to express them.Haldane pointed out that late-onset genetic diseases in humans, such as Huntington's disease, encounter only weak selection, because most reproduction is complete by the age of onset [14].Ageing could therefore result from the accumulation under mutation pressure of age-specific, deleterious mutations.In addition, if some mutations have pleiotropic effects, with beneficial effects in youth, such as high fecundity, but also with a higher subsequent rate of ageing, then they could be incorporated into the population by natural selection, which will act more strongly on the early, beneficial effect.Thus, variation in the rate of ageing would result from the readjustment of a tradeoff between youthful benefits and the subsequent rate of ageing.Both processes imply that faster ageing will evolve where the extrinsic hazard to adults is greatest, a hypothesis in general supported by the data [1,2,17].", + "A. TheoriesIn looking back at the development of aging studies, we can see that it did not follow a straight or logical course.On the contrary, it can be compared with the flow of several convergent streams winding in their course.To date, numerous proposals have been made for the paradigm of aging.These include Hayflick's contributions (153) on programmed cellular incapacitation derived from flbroblast studies, a decrease in immunologic response, deleterious endocrinological changes, nuclear somatic gene mutation, mitochondrial somatic gene mutation, oxygen free radical damage to proteins and nucleic acids, molecular instabilities, molecular cross-linking, glycation reactions, and so on.There is little doubt that many of these factors contribute to the overall aging, but what are primary causes, and what are secondary outcomes?", + "Ageing Is Adjusted by Genetic, Environmental, and Stochastic ProcessesEnough evidence suggests that ageing is the result of different events such as molecular damage, mutations, incomplete repair, genetic programs, and continued development, among others [16].These events, in turn, are caused by genetic factors, environmental conditions, and even stochastic factors, which are mentioned below in this chapter.Different stochastic theories of ageing focus on specific mechanisms that may lead to ageing.The catastrophic error theory poses that the accumulation of errors in protein synthesis causes damage in cell function.The theory of cross-linking holds this process between proteins and other macromolecules responsible for ageing, while the theory of free radicals suggests that ageing is the result of inadequate protection against cell and tissue damage by free radicals and oxidative stress throughout life.Finally, the wear-and-tear theory poses that the cumulative damage that eventually leads to ageing and death is, in fact, the result of the continuous functioning of vital processes, during which stochastic errors gradually arise.IntroductionAging is a natural and irreversible process characterized by a progressive decay in physiological, biochemical, and structural functions of individuals.Aging is a multifactorial process that can be affected by two main factors: environmental and genetic.Environmental factors are nutrition, pathologies, pollution exposure, physical activity, and microbiota, while genetic factors are issues that have been associated with antioxidant and DNA damage responses, the fidelity of genetic information transfer, the efficiency of protein degradation, the extent of cellular responsiveness to stress, the mechanisms of epigenetic regulation, and the ability to elongate telomeres.All of them can determine how fast we age.Traditionally, aging studies had used several model organisms, from yeast to mammals, especially rodents (rats and mice).Most of the studies are made under controlled conditions, where only a few variables are observed, and the subjects are members of the same strain with the same genetic backgrounds or the same mutations.The information that so far has been obtained about aging has helped us to describe different factors that influence this process and that are the fundamental concepts of the various theories of aging.However, these theories do not fully explain the aging process in the different models of aging study.This is the case of the study of aging in humans, where it is very difficult to control the environmental and genetic variables.That is why issues haven't been solved such as the following: How does time influence aging?When do we start to age?How do we know we are old?Is it possible to delay aging?Those and more questions are the cornerstones for aging studies.Biological aging has been associated with the decrease in the repair and regeneration capacity of tissues and organs; it is a time-dependent process.This reduction can be observed by an increase in the acquisition of diseases and functional and reproductive disability, which eventually lead to death.On the other hand, it has been observed that in humans, people with the same chronological age exhibit different trajectories in the decrease of physiological functions associated with biological aging and what complicates the understanding of the molecular and physiological phenomena that drive the complex and multifactorial processes that underlie biological aging in humans.", + "The underlying cause of aging remains one of the central mysteries of biology.Recent studies in several different systems suggest that not only may the rate of aging be modified by environmental and genetic factors, but also that the aging clock can be reversed, restoring characteristics of youthfulness to aged cells and tissues.This Review focuses on the emerging biology of rejuvenation through the lens of epigenetic reprogramming.By defining youthfulness and senescence as epigenetic states, a framework for asking new questions about the aging process emerges.", + "Aging does not happen in a vacuum.Aging must be the result of changes that occur in molecules that have existed at one time with no age changes.It is the state of these pre-existing molecules that governs longevity determination.The pre-existing state is, as I have already described, maintained by repair and turnover systems that themselves eventually succumb to irreparable age changes.Longevity determination is the state of all molecules prior to succumbing to irreparable loss of molecular structure.Biological aging is more than simply the occurrence of random changes in molecules.It also includes the role of the many repair systems found within cells.Thus, a more complete, but less concise, explanation of the first causes of aging in biological systems is the following:", + "Understanding the deleterious processes that cause aging has been a human endeavor ever since we figured out that we grew old and that we didn't like it.Many hypotheses have been proposed to explain the root cause of aging (1).One broad-based hypothesis is that generalized homeostatic failure leads to age-related decline.Although notions of time-and use-related deterioration may be applicable to mechanical objects, they fall short as analogies to biological systems because energy input should theoretically maintain living systems indefinitely.Yet, despite the regenerative potential of biological organisms, progressive deterioration accompanies postmaturational aging.That the organism's repair capabilities cannot keep up with wear and tear is, according to evolutionary theory, explained by the inevitable declining force of natural selection with age.According to this reasoning, there is no selective advantage to maintaining somatic cells in perfect order much beyond reproductive maturation (1).Hence, a long life depends on the timing of maturation and the quality of somatic cell maintenance.Wear and tear on the DNA often has been touted as a possible basis for our progressive age-related decline.Supporting this notion is the work of de Boer et al. (2) reported on page 1276 of this week's issue.They reveal important evidence for imperfect genome maintenance of DNA damage as a possible causal factor in aging.Harman, with his \"free radical theory of aging\" (3), was the first to propose that metabolic by-products called reactive oxygen species (ROS) continually damage cellular macromolecules, including DNA.Incomplete repair of such damage would lead to its accumulation over time and eventually result in age-related deterioration.A number of observations support the free radical theory, including the discovery that dietary restriction delays aging and extends life-span in a wide range of rodents and other species, possibly by reducing free radical damage.The notion that genomic DNA could be a major target of continual free radical attack over time is supported by the recent observation that genetic lesions accumulate with age and that dietary restriction reduces this accumulation in rodents (4).In addition, deletion of p66 shc , a signaling protein that maintains oxidant levels, increases resistance to oxidative damage and extends the life-span of mice (5).", + "Instead, aging is expected tobe a pervasive failure of adaptation across most, if not all, of the physiological mechanismsthat sustain survival and reproduction among young individuals. For this reason, evolutionary biologists have generally been skeptical of proposals that attribute the cause ofaging to any one physiological mechanism or gene for aging or programmed death. Although common genetic pathways might be identified that contribute to aging among avariety of organisms (cf.", + "BackgroundAging is a complex process characterized by the progressive degeneration of a healthy phenotype and correlated with a decline in the ability to withstand cellular stress and damage.The subject of investigation for decades, the underlying molecular genetic causes of and responses to aging remain an area of active study.Research from model systems has characterized a range of physiological and molecular phenotypes associated with aging.These include genomic instability caused by accumulation of DNA damage, dysregulation of repair mechanisms, and telomere attrition; epigenetic alterations; dysregulation of transcription; loss of proteostasis; cellular senescence; and deregulated nutrient sensing, metabolic pathways, and energy use (reviewed in [1]).Separating causation from correlation between these phenotypes and aging remains a challenge, however.", + "IntroductionUnderstanding what actually causes ageing remains admittedly a fundamental and fascinating problem in biology [1].Experimental data accumulated in the last three decades have led to the identification of various environmental and genetic factors, as well as chemical substances that influence lifespan in divergent eukaryotic species [1,2].Organisms normally age faster and hence live shorter under stress conditions that can lead to the generation of DNA mutations and, often as a consequence of mutations, damaged cytoplasmic constituents (including injured proteins, lipids, carbohydrates and organelles).Such types of damage can interfere with cellular functioning; thereby, they should be eliminated by effective repair and self-cleaning mechanisms to maintain cellular homeostasis.These mechanisms include DNA repair pathways, molecular chaperons, as well as the proteasome-ubiquitin system and lysosome-mediated autophagy, the main forms of cellular self-degradation [3].This has led to the attractive model that the gradual, lifelong accumulation of unrepaired cellular damage drives the ageing process and determines the incidence of age-related fatal diseases [4,5].", + "In conclusion, aging may not be primarily due to damage accumulating from the basic biochemical reactions that make up life but rather the result of the developmental program or of changes brought about by it.Our hypothesis is that the timing of development regulates the rate of aging among mammals, with a subset of developmental mechanisms determining the pace and causing most agerelated changes.Maybe people change as they grow old due to the same mechanisms that drive changes throughout the earlier stages in life.", + "Instead, aging is expected tobe a pervasive failure of adaptation across most, if not all, of the physiological mechanismsthat sustain survival and reproduction among young individuals. For this reason, evolutionary biologists have generally been skeptical of proposals that attribute the cause ofaging to any one physiological mechanism or gene for aging or programmed death. Although common genetic pathways might be identified that contribute to aging among avariety of organisms (cf.", + "In 2021, Science published a special issue entitled \"125 Questions: Exploration and Discovery.\" One of these 125 questions was \"Can we stop ourselves from aging? \"The U.S. National Institute on Aging (NIA) at the National Institutes of Health (NIH) states that \"aging is associated with changes in dynamic biological, physiological, environmental, psychological, behavioral, and social processes.\" Although geneticists and epidemiologists have long debated the relative importance of the role played by genotype or the environment in the development of age-related diseases, it is apparent that both can play substantial roles in this process [6,7].However, most etiological studies have concentrated on the role of genotype and have considered the environment to play a secondary role.Nevertheless, an analysis of GBD data showed that nearly 50% of deaths worldwide are attributable to environmental exposure, primarily exposure to airborne particulates (including household air pollution and occupational exposure; 14% of all deaths), smoking and secondhand smoke (13%), plasma sodium concentrations (6%), and alcohol consumption (5%) [8].In contrast, a recent analysis of 28 chronic diseases in identical twins showed that the genetic-related risks of developing one of five age-related diseases were 33.3%, 10.6%, 36.3%, 19.5%, and 33.9% for AD, PD, CAD, COPD, and T2DM, respectively, with a mean of only 26% [9].The results of over 400 genome-wide association studies (GWASs) have also elucidated that the heritability of degenerative diseases is only approximately 10% [10,11].Consequently, nongenetic drivers, such as environmental factors, are now recognized as major risk factors for age-related diseases.The contributions of environmental factors to the development of age-related diseases can be revealed by analyses of all of the factors to which individuals are exposed in their life and the relationships between these exposures and age-related diseases [12,13].", + "IntroductionThe fundamental manifestation of the aging process is a progressive decline in the functional maintenance of tissue homeostasis and an increasing propensity to degenerative diseases and death [1].It has attracted significant interest to study the underlying mechanisms of aging, and many theories have been put forward to explain the phenomenon of aging.There is an emerging consensus that aging is a multifactorial process, which is genetically determined and influenced epigenetically by environment [2].Most aging theories postulate a single physiological cause of aging, and likely these theories are correct to a certain degree and in certain aspects of aging.", + "Many factors contribute to aging, including genes.This is the first article in a 10-part series that highlight some of what is known about the influence of genes on aging and emerging treatment options that may slow down or potentially reverse the aging process.The series will address \\genes, adducts, and telomeres, decreased immune defenses, oxidation and inefficient mitochondria, toxins and radiation, glycosylation, caloric intake and sirtuin production, neurotransmitter imbalance, hormone mechanisms, reduced nitric oxide, and stem cell slowdown.Underpinning these factors are wear and tear on cells and aging as a result of inability to repair or replace these affected cells.These topics have been addressed in research, health magazines, and even by talk show hosts.There is even a LongevityMap website addressing significant and nonsignificant genetic association studies in aging across the human genome (http://genomics.senescence.info/longevity/).The series will address a scientific and clinical approach to genome-related aging topics.", + "Trying to explain aging in terms of a singular process would be in conflict with evolutionary theory.Even if loss of genome sequence integrity was the most conserved cause of aging, already active in the first replicators (Vijg, 2007), natural selection would allow a multitude of mutations with late adverse effects to accumulate in the germline, many of which would be positively selected for because of their beneficial effects early in life (Williams, 1957), In this respect, somatic mutation accumulation could be a conserved, inevitable cause of aging but superposed on multiple other processes that usually cause the earlier demise of an individual." + ] + ], + "task_id": [ + "2C477A3C76794C27A1FBBF437CFF75EE", + "CAD6C6C2AB42AA66BFDD65F0F11932B2", + "78A0CD7E12AFEF6865583142603EE039", + "DA98AC2EA5D1F776D3F04FCBC7F01339", + "117299AD06C2B147F49E9C9BC036CEA4" + ] +}
\ No newline at end of file diff --git a/gnqa/data/study1/datasets/human/dataset_citizenscientist_aging_2.json b/gnqa/data/study1/datasets/human/dataset_citizenscientist_aging_2.json new file mode 100644 index 0000000..e2ed5ed --- /dev/null +++ b/gnqa/data/study1/datasets/human/dataset_citizenscientist_aging_2.json @@ -0,0 +1,110 @@ +{ + "question": [ + "which genes are involved in aging", + "what genes are involved in the aging process", + "Describe the genotypes related to Alzheimers and dementia which have commonalities with those for aging.", + "Describe the genotypes related to Alzheimer's and dementia which have commonalities with those for aging.", + "What genetic factors influence aging in humans? " + ], + "answer": [ + "Several genes are involved in aging, including BAZ2B, HMGB4, NOC2L, RAI1, SIK1, SMARCA2, SPZ1, TBP, TRIP13, ZKSCAN1, DBH, TPO, LSS, GPER, HCRTR2, ATG2A, NEDD4L, PSMB1, UBXN4, USP6, EEF1A2, DBH, ITGB2, TUBB2C, WRN, ABCA7, AZGP1, CD36, DEGS2, LSS, PI4KA, SOAT2, APOE, LDLR, CDKN2B, RBM38, IGF1R, FOXO3, SNCA, CDKN2B, USP28, E2F2, BCL3, NAP1L4, GAB2, QKI, Lamp2, Fas, Ghr, Anxa2, Anxa3, Anxa4, and several S100 calcium binding proteins.", + "Several genes are involved in the aging process. These include BAZ2B, HMGB4, NOC2L, RAI1, SIK1, SMARCA2, SPZ1, TBP, TRIP13, ZKSCAN1, DBH, TPO, LSS, GPER, HCRTR2, ATG2A, NEDD4L, PSMB1, UBXN4, USP6, EEF1A2, DBH, ITGB2, TUBB2C, WRN, ABCA7, AZGP1, CD36, DEGS2, LSS, PI4KA, SOAT2, IGF-1, CYP19, NAP1L4, GAB2, QKI, APOE, LDLR, CDKN2B, RBM38, IGF1R, FOXO3, SNCA, USP28, E2F2, BCL3, mre11, rad50, Ku80, mus308, mus205, DNApol-eta, Trxr-1, GlnRS, isoQC, QC, Cct1, Cct4, Cct5, Cct6, Hsc70-4, Xbp1, NAT1, Rack1, Rheb, Lamp2, Fas, Ghr, Anxa2, Anxa3, Anxa4, SIR2, DAF-2/IGF1R, DAF-16/FOXO, AAK-2/AMPK, LET-363/mTOR, SIRT1/SIR2, p53, Werner's syndrome gene, iff-1, SIR2, TERT, ZBTB16, CA9, HEY2, Smarca4, Amarcb1, Hdac1, -5, -6, Dnmt3b, and genes in the insulin/ insulin-like growth factor (IGF12 )/growth hormone (GH) pathway.", + "The genotypes related to Alzheimer's and dementia that have commonalities with those for aging include a locus on chromosome 17 associated with the age at onset of Alzheimer's. A specific variant in the CCL11 gene is likely responsible for this association. Additionally, the APOE gene is a strong genetic risk factor for late-onset Alzheimer's, which typically occurs in individuals aged 65 and older. Other genes associated with Alzheimer's include APP, PSEN1, and PSEN2 for early-onset Alzheimer's, and CR1, BIN1, CLU for late-onset Alzheimer's. These genetic factors are believed to interact with environmental components and contribute to the complex etiology of these aging-associated neurodegenerative diseases.", + "The genotypes related to Alzheimer's and dementia that have commonalities with those for aging include a locus on chromosome 17 associated with the age at onset of Alzheimer's. A specific variant in CCL11 is probably responsible for this association. Other genes associated with Alzheimer's include APP, PSEN1, PSEN2, and APOE. These genes are also associated with early-onset and late-onset forms of Alzheimer's, which are more common in older individuals. The APOE gene, specifically the 4 allele, is a significant genetic risk factor for late-onset Alzheimer's. Genome-wide studies have also identified several genetic susceptibility factors for age-related macular degeneration (AMD), another aging-associated neurodegenerative disease.", + "Several genetic factors influence aging in humans. These include genes such as the apolipoprotein E gene (APOE), the forkhead box O3A (FOXO3A) gene, and the exonuclease 1 (EXO1) gene. Other factors include the insulin-IGF1 signaling pathway, the growth hormone or insulinlike growth factor and insulin (GH, IGF, INS) axis, and the heatshock proteins and heat-shock factors. Additionally, genetic variants within genes involved in pathways associated with aging identified in animal models, such as insulin-like growth factor (IGF)-insulin signaling, regulation of lipoprotein metabolism, the mTOR pathway, and the oxidative stress response may also influence survival to old age in humans." + ], + "contexts": [ + [ + "Following are examples of the identified genes and experimental or GWAS link between these genes and aging.On the list of the 25 top genes, NAP1L4 encodes a member of the nucleosome assembly protein (NAP) family, which interacts with both core and linker histones, and shuttles between the cytoplasm and nucleus, suggesting a role as histone chaperone.Histone protein levels decline during aging, and dramatically affect chromatin structure.Remarkably, the lifespan can be extended by manipulations that reverse the age-dependent changes to chromatin structure, indicating the pivotal role of chromatin structure in aging [32].In another example, gene expression of NAP1L4 increases with age in the skin tissue [33].Findings of GWAS link a number of the identified genes to age-related disorders, such as GAB2 and late onset Alzheimer's disease [86], and QKI and coronary heart disease/myocardial infarction [79].Interestingly, GWAS reports also link QKI to successful aging [87].Indicative biological pathways associated with the candidate aging genes", + "Examples of biological candidate genes with pleiotropic functions, which are involved in aging in general and in musculoskeletal aging in particular, are numerous: (a) in addition to the IGF-1 and vitamin D genes, estrogen metabolism pathway genes, including estrogen receptors and aromatase (CYP19), are associated with fat-free mass (Walsh et al. 2005) and BMD (Shearman et al. 2004), prostate and breast cancer (Gallicchio et al. 2006), and cardiovascular disease risk (Shearman et al. 2003).", + "In-depth analysis of the age-regulated genes revealed that multiple genes in the DNA damage response pathway were upregulated with age including those that function in non-homologous end-joining repair (mre11, rad50, Ku80 and mus308) and in translesion DNA synthesis (mus205 and DNApol-eta) [44][45][46].Genes that encoded enzymes with antioxidant properties, such as the thioredoxin reductase Trxr-1, and antioxidant genes involved in glutamate metabolism, such as GlnRS, isoQC and QC, were also upregulated with age [47][48][49][50].We also observed increased age-associated expression of chaperone genes (Cct1, Cct4, Cct5, Cct6, Hsc70-4) and the unfolded protein response transcription factor Xbp1, consistent with an induction of the unfolded protein response [51][52][53].Under stress conditions, there is a translational switch that favors production of stressrelated proteins while decreasing translation of other proteins [54].Paralogs of canonical translation factors such as NAT1 and Rack1, which were both upregulated, promote this switch to cap-independent translation [55,56].Notably, Rheb, which is downregulated with age, positively regulates ribosome production and capdependent translation by activating the mechanistic target of rapamycin (mTOR) kinase pathway [57].Thus, decreased Rheb levels during aging could decrease mTOR pathway activity, which extends lifespan and is protective against age-related pathology [58].Together, these data suggest that multiple genes are induced in aging photoreceptors to mitigate the effects of oxidative stress, protein misfolding and DNA damage.", + "CellAge vs human orthologues of longevity-associated model organism genesTo understand how senescence is linked to the genetics of aging processes, we looked at the intersection of CellAge genes and the 869 genes in the human orthologues of model organisms' longevity-associated genes (LAGs) dataset, collected based on quantitative changes in lifespan [34].Like CellAge, where genes are classified based on whether their upregulation induces, inhibits, or has an unknown impact on CS, the longevity orthologues dataset also provides information on the effect of upregulation of its genes, namely whether it promotes (pro, 421) or inhibits (anti, 448) longevity (Additional file 1: Table S7; Additional file 2: Fig. S2).", + "Analysis of prior research (Online Resource 5) shows that the revealed genes can be explicitly involved in other key biological processes in an organism whose role is known to be changing with aging.Specifically, ten genes (BAZ2B, HMGB4, NOC2L, RAI1, SIK1, SMARCA2, SPZ1, TBP, TRIP13, and ZKSCAN1) regulate transcription which is believed to be disrupted when an organism is getting older (Roy et al. 2002).The DBH, TPO, and LSS genes are involved in synthesis of catecholamine, thyroid, and vitamin D hormones, respectively.The GPER binds estrogen and HCRTR2 binds orexin-A and orexin-B neuropeptid hormones.Hormonal deregulation with aging is considered to be one of the major components of senescent processes in an organism (Barzilai and Gabriely 2010).Five genes (ATG2A, NEDD4L, PSMB1, UBXN4, and USP6) are involved in degradation of proteins through ubiquitin-proteasome and the lysosomal/autophagic system.Dysfunction of this system leads to accumulation of damaged proteins in an organism that is associated with aging (Koga et al. 2011).Protein degradation through ubiquitin-mediated proteolysis plays an important role in cell-cycle regulation (Reed 2003).The PSMB1, SIK1, TRIP13, and TTN genes in the revealed set coordinate cell cycle.Cell cycle is linked with the aging-related processes in humans through a gradual increase in cell division errors in all tissues in an organism (Ly et al. 2000).Five genes (EEF1A2, DBH, ITGB2, TUBB2C, and WRN) take part in regulation of apoptosis which plays an important role in the aging process and tumorigenesis (Salvioli et al. 2008).Seven genes (ABCA7, AZGP1, CD36, DEGS2, LSS, PI4KA, and SOAT2) are involved in lipid metabolism which plays one of the key roles in human longevity and healthy aging (Barzilai et al. 2003).", + "In addition to testing genes known to be associated with age-related diseases and phenotypes for association with longevity, genes known to promote longevity in model organisms have been examined in human populations.Mutations in insulin or insulinlike signalling pathway genes have been shown to extend lifespan in Caenorhabditis elegans [20], Drosophila melanogaster [21,22] and mice [23,24].The insulin-signalling pathway negatively regulates the forkhead (FOXO) transcription factor [25].When insulin or insulin-like growth factor signalling is low, FOXO is activated and lifespan extension occurs [26].An overrepresentation of rare insulin-like growth factor I receptor (IGFIR) mutations has been observed in centenarians [27].These mutations are associated with reduced activity of IGFIR as measured in transformed lymphocytes [27].", + "Genes Whose Expression Decreased with Age.Of the 26 genes that decreased expression with age in control mice, 23% are involved in DNA replication and the cell cycle (Table 2).Most of these have a negative effect on cell growth and division.Among these, the product of phosphatase and tensin homolog (Pten) gene is a tumor suppressor that induces cell-cycle arrest through inhibition of the phosphoinositide 3-kinase pathway (28).B cell translocation gene 2 (Btg2) is a tumor suppressor that increases expression in response to DNA damage (29).The murine gene product of the amino-terminal enhancer of split (Aes) is a potent corepressor of gene expression and cellular proliferation (30).Calcium-binding protein A11 (S100a10) binds to and regulates the activity of annexin II, which is involved in the transduction of calcium-related mitogenic signals (31).Insulin-like growth factor (IGF) binding protein 1 (Igfbp1) plays an important role in the negative regulation of the IGF-1 system, a stimulator of mitogenesis (32).", + "daf-16 dependent genesAmong the 52 genes that we have tested, 29 genes act almost completely in a daf-16 dependent manner, to regulate lifespan (Table 2).One of the genes identified was daf-2 (Y55D5A_391.b).This serves as a proof of principle that our screen is effective in identification of aging genes.", + "Several of the genes we identify have previously been shown to influence lifespan in experiments on model organisms.For example, knockouts of the orthologs of APOE, LDLR, CDKN2B, and RBM38 in mice shortens their lifespan [24][25][26][27] , while knockout of IGF1R has the opposite effect 28 .Similarly, overexpression of the FOXO3 orthologue in Drosophila melanogaster 29 and the SNCA orthologue in Caenorhabditis elegans 30 have shown to extend their respective lifespans.Many of our genes are also enriched for pathways previously related to ageing in eukaryotic model organisms, including genomic stability, cellular senescence, and nutrient sensing 31 .For example, FOXO3 and IGF1R are well-known players modulating survival in response to dietary restriction 32 , but we also highlight genes involved in the response to DNA damage and apoptosis, such as CDKN2B, USP28, E2F2, and BCL3.In addition to hallmarks discovered in model organisms, our results suggest that haem metabolism may play a role in human ageing.This pathway includes genes involved in processing haem and differentiation of erythroblasts 33 .Although the enrichment is largely driven by genes linked to the LDLR locus, genes linked to other loci of interest (such as FOXO3, CDKN2B, LINC02513) are involved in similar biological pathways: myeloid differentiation, erythrocyte homeostasis, and chemical homeostasis.", + "In recent years, some of the aging-related genes identified in worms have been shown to have mammalian homologs that modulate longevity and delay age-related diseases in mice, in particular as part of the insulin/ insulin-like growth factor (IGF12 )/growth hormone (GH) pathway (Bartke, 2005), and variants in these genes have even been associated with human longevity, such as the daf-2 homolog IGF1R (Suh et al., 2008).Therefore, there is great potential for human homologs of genes shown to modulate aging in model organisms to represent pharmaceutical targets with human applications.", + "Hundreds of genes in several pathways act as regulators of ageing (1,32).However, analysis of DrugAge and other HAGR databases has revealed that the overlap between the targets of lifespan-extending drugs and known ageing related genes is modest (31).This indicates that most ageing-related pathways have yet to be targeted pharmacologically; DrugAge may aid in guiding further assays.This was recently demonstrated in one study where machine learning was used to predict whether a compound would increase lifespan in worms using data from Dru-gAge.The best model had 80% prediction accuracy and the top hit compounds could broadly be divided into compounds affecting mitochondria, inflammation, cancer, and gonadotropin-releasing hormone (33).", + "Aging-related gene prediction and putative transcriptional mechanismsGeneFriends was used to identify genes related to aging.A seed list of genes known to be consistently overexpressed with age in mammals was used [18].In total, 1119 genes were co-expressed with the aging seed list at p <10 -6 ; Table 1 shows the top 25 genes.Many of these genes have been associated with age-related diseases.Several other genes that have been shown to play a role in aging such as lysosomal-associated membrane protein-2 Lamp2 [19] (p = 5.68 -30 ), Fas [20] (p = 2.70 -31 ) and growth hormone receptor Ghr [21] (p = 1.34 -19 ) also showed a significant co-expression.Anxa2, Anxa3 and Anxa4 also show a low p-value (p < 10 -25 ) as well as several S100 calcium binding proteins which have been shown to interact with annexins [22].Top 25genes co-expressed with aging related genes", + "Fig. 7 Functional relationships of genes implicated in longevity.The genes in red/blue boxes represent genes with increased/decreased mRNA expression in ageing Drosophila (color figure online)", + "The genome-wide RNAi study conducted by the Ruvkun lab, authored by Hamilton et al. [88], identified a total of 89 additional aging genes with disparate functions including cell structure, cell surface proteins, cell signaling, cellular metabolism, and protein turnover.Of the 66 genes with previously known functions, 17 corresponded to various aspects of carbon metabolism, including citric acid cycle enzymes and subunits of complexes I, IV, and V of the ETC.Researchers also speculated that protein translation might play a role in lifespan regulation, based on the identification of iff-1 (T05G5.10),a gene that has homology to the translation initiation factor eIF5A.Other hits from this screen included two genes containing PH domains known to interact with phosphatidylinositol lipids, multiple G protein-coupled receptors, protein processing and degradation genes such as proteases and ubiquitin ligases/hydrolases, and chromatin modifying factors.", + "INTRODUCTIONAging has fascinated researchers since ancient times.The hugely complicated process that has been revealed may be interpreted from different aspects, such as the accumulation of oxidative damage, shortening of telomeres, the costs of reproduction, metabolic rates, cellular senescence, etc., and these have in turn given rise to diverse theories of aging [1].However, thanks to forward and reverse genetic technologies, researchers in the recent decades have established that despite its complexity, a single or a few key genes in a few key pathways can modulate the aging rate.The most important players would appear to be those in nutrient sensing pathways or stress response pathways, such as DAF-2/IGF1R and DAF-16/FOXO in the Insulin/IGF like signaling pathway, AAK-2/AMPK in another nutrient sensing pathway, JNK in the stress response pathway, LET-363/mTOR as an inhibitor of autophagy and activator of translation and SIRT1/SIR2 in genome stability maintenance, to name a few [2,3].In addition to genetic perturbations, dietary perturbations, such as diet restriction (DR) are known to significantly extend lifespan in most organisms examined from yeasts to primates, although different pathways may act under different DR conditions, and alternative DR strategies also effect C.elegans lifespan in different ways [3,4].The main pathways revealed under different DR regimens are summarized in Fig. (1).In this small, convoluted DR response network, DAF-16 and ceTOR/LET-363 *Address correspondence to this author at the Chinese Academy of Sciences, 320 Yue Yang Road, Shanghai, 200031, China; Tel: 86-21-54920458; Fax: 86-21-54920451; E-mail: jdhan@picb.ac.cn These authors contributed equally to this work.", + "IntroductionAging affects nearly all organisms and is a major risk factor in most human diseases.Recent work has begun to uncover molecular mechanisms that specify lifespan and to identify alterations in cellular physiology that occur at the end of life (Tissenbaum and Guarente 2002).For example, oxidative damage caused by the generation of free radicals in the mitochondria has been found to hasten aging by causing an accumulation of damaged cellular components (Droge 2003).Telomere shortening may also play a role in aging by preventing DNA replication and cell division in later years (Hasty et al. 2003).Genetic studies have identified many genes that play a role in specifying lifespan.For example, mutations in yeast sir2 (chromatin regulator), worm daf-2 (insulin-like growth factor receptor), fly methuselah (tyrosine kinase receptor), mouse p53, and the human Werner's syndrome gene (DNA helicase) cause dramatic changes in lifespan (Guarente and Kenyon 2000).Several aging mechanisms alter longevity in multiple organisms.For example, mutations in the gene encoding insulin-like growth factor receptor alter lifespan in worms, flies, and mice, indicating that an endocrine signaling pathway has a conserved role in aging (Hekimi and Guarente 2003).Genetic studies have shown that aging can be slowed in mutants that are defective in a wide range of cellular processes (such as mitochondrial function, chromatin regulation, insulin signaling, transcriptional regulation, and genome stability).This indicates that aging is a complex process driven by diverse molecular pathways and biochemical events.As such, a powerful approach to study aging is to use systems biology, which allows a multitude of factors affecting aging to be analyzed in parallel.For example, DNA microarrays and gene expression chips have been used to perform a genome-wide analysis of changes in gene expres-sion in old age.Extensive studies in Caenorhabditis elegans and Drosophila melanogaster have identified hundreds of ageregulated genes (Hill et al. 2000;Zou et al. 2000;Lund et al. 2002;Pletcher et al. 2002;Murphy et al. 2003).Several studies have described age-regulated genes in the muscle and brain of mice (Lee et al. 1999(Lee et al. , 2000) ) and the retina and muscle of humans (Yoshida et al. 2002;Welle et al. 2003Welle et al. , 2004).These age-regulated genes may serve as markers of aging, enabling one to assess physiological age independently of chronological age.Analysis of the functions of these age-regulated genes has identified specific biochemical mechanisms that change toward the end of life.", + "In addition to testing genes known to be associated with age-related diseases and phenotypes for association with longevity, genes known to promote longevity in model organisms have been examined in human populations.Mutations in insulin or insulinlike signalling pathway genes have been shown to extend lifespan in Caenorhabditis elegans [20], Drosophila melanogaster [21,22] and mice [23,24].The insulin-signalling pathway negatively regulates the forkhead (FOXO) transcription factor [25].When insulin or insulin-like growth factor signalling is low, FOXO is activated and lifespan extension occurs [26].An overrepresentation of rare insulin-like growth factor I receptor (IGFIR) mutations has been observed in centenarians [27].These mutations are associated with reduced activity of IGFIR as measured in transformed lymphocytes [27]." + ], + [ + "Following are examples of the identified genes and experimental or GWAS link between these genes and aging.On the list of the 25 top genes, NAP1L4 encodes a member of the nucleosome assembly protein (NAP) family, which interacts with both core and linker histones, and shuttles between the cytoplasm and nucleus, suggesting a role as histone chaperone.Histone protein levels decline during aging, and dramatically affect chromatin structure.Remarkably, the lifespan can be extended by manipulations that reverse the age-dependent changes to chromatin structure, indicating the pivotal role of chromatin structure in aging [32].In another example, gene expression of NAP1L4 increases with age in the skin tissue [33].Findings of GWAS link a number of the identified genes to age-related disorders, such as GAB2 and late onset Alzheimer's disease [86], and QKI and coronary heart disease/myocardial infarction [79].Interestingly, GWAS reports also link QKI to successful aging [87].", + "Examples of biological candidate genes with pleiotropic functions, which are involved in aging in general and in musculoskeletal aging in particular, are numerous: (a) in addition to the IGF-1 and vitamin D genes, estrogen metabolism pathway genes, including estrogen receptors and aromatase (CYP19), are associated with fat-free mass (Walsh et al. 2005) and BMD (Shearman et al. 2004), prostate and breast cancer (Gallicchio et al. 2006), and cardiovascular disease risk (Shearman et al. 2003).", + "In-depth analysis of the age-regulated genes revealed that multiple genes in the DNA damage response pathway were upregulated with age including those that function in non-homologous end-joining repair (mre11, rad50, Ku80 and mus308) and in translesion DNA synthesis (mus205 and DNApol-eta) [44][45][46].Genes that encoded enzymes with antioxidant properties, such as the thioredoxin reductase Trxr-1, and antioxidant genes involved in glutamate metabolism, such as GlnRS, isoQC and QC, were also upregulated with age [47][48][49][50].We also observed increased age-associated expression of chaperone genes (Cct1, Cct4, Cct5, Cct6, Hsc70-4) and the unfolded protein response transcription factor Xbp1, consistent with an induction of the unfolded protein response [51][52][53].Under stress conditions, there is a translational switch that favors production of stressrelated proteins while decreasing translation of other proteins [54].Paralogs of canonical translation factors such as NAT1 and Rack1, which were both upregulated, promote this switch to cap-independent translation [55,56].Notably, Rheb, which is downregulated with age, positively regulates ribosome production and capdependent translation by activating the mechanistic target of rapamycin (mTOR) kinase pathway [57].Thus, decreased Rheb levels during aging could decrease mTOR pathway activity, which extends lifespan and is protective against age-related pathology [58].Together, these data suggest that multiple genes are induced in aging photoreceptors to mitigate the effects of oxidative stress, protein misfolding and DNA damage.", + "Results: We develop CellAge (http://genomics.senescence.info/cells),a manually curated database of 279 human genes driving cellular senescence, and perform various integrative analyses.Genes inducing cellular senescence tend to be overexpressed with age in human tissues and are significantly overrepresented in anti-longevity and tumor-suppressor genes, while genes inhibiting cellular senescence overlap with pro-longevity and oncogenes.Furthermore, cellular senescence genes are strongly conserved in mammals but not in invertebrates.We also build cellular senescence protein-protein interaction and co-expression networks.Clusters in the networks are enriched for cell cycle and immunological processes.Network topological parameters also reveal novel potential cellular senescence regulators.Using siRNAs, we observe that all 26 candidates tested induce at least one marker of senescence with 13 genes (C9orf40, CDC25A, CDCA4, CKAP2, GTF3C4, HAUS4, IMMT, MCM7, MTHFD2, MYBL2, NEK2, NIPA2, and TCEB3) decreasing cell number, activating p16/p21, and undergoing morphological changes that resemble cellular senescence.Conclusions: Overall, our work provides a benchmark resource for researchers to study cellular senescence, and our systems biology analyses reveal new insights and gene regulators of cellular senescence.", + "Genes involved intranscriptional silencing via chromatin remodeling (Smarca4 and Amarcb1) as well ashistone deacetylases (Hdac1, -5, and -6) and a DNA methyltransferace (Dnmt3b) weredownregulated in aged cells. They also showed that several chromosomal regionschanged with age in a coordinated manner resulting in an overall increase intranscriptional activity. They propose that chromatin dysregulation and epigeneticchanges drive the loss of cellular function and ultimately drive the aging process inHSCs.", + "Analysis of prior research (Online Resource 5) shows that the revealed genes can be explicitly involved in other key biological processes in an organism whose role is known to be changing with aging.Specifically, ten genes (BAZ2B, HMGB4, NOC2L, RAI1, SIK1, SMARCA2, SPZ1, TBP, TRIP13, and ZKSCAN1) regulate transcription which is believed to be disrupted when an organism is getting older (Roy et al. 2002).The DBH, TPO, and LSS genes are involved in synthesis of catecholamine, thyroid, and vitamin D hormones, respectively.The GPER binds estrogen and HCRTR2 binds orexin-A and orexin-B neuropeptid hormones.Hormonal deregulation with aging is considered to be one of the major components of senescent processes in an organism (Barzilai and Gabriely 2010).Five genes (ATG2A, NEDD4L, PSMB1, UBXN4, and USP6) are involved in degradation of proteins through ubiquitin-proteasome and the lysosomal/autophagic system.Dysfunction of this system leads to accumulation of damaged proteins in an organism that is associated with aging (Koga et al. 2011).Protein degradation through ubiquitin-mediated proteolysis plays an important role in cell-cycle regulation (Reed 2003).The PSMB1, SIK1, TRIP13, and TTN genes in the revealed set coordinate cell cycle.Cell cycle is linked with the aging-related processes in humans through a gradual increase in cell division errors in all tissues in an organism (Ly et al. 2000).Five genes (EEF1A2, DBH, ITGB2, TUBB2C, and WRN) take part in regulation of apoptosis which plays an important role in the aging process and tumorigenesis (Salvioli et al. 2008).Seven genes (ABCA7, AZGP1, CD36, DEGS2, LSS, PI4KA, and SOAT2) are involved in lipid metabolism which plays one of the key roles in human longevity and healthy aging (Barzilai et al. 2003).", + "Genes that are age-regulated in all tissues would reveal genes involved in core mechanisms that underlie cellular ageing.Zahn et al. [63] discovered genetic pathways that show common age regulation in human kidney, brain and muscle.They used microarrays to analyse expression in 81 skeletal muscle samples from patients aged 16 -86 years and found 250 age-regulated muscle genes [63].Similar to the ageing expression profile for the kidney, the overall expression behaviour of this set of age-regulated muscle genes correlated with the physiological as well as chronological age of the muscle sample.Next, they compared their muscle-ageing results to previously published data on kidney and brain ageing of similarly large sample size [56,60].Although most of the age-related changes were tissue specific, they found evidence for common age regulation of six genetic pathways in all three tissues.Specifically, there is an overall increase in expression of the extracellular matrix genes, the ribosomal genes, the cell growth genes and the complement activation genes in all three tissues.Increased overall expression of the extracellular matrix and complement activation gene sets with advancing age may contribute to widespread fibrosis and inflammation in the elderly.There is an overall decrease in expression of the chloride transport genes and the electron transport genes in all three tissues.Decreased overall expression of electron transport chain genes with age might support the mitochondrial free-radical theory of ageing [67], as free-radical generation by mitochondria would preferentially damage the electron transport chain protein complexes.Decreased expression of the electron transport genes (encoded in the nucleus) might be caused by feedback regulation from damage to the electron transport chain protein complexes [63].However, it is also possible that increased oxidative damage occurs as a consequence of the decreased expression of the electron transport chain genes.In addition, an increasing number of studies in model organisms have critically challenged the mitochondrial free-radical theory of ageing [68].In addition to testing genes known to be associated with age-related diseases and phenotypes for association with longevity, genes known to promote longevity in model organisms have been examined in human populations.Mutations in insulin or insulinlike signalling pathway genes have been shown to extend lifespan in Caenorhabditis elegans [20], Drosophila melanogaster [21,22] and mice [23,24].The insulin-signalling pathway negatively regulates the forkhead (FOXO) transcription factor [25].When insulin or insulin-like growth factor signalling is low, FOXO is activated and lifespan extension occurs [26].An overrepresentation of rare insulin-like growth factor I receptor (IGFIR) mutations has been observed in centenarians [27].These mutations are associated with reduced activity of IGFIR as measured in transformed lymphocytes [27].", + "Aging can be viewed as a lethal by-product of activities, such as reproduction and food intake, that are controlled by genes [1].Since most of these genes are evolutionarily conserved, distant species may share common pathways of aging [2].The insulin/insulin-like growth factor 1 (IGF1) signaling pathway could be one such common pathway, as it modulates aging in many species, including Caenorhabditis elegans, Drosophila, mice [3], and possibly humans [4].An elegant study carried out in C. elegans by applying microarray techniques showed that a member of the SIR2like protein family is regulated downstream of DAF-16, a FOXO-family transcription factor that affects the rate of aging in response to the insulin/IGF1 pathway [5].SIR2 proteins constitute an evolutionarily conserved family of NAD-dependent deacetylases called sirtuins [6][7][8].In model organisms the expression levels of SIR2 modulate life span [9][10][11].Since sirtuins are NAD + dependent these proteins through different routes may link energy metabolism, genome maintenance, and aging [11,12].Thus SIR2 genes may play a crucial role in conserved pathways of aging and longevity.", + "Regarding cancer and aging, Serrano and Blasco (2007) suggested that an equilibrium between mechanisms diminishing cellular damage and mechanisms preventing excessive cellular proliferation is required between both processes [43].The authors argue that the p53 pathway may be seen as an anti-aging mechanism as it is a key defense mechanism against cellular damage protecting from both aging and cancer.One effect of aging at the cellular level is reduced telomerase activity and progressive shorter telomeres in somatic cells [45].Shortened telomeres are highly recombinogenic, leading to a genome-susceptible cancer development [46,47].Genomic instability driven by dysfunctional telomeres is also associated with the transition from benign to malignant tumors [48].Conversely, telomere dysfunction also acts to induce the p53 gene to suppress tumor development by initiating cell-cycle arrest, cellular senescence or, apoptosis.Our analysis has identified several genes involved in the regulation and activity of the p53 pathway as being affected by age.In skin, the telomerase reverse transcriptase (TERT) showed an age-related expression in association with a genetic variant (rs10866530).In addition p21, a gene directly regulated by p53 and also involved in telomere-driven aging, was shown to be differentially expressed with age [49].In brain, theZBTB16, CA9,and HEY2, genes associated to the p53 pathway directly or via SIRT1, all showed age-related expression.The activity of p53 has been shown to enhance the transcription of inhibitors of the insulin receptor pathway, preventing cell growth and division after stress signaling [50,51] and many genes from the insulin signaling pathway have been extensively associated with longevity in multiple studies and organisms.Our results suggest that the link between aging and cancer is evident in multiple tissues through differential expression of genes with age.", + "Several of the genes we identify have previously been shown to influence lifespan in experiments on model organisms.For example, knockouts of the orthologs of APOE, LDLR, CDKN2B, and RBM38 in mice shortens their lifespan [24][25][26][27] , while knockout of IGF1R has the opposite effect 28 .Similarly, overexpression of the FOXO3 orthologue in Drosophila melanogaster 29 and the SNCA orthologue in Caenorhabditis elegans 30 have shown to extend their respective lifespans.Many of our genes are also enriched for pathways previously related to ageing in eukaryotic model organisms, including genomic stability, cellular senescence, and nutrient sensing 31 .For example, FOXO3 and IGF1R are well-known players modulating survival in response to dietary restriction 32 , but we also highlight genes involved in the response to DNA damage and apoptosis, such as CDKN2B, USP28, E2F2, and BCL3.In addition to hallmarks discovered in model organisms, our results suggest that haem metabolism may play a role in human ageing.This pathway includes genes involved in processing haem and differentiation of erythroblasts 33 .Although the enrichment is largely driven by genes linked to the LDLR locus, genes linked to other loci of interest (such as FOXO3, CDKN2B, LINC02513) are involved in similar biological pathways: myeloid differentiation, erythrocyte homeostasis, and chemical homeostasis.", + "In recent years, some of the aging-related genes identified in worms have been shown to have mammalian homologs that modulate longevity and delay age-related diseases in mice, in particular as part of the insulin/ insulin-like growth factor (IGF12 )/growth hormone (GH) pathway (Bartke, 2005), and variants in these genes have even been associated with human longevity, such as the daf-2 homolog IGF1R (Suh et al., 2008).Therefore, there is great potential for human homologs of genes shown to modulate aging in model organisms to represent pharmaceutical targets with human applications.", + "Aging-related gene prediction and putative transcriptional mechanismsGeneFriends was used to identify genes related to aging.A seed list of genes known to be consistently overexpressed with age in mammals was used [18].In total, 1119 genes were co-expressed with the aging seed list at p <10 -6 ; Table 1 shows the top 25 genes.Many of these genes have been associated with age-related diseases.Several other genes that have been shown to play a role in aging such as lysosomal-associated membrane protein-2 Lamp2 [19] (p = 5.68 -30 ), Fas [20] (p = 2.70 -31 ) and growth hormone receptor Ghr [21] (p = 1.34 -19 ) also showed a significant co-expression.Anxa2, Anxa3 and Anxa4 also show a low p-value (p < 10 -25 ) as well as several S100 calcium binding proteins which have been shown to interact with annexins [22].", + "The genome-wide RNAi study conducted by the Ruvkun lab, authored by Hamilton et al. [88], identified a total of 89 additional aging genes with disparate functions including cell structure, cell surface proteins, cell signaling, cellular metabolism, and protein turnover.Of the 66 genes with previously known functions, 17 corresponded to various aspects of carbon metabolism, including citric acid cycle enzymes and subunits of complexes I, IV, and V of the ETC.Researchers also speculated that protein translation might play a role in lifespan regulation, based on the identification of iff-1 (T05G5.10),a gene that has homology to the translation initiation factor eIF5A.Other hits from this screen included two genes containing PH domains known to interact with phosphatidylinositol lipids, multiple G protein-coupled receptors, protein processing and degradation genes such as proteases and ubiquitin ligases/hydrolases, and chromatin modifying factors.", + "INTRODUCTIONAging has fascinated researchers since ancient times.The hugely complicated process that has been revealed may be interpreted from different aspects, such as the accumulation of oxidative damage, shortening of telomeres, the costs of reproduction, metabolic rates, cellular senescence, etc., and these have in turn given rise to diverse theories of aging [1].However, thanks to forward and reverse genetic technologies, researchers in the recent decades have established that despite its complexity, a single or a few key genes in a few key pathways can modulate the aging rate.The most important players would appear to be those in nutrient sensing pathways or stress response pathways, such as DAF-2/IGF1R and DAF-16/FOXO in the Insulin/IGF like signaling pathway, AAK-2/AMPK in another nutrient sensing pathway, JNK in the stress response pathway, LET-363/mTOR as an inhibitor of autophagy and activator of translation and SIRT1/SIR2 in genome stability maintenance, to name a few [2,3].In addition to genetic perturbations, dietary perturbations, such as diet restriction (DR) are known to significantly extend lifespan in most organisms examined from yeasts to primates, although different pathways may act under different DR conditions, and alternative DR strategies also effect C.elegans lifespan in different ways [3,4].The main pathways revealed under different DR regimens are summarized in Fig. (1).In this small, convoluted DR response network, DAF-16 and ceTOR/LET-363 *Address correspondence to this author at the Chinese Academy of Sciences, 320 Yue Yang Road, Shanghai, 200031, China; Tel: 86-21-54920458; Fax: 86-21-54920451; E-mail: jdhan@picb.ac.cn These authors contributed equally to this work.", + "IntroductionAging affects nearly all organisms and is a major risk factor in most human diseases.Recent work has begun to uncover molecular mechanisms that specify lifespan and to identify alterations in cellular physiology that occur at the end of life (Tissenbaum and Guarente 2002).For example, oxidative damage caused by the generation of free radicals in the mitochondria has been found to hasten aging by causing an accumulation of damaged cellular components (Droge 2003).Telomere shortening may also play a role in aging by preventing DNA replication and cell division in later years (Hasty et al. 2003).Genetic studies have identified many genes that play a role in specifying lifespan.For example, mutations in yeast sir2 (chromatin regulator), worm daf-2 (insulin-like growth factor receptor), fly methuselah (tyrosine kinase receptor), mouse p53, and the human Werner's syndrome gene (DNA helicase) cause dramatic changes in lifespan (Guarente and Kenyon 2000).Several aging mechanisms alter longevity in multiple organisms.For example, mutations in the gene encoding insulin-like growth factor receptor alter lifespan in worms, flies, and mice, indicating that an endocrine signaling pathway has a conserved role in aging (Hekimi and Guarente 2003).Genetic studies have shown that aging can be slowed in mutants that are defective in a wide range of cellular processes (such as mitochondrial function, chromatin regulation, insulin signaling, transcriptional regulation, and genome stability).This indicates that aging is a complex process driven by diverse molecular pathways and biochemical events.As such, a powerful approach to study aging is to use systems biology, which allows a multitude of factors affecting aging to be analyzed in parallel.For example, DNA microarrays and gene expression chips have been used to perform a genome-wide analysis of changes in gene expres-sion in old age.Extensive studies in Caenorhabditis elegans and Drosophila melanogaster have identified hundreds of ageregulated genes (Hill et al. 2000;Zou et al. 2000;Lund et al. 2002;Pletcher et al. 2002;Murphy et al. 2003).Several studies have described age-regulated genes in the muscle and brain of mice (Lee et al. 1999(Lee et al. , 2000) ) and the retina and muscle of humans (Yoshida et al. 2002;Welle et al. 2003Welle et al. , 2004).These age-regulated genes may serve as markers of aging, enabling one to assess physiological age independently of chronological age.Analysis of the functions of these age-regulated genes has identified specific biochemical mechanisms that change toward the end of life.", + "Age-Regulated Genes Involved in Reproductive Capacity.Decline in reproductive capacity is an age-related phenotype, and the reproductive system seems to play an important role in longevity (22).For example, signals from germ cells can affect lifespan in C. elegans (23).In our study, we observed decreased RNA levels for several genes involved in reproduction (Fig. 3).These include two genes that encode members of the Acp family.The Acp from male flies stimulates female egg-laying and facilitates storage of sperm in the female genital tract (24).In addition, two ESTs showing age-regulated decrease of transcript levels represent different genes with homology to Arabidopsis MALE STERIL- In Northern analysis, the ratios were calculated by dividing mRNA levels at 25-, 40-, and 50-day time points by those at 3-day time points after normalization with mRNA levels of the control gene rp49.Ratios in microarray analysis are provided from each of the duplicate experiments for comparison.ITY 2 (MS2; ref. 25), a gene involved in gametogenesis.Furthermore, an EST with homology to peanut, a member of the septin family (26), is down-regulated in older flies.This downregulation may reflect a decrease in spermatogenesis.", + "Genes that are age-regulated in all tissues would reveal genes involved in core mechanisms that underlie cellular ageing.Zahn et al. [63] discovered genetic pathways that show common age regulation in human kidney, brain and muscle.They used microarrays to analyse expression in 81 skeletal muscle samples from patients aged 16 -86 years and found 250 age-regulated muscle genes [63].Similar to the ageing expression profile for the kidney, the overall expression behaviour of this set of age-regulated muscle genes correlated with the physiological as well as chronological age of the muscle sample.Next, they compared their muscle-ageing results to previously published data on kidney and brain ageing of similarly large sample size [56,60].Although most of the age-related changes were tissue specific, they found evidence for common age regulation of six genetic pathways in all three tissues.Specifically, there is an overall increase in expression of the extracellular matrix genes, the ribosomal genes, the cell growth genes and the complement activation genes in all three tissues.Increased overall expression of the extracellular matrix and complement activation gene sets with advancing age may contribute to widespread fibrosis and inflammation in the elderly.There is an overall decrease in expression of the chloride transport genes and the electron transport genes in all three tissues.Decreased overall expression of electron transport chain genes with age might support the mitochondrial free-radical theory of ageing [67], as free-radical generation by mitochondria would preferentially damage the electron transport chain protein complexes.Decreased expression of the electron transport genes (encoded in the nucleus) might be caused by feedback regulation from damage to the electron transport chain protein complexes [63].However, it is also possible that increased oxidative damage occurs as a consequence of the decreased expression of the electron transport chain genes.In addition, an increasing number of studies in model organisms have critically challenged the mitochondrial free-radical theory of ageing [68].In addition to testing genes known to be associated with age-related diseases and phenotypes for association with longevity, genes known to promote longevity in model organisms have been examined in human populations.Mutations in insulin or insulinlike signalling pathway genes have been shown to extend lifespan in Caenorhabditis elegans [20], Drosophila melanogaster [21,22] and mice [23,24].The insulin-signalling pathway negatively regulates the forkhead (FOXO) transcription factor [25].When insulin or insulin-like growth factor signalling is low, FOXO is activated and lifespan extension occurs [26].An overrepresentation of rare insulin-like growth factor I receptor (IGFIR) mutations has been observed in centenarians [27].These mutations are associated with reduced activity of IGFIR as measured in transformed lymphocytes [27]." + ], + [ + "We briefly comment on rare mutations that shorten life span through the early onset of diseases that are increasingly common during aging in the general population, e.g., familial forms of Alzheimer, breast cancer, coronary artery disease, type II diabetes, etc.The later onset forms of these diseases are associated with causes of death at later ages.A major question is what role the more common allelic variants of these same genes have in \"normal aging\".Although examination of this huge emerging topic goes beyond the present discussion, we may consider the example of Werner's syndrome, a rare autosomal recessive that causes adult onset progeria with a high incidence of cancer and atherosclerosis (70).The absence of Alzheimer-type dementia in Werner's syndrome illustrates the \"segmental\" nature of this and other progerias (70).Thus, heritable shortening of life span should not be considered as a simple acceleration of general aging processes.The Werner's lesion maps to a defective gene encoding a helicase and exonuclease, which also has several polymorphisms.In Japan, 1367Arg was associated with a lower risk of myocardial infarction (70), although it was not associated with longevity in Finland (14).In general, we know little of the genetic factors involved in frailty and morbidity at later ages, which are important to the geneenvironment interactions implied in the major longevity increase seen during the twentieth century.", + "Indicative diseases associated with the candidate aging genes", + "Dementia has an age-and sex-standardized prevalence of ~7.1% in Europeans 1 , with Alzheimer's disease (AD) being the most common form of dementia (50-70% of cases) 2 .AD is pathologically characterized by the presence of amyloid-beta plaques and tau neurofibrillary tangles in the brain 3 .Most patients are diagnosed with AD after the age of 65, termed late-onset AD (LOAD), while only 1% of AD cases have an early onset (before the age of 65) 3 .On the basis of twin studies, the heritability of LOAD is estimated to be ~60-80% (refs. 4,5 ), suggesting that a large proportion of individual differences in LOAD risk is driven by genetics.The heritability of LOAD is spread across many genetic variants; however, Zhang et al. 6 suggested that LOAD is more of an oligogenic than a polygenic disorder due to the large effects of APOE variants.Zhang et al. 6 and Holland et al. 7 predicted there to be ~100-10,000 causal variants contributing to LOAD; however, only a fraction have been identified.Increasing the sample size of genome-wide association studies (GWAS) will improve the statistical power to identify the missing causal variants and may highlight additional disease mechanisms.In combination with increasing the number of samples, it is beneficial to use different approaches to identify rare and private variation to help identify additional causal variants and increase understanding of disease mechanisms; however, we deem this to be out of the scope of the current analysis.", + "Alzheimer's disease (AD) is a devastating neurodegenerative disorder of late life with complex inheritance.Mutations in three known genes lead to the rare early-onset autosomal dominant form of AD, while a common polymorphism (e4) in the gene encoding apolipoprotein E (APOE ) is a risk factor for more typical late-onset (>60 years) AD.A recent study concluded that there are up to four additional genes with an equal or greater contribution to the disease.We performed a 9 cM genome screen of 437 families with AD, the full National Institute of Mental Health (NIMH) sample, which has been carefully ascertained, evaluated and followed by our group over the last decade.Performing standard parametric and non-parametric linkage analyses, we observed a 'highly significant' linkage peak by Lander and Kruglyak criteria on chromosome 19q13, which probably represents APOE.Twelve additional locations-on 1q23, 3p26, 4q32, 5p14, 6p21, 6q27, 9q22, 10q24, 11q25, 14q22, 15q26 and 21q22-met criteria for 'suggestive' linkage [i.e.two-point lod score (TLS) !1.9 and/ or multipoint lod score (MLS) !2.2] in at least one of our analyses.Although some of these will surely prove to be false positives, these linkage signals should provide a valuable framework for future studies aimed at identifying additional susceptibility genes for late-onset AD.Alzheimer's disease (AD) is a devastating neurodegenerative disorder of late life with complex inheritance.Mutations in three known genes lead to the rare early-onset autosomal dominant form of AD, while a common polymorphism (e4) in the gene encoding apolipoprotein E (APOE ) is a risk factor for more typical late-onset (>60 years) AD.A recent study concluded that there are up to four additional genes with an equal or greater contribution to the disease.We performed a 9 cM genome screen of 437 families with AD, the full National Institute of Mental Health (NIMH) sample, which has been carefully ascertained, evaluated and followed by our group over the last decade.Performing standard parametric and non-parametric linkage analyses, we observed a 'highly significant' linkage peak by Lander and Kruglyak criteria on chromosome 19q13, which probably represents APOE.Twelve additional locations-on 1q23, 3p26, 4q32, 5p14, 6p21, 6q27, 9q22, 10q24, 11q25, 14q22, 15q26 and 21q22-met criteria for 'suggestive' linkage [i.e.two-point lod score (TLS) !1.9 and/ or multipoint lod score (MLS) !2.2] in at least one of our analyses.Although some of these will surely prove to be false positives, these linkage signals should provide a valuable framework for future studies aimed at identifying additional susceptibility genes for late-onset AD.", + "IntroductionAlzheimer's disease (AD) is a complex disorder and is the most common form of dementia [1].After age, family history is the single greatest risk factor for AD.AD can be classified into early and late onset forms.Mutations in three genes: PSEN1/2 and APP are known to cause early onset AD in an autosomal dominant manner [2,3].The majority of AD cases, however, are late onset (LOAD) and the APOE e4 allele is the strongest known genetic risk factor.Many additional genetic polymorphisms have been identified, though with substantially lower risk estimates [1,4,5,6,7,8,9,10].LOAD appears to be inherited and/or sporadic and there is evidence of a maternal inheritance pattern [11].Current estimates suggest that more than 20% of inherited LOAD cases are maternally inherited [12].", + "INTRODUCTIONMany common noninfectious diseases exhibit a more severe clinical presentation in older individuals.These diseases often exhibit complex etiology and can affect different tissues and cell types, with a wide spectrum of clinical outcomes.Prominent aging-associated neurodegenerative diseases are Alzheimer's disease (AD), Parkinson's disease (PD), and age-related macular degeneration (AMD), all of which can severely compromise the quality of life and have serious repercussions on both the individual and society at large.These late-onset diseases generally result from the interplay between multiple genetic susceptibility factors and environmental components.Sequencing of the human genome, cataloging of millions of single nucleotide polymorphisms (SNPs) together with the development of a map of common haplotypes, and technological innovations in genotyping are among the major milestones that are facilitating exploration of the genetic basis of common diseases (1,7,50).In the field of AMD genetics, these advances have led to the identification of several genetic susceptibility factors and enabled us to start dissecting the relationship between environmental risk factors and the genetic constitution of each individual (66,118,148).As a result, new opportunities are emerging for improved understanding of disease pathogenesis that may lead to better management and treatment of AMD.Clinical aspects of AMD are discussed only briefly (for a more in-depth discussion, see Reference 79).Aging-associated neurodegenerative diseases significantly influence the quality of life of affected individuals.Genetic approaches, combined with genomic technology, have provided powerful insights into common late-onset diseases, such as age-related macular degeneration (AMD).Here, we discuss current findings on the genetics of AMD to highlight areas of rapid progress and new challenges.We also attempt to integrate available genetic and biochemical data with cellular pathways involved in aging to formulate an integrated model of AMD pathogenesis.Aging-associated neurodegenerative diseases significantly influence the quality of life of affected individuals.Genetic approaches, combined with genomic technology, have provided powerful insights into common late-onset diseases, such as age-related macular degeneration (AMD).Here, we discuss current findings on the genetics of AMD to highlight areas of rapid progress and new challenges.We also attempt to integrate available genetic and biochemical data with cellular pathways involved in aging to formulate an integrated model of AMD pathogenesis.", + "Alzheimer's disease (AD) (MIM: 104300) is a highly heritable disease with great complexity in its genetic contributors, and represents the most common form of dementia.With the gradual aging of the world's population, leading to increased prevalence of AD, and the substantial cost of care for those afflicted, identifying the genetic causes of disease represents a critical effort in identifying therapeutic targets.Here we provide a comprehensive review of genomic studies of AD, from the earliest linkage studies identifying monogenic contributors to early-onset forms of AD to the genome-wide and rare variant association studies of recent years that are being used to characterize the mosaic of genetic contributors to late-onset AD (LOAD), and which have identified approximately $20 genes with common variants contributing to LOAD risk.In addition, we explore studies employing alternative approaches to identify genetic contributors to AD, including studies of AD-related phenotypes and multi-variant association studies such as pathway analyses.Finally, we introduce studies of next-generation sequencing, which have recently helped identify multiple lowfrequency and rare variant contributors to AD, and discuss ongoing efforts with next-generation sequencing studies to develop statistically well-powered and comprehensive genomic studies of AD.Through this review, we help uncover the many insights the genetics of AD have provided into the pathways and pathophysiology of AD.Genetics of Alzheimer Disease: Early-Onset ADIn the early to mid-1990s, genetic studies of AD focused on extended families with high burden of disease (two or more cases among first-degree relatives), and used linkage analysis of highly polymorphic genetic markers called short tandem repeats (STRs, or microsattelites) in order to identify genomic regions co-transmitting with disease in affected family members.This strategy, followed by \"fine mapping\"-the positional cloning of candidate genes-was used to identify genes and genetic variants contributing to AD risk.The first three genes known to cause AD were identified among families with multiple early-onset cases (age-at-onset <60 years): APP, encoding amyloid precursor protein [Goate et al., 1991], and PS1 and PS2, encoding presenilins I and II respectively [Levy-Lahad et al., 1995;Rogaev et al., 1995;Sherrington et al., 1995], each transmitting disease-causing variants in the predicted autosomal-dominant fashion.Alzheimer's disease (AD) (MIM: 104300) is a highly heritable disease with great complexity in its genetic contributors, and represents the most common form of dementia.With the gradual aging of the world's population, leading to increased prevalence of AD, and the substantial cost of care for those afflicted, identifying the genetic causes of disease represents a critical effort in identifying therapeutic targets.Here we provide a comprehensive review of genomic studies of AD, from the earliest linkage studies identifying monogenic contributors to early-onset forms of AD to the genome-wide and rare variant association studies of recent years that are being used to characterize the mosaic of genetic contributors to late-onset AD (LOAD), and which have identified approximately $20 genes with common variants contributing to LOAD risk.In addition, we explore studies employing alternative approaches to identify genetic contributors to AD, including studies of AD-related phenotypes and multi-variant association studies such as pathway analyses.Finally, we introduce studies of next-generation sequencing, which have recently helped identify multiple lowfrequency and rare variant contributors to AD, and discuss ongoing efforts with next-generation sequencing studies to develop statistically well-powered and comprehensive genomic studies of AD.Through this review, we help uncover the many insights the genetics of AD have provided into the pathways and pathophysiology of AD.", + "The genetics of Alzheimer disease (AD) to date support an age-dependent dichotomous model whereby earlier age of disease onset (60 years) is explained by 3 fully penetrant genes (APP [NCBI Entrez gene 351], PSEN1 [NCBI Entrez gene 5663], and PSEN2 [NCBI Entrez gene 5664]), whereas later age of disease onset (65 years) representing most cases of AD has yet to be explained by a purely genetic model.The APOE gene (NCBI Entrez gene 348) is the strongest genetic risk factor for later onset, although it is neither sufficient nor necessary to explain all occurrences of disease.Numerous putative genetic risk alleles and genetic variants have been reported.Although all have relevance to biological mechanisms that may be associated with AD pathogenesis, they await replication in large representative populations.Genome-wide association studies have emerged as an increasingly effective tool for identifying genetic contributions to complex diseases and represent the next frontier for furthering our understanding of the underlying etiologic, biological, and pathologic mechanisms associated with chronic complex disorders.There have already been success stories for diseases such as macular degeneration and diabetes mellitus.Whether this will hold true for a genetically complex and heterogeneous disease such as AD is not known, although early reports are encouraging.This review considers recent publications from studies that have successfully applied genome-wide association methods to investigations of AD by taking advantage of the currently available high-throughput arrays, bioinformatics, and software advances.The inherent strengths, limitations, and challenges associated with study design issues in the context of AD are presented herein.Arch Neurol.2008;65(3): 329-334 Alzheimer disease (AD) is the most common cause of dementia and the most prevalent neurodegenerative disorder associated with aging. 1 Alzheimer disease is a heterogeneous disorder with a complex etiology owing to genetic and environmental influences as causal or risk modifiers.The neuropathologic hallmarks of disease are extracellular amyloid plaques and intracellular neurofibrillary tangles of hyperphosphorylated tau protein. 2 Only 10% of AD cases occurring before 60 years of age (early-onset AD) are due to rare, fully penetrant (autosomal dominant) mutations in 3 genes: A precursor protein (APP) on chromosome 21, 3 presenilin 1 (PSEN1) on chromosome 14, 4 and presenilin 2 (PSEN2) on chromosome 1. 5,6In contrast, most cases of AD are later in onset ( 65 years of age) (late-onset AD), are nonfamilial, and are likely the result of highly prevalent genetic variants with low penetrance. 7To date, the only genetic risk factor for lateonset AD remains the apolipoprotein E gene (APOE), specifically the 4 allele, which is moderately penetrant, accounting for up to 50% of cases. 8owever, a robust literature reports numerous putative genetic risk alleles and promising genetic variants.Recent reports from individual studies reveal significant associations with the sortilin-related receptor (SORL1 [NCBI Entrez gene 6653]) 9,10 and glycine-rich protein 2-associated binding protein 2 (GAB2 [NCBI Entrez gene 9846]) 11 on chromosome 11; death-associated protein kinase 1 (DAPK1 [NCBI Entrez gene 1612]), 12 ubiquilin 1 (UBQLN1 [NCBI Entrez gene 299798]), 13 and adenosine triphosphate-binding cassette transporter 1, subfamily A (ABCA1 [NCBI Entrez gene 19]), on chromosome 9 14 ; and low-density lipoprotein receptor-related protein 6 (LRP6 [NCBI Entrez gene 4040]) on chromosome 12. 15 All of these putative variants still lack replication in large representative populations but have relevance to neuropathologic mechanisms and pathways that may be associated with AD pathogenesis ( A large meta-analysis from the AlzGene database 16 17 All are associated with relevant biological mechanisms and pathways but await replication to further elucidate their utility as significant markers for AD.", + "BackgroundAlzheimer's disease (AD) is the most common neurodegenerative disorder and the leading cause of dementia in the elderly [1].Diagnosis of AD is based on the presence of neurofibrillary tangles and amyloid plaques [2], and symptoms typically include memory loss and impaired cognitive ability.Although the pathological hallmarks associated with dementia-related symptoms in AD appear largely similar between both the early-onset and late-onset forms of the disease, their underlying etiologies contrast [3].Whereas early-onset AD is a familial autosomal dominant disorder caused by rare, highly penetrant mutations in one of a small set of genes (APP, PSEN1, and PSEN2), the more common late-onset form of the disease (accounting for 90-95 % of cases) occurs sporadically, and risk is determined by complex underlying mechanisms [3][4][5][6].Estimates based on twin concordance rates suggest heritability of late-onset AD is as high as 70 %, implicating major roles for genetic as well as non-genetic factors [6].Indeed, through candidate gene studies, as well as more recent genome-wide association studies (GWASs) and whole-exome sequencing, both common and rare variants associated with the late-onset form of AD have been identified [7][8][9][10][11].Collectively, however, common GWAS variants account for only a modest proportion (~30 %) of the underlying variance in disease susceptibility [12].Several environmental factors are also thought to play a role [5,6], yet exactly how these contribute to risk, onset, and progression remains poorly defined.", + "Alzheimer's disease is the most common type of dementia, and it is characterized by a decline in memory or other thinking skills.The greatest risk factor for Alzheimer's disease is advanced age.A recent genome-wide study identified a locus on chromosome 17 associated with the age at onset, and a specific variant in CCL11 is probably responsible for the association.The association of a protective haplotype with a 10-year delay in the onset of Alzheimer's disease and the identification of a CCL11 variant with possible functional roles in this association might allow the future development of immunomodulators with the potential to halve disease incidence.Alzheimer's disease is the most common type of dementia, and it is characterized by a decline in memory or other thinking skills.The greatest risk factor for Alzheimer's disease is advanced age.A recent genome-wide study identified a locus on chromosome 17 associated with the age at onset, and a specific variant in CCL11 is probably responsible for the association.The association of a protective haplotype with a 10-year delay in the onset of Alzheimer's disease and the identification of a CCL11 variant with possible functional roles in this association might allow the future development of immunomodulators with the potential to halve disease incidence.", + "INTRODUCTIONAlzheimer's disease (AD) is a common debilitating disorder with a prevalence that rises steeply with age from below 1% at 65 years to as high as 40% after the age of 90 [Bachman et al., 1992].Genes are known to play a role in the development of AD.Twin studies show heritabilities of around 60% [Bergem et al., 1997;Gatz et al., 1997].Indeed, variation in four genes has already been shown to cause rare forms of early-onset AD [the Amyloid Precursor Protein Gene (APP); Goate et al., 1991; Presenilin 1 (PS1); Sherrington et al., 1995; Presenilin 2 (PS2); Levy Lahad et al., 1995, Rogaev et al., 1995] or increase the general risk of disease development [Apolipoprotein E (APOE), Corder et al., 1993].As well as increasing disease susceptibility, APOE e4 alleles are associated with reduced age at onset (AAO) and appear to show their strongest effect below 70 years [Farrer et al., 1997].There is also evidence from both twin [Pedersen et al., 2001] and family studies [Tunstall et al., 2000;Li et al., 2002] that AAO in AD is heritable.Daw et al. [2000] have estimated that in addition to APOE, there are at least four loci with similar effect sizes, which contribute to AAO in AD.", + "IntroductionAlzheimer's disease (AD), a devastating neurodegenerative disease, is the most common form of dementia among the elderly.Genetically, AD is a complex and multifactorial disease with the possible involvement of multiple genes.The rare early-onset form of the disease usually follows an autosomal-dominant inheritance pattern and to date three genes have been identified: amyloid precursor protein (APP) and presenilin 1 and 2 (PSEN1 and PSEN2).The common late-onset form of the disease is much more complex than the early-onset form and until recently the apolipoprotein E (APOE) gene was the only major genetic factor accounting for 20-29% of the risk for late-onset AD. 1,2 Recent large genome-wide association studies (GWAS) have identi-fied nine additional genes for late-onset AD, including CR1, BIN1, CLU (a.k.a.4][5][6][7] There is high heritability for AD risk (up to 80%), 8 but the total risk attributable to all confirmed loci is about 50%, indicating the presence of additional risk genes for late-onset AD.", + "NE OF EVERY 5 PERSONS AGED 65 years is predicted to develop Alzheimer disease (AD) in their lifetime, and genetic variants may play an important part in the development of the disease. 1 The apparent substantial heritability of late-onset AD 2 is inadequately explained by genetic variation within the well-replicated genes (apolipoprotein E [APOE; RefSeq NG_007084], presenilin-1 [PSEN1; RefSeq NG_007386], presenilin-2 [PSEN2; RefSeq NG_007381], and amyloid beta precursor protein [APP;" + ], + [ + "We briefly comment on rare mutations that shorten life span through the early onset of diseases that are increasingly common during aging in the general population, e.g., familial forms of Alzheimer, breast cancer, coronary artery disease, type II diabetes, etc.The later onset forms of these diseases are associated with causes of death at later ages.A major question is what role the more common allelic variants of these same genes have in \"normal aging\".Although examination of this huge emerging topic goes beyond the present discussion, we may consider the example of Werner's syndrome, a rare autosomal recessive that causes adult onset progeria with a high incidence of cancer and atherosclerosis (70).The absence of Alzheimer-type dementia in Werner's syndrome illustrates the \"segmental\" nature of this and other progerias (70).Thus, heritable shortening of life span should not be considered as a simple acceleration of general aging processes.The Werner's lesion maps to a defective gene encoding a helicase and exonuclease, which also has several polymorphisms.In Japan, 1367Arg was associated with a lower risk of myocardial infarction (70), although it was not associated with longevity in Finland (14).In general, we know little of the genetic factors involved in frailty and morbidity at later ages, which are important to the geneenvironment interactions implied in the major longevity increase seen during the twentieth century.", + "Indicative diseases associated with the candidate aging genes", + "Dementia has an age-and sex-standardized prevalence of ~7.1% in Europeans 1 , with Alzheimer's disease (AD) being the most common form of dementia (50-70% of cases) 2 .AD is pathologically characterized by the presence of amyloid-beta plaques and tau neurofibrillary tangles in the brain 3 .Most patients are diagnosed with AD after the age of 65, termed late-onset AD (LOAD), while only 1% of AD cases have an early onset (before the age of 65) 3 .On the basis of twin studies, the heritability of LOAD is estimated to be ~60-80% (refs. 4,5 ), suggesting that a large proportion of individual differences in LOAD risk is driven by genetics.The heritability of LOAD is spread across many genetic variants; however, Zhang et al. 6 suggested that LOAD is more of an oligogenic than a polygenic disorder due to the large effects of APOE variants.Zhang et al. 6 and Holland et al. 7 predicted there to be ~100-10,000 causal variants contributing to LOAD; however, only a fraction have been identified.Increasing the sample size of genome-wide association studies (GWAS) will improve the statistical power to identify the missing causal variants and may highlight additional disease mechanisms.In combination with increasing the number of samples, it is beneficial to use different approaches to identify rare and private variation to help identify additional causal variants and increase understanding of disease mechanisms; however, we deem this to be out of the scope of the current analysis.Dementia has an age-and sex-standardized prevalence of ~7.1% in Europeans 1 , with Alzheimer's disease (AD) being the most common form of dementia (50-70% of cases) 2 .AD is pathologically characterized by the presence of amyloid-beta plaques and tau neurofibrillary tangles in the brain 3 .Most patients are diagnosed with AD after the age of 65, termed late-onset AD (LOAD), while only 1% of AD cases have an early onset (before the age of 65) 3 .On the basis of twin studies, the heritability of LOAD is estimated to be ~60-80% (refs. 4,5 ), suggesting that a large proportion of individual differences in LOAD risk is driven by genetics.The heritability of LOAD is spread across many genetic variants; however, Zhang et al. 6 suggested that LOAD is more of an oligogenic than a polygenic disorder due to the large effects of APOE variants.Zhang et al. 6 and Holland et al. 7 predicted there to be ~100-10,000 causal variants contributing to LOAD; however, only a fraction have been identified.Increasing the sample size of genome-wide association studies (GWAS) will improve the statistical power to identify the missing causal variants and may highlight additional disease mechanisms.In combination with increasing the number of samples, it is beneficial to use different approaches to identify rare and private variation to help identify additional causal variants and increase understanding of disease mechanisms; however, we deem this to be out of the scope of the current analysis.The largest previous GWAS of LOAD, identified 29 risk loci from 71,880 (46,613 proxy) cases and 383,378 (318,246 proxy) controls 8 .Our current study expands this to include 90,338 (46,613 proxy) cases and 1,036,225 (318,246 proxy) controls.The recruitment of LOAD cases can be difficult due to the late age of onset, so proxy cases can allow for the inclusion of younger individuals by estimating their risk of LOAD using parental status.Proxy cases and controls were defined on the basis of known parental LOAD status weighted by parental age (Supplementary Information).In the current study, we identified 38 loci, including seven loci that have not been reported previously.Functional follow-up analyses implicated tissues, cell types and genes of interest through tissue and cell type enrichment, colocalization and statistical fine-mapping.This study highlights microglia, immune cells and protein catabolism as relevant to LOAD, while identifying previously unidentified genes of potential interest. ResultsGenome-wide inferences.We performed meta-analysis on data from 13 cohorts, totaling 1,126,563 individuals (Supplementary", + "Alzheimer's disease (AD) is a devastating neurodegenerative disorder of late life with complex inheritance.Mutations in three known genes lead to the rare early-onset autosomal dominant form of AD, while a common polymorphism (e4) in the gene encoding apolipoprotein E (APOE ) is a risk factor for more typical late-onset (>60 years) AD.A recent study concluded that there are up to four additional genes with an equal or greater contribution to the disease.We performed a 9 cM genome screen of 437 families with AD, the full National Institute of Mental Health (NIMH) sample, which has been carefully ascertained, evaluated and followed by our group over the last decade.Performing standard parametric and non-parametric linkage analyses, we observed a 'highly significant' linkage peak by Lander and Kruglyak criteria on chromosome 19q13, which probably represents APOE.Twelve additional locations-on 1q23, 3p26, 4q32, 5p14, 6p21, 6q27, 9q22, 10q24, 11q25, 14q22, 15q26 and 21q22-met criteria for 'suggestive' linkage [i.e.two-point lod score (TLS) !1.9 and/ or multipoint lod score (MLS) !2.2] in at least one of our analyses.Although some of these will surely prove to be false positives, these linkage signals should provide a valuable framework for future studies aimed at identifying additional susceptibility genes for late-onset AD.", + "IntroductionAlzheimer's disease (AD) is a complex disorder and is the most common form of dementia [1].After age, family history is the single greatest risk factor for AD.AD can be classified into early and late onset forms.Mutations in three genes: PSEN1/2 and APP are known to cause early onset AD in an autosomal dominant manner [2,3].The majority of AD cases, however, are late onset (LOAD) and the APOE e4 allele is the strongest known genetic risk factor.Many additional genetic polymorphisms have been identified, though with substantially lower risk estimates [1,4,5,6,7,8,9,10].LOAD appears to be inherited and/or sporadic and there is evidence of a maternal inheritance pattern [11].Current estimates suggest that more than 20% of inherited LOAD cases are maternally inherited [12].", + "INTRODUCTIONMany common noninfectious diseases exhibit a more severe clinical presentation in older individuals.These diseases often exhibit complex etiology and can affect different tissues and cell types, with a wide spectrum of clinical outcomes.Prominent aging-associated neurodegenerative diseases are Alzheimer's disease (AD), Parkinson's disease (PD), and age-related macular degeneration (AMD), all of which can severely compromise the quality of life and have serious repercussions on both the individual and society at large.These late-onset diseases generally result from the interplay between multiple genetic susceptibility factors and environmental components.Sequencing of the human genome, cataloging of millions of single nucleotide polymorphisms (SNPs) together with the development of a map of common haplotypes, and technological innovations in genotyping are among the major milestones that are facilitating exploration of the genetic basis of common diseases (1,7,50).In the field of AMD genetics, these advances have led to the identification of several genetic susceptibility factors and enabled us to start dissecting the relationship between environmental risk factors and the genetic constitution of each individual (66,118,148).As a result, new opportunities are emerging for improved understanding of disease pathogenesis that may lead to better management and treatment of AMD.Clinical aspects of AMD are discussed only briefly (for a more in-depth discussion, see Reference 79).Aging-associated neurodegenerative diseases significantly influence the quality of life of affected individuals.Genetic approaches, combined with genomic technology, have provided powerful insights into common late-onset diseases, such as age-related macular degeneration (AMD).Here, we discuss current findings on the genetics of AMD to highlight areas of rapid progress and new challenges.We also attempt to integrate available genetic and biochemical data with cellular pathways involved in aging to formulate an integrated model of AMD pathogenesis.Aging-associated neurodegenerative diseases significantly influence the quality of life of affected individuals.Genetic approaches, combined with genomic technology, have provided powerful insights into common late-onset diseases, such as age-related macular degeneration (AMD).Here, we discuss current findings on the genetics of AMD to highlight areas of rapid progress and new challenges.We also attempt to integrate available genetic and biochemical data with cellular pathways involved in aging to formulate an integrated model of AMD pathogenesis.", + "Alzheimer's disease (AD) (MIM: 104300) is a highly heritable disease with great complexity in its genetic contributors, and represents the most common form of dementia.With the gradual aging of the world's population, leading to increased prevalence of AD, and the substantial cost of care for those afflicted, identifying the genetic causes of disease represents a critical effort in identifying therapeutic targets.Here we provide a comprehensive review of genomic studies of AD, from the earliest linkage studies identifying monogenic contributors to early-onset forms of AD to the genome-wide and rare variant association studies of recent years that are being used to characterize the mosaic of genetic contributors to late-onset AD (LOAD), and which have identified approximately $20 genes with common variants contributing to LOAD risk.In addition, we explore studies employing alternative approaches to identify genetic contributors to AD, including studies of AD-related phenotypes and multi-variant association studies such as pathway analyses.Finally, we introduce studies of next-generation sequencing, which have recently helped identify multiple lowfrequency and rare variant contributors to AD, and discuss ongoing efforts with next-generation sequencing studies to develop statistically well-powered and comprehensive genomic studies of AD.Through this review, we help uncover the many insights the genetics of AD have provided into the pathways and pathophysiology of AD.Alzheimer's disease (AD) (MIM: 104300) is a highly heritable disease with great complexity in its genetic contributors, and represents the most common form of dementia.With the gradual aging of the world's population, leading to increased prevalence of AD, and the substantial cost of care for those afflicted, identifying the genetic causes of disease represents a critical effort in identifying therapeutic targets.Here we provide a comprehensive review of genomic studies of AD, from the earliest linkage studies identifying monogenic contributors to early-onset forms of AD to the genome-wide and rare variant association studies of recent years that are being used to characterize the mosaic of genetic contributors to late-onset AD (LOAD), and which have identified approximately $20 genes with common variants contributing to LOAD risk.In addition, we explore studies employing alternative approaches to identify genetic contributors to AD, including studies of AD-related phenotypes and multi-variant association studies such as pathway analyses.Finally, we introduce studies of next-generation sequencing, which have recently helped identify multiple lowfrequency and rare variant contributors to AD, and discuss ongoing efforts with next-generation sequencing studies to develop statistically well-powered and comprehensive genomic studies of AD.Through this review, we help uncover the many insights the genetics of AD have provided into the pathways and pathophysiology of AD.Genetics of Alzheimer Disease: Early-Onset ADIn the early to mid-1990s, genetic studies of AD focused on extended families with high burden of disease (two or more cases among first-degree relatives), and used linkage analysis of highly polymorphic genetic markers called short tandem repeats (STRs, or microsattelites) in order to identify genomic regions co-transmitting with disease in affected family members.This strategy, followed by \"fine mapping\"-the positional cloning of candidate genes-was used to identify genes and genetic variants contributing to AD risk.The first three genes known to cause AD were identified among families with multiple early-onset cases (age-at-onset <60 years): APP, encoding amyloid precursor protein [Goate et al., 1991], and PS1 and PS2, encoding presenilins I and II respectively [Levy-Lahad et al., 1995;Rogaev et al., 1995;Sherrington et al., 1995], each transmitting disease-causing variants in the predicted autosomal-dominant fashion.", + "Arch Neurol.2008;65(3): 329-334 Alzheimer disease (AD) is the most common cause of dementia and the most prevalent neurodegenerative disorder associated with aging. 1 Alzheimer disease is a heterogeneous disorder with a complex etiology owing to genetic and environmental influences as causal or risk modifiers.The neuropathologic hallmarks of disease are extracellular amyloid plaques and intracellular neurofibrillary tangles of hyperphosphorylated tau protein. 2 Only 10% of AD cases occurring before 60 years of age (early-onset AD) are due to rare, fully penetrant (autosomal dominant) mutations in 3 genes: A precursor protein (APP) on chromosome 21, 3 presenilin 1 (PSEN1) on chromosome 14, 4 and presenilin 2 (PSEN2) on chromosome 1. 5,6In contrast, most cases of AD are later in onset ( 65 years of age) (late-onset AD), are nonfamilial, and are likely the result of highly prevalent genetic variants with low penetrance. 7To date, the only genetic risk factor for lateonset AD remains the apolipoprotein E gene (APOE), specifically the 4 allele, which is moderately penetrant, accounting for up to 50% of cases. 8owever, a robust literature reports numerous putative genetic risk alleles and promising genetic variants.Recent reports from individual studies reveal significant associations with the sortilin-related receptor (SORL1 [NCBI Entrez gene 6653]) 9,10 and glycine-rich protein 2-associated binding protein 2 (GAB2 [NCBI Entrez gene 9846]) 11 on chromosome 11; death-associated protein kinase 1 (DAPK1 [NCBI Entrez gene 1612]), 12 ubiquilin 1 (UBQLN1 [NCBI Entrez gene 299798]), 13 and adenosine triphosphate-binding cassette transporter 1, subfamily A (ABCA1 [NCBI Entrez gene 19]), on chromosome 9 14 ; and low-density lipoprotein receptor-related protein 6 (LRP6 [NCBI Entrez gene 4040]) on chromosome 12. 15 All of these putative variants still lack replication in large representative populations but have relevance to neuropathologic mechanisms and pathways that may be associated with AD pathogenesis ( A large meta-analysis from the AlzGene database 16 17 All are associated with relevant biological mechanisms and pathways but await replication to further elucidate their utility as significant markers for AD.The genetics of Alzheimer disease (AD) to date support an age-dependent dichotomous model whereby earlier age of disease onset (60 years) is explained by 3 fully penetrant genes (APP [NCBI Entrez gene 351], PSEN1 [NCBI Entrez gene 5663], and PSEN2 [NCBI Entrez gene 5664]), whereas later age of disease onset (65 years) representing most cases of AD has yet to be explained by a purely genetic model.The APOE gene (NCBI Entrez gene 348) is the strongest genetic risk factor for later onset, although it is neither sufficient nor necessary to explain all occurrences of disease.Numerous putative genetic risk alleles and genetic variants have been reported.Although all have relevance to biological mechanisms that may be associated with AD pathogenesis, they await replication in large representative populations.Genome-wide association studies have emerged as an increasingly effective tool for identifying genetic contributions to complex diseases and represent the next frontier for furthering our understanding of the underlying etiologic, biological, and pathologic mechanisms associated with chronic complex disorders.There have already been success stories for diseases such as macular degeneration and diabetes mellitus.Whether this will hold true for a genetically complex and heterogeneous disease such as AD is not known, although early reports are encouraging.This review considers recent publications from studies that have successfully applied genome-wide association methods to investigations of AD by taking advantage of the currently available high-throughput arrays, bioinformatics, and software advances.The inherent strengths, limitations, and challenges associated with study design issues in the context of AD are presented herein.", + "BackgroundAlzheimer's disease (AD) is the most common neurodegenerative disorder and the leading cause of dementia in the elderly [1].Diagnosis of AD is based on the presence of neurofibrillary tangles and amyloid plaques [2], and symptoms typically include memory loss and impaired cognitive ability.Although the pathological hallmarks associated with dementia-related symptoms in AD appear largely similar between both the early-onset and late-onset forms of the disease, their underlying etiologies contrast [3].Whereas early-onset AD is a familial autosomal dominant disorder caused by rare, highly penetrant mutations in one of a small set of genes (APP, PSEN1, and PSEN2), the more common late-onset form of the disease (accounting for 90-95 % of cases) occurs sporadically, and risk is determined by complex underlying mechanisms [3][4][5][6].Estimates based on twin concordance rates suggest heritability of late-onset AD is as high as 70 %, implicating major roles for genetic as well as non-genetic factors [6].Indeed, through candidate gene studies, as well as more recent genome-wide association studies (GWASs) and whole-exome sequencing, both common and rare variants associated with the late-onset form of AD have been identified [7][8][9][10][11].Collectively, however, common GWAS variants account for only a modest proportion (~30 %) of the underlying variance in disease susceptibility [12].Several environmental factors are also thought to play a role [5,6], yet exactly how these contribute to risk, onset, and progression remains poorly defined.", + "Alzheimer's disease is the most common type of dementia, and it is characterized by a decline in memory or other thinking skills.The greatest risk factor for Alzheimer's disease is advanced age.A recent genome-wide study identified a locus on chromosome 17 associated with the age at onset, and a specific variant in CCL11 is probably responsible for the association.The association of a protective haplotype with a 10-year delay in the onset of Alzheimer's disease and the identification of a CCL11 variant with possible functional roles in this association might allow the future development of immunomodulators with the potential to halve disease incidence.Alzheimer's disease is the most common type of dementia, and it is characterized by a decline in memory or other thinking skills.The greatest risk factor for Alzheimer's disease is advanced age.A recent genome-wide study identified a locus on chromosome 17 associated with the age at onset, and a specific variant in CCL11 is probably responsible for the association.The association of a protective haplotype with a 10-year delay in the onset of Alzheimer's disease and the identification of a CCL11 variant with possible functional roles in this association might allow the future development of immunomodulators with the potential to halve disease incidence.", + "INTRODUCTIONAlzheimer's disease (AD) is a common debilitating disorder with a prevalence that rises steeply with age from below 1% at 65 years to as high as 40% after the age of 90 [Bachman et al., 1992].Genes are known to play a role in the development of AD.Twin studies show heritabilities of around 60% [Bergem et al., 1997;Gatz et al., 1997].Indeed, variation in four genes has already been shown to cause rare forms of early-onset AD [the Amyloid Precursor Protein Gene (APP); Goate et al., 1991; Presenilin 1 (PS1); Sherrington et al., 1995; Presenilin 2 (PS2); Levy Lahad et al., 1995, Rogaev et al., 1995] or increase the general risk of disease development [Apolipoprotein E (APOE), Corder et al., 1993].As well as increasing disease susceptibility, APOE e4 alleles are associated with reduced age at onset (AAO) and appear to show their strongest effect below 70 years [Farrer et al., 1997].There is also evidence from both twin [Pedersen et al., 2001] and family studies [Tunstall et al., 2000;Li et al., 2002] that AAO in AD is heritable.Daw et al. [2000] have estimated that in addition to APOE, there are at least four loci with similar effect sizes, which contribute to AAO in AD.", + "IntroductionAlzheimer's disease (AD), a devastating neurodegenerative disease, is the most common form of dementia among the elderly.Genetically, AD is a complex and multifactorial disease with the possible involvement of multiple genes.The rare early-onset form of the disease usually follows an autosomal-dominant inheritance pattern and to date three genes have been identified: amyloid precursor protein (APP) and presenilin 1 and 2 (PSEN1 and PSEN2).The common late-onset form of the disease is much more complex than the early-onset form and until recently the apolipoprotein E (APOE) gene was the only major genetic factor accounting for 20-29% of the risk for late-onset AD. 1,2 Recent large genome-wide association studies (GWAS) have identi-fied nine additional genes for late-onset AD, including CR1, BIN1, CLU (a.k.a.4][5][6][7] There is high heritability for AD risk (up to 80%), 8 but the total risk attributable to all confirmed loci is about 50%, indicating the presence of additional risk genes for late-onset AD.", + "NE OF EVERY 5 PERSONS AGED 65 years is predicted to develop Alzheimer disease (AD) in their lifetime, and genetic variants may play an important part in the development of the disease. 1 The apparent substantial heritability of late-onset AD 2 is inadequately explained by genetic variation within the well-replicated genes (apolipoprotein E [APOE; RefSeq NG_007084], presenilin-1 [PSEN1; RefSeq NG_007386], presenilin-2 [PSEN2; RefSeq NG_007381], and amyloid beta precursor protein [APP;" + ], + [ + "Genomic analysis of longevity offers the potential to illuminate the biology of human aging.Here, using genome-wide association meta-analysis of 606,059 parents' survival, we discover two regions associated with longevity (HLA-DQA1/DRB1 and LPA).We also validate previous suggestions that APOE, CHRNA3/5, CDKN2A/B, SH2B3 and FOXO3A influence longevity.Next we show that giving up smoking, educational attainment, openness to new experience and high-density lipoprotein (HDL) cholesterol levels are most positively genetically correlated with lifespan while susceptibility to coronary artery disease (CAD), cigarettes smoked per day, lung cancer, insulin resistance and body fat are most negatively correlated.We suggest that the effect of education on lifespan is principally mediated through smoking while the effect of obesity appears to act via CAD.Using instrumental variables, we suggest that an increase of one body mass index unit reduces lifespan by 7 months while 1 year of education adds 11 months to expected lifespan.Genomic analysis of longevity offers the potential to illuminate the biology of human aging.Here, using genome-wide association meta-analysis of 606,059 parents' survival, we discover two regions associated with longevity (HLA-DQA1/DRB1 and LPA).We also validate previous suggestions that APOE, CHRNA3/5, CDKN2A/B, SH2B3 and FOXO3A influence longevity.Next we show that giving up smoking, educational attainment, openness to new experience and high-density lipoprotein (HDL) cholesterol levels are most positively genetically correlated with lifespan while susceptibility to coronary artery disease (CAD), cigarettes smoked per day, lung cancer, insulin resistance and body fat are most negatively correlated.We suggest that the effect of education on lifespan is principally mediated through smoking while the effect of obesity appears to act via CAD.Using instrumental variables, we suggest that an increase of one body mass index unit reduces lifespan by 7 months while 1 year of education adds 11 months to expected lifespan.", + "Recent developments on the genetics of aging can be seen as several streams of effort.In general, humans show a relatively modest (<50%) heritability of life spans (results obtained from twin studies discussed below).The apoE polymorphisms are remarkable for their influence on both cardiovascular disease and Alzheimer disease.In contrast, rare mutant genes with high penetrance cause these same diseases but with early onset and a major shortening of the life span.Shortlived laboratory models (fruit flies, nematodes, mice) are yielding rapid advances, with the discovery of mutants that increase life spans in association with altered metabolism, which leads to questions on the physiological organization of aging processes.Although these early findings do not show that a conserved genetic program actually controls aging processes across animal phylogeny, it is striking how frequently findings of metabolic rate, insulin signaling, and free radicals have emerged from very different approaches to aging in nematodes and mammals, for example.These findings hint that the genetic control of life span was already developed in the common ancestor of modern animals so that subsequent evolution of life spans was mediated by quantitative changes in the control of metabolism through insulin and the production of free radicals.", + "Background: Genetic research on longevity has provided important insights into the mechanism of aging and aging-related diseases.Pinpointing import genetic variants associated with aging could provide insights for aging research.Methods: We performed a whole-genome sequencing in 19 centenarians to establish the genetic basis of human longevity.Results: Using SKAT analysis, we found 41 significantly correlated genes in centenarians as compared to control genomes.Pathway enrichment analysis of these genes showed that immune-related pathways were enriched, suggesting that immune pathways might be critically involved in aging.HLA typing was next performed based on the whole-genome sequencing data obtained.We discovered that several HLA subtypes were significantly overrepresented.Conclusions: Our study indicated a new mechanism of longevity, suggesting potential genetic variants for further study.Background: Genetic research on longevity has provided important insights into the mechanism of aging and aging-related diseases.Pinpointing import genetic variants associated with aging could provide insights for aging research.", + "INTRODUCTIONHuman aging is affected by genes, life style, and environmental factors.The genetic contribution to average human aging can be modest with genes explaining 20-25% of the variability of human survival to the mid-eighties (Herskind et al., 1996;Fraser and Shavlik, 2001).By contrast, genetic factors may have greater impact on survival to the ninth through eleventh decades (Tan et al., 2008).Notably, exceptional longevity is rare and may involve biological mechanisms that differ from those implicated in usual human aging.", + "Somatic mutations with the inherited gene variations of each individual cumulatively or synergistically influence the health span and life span [11].Very few genetic variants have been associated with human longevity, but those found include the transcription factor FOXO3 gene, the APOE/TOMM40 and the CDKN2B/ ANRIL loci, which are associated with Alzheimer's disease and cellular senescence [12][13][14].In fact, the heritability for human longevity has been estimated to be approximately 20-30%, according to studies of twins, suggesting that external factors such as diet, environment, physical activity and microbiomes are important factors that influence the health span [14][15][16].The increase in the rate of retrotranscription reflects genome deregulation, creating additional mutations, DNA damage, and other forms of genome instability.For instance, the expression of several families of retrotransposable elements increases with age, as observed in mouse skeletal muscle and human fibroblasts, particularly the long interspersed nuclear element-1 (L1 LINE) [17,18].Influence of Genetic Factors in Ageing and LifespanAgeing is defined as the decline of physiological functions in several tissues and organs inducing an increasing probability of death [17].The understanding of genetic factors involved in ageing has been limited due to the complexity of this process and the heterogeneity among individuals and even among tissues [18][19][20].Tissue cells adopt a senescent phenotype as a consequence of multiple intrinsic, extrinsic, and stochastic factors [21].The combination of these genetic factors is related to longevity and healthy ageing [22].Although this decline is somewhat predictable, some individuals show a much slower decline and get to live past the age of 100.Studies in these individuals showed polymorphisms in some genes which are associated with long life, such as APOE and FOXO3.However, these associations have not been consistent across different populations, suggesting that ageing is rather polygenic [23].", + "On the other hand, the same evolutionary-motivated strategy suggesting to focus on more heterogeneous phenotypes (as opposite to more homogenous) can be highly beneficial for unraveling genetic predisposition to fundamental mechanisms of intrinsic biological aging and, consequently, to geriatric diseases.Indeed, aging is associated with systemic remodeling of an organism's functioning which increases chances of virtually all geriatric disorders (Franco et al. 2009;Franceschi et al. 2000;Martin et al. 2007;Cutler and Mattson 2006).Experiments with laboratory animals (Johnson 2006) and heritability estimates in humans (Christensen et al. 2006;Iachine et al. 1998) show that aging can be genetically regulated (Finch and Tanzi 1997;Martin et al. 2007;Vaupel 2010).Accordingly, yielding insights in genetic predisposition to aging-related processes in an organism could be a major breakthrough in preventing and/or ameliorating not one geriatric trait, but perhaps a major subset of such traits (Martin et al. 2007) that can greatly advance progress in solving the problem of extending healthy lifespan in humans.", + "M OST genetic studies involved with aging have focused on identifying genes contributing to particular diseases.More recently, it has been recognized that it is also valuable to examine genetic factors related to diseasefree or healthy aging (1,2).Utilizing twins from the National Academy of Sciences-National Research Council (NAS-NRC) twin panel, we have demonstrated that healthy physical aging is under a significant degree of genetic influence, with a heritability over 50% (3).Our definition of healthy aging focused principally on freedom from cardiovascular disease, and has received considerable support in the more recent literature.Brand and colleagues (4) reported that parental age at death was a significant predictor of coronary heart disease death in the Framingham offspring study and concluded that familial similarities for age at death may be mediated through shared coronary heart disease risk factors.Frederiksen and colleagues (5) reported that increased parental life was associated with a reduction in odds ratio for their children to have diabetes, ischemic heart disease, heart failure, stroke, and hypertension.We have found that better midlife lipid levels and blood pressures were associated with increased parental longevity in the National Heart, Lung, and Blood Institute twin study (6).Centenarian siblings and offspring, besides having increased longevity, have been shown to have better health and better cardiovascular risk factor profiles (7)(8)(9)(10).", + "IntroductionThe recent, remarkable extension of life expectancy is largely attributed to the postponement of mortality at old age (Vaupel, 1997(Vaupel, , 2010)).The years of life gained in the older population residing in developed nations are a success story of public health measures and improved health care.In addition to such external factors, longevity and healthy aging consistently show a modest heritability between 20% and 50% and aging-associated genetic research may provide further insights into the mechanisms of aging (Herskind et al., 1996;McGue et al., 1993;Reed and Dick, 2003).It has been postulated that genes involved in pathways associated with aging identified in animal models, such as insulin-like growth factor (IGF)-insulin signaling, regulation of lipoprotein metabolism, the mTOR pathway, and the oxidative stress response may also influence survival to old or even exceptionally old age in humans (Christensen et al., 2006;Kenyon, 2010;Vellai et al., 2003).However, in humans, common variants within genes involved in these pathways have not been consistently associated with lifespan (Chris-tensen et al., 2006;Kenyon, 2010;Kuningas et al., 2008;Vijg and Suh, 2005).", + "TranslationalA LTHOUGH there is much debate about the processes driving human aging, there is little doubt that genetic influences play a significant role (1).Humans clearly live very much longer than the currently favored laboratory models of aging, and such interspecies differences in reproductively 'fit' life span must have an inherited genetic foundation.Within human populations, environmental and behavioral exposures are important but at least a quarter of life expectancy variation in twin or family studies is attributable to inherited genetic or epigenetic factors (2).Age-related conditions such as type 2 diabetes, myocardial infarction, common cancers, and Alzheimer's disease (AD) typically have onsets after the fourth decade of life; \"successful\" agers delay these onsets until relatively late in life (3).Many aging traits and diseases show moderate heritability, including cardiovascular disease (CVD) (4) and impaired physical functioning (5), independent of known environmental risk factors.", + "Many factors contribute to aging, including genes.This is the first article in a 10-part series that highlight some of what is known about the influence of genes on aging and emerging treatment options that may slow down or potentially reverse the aging process.The series will address \\genes, adducts, and telomeres, decreased immune defenses, oxidation and inefficient mitochondria, toxins and radiation, glycosylation, caloric intake and sirtuin production, neurotransmitter imbalance, hormone mechanisms, reduced nitric oxide, and stem cell slowdown.Underpinning these factors are wear and tear on cells and aging as a result of inability to repair or replace these affected cells.These topics have been addressed in research, health magazines, and even by talk show hosts.There is even a LongevityMap website addressing significant and nonsignificant genetic association studies in aging across the human genome (http://genomics.senescence.info/longevity/).The series will address a scientific and clinical approach to genome-related aging topics.", + "The genetic basis of human longevity has so far been primarily investigated by association studies.Most results from these experiments have been difficult to confirm in independent samples, probably owing to the modest heritability, multifactorial nature, and heterogeneity of the phenotype (Christensen et al., 2006).To date, variation in only two genes has been identified, which has an effect on longevity in various populations: (i) the apolipoprotein E gene (APOE) (Scha chter et al., 1994;Christensen et al., 2006) and (ii) the forkhead box O3A (FOXO3A) gene in the insulin-IGF1 signaling (IIS) pathway (Willcox et al., 2008;Flachsbart et al., 2009).Given the apparent lack of susceptibility candidates, it is conceivable that other genetic factors influence the function or expression of genes relevant for human longevity.IntroductionHuman longevity is influenced by multiple genetic and environmental factors.Approximately 25-32% of the overall variation in adult lifespan is because of genetic variation that becomes particularly important for survival at advanced age (Hjelmborg et al., 2006).Epidemiological studies have revealed that long-lived individuals (LLI), that is, people surviving to the 95th percentile of the respective birth cohort-specific age distributions (Gudmundsson et al., 2000), frequently show a favorable ('healthy') course of the aging process, with the absence or a delayed onset of agerelated diseases (Hitt et al., 1999).Hence, the LLI offer the key to elucidate the molecular mechanisms underlying the 'healthy aging' phenotype (Perls, 2006).", + "IntroductionApproximately 25-30% of the variation in adult lifespan is attributable to genetic factors that become more important with increasing age and exert their strongest effects in nonagenarians and centenarians (Go gele et al., 2010;Hjelmborg et al., 2006).As yet, however, only a few genetic variants have been found consistently to influence longevity.The first to be discovered was the e4 allele of the apolipoprotein E (APOE) gene, a mortality factor that predisposes to both Alzheimer's and cardiovascular diseases (Corder et al., 1993; Panza et al., 2004).APOE e4 is the only variant with a reportedly large adverse effect upon survival at advanced age (Scha chter et al., 1994), and this association has been replicated in several populations (Christensen et al., 2006).Variation in the human forkhead box O3A gene (FOXO3A), in contrast, has been found to be associated with the ability to live long, an effect corroborated by studies in Japanese, German, Italian, US-American, Jewish, Chinese and Danish populations (Anselmi et al., 2009;Flachsbart et al., 2009;Li et al., 2009;Pawlikowska et al., 2009;Soerensen et al., 2010;Willcox et al., 2008).More recently, we have identified exonuclease 1 (EXO1) as a potential novel longevity gene (Nebel et al., 2009).All three genes were detected through candidate-gene approaches.", + "GenAge: the aging gene database Philosophy and overview of resourcesIt is undisputed that genetic factors influence aging.In a remarkable series of recent breakthroughs, a number of genes capable of altering the aging process as a whole -or at least to a large degree -have been identified in animal models and even a few in humans (Finch & Ruvkun, 2001;de Magalhes, 2005;Kenyon, 2005).Furthermore, multiple alleles have been examined for their association with human exceptional longevity (Vijg & Suh, 2005).This is a fascinating and important area of research, yet there are now so many genes being associated with aging and longevity that keeping track of them all is becoming increasingly more difficult.Moreover, it is necessary now to study not only individual genes but their interactions with each other and with the environment, and how together genes give rise to a given phenotype: the so-called systems biology approach.To help researchers address these issues we created GenAge, a database of genes related to longevity and/or aging.", + "I NCREASES in longevity of the general population world- wide are an unprecedented phenomenon with significant health and social impact.Although environmental factors have led to an increase in life span, there is ample evidence that genetic factors are involved in extreme longevity both in humans (1-7) and in other organisms (8).The protective genetic factors that lead to longevity are likely to involve fundamental processes of aging that may be different from those associated with early mortality or premature onset of age-related diseases in younger individuals.The mechanisms of aging in humans are far from understood, but available evidence suggests that several pathways-inflammation, oxidative stress and stress responses, cellular senescence, DNA damage and repair, and the growth hormone or insulinlike growth factor and insulin (GH, IGF, INS) axis-may play key roles (9)(10)(11)(12).Model organisms suggest that inhibiting the GH, IGF, or INS axis, which is involved in regulating cell proliferation, cell death, wound repair, and metabolism, may promote longevity by reducing oxidative stress and slowing the rate of cell replication and the accumulation of somatic-cell DNA mutations (13).There is also evidence for other important pathways such as the heatshock proteins and heat-shock factors that are highly conserved across species and play a role in prolongevity transcription pathways.Clinical and epidemiological investigations, including candidate gene studies, have suggested that inflammation pathways may affect life span and risk of age-related conditions such as cardiovascular disease (CVD) and its risk factors (14)(15)(16)(17)(18)(19).A combination of multiple genetic variants may be required for an individual to achieve exceptional longevity, which may account in part for its rarity.", + "Studies of mono-and dizygous twins have revealed that the genetic contribution to the variation in human lifespan is about 25-30% [12,13], and is most prominent in families clustered for longevity [14,15].This genetic contribution is mainly apparent after the age of 60 years and seems to increase with age [13,16].Furthermore, human lifespan is a complex trait which is assumed to be determined by many genes with small individual effects [17], although the polygenic architecture still needs to be characterized [18,19].The diverse health features of long-lived families illustrate that different age-related diseases have common determinants and implicate that pathways can be identified that attenuate aging and delay age-related disease.From a genomic perspective, individuals from long-lived families are assumed to be characterized by a decreased prevalence of disease-promoting variants (referred to as disease-susceptibility alleles) and an increased prevalence of variants conferring maintenance of health and protection from disease, when compared to population controls.In the last 5 years, many diseasesusceptibility alleles have been identified (National Human Genome Research Institute (NHGRI) genome-wide association study (GWAS) Catalog; http://www.genome.gov/gwastudies/)[20].A first comparison between long-lived individuals, selected from both long-lived families (LLS) and the general population (Leiden 85-plus study), and young controls showed no difference in the distribution or frequency of disease-susceptibility alleles identified in cancer, coronary artery disease and type 2 diabetes [21].The search for lifespan regulating loci -contributing to longevity and population mortality -must therefore extend beyond a focus on disease-susceptibility alleles.We will first discuss the efforts to identify longevity loci by genetics approaches.Conclusions and prospectsOver the past two decades the human aging field has built up the necessary resources to study the biology of aging and longevity by establishing human populations with a diversity of designs.Meta-analyses integrating genetic and phenotypic datasets have successfully identified variants associated with a range of age-related traits and diseases.Despite these accomplishments, the number of novel leads contributing to human lifespan regulation is limited.Although positive regions of linkage and suggestive GWAS hits have been reported, the field has not yet identified the loci that explain the clustering of longevity in families and the variation in biological aging rate in the population.As for animal models, down-signaling of the IIS and mTOR pathway appeared to be relevant in humans.These findings are being followed up by molecular and physiological profiling using skin, fat and muscle tissue of long-lived family members and controls.Human studies now also include the response of nutrient sensing systems to the application of dietary and physical challenges." + ] + ], + "task_id": [ + "FE094A900BA5B3C48A3A67B18B2F12BD", + "8DCEF606839664C8B6C72CF1D181CEEA", + "96B873A13E3B386E38940EF6ECA90D4A", + "F4DA6C97230E89C0226B1433532A16D9", + "B2F5CB7BCD9A827D3A6E0152C030C4B4" + ] +}
\ No newline at end of file diff --git a/gnqa/data/study1/datasets/human/dataset_citizenscientist_aging_3.json b/gnqa/data/study1/datasets/human/dataset_citizenscientist_aging_3.json new file mode 100644 index 0000000..3247692 --- /dev/null +++ b/gnqa/data/study1/datasets/human/dataset_citizenscientist_aging_3.json @@ -0,0 +1,64 @@ +{ + "question": [ + "what genes are associated with aging?", + "Which genes are associated with aging in human ", + "What is GeneNetwork and how does it relate to aging research?" + ], + "answer": [ + "Several genes are associated with aging. These include NAP1L4, which is involved in chromatin structure and increases with age in skin tissue. Other genes include GAB2, linked to late-onset Alzheimer's disease, and QKI, linked to coronary heart disease and successful aging. Genes such as Lamp2, Fas, and Ghr also show significant co-expression with aging. Other genes involved in aging include those in the IGF-1 and vitamin D pathways, estrogen metabolism pathway genes, and SIR2 genes. Genes like APOE, LDLR, CDKN2B, and RBM38 influence lifespan in model organisms. Genes involved in DNA damage response, antioxidant properties, and protein misfolding also show age-related changes. The gene Cd63 is highly connected in aging-associated gene sets. In muscle aging, genes involved in proteasomal and mitochondrial functions show altered expression. The insulin/insulin-like growth factor 1 (IGF1) signaling pathway also modulates aging.", + "The genes associated with aging in humans are APOE, FOXO3A, and to some extent, AKT1.", + "GeneNetwork is a collaborative web-based resource equipped with tools and features for studying gene/gene interactions and exploring genetic correlates to neurobehavioral phenotypes. It houses gene expression and phenotypic data from various species and brain regions, and offers correlation and mapping strategies for assessing associations among multiple genes and QTLs. In the context of aging research, GeneNetwork can be used to analyze large gene expression data sets, model causal networks linking DNA differences to traits, and identify genes common to cellular senescence and functional cognitive decline. It can also help in identifying potential druggable targets for investigation in longevity." + ], + "contexts": [ + [ + "Following are examples of the identified genes and experimental or GWAS link between these genes and aging.On the list of the 25 top genes, NAP1L4 encodes a member of the nucleosome assembly protein (NAP) family, which interacts with both core and linker histones, and shuttles between the cytoplasm and nucleus, suggesting a role as histone chaperone.Histone protein levels decline during aging, and dramatically affect chromatin structure.Remarkably, the lifespan can be extended by manipulations that reverse the age-dependent changes to chromatin structure, indicating the pivotal role of chromatin structure in aging [32].In another example, gene expression of NAP1L4 increases with age in the skin tissue [33].Findings of GWAS link a number of the identified genes to age-related disorders, such as GAB2 and late onset Alzheimer's disease [86], and QKI and coronary heart disease/myocardial infarction [79].Interestingly, GWAS reports also link QKI to successful aging [87].Indicative biological pathways associated with the candidate aging genes", + "Examples of biological candidate genes with pleiotropic functions, which are involved in aging in general and in musculoskeletal aging in particular, are numerous: (a) in addition to the IGF-1 and vitamin D genes, estrogen metabolism pathway genes, including estrogen receptors and aromatase (CYP19), are associated with fat-free mass (Walsh et al. 2005) and BMD (Shearman et al. 2004), prostate and breast cancer (Gallicchio et al. 2006), and cardiovascular disease risk (Shearman et al. 2003).", + "In-depth analysis of the age-regulated genes revealed that multiple genes in the DNA damage response pathway were upregulated with age including those that function in non-homologous end-joining repair (mre11, rad50, Ku80 and mus308) and in translesion DNA synthesis (mus205 and DNApol-eta) [44][45][46].Genes that encoded enzymes with antioxidant properties, such as the thioredoxin reductase Trxr-1, and antioxidant genes involved in glutamate metabolism, such as GlnRS, isoQC and QC, were also upregulated with age [47][48][49][50].We also observed increased age-associated expression of chaperone genes (Cct1, Cct4, Cct5, Cct6, Hsc70-4) and the unfolded protein response transcription factor Xbp1, consistent with an induction of the unfolded protein response [51][52][53].Under stress conditions, there is a translational switch that favors production of stressrelated proteins while decreasing translation of other proteins [54].Paralogs of canonical translation factors such as NAT1 and Rack1, which were both upregulated, promote this switch to cap-independent translation [55,56].Notably, Rheb, which is downregulated with age, positively regulates ribosome production and capdependent translation by activating the mechanistic target of rapamycin (mTOR) kinase pathway [57].Thus, decreased Rheb levels during aging could decrease mTOR pathway activity, which extends lifespan and is protective against age-related pathology [58].Together, these data suggest that multiple genes are induced in aging photoreceptors to mitigate the effects of oxidative stress, protein misfolding and DNA damage.", + "Gene expression modules regulated by agingNearest-neighbor co-expression modules ranging in size from 2 to 40 genes were formed and the collective response of each module to aging across tissues was evaluated. ).The most significant 3-gene module included two proteasome subunit genes (Psmb8 and Psmb9), along with the MHC antigen H2-K1 (M = 10.0;P < 0.001; see Table 3).The three genes contained in this module exhibited highly correspondent patterns of differential expression, with decreased expression occurring in spleen with age, and an age-related up regulation of expression across 13 tissues (Additional File 11).A similar pattern was present with respect to other 3-gene co-expression modules, such as {Tyrobp, Mpeg1, Ctss} and {Sfi1, Pisd, 4933439C20Rik}, and with significant co-expression modules of larger size (Additional File 11).In each of these cases, genes belonging to the same module exhibited similar differential expression patterns in the same tissues, indicating that patterns of co-expression had considerable explanatory power in terms of age-related transcriptional effects.", + "Analysis of prior research (Online Resource 5) shows that the revealed genes can be explicitly involved in other key biological processes in an organism whose role is known to be changing with aging.Specifically, ten genes (BAZ2B, HMGB4, NOC2L, RAI1, SIK1, SMARCA2, SPZ1, TBP, TRIP13, and ZKSCAN1) regulate transcription which is believed to be disrupted when an organism is getting older (Roy et al. 2002).The DBH, TPO, and LSS genes are involved in synthesis of catecholamine, thyroid, and vitamin D hormones, respectively.The GPER binds estrogen and HCRTR2 binds orexin-A and orexin-B neuropeptid hormones.Hormonal deregulation with aging is considered to be one of the major components of senescent processes in an organism (Barzilai and Gabriely 2010).Five genes (ATG2A, NEDD4L, PSMB1, UBXN4, and USP6) are involved in degradation of proteins through ubiquitin-proteasome and the lysosomal/autophagic system.Dysfunction of this system leads to accumulation of damaged proteins in an organism that is associated with aging (Koga et al. 2011).Protein degradation through ubiquitin-mediated proteolysis plays an important role in cell-cycle regulation (Reed 2003).The PSMB1, SIK1, TRIP13, and TTN genes in the revealed set coordinate cell cycle.Cell cycle is linked with the aging-related processes in humans through a gradual increase in cell division errors in all tissues in an organism (Ly et al. 2000).Five genes (EEF1A2, DBH, ITGB2, TUBB2C, and WRN) take part in regulation of apoptosis which plays an important role in the aging process and tumorigenesis (Salvioli et al. 2008).Seven genes (ABCA7, AZGP1, CD36, DEGS2, LSS, PI4KA, and SOAT2) are involved in lipid metabolism which plays one of the key roles in human longevity and healthy aging (Barzilai et al. 2003).", + "In addition to testing genes known to be associated with age-related diseases and phenotypes for association with longevity, genes known to promote longevity in model organisms have been examined in human populations.Mutations in insulin or insulinlike signalling pathway genes have been shown to extend lifespan in Caenorhabditis elegans [20], Drosophila melanogaster [21,22] and mice [23,24].The insulin-signalling pathway negatively regulates the forkhead (FOXO) transcription factor [25].When insulin or insulin-like growth factor signalling is low, FOXO is activated and lifespan extension occurs [26].An overrepresentation of rare insulin-like growth factor I receptor (IGFIR) mutations has been observed in centenarians [27].These mutations are associated with reduced activity of IGFIR as measured in transformed lymphocytes [27].", + "Aging can be viewed as a lethal by-product of activities, such as reproduction and food intake, that are controlled by genes [1].Since most of these genes are evolutionarily conserved, distant species may share common pathways of aging [2].The insulin/insulin-like growth factor 1 (IGF1) signaling pathway could be one such common pathway, as it modulates aging in many species, including Caenorhabditis elegans, Drosophila, mice [3], and possibly humans [4].An elegant study carried out in C. elegans by applying microarray techniques showed that a member of the SIR2like protein family is regulated downstream of DAF-16, a FOXO-family transcription factor that affects the rate of aging in response to the insulin/IGF1 pathway [5].SIR2 proteins constitute an evolutionarily conserved family of NAD-dependent deacetylases called sirtuins [6][7][8].In model organisms the expression levels of SIR2 modulate life span [9][10][11].Since sirtuins are NAD + dependent these proteins through different routes may link energy metabolism, genome maintenance, and aging [11,12].Thus SIR2 genes may play a crucial role in conserved pathways of aging and longevity.", + "Cross-species translation of age-related processesTo identify convergent evidence across species for genes involved in aging, we integrated data from a total of 73 aging-associated gene sets (S4 Table ), derived from 31 publications across 6 species (yeast, worm, fly, rat, mouse, human), and from three web resources (GeneNetwork, GenAge [38], and GWAS Catalog (https://www.ebi.ac.uk/gwas/).Using the \"GeneSet Graph tool\" in GeneWeaver, we identified Cd63 as the most highly connected gene (i.e. it was present in the largest number of sets of genes) (Fig 3).Cd63 was present in 12 gene sets from seven publications across four species (fly, rat, mouse, and human; Table 3).The probability of finding at least one gene in a 12-way intersection, given the observed set sizes and species, is p < 0.0005 (permutations n = 2000).To validate Cd63 as an aging gene, we knocked down the C. elegans ortholog, tsp-7, by feeding RNAi and observed a 10.5% extension of mean lifespan (19.04.0,n = 312 for empty vector(RNAi) vs. 21.06.5 days, n = 317 for tsp-7(RNAi) at 25C; p = 4.8e-7 by the log-rank test) (Fig 4,S5 Table).Manipulating tsp-7 is thus sufficient to influence lifespan in at least one environmental context.", + "Genes Whose Expression Decreased with Age.Of the 26 genes that decreased expression with age in control mice, 23% are involved in DNA replication and the cell cycle (Table 2).Most of these have a negative effect on cell growth and division.Among these, the product of phosphatase and tensin homolog (Pten) gene is a tumor suppressor that induces cell-cycle arrest through inhibition of the phosphoinositide 3-kinase pathway (28).B cell translocation gene 2 (Btg2) is a tumor suppressor that increases expression in response to DNA damage (29).The murine gene product of the amino-terminal enhancer of split (Aes) is a potent corepressor of gene expression and cellular proliferation (30).Calcium-binding protein A11 (S100a10) binds to and regulates the activity of annexin II, which is involved in the transduction of calcium-related mitogenic signals (31).Insulin-like growth factor (IGF) binding protein 1 (Igfbp1) plays an important role in the negative regulation of the IGF-1 system, a stimulator of mitogenesis (32).", + "daf-16 dependent genesAmong the 52 genes that we have tested, 29 genes act almost completely in a daf-16 dependent manner, to regulate lifespan (Table 2).One of the genes identified was daf-2 (Y55D5A_391.b).This serves as a proof of principle that our screen is effective in identification of aging genes.", + "Signatures of aging in muscleFor the muscular system, six clusters of age-related genes with significant enrichment of functional annotation were identified (Fig. 2B; Supplemental Table 9).Aging in muscle was associated with an increase of transcript levels of genes (Clusters 1, 2, and 3) involved in a number of biological processes, including antimicrobial humoral response, ubiquitin-dependent protein catabolism, autophagic cell death, prosthetic group metabolism, protein membrane targeting, secretion pathway, transmembrane receptor protein tyrosine kinase signaling pathway, cell motility, and response to toxin as represented by glutathione S transferase.On the other hand, aging in muscle was found to be associated with decreased transcript levels of genes (Clusters 4-6) involved in generation of energy derived by oxidation of organic compounds as represented by succinate dehydrogenase B (SdhB), in oxidative phosphorylation as represented by ATPase coupling factor 6, in protein kinase cascade as represented by Jun-related antigen, and in metal ion transport as represented by ferritin 1 heavy chain homolog and I'm not dead yet (Indy).It has been shown that SdhB, ATP synthase, ferritin, and aconitase in C. elegans (Hamilton et al. 2005;Hansen et al. 2005) and Indy and SdhB in D. melanogaster (Rogina et al. 2000;Walker et al. 2006) modulate lifespan in these organisms, respectively.Overall, these findings suggest that a prominent feature of aging in muscle is the alteration of expression of genes involved in proteasomal and mitochondrial functions.", + "Several of the genes we identify have previously been shown to influence lifespan in experiments on model organisms.For example, knockouts of the orthologs of APOE, LDLR, CDKN2B, and RBM38 in mice shortens their lifespan [24][25][26][27] , while knockout of IGF1R has the opposite effect 28 .Similarly, overexpression of the FOXO3 orthologue in Drosophila melanogaster 29 and the SNCA orthologue in Caenorhabditis elegans 30 have shown to extend their respective lifespans.Many of our genes are also enriched for pathways previously related to ageing in eukaryotic model organisms, including genomic stability, cellular senescence, and nutrient sensing 31 .For example, FOXO3 and IGF1R are well-known players modulating survival in response to dietary restriction 32 , but we also highlight genes involved in the response to DNA damage and apoptosis, such as CDKN2B, USP28, E2F2, and BCL3.In addition to hallmarks discovered in model organisms, our results suggest that haem metabolism may play a role in human ageing.This pathway includes genes involved in processing haem and differentiation of erythroblasts 33 .Although the enrichment is largely driven by genes linked to the LDLR locus, genes linked to other loci of interest (such as FOXO3, CDKN2B, LINC02513) are involved in similar biological pathways: myeloid differentiation, erythrocyte homeostasis, and chemical homeostasis.To determine the age-related expression of the identified cisand trans-acting genes, we performed a look-up in the dataset of Peters et al. 14 .This large dataset contains the associations of genes with age in whole blood, so we limited ourselves to the cis-and trans-acting genes identified in the whole-blood datasets.We found that FOXO3 expression is increased with age in this dataset, which is in line with the life-extending variant decreasing expression (Supplementary Data 6).Moreover, one cis-(ILF3) and two trans-acting genes (E2F2 and PDZK1IP1) in the LDLR locus show a similar effect (i.e.increased or decreased expression with age combined with the life-extending variant decreasing or increasing expression, respectively).The most interesting, however, seems to be the LINC02513 locus, which showed multiple trans-acting genes to be strongly downregulated with age, while the lead life-extending variant increases expression.LEF1, CCR7, and ABLIM1 even belong to the most significantly affected genes in the whole transcriptomic dataset.This indicates that this long intergenic non-protein coding RNA may serve as a master regulator of age-related transcription in whole blood.", + "94DE MAGALHES ET AL. lar signatures of mammalian aging.Some of the genes overexpressed with age seem to be a response to aging, in that they have been previously found to have protective functions (de Magalha es et al., 2009b).As such, these genes may help organisms manage aging and could be targets for manipulation.Likewise, gene expression analysis of CR has been conducted to identify associated genes (Lee et al., 1999(Lee et al., , 2000)).A number of molecular signatures have emerged from such studies that could be useful to identify candidate processes and pathways that affect aging, biomarkers (see below), and candidate regulators (Anderson and Weindruch, 2010;Hong et al., 2010).", + "Aging-related gene prediction and putative transcriptional mechanismsGeneFriends was used to identify genes related to aging.A seed list of genes known to be consistently overexpressed with age in mammals was used [18].In total, 1119 genes were co-expressed with the aging seed list at p <10 -6 ; Table 1 shows the top 25 genes.Many of these genes have been associated with age-related diseases.Several other genes that have been shown to play a role in aging such as lysosomal-associated membrane protein-2 Lamp2 [19] (p = 5.68 -30 ), Fas [20] (p = 2.70 -31 ) and growth hormone receptor Ghr [21] (p = 1.34 -19 ) also showed a significant co-expression.Anxa2, Anxa3 and Anxa4 also show a low p-value (p < 10 -25 ) as well as several S100 calcium binding proteins which have been shown to interact with annexins [22].Top 25genes co-expressed with aging related genes", + "Fig. 7 Functional relationships of genes implicated in longevity.The genes in red/blue boxes represent genes with increased/decreased mRNA expression in ageing Drosophila (color figure online)", + "The genome-wide RNAi study conducted by the Ruvkun lab, authored by Hamilton et al. [88], identified a total of 89 additional aging genes with disparate functions including cell structure, cell surface proteins, cell signaling, cellular metabolism, and protein turnover.Of the 66 genes with previously known functions, 17 corresponded to various aspects of carbon metabolism, including citric acid cycle enzymes and subunits of complexes I, IV, and V of the ETC.Researchers also speculated that protein translation might play a role in lifespan regulation, based on the identification of iff-1 (T05G5.10),a gene that has homology to the translation initiation factor eIF5A.Other hits from this screen included two genes containing PH domains known to interact with phosphatidylinositol lipids, multiple G protein-coupled receptors, protein processing and degradation genes such as proteases and ubiquitin ligases/hydrolases, and chromatin modifying factors.", + "Genetic studies have shown that aging can be slowed in mutants that are defective in a wide range of cellular processes (such as mitochondrial function, chromatin regulation, insulin signaling, transcriptional regulation, and genome stability).This indicates that aging is a complex process driven by diverse molecular pathways and biochemical events.As such, a powerful approach to study aging is to use systems biology, which allows a multitude of factors affecting aging to be analyzed in parallel.For example, DNA microarrays and gene expression chips have been used to perform a genome-wide analysis of changes in gene expres-sion in old age.Extensive studies in Caenorhabditis elegans and Drosophila melanogaster have identified hundreds of ageregulated genes (Hill et al. 2000;Zou et al. 2000;Lund et al. 2002;Pletcher et al. 2002;Murphy et al. 2003).Several studies have described age-regulated genes in the muscle and brain of mice (Lee et al. 1999(Lee et al. , 2000) ) and the retina and muscle of humans (Yoshida et al. 2002;Welle et al. 2003Welle et al. , 2004).These age-regulated genes may serve as markers of aging, enabling one to assess physiological age independently of chronological age.Analysis of the functions of these age-regulated genes has identified specific biochemical mechanisms that change toward the end of life." + ], + [ + "Genomic analysis of longevity offers the potential to illuminate the biology of human aging.Here, using genome-wide association meta-analysis of 606,059 parents' survival, we discover two regions associated with longevity (HLA-DQA1/DRB1 and LPA).We also validate previous suggestions that APOE, CHRNA3/5, CDKN2A/B, SH2B3 and FOXO3A influence longevity.Next we show that giving up smoking, educational attainment, openness to new experience and high-density lipoprotein (HDL) cholesterol levels are most positively genetically correlated with lifespan while susceptibility to coronary artery disease (CAD), cigarettes smoked per day, lung cancer, insulin resistance and body fat are most negatively correlated.We suggest that the effect of education on lifespan is principally mediated through smoking while the effect of obesity appears to act via CAD.Using instrumental variables, we suggest that an increase of one body mass index unit reduces lifespan by 7 months while 1 year of education adds 11 months to expected lifespan.Genomic analysis of longevity offers the potential to illuminate the biology of human aging.Here, using genome-wide association meta-analysis of 606,059 parents' survival, we discover two regions associated with longevity (HLA-DQA1/DRB1 and LPA).We also validate previous suggestions that APOE, CHRNA3/5, CDKN2A/B, SH2B3 and FOXO3A influence longevity.Next we show that giving up smoking, educational attainment, openness to new experience and high-density lipoprotein (HDL) cholesterol levels are most positively genetically correlated with lifespan while susceptibility to coronary artery disease (CAD), cigarettes smoked per day, lung cancer, insulin resistance and body fat are most negatively correlated.We suggest that the effect of education on lifespan is principally mediated through smoking while the effect of obesity appears to act via CAD.Using instrumental variables, we suggest that an increase of one body mass index unit reduces lifespan by 7 months while 1 year of education adds 11 months to expected lifespan.", + "Background: Genetic research on longevity has provided important insights into the mechanism of aging and aging-related diseases.Pinpointing import genetic variants associated with aging could provide insights for aging research.Methods: We performed a whole-genome sequencing in 19 centenarians to establish the genetic basis of human longevity.Results: Using SKAT analysis, we found 41 significantly correlated genes in centenarians as compared to control genomes.Pathway enrichment analysis of these genes showed that immune-related pathways were enriched, suggesting that immune pathways might be critically involved in aging.HLA typing was next performed based on the whole-genome sequencing data obtained.We discovered that several HLA subtypes were significantly overrepresented.Conclusions: Our study indicated a new mechanism of longevity, suggesting potential genetic variants for further study.", + "Geneticlinkage studies of long-lived human families identified alongevity locus while candidate gene approaches have beenused to identify and confirm the association betweenspecific variants in the FOXO3A gene and humanlongevity [37]. Genome-wide association studies havealso been used to identify the association of APOE with life123Aging Clin Exp Resspan and have yielded insights into potential biologicalpathways and processes related to aging. Despite thesesuccesses, several problems are inherent in humanlongevity studies including potentially high degrees ofenvironmental heterogeneity, genetic diversity, and lack ofbirth matched controls, among others [8].", + "Additional association studies with these families and replication of these results with an independent data set should facilitate the positional cloning of a gene that influences the ability to age well and achieve exceptional longevity.Identification of the genes in humans that allow certain individuals to live to extreme old age should lead to insights on cellular pathways that are important to the aging process.", + "Somatic mutations with the inherited gene variations of each individual cumulatively or synergistically influence the health span and life span [11].Very few genetic variants have been associated with human longevity, but those found include the transcription factor FOXO3 gene, the APOE/TOMM40 and the CDKN2B/ ANRIL loci, which are associated with Alzheimer's disease and cellular senescence [12][13][14].In fact, the heritability for human longevity has been estimated to be approximately 20-30%, according to studies of twins, suggesting that external factors such as diet, environment, physical activity and microbiomes are important factors that influence the health span [14][15][16].The increase in the rate of retrotranscription reflects genome deregulation, creating additional mutations, DNA damage, and other forms of genome instability.For instance, the expression of several families of retrotransposable elements increases with age, as observed in mouse skeletal muscle and human fibroblasts, particularly the long interspersed nuclear element-1 (L1 LINE) [17,18].", + "Ageing in humans is typified by the decline of physiological functions in various organs and tissues leading to an increased probability of death.Some individuals delay, escape or survive much of this age-related decline and live past age 100.Studies comparing centenarians to average-aged individuals have found polymorphisms in genes that are associated with long life, including APOE and FOXOA3, which have been replicated many times.However, the associations found in humans account for small percentages of the variance in lifespan and many other gene associations have not been replicated in additional populations.Therefore, ageing is probably a highly polygenic trait.In humans, it is important to also consider differences in age-related decline that occur within and among tissues.Longitudinal data of age-related traits can be used in association studies to test for polymorphisms that predict how an individual will change over time.Transcriptional and genetic association studies of different tissues have revealed common and unique pathways involved in human ageing.Genomic convergence is a method that combines multiple types of functional genomic information such as transcriptional profiling, expression quantitative trait mapping and gene association.The genomic convergence approach has been used to implicate the gene MMP20 in human kidney ageing.New human genetics technologies are continually in development and may lead to additional breakthroughs in human ageing in the near future.Ageing in humans is typified by the decline of physiological functions in various organs and tissues leading to an increased probability of death.Some individuals delay, escape or survive much of this age-related decline and live past age 100.Studies comparing centenarians to average-aged individuals have found polymorphisms in genes that are associated with long life, including APOE and FOXOA3, which have been replicated many times.However, the associations found in humans account for small percentages of the variance in lifespan and many other gene associations have not been replicated in additional populations.Therefore, ageing is probably a highly polygenic trait.In humans, it is important to also consider differences in age-related decline that occur within and among tissues.Longitudinal data of age-related traits can be used in association studies to test for polymorphisms that predict how an individual will change over time.Transcriptional and genetic association studies of different tissues have revealed common and unique pathways involved in human ageing.Genomic convergence is a method that combines multiple types of functional genomic information such as transcriptional profiling, expression quantitative trait mapping and gene association.The genomic convergence approach has been used to implicate the gene MMP20 in human kidney ageing.New human genetics technologies are continually in development and may lead to additional breakthroughs in human ageing in the near future.The only two genes associated with human longevity that have been replicated in multiple populations are FOXO3A and APOE [11,12,15,26,28 -31].The effect sizes of these two genes for longevity are small with odds ratios of 1.26 and 1.45 for survival to age 100 in replicate studies for FOXO3A and APOE, respectively [10,29].These genes account for only a small portion of the genetic contribution to longevity measured through family heritability studies [4,5].Therefore, much of the heritability of lifespan remains to be explained.", + "In most experimentally modified animal model systems, single-gene mutations in many different genes have major life extension effects (Fontana et al., 2010;Kenyon, 2010).However, natural human and animal longevity is presumed to be a complex trait (Finch & Tanzi, 1997).In humans, both candidate gene and genome-wide genetic association approaches have been applied in an attempt to identify longevity loci.The frequency of genetic variants has been typically compared between nonagenarian cases and young controls, revealing loci at which genetic variants may contribute to a higher or lower probability of survival into old age.The initial candidate gene studies aimed at finding human longevity genes were dominated by contradictory results (Christensen et al., 2006).The more consistent evidence obtained by repeated observation in independent cohort studies for association with longevity has so far only been observed for three loci, the apolipoprotein E (APOE) locus (Schachter et al., 1994;Christensen et al., 2006), the FOXO3A locus (Willcox et al., 2008;Flachsbart et al., 2009;Pawlikowska et al., 2009;Soerensen et al., 2010), and the AKT1 locus (Pawlikowska et al., 2009).Thus, despite the expectation that longevity would be influenced by many genetic variants with small effect sizes, the effect of variants has consistently been shown in only three genes.", + "The lack of success in the identification of genes related to aging in humans may be due to the complexity of the phenotype.One approach to investigate aging and longevity is to compare frequencies of genetic variants between nonagenarians or centenarians and the general population.This approach led to the discovery of an association between APOE (Deelen et al., 2011;Ewbank, 2007;Gerdes et al., 2000) and more recently FOXO3A (Anselmi et al., 2009;Flachsbart et al., 2009;Li et al., 2009a;Pawlikowska et al., 2009;Willcox et al., 2008) and human aging and longevity.However, a recent genome-wide association study (GWAS) of individuals reaching the age of 90 or older failed to identify genome-wide significant variants (Newman et al., 2010).Human longevity and healthy aging show moderate heritability (20%-50%).We conducted a meta-analysis of genome-wide association studies from 9 studies from the Cohorts for Heart and Aging Research in Genomic Epidemiology Consortium for 2 outcomes: (1) all-cause mortality, and (2) survival free of major disease or death.No single nucleotide polymorphism (SNP) was a genome-wide significant predictor of either outcome (p 5 10 8 ).We found 14 independent SNPs that predicted risk of death, and 8 SNPs that predicted event-free survival (p 10 5 ).These SNPs are in or near genes that are highly expressed in the brain (HECW2, HIP1, BIN2, GRIA1), genes involved in neural development and function (KCNQ4, LMO4, GRIA1, NETO1) and autophagy (ATG4C), and genes that are associated with risk of various diseases including cancer and Alzheimer's disease.In addition to considerable overlap between the traits, pathway and network analysis corroborated these findings.These findings indicate that variation in genes involved in neurological processes may be an important factor in regulating aging free of major disease and achieving longevity.Human longevity and healthy aging show moderate heritability (20%-50%).We conducted a meta-analysis of genome-wide association studies from 9 studies from the Cohorts for Heart and Aging Research in Genomic Epidemiology Consortium for 2 outcomes: (1) all-cause mortality, and (2) survival free of major disease or death.No single nucleotide polymorphism (SNP) was a genome-wide significant predictor of either outcome (p 5 10 8 ).We found 14 independent SNPs that predicted risk of death, and 8 SNPs that predicted event-free survival (p 10 5 ).These SNPs are in or near genes that are highly expressed in the brain (HECW2, HIP1, BIN2, GRIA1), genes involved in neural development and function (KCNQ4, LMO4, GRIA1, NETO1) and autophagy (ATG4C), and genes that are associated with risk of various diseases including cancer and Alzheimer's disease.In addition to considerable overlap between the traits, pathway and network analysis corroborated these findings.These findings indicate that variation in genes involved in neurological processes may be an important factor in regulating aging free of major disease and achieving longevity.", + "In addition to aging-and CR-related genes, another source of candidate genes and pathways for drug design are human longevity-associated genes (Barzilai and Shuldiner, 2001;Browner et al., 2004;Kenyon, 2010).Dozens of genes have now been associated with human longevity (de Magalha es et al., 2009a), although only a handful of genes have been shown to have consistent effects across populations.", + "The genetic basis of human longevity has so far been primarily investigated by association studies.Most results from these experiments have been difficult to confirm in independent samples, probably owing to the modest heritability, multifactorial nature, and heterogeneity of the phenotype (Christensen et al., 2006).To date, variation in only two genes has been identified, which has an effect on longevity in various populations: (i) the apolipoprotein E gene (APOE) (Scha chter et al., 1994;Christensen et al., 2006) and (ii) the forkhead box O3A (FOXO3A) gene in the insulin-IGF1 signaling (IIS) pathway (Willcox et al., 2008;Flachsbart et al., 2009).Given the apparent lack of susceptibility candidates, it is conceivable that other genetic factors influence the function or expression of genes relevant for human longevity.", + "Although the models data set comprises all genes (to our knowledge) shown by the time of the latest update to statistically increase longevity or alter the aging process in a noticeable way, in the human data set we try to evaluate whether a given intervention is affecting the aging process itself or not.For example, many mutations may increase longevity by decreasing the incidence of specific diseases, rather than by altering the basic process of aging (de Magalhes et al ., 2005a(de Magalhes et al ., , 2005b)).Therefore, the human data set is not merely an extension of the work conducted in model organisms and of its bibliography, but a manually selected list of the most pertinent human aging candidate genes, each presented with a higher annotation level.We cite studies on whether the functions of aging-associated genes in model organisms are conserved in their human orthologues.Likewise, we cite flaws in previous studies based on new published observations, although we have a neutral stance on conflicting findings from different research groups.Our policy is to cite all conflicting reports and let visitors make their own decisions on how to interpret them.By contrast, each entry in GenAge model organisms has only one reference: the first publication reporting an association of the gene with longevity or aging.Moreover, one of the latest enhancements in the human data set was the inclusion of Gene Ontology annotation.Gene Ontology terms and annotation files were obtained from the Gene Ontology Consortium website (http://www.geneontology.org/ ) and provide an additional layer of description for the gene products in a cellular context (Ashburner et al ., 2000).", + "Ageing in humans is typified by the decline of physiological functions in various organs and tissues leading to an increased probability of death.Some individuals delay, escape or survive much of this age-related decline and live past age 100.Studies comparing centenarians to average-aged individuals have found polymorphisms in genes that are associated with long life, including APOE and FOXOA3, which have been replicated many times.However, the associations found in humans account for small percentages of the variance in lifespan and many other gene associations have not been replicated in additional populations.Therefore, ageing is probably a highly polygenic trait.In humans, it is important to also consider differences in age-related decline that occur within and among tissues.Longitudinal data of age-related traits can be used in association studies to test for polymorphisms that predict how an individual will change over time.Transcriptional and genetic association studies of different tissues have revealed common and unique pathways involved in human ageing.Genomic convergence is a method that combines multiple types of functional genomic information such as transcriptional profiling, expression quantitative trait mapping and gene association.The genomic convergence approach has been used to implicate the gene MMP20 in human kidney ageing.New human genetics technologies are continually in development and may lead to additional breakthroughs in human ageing in the near future.Ageing in humans is typified by the decline of physiological functions in various organs and tissues leading to an increased probability of death.Some individuals delay, escape or survive much of this age-related decline and live past age 100.Studies comparing centenarians to average-aged individuals have found polymorphisms in genes that are associated with long life, including APOE and FOXOA3, which have been replicated many times.However, the associations found in humans account for small percentages of the variance in lifespan and many other gene associations have not been replicated in additional populations.Therefore, ageing is probably a highly polygenic trait.In humans, it is important to also consider differences in age-related decline that occur within and among tissues.Longitudinal data of age-related traits can be used in association studies to test for polymorphisms that predict how an individual will change over time.Transcriptional and genetic association studies of different tissues have revealed common and unique pathways involved in human ageing.Genomic convergence is a method that combines multiple types of functional genomic information such as transcriptional profiling, expression quantitative trait mapping and gene association.The genomic convergence approach has been used to implicate the gene MMP20 in human kidney ageing.New human genetics technologies are continually in development and may lead to additional breakthroughs in human ageing in the near future.The only two genes associated with human longevity that have been replicated in multiple populations are FOXO3A and APOE [11,12,15,26,28 -31].The effect sizes of these two genes for longevity are small with odds ratios of 1.26 and 1.45 for survival to age 100 in replicate studies for FOXO3A and APOE, respectively [10,29].These genes account for only a small portion of the genetic contribution to longevity measured through family heritability studies [4,5].Therefore, much of the heritability of lifespan remains to be explained.", + "Most of the human candidate gene studies were performed in cross-sectional designs (Box 1 and Fig. 1), comparing allele frequencies of potential longevity loci between highly aged individuals and young controls.The candidate gene studies based on single genes have pointed a role for genes involved in, e.g., GH/insulin/IGF-1 signaling, immune regulation, and lipoprotein metabolism (Supporting Information Table S1), although most of these results have not (yet) been confirmed in sufficient independent studies.The most convincing human longevity loci today are APOE and FOXO3A which have frequently been associated with longevity in cross-sectional studies (see for a review [26]) and survival in prospective studies [27][28][29] (Fig. 3).APOE encodes the protein apolipoprotein E which seems to play a role in e.g., lipoprotein metabolism, cognitive function, and immune regulation [30].FOXO3A encodes the protein forkhead box O3 which acts as a transcription factor for many different genes involved in processes like apoptosis and oxidative stress [31]." + ], + [ + "Our recent understanding of biological networks has led to new fields, like network medicine [29].Biological networks can be built using protein interaction and gene co-expression data.A previous paper used proteinprotein interactions to build genetic networks identifying potential longevity genes along with links between genes and aging-related diseases [30].Here, we present the network of proteins and genes co-expressed with the CellAge senescence genes.Assaying the networks, we find links between senescence and immune system functions and find genes highly connected to CellAge genes under the assumption that a guilt-by-association approach will reveal genes with similar functions [31].We next explored what information could be obtained by applying a network analysis to CellAge.From the list of CellAge genes, three networks of CS were generated: a PPI network and two co-expression networks, with the aim of identifying new senescence regulators based primarily on network centrality of the genes.We looked at the RNA-Seq co-expression network in detail, using the main connected component of 3198 genes to find highly central genes to the network as a whole, and those occupying subnetworks of interest.The RNA-Seq was a highly modular network, separated into some subnetworks of distinct functions (Fig. 4).The two largest and more central networks contained a number of known senescence genes.We expanded the analysis of these networks in particular, identifying a number of bottleneck nodes.Cluster 1 was enriched for cell cycle processes, which is not overly surprising given that senescence involves changes in cell cycle progression.However, cluster 2 comprised of enriched terms relating to immune system function.One of the aims in biogerontology is to understand and reverse the effects of aging on the immune system.Additional file 1: Table S38 highlights the genes in both clusters that are potential CS bottlenecks within the network and may warrant further study.Unweighted RNA-Seq co-expression networkWe used CellAge genes that induce and inhibit CS and their co-expressing partners to build a cellular senescence co-expression network.The network consists of a main connected network with 3198 nodes, and a number of smaller \"islands\" that are not connected to the main network (Fig. 4a).In this study, we look at the broad context of CS genes-their association with aging and aging-related diseases, functional enrichment, evolutionary conservation, and topological parameters within biological networks-to further our understanding of the impact of CS in aging and diseases.Using our networks, we generate a list of potential novel CS regulators and experimentally validate 26 genes using siRNAs, identifying 13 new senescence inhibitors.Network analysesThe CellAge genes form both protein-protein and gene co-expression networks.The formation of a proteinprotein interaction (PPI) network is significant in itself given that only ~4% of the genes in a randomly chosen gene dataset of similar size are interconnected [53].In order to have a more holistic view of CS, we were interested in the topological parameters of the networks that CS genes form.For this, several types of networks were constructed using the CellAge genes as seeds: the CS PPI network, along with two CS gene co-expression networks built using RNA-seq and microarray data.Biological networks generally have a scale-free topology in which the majority of genes (nodes) have few interactions (edges), while some have many more interactions, resulting in a power law distribution of the node degree (the number of interactions per node) [31,54].As expected, the node-degree distribution of the above networks does confirm a scale-free structure (Additional file 2: Fig. S9).Additional file 1: Table S32 presents the network summary statistics for the resulting networks.", + "Here we have curated studies from the aging literature and utilized integrative functional genomics in GeneWeaver to address four questions related to aging by analyzing these largescale, complex sets of data: 1) to identify molecular relations between cellular senescence and functional cognitive decline, 2) to examine the intersection between comorbid disease states, 3) to identify new druggable targets for longevity, and 4) to examine cross-species translation of age-related processes.GeneSet GraphTo identify the most highly connected gene within a group of gene sets related to aging, the \"GeneSet Graph\" tool was used.This tool presents a bipartite graph visualization of genes and gene sets.Genes are represented by elliptical nodes, and gene sets are represented by boxes.The least-connected genes are displayed on the left, followed by the gene sets, then the moreconnected genes in increasing order to the right.Genes and gene sets are connected by colored lines to show what genes are in which gene sets.A degree threshold is applied on the gene partite set to reduce the graph size.DiscussionThe growing number of studies and data in many fields, including ageing, requires the development of integrative and computational approaches to analyze the data for consensus and shared biological findings across conditions.Using GeneWeaver's database and analysis tools to address questions in aging research we were able to identify genes common to cellular senescence and functional cognitive decline; to examine gene products at the intersection between obesity and dementia, to identify several potential druggable targets for investigation in longevity, and to identify and validate a cross-species age-related gene from convergent evidence.Our identification of the role for CD63 in aging would not have been made without this use of this large genomic analysis tool.CD63 in C.elegans is member of the tertaspanin family of proteins [47].Tetraspanins are transmembrane scaffolding proteins involved in motility, cell adhesion, proliferation and activation.Recently we showed that knockdown of another tetraspanin in C.elegans, tsp-3, extends lifespan by >20% lifespan as well [48], suggesting that this protein family may be of broader interest in aging.", + "NIH-PA Author ManuscriptNIH-PA Author ManuscriptGeneNetwork (www.genenetwork.org), described in Chapter 6, is a suite of data sets andbioinformatics tools that stores, analyzes, and displays phenotypes as well as large geneexpression data sets for several species (human, monkey, mouse, rat, fly, barley, tomato, andArabidopsis) (Durrant et al. , 2012; Hoffman et al. , 2011; Rosen et al. , 2007). GeneNetworkusers can take advantage of a systems genetics approach (Rosen et al. , 2003, 2007).", + "Interaction network analysisThe increased accuracy and breadth of our RNA-seq data sets allowed us to generate networks of gene functional change in aging liver, above and beyond what was observed using DAVID or GOrilla.Using Ingenuity Pathway Analysis (IPA) we generated, from the differentially expressed protein-coding genes and ncRNAs, interaction networks of functional change.This resulted in multiple overlapping pro-aging networks from which we could distinguish three major molecular phenotypes: inflammation, proliferative homeostasis and lipid metabolism (Figs. 4, 5 and 6).", + "As mentioned previously, GeneNetwork(www.genenetwork.org) is a collaborative Web-based resource equipped with tools andfeatures for studying gene/gene and exploring genetic correlates to neurobehavioralphenotypes (Chesler et al. , 2003, 2004). The Web site is home to a growing collection ofgene expression and phenotypic data from a variety of species and brain regions, with a hostof links to external resources for tracing the interrelationships of a gene among multipleWeb-based resources. GeneNetwork also offers a number of correlation and mappingstrategies for assessing associations among multiple genes and QTLs.", + "The aim of this work was to construct an online tool that can be used to derive novel candidate genes for further studies in aging and complex diseases, in a quick and intuitive manner.Aging is not considered a disease, yet older individuals are more susceptible to several diseases such as Alzheimer's, Parkinson's and cancer.This is one of the reasons why research in this field is rapidly expanding and several hundreds of genes have been linked to aging [16].A major bottleneck in aging/ complex disease research is that it is difficult to determine the causality of transcriptional alterations.It is also unclear if the altered expression profile observed with aging/complex disease consists of one particular biological module or whether it consists of genes that act separately from each other.To this end, GeneFriends outputs transcription factors co-expressed with the genes supplied by the user.", + "Network analyses additionally revealed systems level relationships between age-related diseases and the aging regulators.Miller et al. [42] used a weighted gene coexpression network to identify transcriptional networks in Alzheimer's disease (AD) and found a significant association between gene expression changes during the progression of AD and those during normal aging.Wang et al. [43] constructed a human disease-aging network to study the relationships between aging genes and genetic disease genes.This study showed that disease genes located close to aging genes have central positions in the PPI network.Second, although high-throughput data on different layers of the living system Fig. (2) can now be easily obtained, it remains obscure as to how information flows or exchanges across these layers to arrive at the alternative \"old/aging\" state of the molecular network from the young state, what events cause the state transition and what are the network circuitry and epigenetic events locking the network in the aging state. [62,63].Clusters or communities in the networks were extracted by the MCL algorithm [64] and only top clusters with more than 10 genes for each network are shown, and different clusters with similar functional enrichment are merged. (A) The network based on a protein functional interaction network [65]. (B) The edges in the network represent cocitation of the two genes together in at least 2 PubMed abstracts under the context of aging, i.e. also co-cited with \"aging\", \"ageing\", \"lifespan\", \"life span\" as calculated by Cociter (http:// www.picb.ac.cn/ hanlab/cociter).In both graphs, the enriched functions within the gene clusters are coded by the colors of the nodes: green -signaling pathways, red -DNA damage response, yellow -mitochondria function and oxidative stress response, blue -ribosome and translation related genes, and purple -protein localization, transport and autophagy.Fig. (4).Network communities among known aging regulators in human and model organisms based on two different interactome datasets.Nodes include human aging regulators and human homologs of aging regulators in worm, fly and mouse from GenAge[62,63].Clusters or communities in the networks were extracted by the MCL algorithm[64] and only top clusters with more than 10 genes for each network are shown, and different clusters with similar functional enrichment are merged. (A) The network based on a protein functional interaction network[65]. (B) The edges in the network represent cocitation of the two genes together in at least 2 PubMed abstracts under the context of aging, i.e. also co-cited with \"aging\", \"ageing\", \"lifespan\", \"life span\" as calculated by Cociter (http:// www.picb.ac.cn/ hanlab/cociter).In both graphs, the enriched functions within the gene clusters are coded by the colors of the nodes: green -signaling pathways, red -DNA damage response, yellow -mitochondria function and oxidative stress response, blue -ribosome and translation related genes, and purple -protein localization, transport and autophagy.Network approaches are instrumental in discerning global properties of aging/lifespan regulators, making computational predictions and inferring the modularity and relationships of various aging regulators.However, they should be applied with great caution as to avoid bias introduced by the literature, the lack of spatial and temporal information, or the limited coverage of the network [44].", + "GeneNetwork.org also offers a powerful statistical platform foronline network analyses and mapping, enabling numerous molecular questions to be probed in one centralized location(Chesler et al. , 2003, 2005; Li et al. , 2010; Mulligan et al. , 2012,2017, 2019). Most data are from groups of animals or humanswho have been fully genotyped or even sequenced. As a result, itcan be used to model causal networks that link DNA differencesto traits such as differences in expression, cell number, volumes,and behavior using real-time computation and graphing.", + "Another use of GenAge is for researchers to associate genes already under investigation with other, little-known genes, which can lead to new experimental designs.To do this, protein-protein interactions are one possible approach, and GenAge's human data set features 673 interactions, most of which manually curated obtained from the Human Protein Reference Database (HPRD) (Peri et al ., 2003).In fact, one of our earliest applications of GenAge involved finding novel genes that may be linked to aging by way of an analysis of protein-protein interactions.The principle being that proteins not previously thought to be related to aging which interact with a large number of proteins directly linked to aging might too be involved in aging and are thus promising candidates for future studies (de Magalhes & Toussaint, 2004;Budovsky et al ., 2007).Similar works are made easy with GenAge.Protein-protein interactions with one or more genes as query can be visualized (Fig. 2), or they can be downloaded for use with more advanced biological pathway analysis software.By providing a list of candidate genes, the genes in GenAge can serve as basis for gene expression and genetic association longevity studies, including human studies, or even for clinical studies of interventions hypothesized to affect aging.In fact, recent gene expression studies have used GenAge to focus on aging-associated genes (Chen et al ., 2008;Hardman & Ashcroft, 2008).Because researchers may have disparate opinions regarding the relevance of different model systems to understand human aging, an important tool to investigate the human data set as a whole is GenAge's browser (http://genomics.senescence.info/genes/browser.php).The browser makes it possible to (among other things) retrieve only those entries that pass certain criteria related to the annotation in GenAge, such as selection process and gene function.For example, users can retrieve only genes associated with aging in non-mammalian model organisms.Several criteria can be set, all of which must be passed for genes to be retrieved.Afterwards, users can select all or a subgroup of the genes retrieved, which can then be further analysed using simple statistical tools (http://genomics.senescence.info/genes/analyse.php).Cross-links between the different sections of GenAge make it easy to navigate between the different tools and data sets." + ] + ], + "task_id": [ + "C6C7CEF19CE7C27CF4BC6906259CDDF9", + "3EC47C56606B02F00CF2449AB311365C", + "37A26345145679F7539EA8F512623F5E" + ] +}
\ No newline at end of file diff --git a/gnqa/data/study1/datasets/human/dataset_citizenscientist_diabetes_1.json b/gnqa/data/study1/datasets/human/dataset_citizenscientist_diabetes_1.json new file mode 100644 index 0000000..68b53cb --- /dev/null +++ b/gnqa/data/study1/datasets/human/dataset_citizenscientist_diabetes_1.json @@ -0,0 +1,109 @@ +{ + "question": [ + "What are the genetic bases for the varying efficacy of diabetes treatments among individuals?", + "Explain Protective Genetic Factors Against Diabetes in Elderly Populations", + "Explain Effect of Lifestyle Modifications on Aging-Associated Diabetes Risk", + "Explain The Role of Longevity Genes in Protecting Against Diabetes", + "What are the types of diabetes" + ], + "answer": [ + "The varying efficacy of diabetes treatments among individuals can be attributed to genetic variants present in drug receptors or drug metabolizers, such as OCT genes, KCNJ11, ABCC8, and CYP2C9. These genetic variants can influence the metabolism, transportation, and therapeutic mechanisms of antidiabetic drugs, leading to differences in drug disposition, glycemic response, tolerability, and incidence of adverse effects. Additionally, gene-gene, gene-environment, and gene-treatment interactions may also contribute to the variation in disease progression and response to therapy.", + "Studies have shown that foreign genetic admixture can have a protective effect against diabetes. For instance, the prevalence of Type 2 Diabetes (T2D) in elderly Nauruans was reported to be 83% in full-blooded islanders but only 17% in those with foreign genetic admixture. This suggests that foreign genotypes can reduce the risk of diabetes. Similar findings have been reported in Pima Indians and other Native American populations.", + "Lifestyle modifications, including increased physical activity, dietary changes, and weight reduction, have been shown to significantly reduce the risk of Type 2 Diabetes (T2D), which is often associated with aging. Regular physical activity improves insulin sensitivity, reducing the stress on insulin-producing cells. Dietary changes, particularly those reducing the intake of processed carbohydrates, energy-rich foods, and animal fats, can also lower T2D risk. Weight loss can directly improve insulin secretion and sensitivity. However, these lifestyle changes require not only individual efforts but also changes in social, built, and food environments.", + "The background text does not provide information on the role of longevity genes in protecting against diabetes.", + "The types of diabetes include Type 1 diabetes, Type 2 diabetes, gestational diabetes, and diabetes due to specific causes such as genetic defects causing deficient insulin secretion or action, diseases of the pancreas, and use of certain drugs. There are also subtypes of adult-onset diabetes identified as severe autoimmune diabetes, severe insulin-deficient diabetes, severe insulin-resistant diabetes, mild obesity-related diabetes, and mild age-related diabetes." + ], + "contexts": [ + [ + "Genetics and pharmacogenomicsWe are at the dawn of the age of pharmacogenomics and personalized medicine and ever closer to achieving the \"$1,000 genome. \"What does this mean for diabetes?Forward genetic approaches (i.e., starting from phenotype and identifying the genetic cause) to dissecting mendelian forms of diabetes have been hugely successful in identifying a small subset of diabetic patients in whom rare, highly penetrant mutations of a single gene cause their diabetes (13).While common variants of these genes that make a small contribution to polygenic diabetes may also exist (13), the variants causing monogenic diabetes have limited utility in pharmacogenetics due to their low allele frequency.The vast majority of type 2 diabetes patients have polygenetic forms of the disease that typically also require a permissive environment (e.g., obesity, sedentary lifestyle, advancing age, etc.) to be penetrant.Each locus contributes a small amount of risk (odds ratios typically ranging from 1.1- to 1.5-fold), so large cohorts are needed to identify the at-risk alleles.Some of the loci identified to date include transcription factor 7-like 2 (TCF7L2) (14), calpain 10 (CAPN10) (15), peroxisome proliferator-activated receptor (PPARG) (16), and potassium inwardly rectifying channel, subfamily J, member 11 (KCNJ11) (17).However, the pace of gene identification is increasing due to the availability of large-scale databases of genetic variation and advances in genotyping technology.A recent genome-wide study identified solute carrier family 30, member 8 (SLC30A8), a cell Zn transporter, and two other genomic regions as additional diabetes risk loci (18).", + "With further progress in unravelling the pathogenic roles of genes and epigenomic phenomena in type 2 diabetes, pharmacogenomic and pharmacoepigenomic studies might eventually yield treatment choices that can be personalised for individual patients.", + "Pharmacogenomics of Type 2 DiabetesWith the advent of GWAS, studies on the roles of inherited and acquired genetic variations in drug response have undergone an evolution from pharmacogenetics into pharmacogenomics, with a shift from the focus on individual candidate genes to GWAS [147].Clinically, it is often observed that even patients who receive similar antidiabetic regimens demonstrate large variability in drug disposition, glycemic response, tolerability, and incidence of adverse effects [148].This interindividual variability can be attributed to specific gene polymorphisms involved in the metabolism, transportation, and therapeutic mechanisms of oral antidiabetic drugs.Pharmacogenomics is on the agenda to explore feasible genetic testing to predict treatment outcome, so that appropriate steps could be taken to treat type 2 diabetes more efficiently.", + "Future directionsDelays in identifying genetic variants that are robustly associated with differences in individual predisposition to the complications of diabetes, have constrained progress towards a mechanistic understanding of these conditions.Some approaches to overcome these limitations are outlined in Figure 4.", + "Genomics of T2DDiet, lifestyle, environment, and even genetic variation influence an individual's response to disease therapy.Like GWAS which identify genetic variants conferring risk for a disease, studies have been carried out for identifying genetic variants responsible for patient differences in drug response.Pharmacogenomics in diabetes focuses on the study of gene polymorphisms which influence an individual's response to antidiabetic drugs.Such genetic variants influence the pharmacodynamics and/or pharmacokinetics of the drug, thus affecting its efficacy or toxicity in an individual.The difference in response to treatments and therapies across individuals on account of these factors strengthens the case for personalized medicine in diabetes.Genetics & genomics of T2D Genome-wide association studies (GWAS) have been helpful in identifying a large number of genetic variants conferring risk to T2D.However, only close to 10% heritability is explained by these variants.Other genetic variants, particularly those which are rare but with significant effects need to be identified. Genetic variability is responsible for the difference in response to antidiabetic drugs seen across individuals.", + "The aim of this study was to summarize current knowledge and provide perspectives on the relationships between human genetic variants, type 2 diabetes, antidiabetic treatment, and disease progression.Type 2 diabetes is a complex disease with clear-cut diagnostic criteria and treatment guidelines.Yet, the interindividual response to therapy and slope of disease progression varies markedly among patients with type 2 diabetes.Gene-gene, gene-environment, and gene-treatment interactions may explain some of the variation in disease progression.Several genetic variants have been suggested to be associated with response to antidiabetic drugs.Some are present in drug receptors or drug metabolizers (OCT genes, KCNJ11, ABCC8, and CYP2C9).Numerous type 2 diabetes risk variants have been identified, but genetic risk score models applying these variants have failed to identify 'disease progressors' among patients with diabetes.Although genetic risk scores are based on a few known loci and only explain a fraction of the heritability of type 2 diabetes, it seems that the genes responsible for the development of diabetes may not be the same driving disease progression after the diagnosis has been made.Pharmacogenetic interactions explain some of the interindividual variation in responses to antidiabetic treatment and may provide the foundation for future genotype-based treatment standards.Pharmacogenetics and Genomics 25:475-484Diabetes progression is a multifactorial process; however, pharmacogenetics seems to play an important role in understanding the different phenotypes and progression rates among diabetic patients.Genetic variants associated with decreased effect of a certain drug might explain why some individuals are more likely to experience glycemic deterioration on a given treatment.In the following sections, different genetic variants and their impact on treatment efficacy and outcome will be addressed.The aim of this study was to summarize current knowledge and provide perspectives on the relationships between human genetic variants, type 2 diabetes, antidiabetic treatment, and disease progression.Type 2 diabetes is a complex disease with clear-cut diagnostic criteria and treatment guidelines.Yet, the interindividual response to therapy and slope of disease progression varies markedly among patients with type 2 diabetes.Gene-gene, gene-environment, and gene-treatment interactions may explain some of the variation in disease progression.Several genetic variants have been suggested to be associated with response to antidiabetic drugs.Some are present in drug receptors or drug metabolizers (OCT genes, KCNJ11, ABCC8, and CYP2C9).Numerous type 2 diabetes risk variants have been identified, but genetic risk score models applying these variants have failed to identify 'disease progressors' among patients with diabetes.Although genetic risk scores are based on a few known loci and only explain a fraction of the heritability of type 2 diabetes, it seems that the genes responsible for the development of diabetes may not be the same driving disease progression after the diagnosis has been made.Pharmacogenetic interactions explain some of the interindividual variation in responses to antidiabetic treatment and may provide the foundation for future genotype-based treatment standards.Pharmacogenetics and Genomics 25:475-484To date, a number of genetic variants have been identified to be associated with response to antidiabetic drugs.Of these, some variants are present in either drug receptors or drug metabolizers as for OCT genes, KCNJ11, ABCC8, and CYP2C9.Other variants are known T2D susceptibility variants such as TCF7L2.To identify variants of importance for antiglycemic drug response, GWAS in large cohorts of patients with diabetes with detailed measures of pharmacotherapy are lacking.The pharmacologic management of patients with diabetes often involves drug classes other than antidiabetics.Pharmacogenetic studies on statin and antihypertensive treatment have reported several genetic variants associated with treatment response and adverse drug reactions [101,102].It therefore seems natural to conclude that the future perspectives in pharmacogenetics is to conduct genetic studies in large cohorts with wellphenotyped individuals, thorough data collection on baseline treatment, concomitant treatment, adherence to therapy as well as data collection on comorbidity and additional disease diagnoses.These types of pharmacogenetic studies may provide unique opportunities for future genotype-based treatment standards and may help in delaying or changing the slope of disease progression among patients with T2D.", + "Genetic determinants of diabetes and metabolic syndromes.", + "Thus, specific answers are lacking as to the genetic basis for type 2 diabetes.Still, speculations can be made about what eventually will be found.It is almost certain the genetic basis for type 2 diabetes and other common metabolic diseases will be extremely complex-that a predisposition for the disease will require several genetic hits as opposed to just one.Also, it is generally assumed there will be many susceptibility genes for type 2 diabetes, with enormous variability in different families and ethnic groups.Not known is whether there will be a common form of type 2 diabetes, with any one or even a few susceptibility genes accounting for a sizeable percentage of affected persons.As such, identifying diabetes genes will be slow and difficult.", + "Ta rge ted T r e atmen t a nd Pr e v en t ion4][75] In monogenic forms of diabetes, at least, genetic testing already drives the choice of therapy.For example, in patients who have maturity-onset diabetes of the young due to mutations in the gene encoding glucokinase (GCK), the hyperglycemia is mild and stable, the risk of complications is low, and dietary management is often sufficient.In contrast, in patients who have maturity-onset diabetes of the young due to mutations in HNF1A, the disease follows a more aggressive course, with a greater risk of severe complications, but is particularly responsive to the hypoglycemic effects of sulfonylureas. 62,73Most children with neonatal diabetes have mutations in KCNJ11 or ABCC8, adjacent genes that jointly encode the beta-cell ATP-sensitive potassium channel that mediates glucose-stimulated insulin secretion and is the target of sulfonylureas.In such children, treatment with sulfonylureas has proved more effective and convenient than the lifelong insulin therapy previously considered the default option. 74,75n children with severe obesity due to profound leptin deficiency, exogenous leptin therapy is lifesaving. 76s yet, there are insufficient genetic data to support management decisions for common forms of type 2 diabetes and obesity. 77Although the TCF7L2 genotype is associated with variation in the response to sulfonylurea treatment, 78 the effect is too modest to guide the care of individual patients.For the time being, the contribution of genetic information to therapy is most likely to come through the drug-discovery pipeline.Information from genetic studies could be used to identify new targets for pharmaceutical intervention that have validated effects on physiological characteristics, to provide information about new and existing targets (e.g., clues about the long-term safety of pathway intervention), 32 and to characterize high-risk groups to enable more efficient clinical trials of agents designed to reduce the progression of type 2 diabetes or obesity or the risk of complications.", + "Type 2 DiabetesWhile a subset of genetic variants are linked to both type 1 and type 2 diabetes (42,43), the two diseases have a largely distinct genetic basis, which could be leveraged toward classification of diabetes (44).Genome-wide association studies have identified more than 130 genetic variants associated with type 2 diabetes, glucose levels, or insulin levels; however, these variants explain less than 15% of disease heritability (45)(46)(47).There are many possibilities for explaining the majority of type 2 diabetes heritability, including disease heterogeneity, gene-gene interactions, and epigenetics.Most type 2 variants are in noncoding genomic regions.Some variants, such as those in KCNQ1, show strong parent-of-origin effects (48).It is possible that children of mothers carrying KCNQ1 are born with a reduced functional b-cell mass and thereby are less able to increase their insulin secretion when exposed to insulin resistance (49).Another area of particular interest has been the search for rare variants protecting from type 2 diabetes, such as loss-of-function mutations in SLC30A8 (50), which could offer potential new drug targets for type 2 diabetes.Research GapsAfter consideration of the known genetic associations with diabetes risk, consensus developed that the field is not yet at a place where genetics has provided actionable information to guide treatment decisions, with a few notable exceptions, namely in MODY.The experts agreed there is a need to use the increasingly accessible and affordable technologies to further refine our understanding of how genetic variations affect the rate of progression of diabetes and its complications.The expert committee also highlighted the importance of determining categorical phenotypic subtypes of diabetes in order to link specific genetic associations to these phenotypic subtypes.These types of information are necessary to develop the tools to predict response to-and side effects of-therapeutic approaches for diabetes in patient populations.", + "Genome-wide association (GWAS) and sequencing studies are providing new insights into the genetic basis of type 2 diabetes (T2D) and the inter-individual variation in glycemic traits, including levels of glucose, insulin, proinsulin and hemoglobin A1c (HbA1c).At the end of 2011, established loci (P < 5 10 8 ) totaled 55 for T2D and 32 for glycemic traits.Since then, most new loci have been detected by analyzing common [minor allele frequency (MAF)>0.05]variants in increasingly large sample sizes from populations around the world, and in trans-ancestry studies that successfully combine data from diverse populations.Most recently, advances in sequencing have led to the discovery of four loci for T2D or glycemic traits based on low-frequency (0.005 < MAF 0.05) variants, and additional low-frequency, potentially functional variants have been identified at GWAS loci.Established published loci now total 88 for T2D and 83 for one or more glycemic traits, and many additional loci likely remain to be discovered.Future studies will build on these successes by identifying additional loci and by determining the pathogenic effects of the underlying variants and genes.", + "Together, the findings from these studies were among the first to demonstrate that the genetic etiology of hyperglycemia may modulate response to hypoglycemia agents.Such results yielded strong implications for patient management and paved the way toward elucidating additional genetic factors that might influence drug response in the treatment of T2D.", + "Anumber of studies have implicated a genetic basis for type 2 diabetes (1).The discovery of monogenic forms of the disease underscored the phenotypic and genotypic heterogeneity, although monogenic forms account for only a few percent of the disease (1).Defining the genetic basis of the far more common polygenic form of the disease presents more difficulties (2,3).Nevertheless, some interesting results have recently emerged.A genome scan of Hispanic-American families (330 affected sib-pairs [ASPs]) found linkage to chromosome 2q37 (logarithm of odds [LOD] 4.15) (4), and the causative gene has been recently reported (5).A number of other genome scans in various racial groups have identified other putative susceptibility loci (6 -8).The largest genome-wide scan for type 2 diabetes loci reported to date studied 477 Finnish families (716 ASPs) and found evidence for linkage to chromosome 20q12-13.1(LOD 2.06 at D20S107) (9).Interestingly, similar results have been reported by at least three other groups (10 -12).", + "Because more than one genetic mutation contributes to T1D, the differences that occur between individuals of different backgrounds (for instance, race and locality) may need to be considered in the design of treatments.Personalized medicine is about the ability to classify individuals into subpopulations that differ in their susceptibility to a particular disease or in their response to a specific treatment (Blau and Liakopoulou, 2013;Timmeman, 2013).This will allow for a more accurate diagnosis per individual, and design of specific treatment plans including gene therapy.", + "Genetic predisposition to diabetes mellitus type 2: will large collaborative efforts be able to overcome the geneticist's nightmare?" + ], + [ + "If an environmental contributor is near ubiquitous and the geneticpredisposition common as well, interventions are most sensibly weighted towardsenvironmental risk factor modification. Even here, though, there is room for further research, since the etiopathogenesisof type 2 diabetes may not be as well understood as some suggest. Specifically,Chaufan implies that dietary intervention to prevent prenatal programmingleading to susceptibility to develop type 2 diabetes (the fetal origins of adult onsetdisease hypothesis) is as evidence-based as dietary management of the adult diabetic state. However, many questions remain in this area.", + "In the bivariate analysis, there was a tendency of probands to be protected from diabetes when the father lived through a year or years of a surfeit of food during his SGP (OR 0.14, P=0.06).The same experience for the grandfather tended to be followed by a higher risk for the probands dying from diabetes, according to the bivariate analysis (OR 2.34, P=0.09).In the multivariable analysis, a father's exposure to a surfeit of food during his SGP tended to protect the proband from diabetes (OR 0.13, c.i. 0.02 -1.07, P=0.06).However, if the paternal grandfather was exposed to a surfeit of food during his SGP, then the proband had an over-mortality in diabetes (OR 4.1, c.i. 1.33 -12.93,P=0.01) when age at death and the effects of possible overeating among parents and grandparents during their respective SGP were taken into account.", + "EnvironmentThe second factor in Figure 1 is environmental aspects.An important concept is the diabetes genotype typically causes only a predisposition for glucose intolerance (note the terminology susceptibility gene was used in the preceding paragraphs).Whether one develops the diabetes phenotype depends on environmental factors, some obvious in how they act, others less so.For instance, the Nurses Health Survey showed positive associations between obesity and lack of physical activity in the development of type 2 diabetes (as expected), but also protection by not smoking and moderate alcohol intake (14).Already discussed, many studies have shown an association between TV watching, high calorie diets, and lack of physical activity with risk of diabetes, i.e., our modern lifestyle, so it is not surprising that there is an explosion in the incidence of diabetes worldwide.", + "Evidence from genetic admixture Some of the clearest data come from studies of genetic admixture.The prevalence of T2D in elderly Nauruans was reported to be 83% in full-blooded islanders but only 17% in those with (unsuspected) foreign genetic admixture. 15Since there were no apparent cultural dierences between the groups, this indicated a protective eect of foreign genotypes on diabetes risk.Similar ndings have been reported in Pima Indians 16 and other Native American populations. 17", + "IntroductionClustering in families implicates a genetic component of diabetic nephropathy, but so far the specific genes underlying diabetic nephropathy remain largely unknown [1,2].Family studies have furthermore revealed that parental type 2 diabetes mellitus is associated with diabetic nephropathy in offspring with type 1 diabetes mellitus [3,4].A positive family history of type 2 diabetes mellitus has also been associated with cardiovascular disease [5] as well as markers of cardiovascular disease [6] in offspring with type 1 diabetes mellitus.Genetic variants or single-nucleotide polymorphisms (SNPs) predisposing to type 2 diabetes mellitus in the Finnish population have recently been identified in large-scale, genome-wide association studies [7,8].The question thus arises of whether these SNPs, which predispose to type 2 diabetes mellitus, also predispose to diabetic nephropathy and related complications in patients with type 1 diabetes mellitus.We therefore assessed the impact of a set of SNPs known to influence susceptibility to type 2 diabetes mellitus on diabetic nephropathy as well as diabetic retinopathy and cardiovascular disease in patients with type 1 diabetes mellitus.", + "Family and twin studies indicate that a substantial fraction of susceptibility to type 1 diabetes is attributable to genetic factors.These and other epidemiologic studies also implicate environmental factors as important triggers.Although the specific environmental factors that contribute to immune-mediated diabetes remain unknown, several of the relevant genetic factors have been identified using two main approaches: genome-wide linkage analysis and candidate gene association studies.This article reviews the epidemiology of type 1 diabetes, the relative merits of linkage and association studies, and the results achieved so far using these two approaches.Prospects for the future of type 1 diabetes genetics research are considered.Type 1 diabetes has unusual epidemiological features related to genderType 1 diabetes also displays unusual patterns of inheritance that may yield insights into etiology and provide clues to the best methods for analyzing genetic studies.The risk to the offspring is generally greater from a mother or father who was diagnosed at an early age (again suggesting that early-onset cases are more heavily genetically 'loaded').However, the risk of diabetes is approximately two to four times higher for a child whose father has type 1 diabetes than one whose mother is affected [see (52,53) and references therein].This parental difference is largely due to a low risk for offspring of mothers who were diagnosed at a later age (53).The difference could be explained by at least three different factors.First, the risk alleles could only be active when transmitted by the father (such as is seen in imprinting, where only one of the parental alleles is expressed).Alternatively, a maternal environmental factor during pregnancy could be protective.However, it is difficult to see how this protective effect would be restricted to mothers diagnosed at a later age, especially since the protective effect was unrelated to the mother's duration of diabetes or even diabetic status at delivery (53).Finally, mothers who are diagnosed at a later age could represent more 'environmental' cases of diabetes, and thus be less likely to pass on risk genes to their offspring.Family and twin studies indicate that a substantial fraction of susceptibility to type 1 diabetes is attributable to genetic factors.These and other epidemiologic studies also implicate environmental factors as important triggers.Although the specific environmental factors that contribute to immune-mediated diabetes remain unknown, several of the relevant genetic factors have been identified using two main approaches: genome-wide linkage analysis and candidate gene association studies.This article reviews the epidemiology of type 1 diabetes, the relative merits of linkage and association studies, and the results achieved so far using these two approaches.Prospects for the future of type 1 diabetes genetics research are considered.", + "CONCLUSIONThe greatest genetic risk (both increased risk, susceptible, and decreased risk, protective) for type 1 diabetes is conferred by specific alleles, genotypes, and haplotypes of the HLA class II (and class I) genes.There are currently about 50 non-HLA region loci that also affect the type 1 diabetes risk.Many of the assumed functions of the non-HLA genes of interest suggest that variants at these loci act in concert on the adaptive and innate immune systems to initiate, magnify, and perpetuate -cell destruction.The clues that genetic studies provide will eventually help lead us to identify how -cell destruction is influenced by environmental factors.While there is extensive overlap between type 1 diabetes and other immune-mediated diseases, it appears that type 1 and type 2 diabetes are genetically distinct entities.These observations may suggest ways to help identify causal gene(s) and, ultimately, a set of disease-associated variants defined on specific haplotypes.Unlike other complex human diseases, relatively little familial clustering remains to be explained for type 1 diabetes.The remaining missing heritability for type 1 diabetes is likely to be explained by as yet unmapped common variants, rare variants, structural polymorphisms, and gene-gene and/or gene-environmental interactions, in which we can expect epigenetic effects to play a role.The examination of the type 1 diabetes genes and their pathways may reveal the earliest pathogenic mechanisms that result in the engagement of the innate and adaptive immune systems to produce massive -cell destruction and clinical disease.The resources established by the international T1DGC are available to the research community and provide a basis for future discovery of genes that regulate the earliest events in type 1 diabetes etiology-potential targets for intervention or biomarkers for monitoring the effects and outcomes of potential therapeutic agents.", + "Genome-wide search for genes affecting the age at diagnosis of type 1 diabetes.Genes affecting type 1 diabetes diagnosis age / A. Syreeni et al.Genome-wide search for genes affecting the age at diagnosis of type 1 diabetes.", + "Type 1 DiabetesThe higher type 1 diabetes prevalence observed in relatives implies a genetic risk, and the degree of genetic identity with the proband correlates with risk (22)(23)(24)(25)(26). Gene variants in one major locus, human leukocyte antigen (HLA) (27), confer 50-60% of the genetic risk by affecting HLA protein binding to antigenic peptides and antigen presentation to T cells (28).Approximately 50 additional genes individually contribute smaller effects (25,29).These contributors include gene variants that modulate immune regulation and tolerance (30)(31)(32)(33), variants that modify viral responses (34,35), and variants that influence responses to environmental signals and endocrine function (36), as well as some that are expressed in pancreatic b-cells (37).Genetic influences on the triggering of islet autoimmunity and disease progression are being defined in relatives (38,39).Together, these gene variants explain ;80% of type 1 diabetes heritability.Epigenetic (40), gene expression, and regulatory RNA profiles (36) may vary over time and reflect disease activity, providing a dynamic readout of risk.", + "Genetic factors have an important role in the development of diabetes, with some forms of the disease resulting from mutations in a single gene.Others are multifactorial in origin.The monogenic forms of diabetes account for approximately 5% of cases and are caused by mutations in genes encoding insulin 3 , the insulin receptor 4 , the glycolytic enzyme glucokinase 5 , and the transcription factors hepatocyte nuclear factor-1 (HNF-1), HNF-1, HNF-4, insulin promoter factor-1 and NeuroD1/BETA2 (refs 6-10).Mutations in maternally inherited mitochondrial genes can also cause diabetes, often in association with hearing loss 11 .", + "Genetics of Diabetic Complications in HumansEpidemiologic studies have clearly established that only a subgroup of individuals with diabetes are at risk of nephropathy (2).To identify genetic determinants and candidate genes that confer susceptibility or progression for DNP in individuals with type 1 and type 2 diabetes, the National Institutes of Health established the ongoing Family Investigation of Nephropathy and Diabetes study consortium.The Family Investigation of Nephropathy and Diabetes is using Mapping by Admixture Linkage Disequilibrium and traditional affected and discordant sibling pair and relative pair analyses.Previous linkage analysis studies led to the mapping of several susceptibility loci for DNP on specific regions on chromosomes 3, 7, 9, 12, and 20 (14,15).", + "However, these efforts to sift individuals into discrete subtypes of disease would appear to run counter to the evidence that points to a complex, graded, architecture of risk, one that is consistent with a multifactorial etiology, composed of genetic predisposition dominated by multiple common variants of modest effect, and pervasive exposures contributing to risk.In one recent study, Ahlqvist et al. () used basic clinical information from patients with newly diagnosed adult-onset diabetes to define five subtypes of late-onset diabetes: an autoimmune form (covering TD and other related clinical entities), two severe forms (one dominated by insulin deficiency, the other by insulin resistance), and two milder forms (termed \"obesityrelated\" and \"age-related\" diabetes).Whereas the genetic clusters that form the basis of pPS are defined at the level of the variants, these clinical subtypes are defined at the level of the individual and based on biomarkers and clinical data gathered at a specific point in the progression of an individual from health to disease.The latter is likely to limit their relevance to those who have not yet developed disease, and/or those who are on treatment.", + "Studies [71][72][73][74] in Mexican and Asian populations have identified several mutations associated with type 2 diabetes in young people.The high prevalence of type 2 diabetes in the parents of young people diagnosed with type 2 diabetes could reflect a stronger genetic predisposition, even when monogenic diabetes is excluded.This hypothesis suggests that efforts to define genes that cause type 2 diabetes by linkage might be more powerful if focused on young adults with diabetes, raising the question of whether type 2 diabetes in older populations has a relatively smaller genetic contribution and a stronger environmental contribution. 66", + "We found that the presence or absence of parental diabetes and the genotype score were independently associated with the risk of diabetes.This suggests that family history as a risk factor for diabetes conveys more than heritable genetic information; it probably includes nongenetic familial behaviors and norms.The lower relative risks for diabetes associated with observed parental diabetes as compared with those associated with self-reported family history (approximately 1.8 vs. approximately 2.2) support the contention that family history contains more risk information than is implied by inheritance of the diabetes phenotype alone.One of the limitations of our study is that the 18 SNPs we included are probably insufficient to account for the familial risk of diabetes.They account for a minority of diabetes heritability, and the SNP array platforms from which they were chosen capture only approximately 80% of common variants in Europeans.In addition, we have not considered structural variants that might confer a risk of diabetes.It is possible that the addition of rare risk alleles with large effects, or a much larger number of common risk alleles with small individual effects, could improve discrimination. 36Indeed, as many as 500 loci may underlie the genetic risk of type 2 diabetes. 16Also, we did not study interactions among genes or between genes and the environment that might alter the genetic risk in exposed persons.As more diabetes risk variants become known, their incorporation into the genotype score may explain more of the genetic risk implied by parental diabetes.Phenotypic Risk Factors and Definitions of DiabetesEach examination consisted of a medical history taking, physical examination, and collection of a fasting blood sample. 21In the sixth examination cycle (1995 through 1998), participants completed a self-administered questionnaire that asked about family history of disease.We defined a positive selfreported family history of diabetes as a report that one or both parents had diabetes; this definition is more than 56% sensitive and 97% specific for confirmed parental diabetes. 22Parental diabetes was confirmed by means of direct observation of the original cohort, over the course of 46 years of observation after their enrollment in the Framingham Heart Study, at the end of which time the mean age of surviving parents was 83 years.We considered diabetes to be present in a parent when medication was prescribed to control the diabetes or when the casual plasma glucose level was 11.1 mmol per liter or higher or 200.0 mg per deciliter or higher at any examination.We defined diabetes to be present in an offspring when treatment was prescribed to control the diabetes or when the fasting plasma glucose level was 7.0 mmol per liter or higher or 126.0 mg per deciliter or higher at any examination.More than 99% of the cases of diabetes among the participants in the Framingham Offspring Study are type 2 diabetes. 6", + "Genetics is one example of the 'other risk factors' involved in the pathogenesis of DR.Twin and epidemiological studies have strongly suggested a genetic component in the etiology of DR (6 -10), with heritability scores ranging from 27 to 52% in both type 1 and type 2 diabetes (7 -10).There is an increased risk of severe DR among family members of DR subjects (8,9) and in siblings of affected subjects (8,9).Furthermore, several studies have also shown a discrepant rate of the prevalence of DR among different racial ethnic groups in the US population, with a significantly higher prevalence observed among Hispanic, African-American and Chinese-American when compared with Caucasian populations (11).While these differences may partially be attributed to lifestyle factors, evidence from familial aggregation, ethnic differences and heritability clearly supports a genetic contribution in the etiology of DR." + ], + [ + "Together, these clinical trials demonstrate that diet and lifestyle modification is highly effective in preventing type 2 diabetes in different ethnic and racial groups.There is an urgent need to translate the findings from these trials into clinical and public health practice.Emphasis should be placed on early adoption of healthy habits in pediatric populations because these practices track through to adulthood.Physical activityNumerous epidemiologic studies show that increased physical activity reduces risk of diabetes, whereas sedentary behaviors increase risk.In the NHS (26), each 2-h/day increment of time spent watching television (TV) was associated with a 14% increase in diabetes risk.Each 2-h/day increment of standing or walking around at home was associated with a 12% reduction in risk.Each 1-h/day increment of brisk walking was associated with a 34% reduction in risk (Fig. 3).These results indicate a continuum in the relationship between physical activity levels and diabetes risk.Among sedentary behaviors (TV watching, sitting at work, and other sitting), prolonged TV watching was associated with the highest risk.Accumulating evidence strongly demonstrates that the majority of type 2 diabetes cases can be prevented through diet and lifestyle modification.However, the adoption of a healthy diet and lifestyle requires not only individual behavioral changes, but also changes in our food, built, and social environments.Public health strategies that target the obesogenic environment are critical.Translating clinical and epidemiologic findings into practice requires fundamental shifts in public policies and health systems.To curb the diabetes epidemic, primary prevention through the promotion of a healthy diet and lifestyle should be a global public policy priority.", + "An obvious conclusion is a manipulation of lifestyle provides an opportunity to reverse the diabetes trend.Stated another way, we cannot change our genetic make-up, but we can alter environmental factors.Indeed, many studies have shown that diet and exercise slow the onset of diabetes in persons with IGT (2,17,18).Also, low glycemic index diets have been shown to promote weight loss along with having metabolic benefits in persons with type 2 diabetes (19).The difficulty, of course, is trying to get people to change their habits.", + "In conclusion, evidence from epidemiological studies and RCTs demonstrate that lifestyle modifi cation comprising higher levels of PA and prudent food consumption may be eff ective in obesity and T2DM prevention.The positive eff ect of lifestyle on body weight seems somewhat transient, whereas the eff ect on T2DM is sustained for longer periods.Furthermore, lifestyle modifi cation appears to have an eff ect on diabetes risk independently of body weight and even of weight loss.As already pointed out in several of the T2DM prevention studies the reduction in diabetes risk has been paralleled by substantial weight loss and weight reduction has been considered to have major importance for diabetes prevention ( Knowler et 1998 ).Hence, lifestyle modifi cation seems to have an eff ect on T2DM not only through reduction in body weight, but also through improvement in insulin sensitivity, blood glucose control and lipid profi le.Whereas there is convincing evidence that lifestyle changes can prevent T2DM in randomized controlled studies, so far little is known whether a lifestyle intervention could also modify cardiovascular morbidity and mortality.The 20-year follow-up results from the Chinese Da Qing diabetes prevention study showed a non-signifi cant 17 % reduction in cardiovascular mortality in the combined (diet and/or PA) intervention group vs. controls ( Li et al., 2008 ).Similarly, lifestyle intervention in the Finnish DPS was not found to reduce signifi cantly cardiovascular mortality during the fi rst 10 years of follow-up ( Uusitupa et al., 2009 ).However, this study was not initially designed to examine the eff ect of lifestyle intervention on total mortality or cardiovascular morbidity, and therefore the statistical power may not have been suffi cient to detect small diff erences in cardiovascular events between the 2 groups.Besides, a longer follow-up period might be needed to answer this question.In the Malm Preventive trial with a 12-year follow-up of men with IGT total and cardiovascular mortality were lower among participants in the lifestyle intervention group, however, these results should be considered with caution due to the non-randomized design of the study ( Eriksson and Lindgarde, 1998 ).Recent fi ndings of bariatric surgery treatment of very obese subjects showed that weight loss indeed may reduce not only T2DM risk but also total mortality ( Sjstrm et al., 2007 ).Further investigations are needed to clarify whether prevention of T2DM by lifestyle modifi cation is associated with cardiovascular disease prevention; until then decisions have to be made on the basis of the best available information.Evidence from randomized controlled trailsThe effi cacy of lifestyle changes in obesity and T2DM prevention has been established in numerous randomized controlled trails (RCTs).Several of them may, however, be considered of major importance due to their large sample sizes (i.e., 458-3234 individuals) and long-term duration (i.e., 3-6 years).The Chinese Da Qing diabetes prevention study was the fi rst to investigate the eff ect of 6-year lifestyle change on body weight and diabetes incidence in individuals with impaired glucose tolerance (IGT) ( Pan et al., 1997 ).Pan and co-workers (1997) reported 42 % reduction in diabetes incidence, although no signifi cant diff erence in body weight was present.Similar results were found in the Finnish Diabetes Prevention Study (DPS) and the US Diabetes Prevention Program (DPP).DPS and DPP independently reported reduction in diabetes incidence of 58 % accompanied by significant reduction in body weight (5-7 %) as a result of the lifestyle modifi cation ( Knowler et al., 2002 ;Tuomilehto et al., 2001 ).These fi ndings were also confi rmed in Japanese and Indian populations, reporting 67.4 % and 28.5 % reduction in diabetes incidence, respectively ( Kosaka et 2011) reported signifi cant reduction in body weight and diabetes incidence at 1, as well as, at 3 years during a lifestyle modifi cation program carried out in a primary healthcare setting among subjects with IGT.All large-scale interventions have been successful in preventing T2DM during the active intervention period.Remarkably when the eff ectiveness of the lifestyle modifi cation programs was assessed on the long-term after discontinuation of the intervention, diabetes risk still remained substantially reduced.In the Finnish DPS, for instance, at extended follow-up 3 years after the 4-year intervention period a substantial reduction in body weight and T2DM incidence was still present ( Lindstrom et al., This document was downloaded for personal use only.Unauthorized distribution is strictly prohibited.", + "Because lifestyle changes to reduce bodyweight have always been an important therapy for type 2 diabetes, investigators of Look AHEAD trial 156 examined the eff ect of weight reduction (achieved by an intensive lifestyle intervention) on cardiovascular events.Despite diff erential weight loss for more than 10 years and improvements in many cardiovascular risk factors (including blood pressure and lipids), lifestyle change did not reduce cardiovascular events compared with diabetes support and education (control group).This fi nding might have been because large proportions of participants in both groups received medical treatment for these risk factors.However, participants in the group receiving Glucokinase Reduce hepatic production of glucoseTable 1: Selected therapeutic targets of largely untested mechanisms for type 2 diabetesintensive lifestyle intervention who had a history of a cardiovascular event at baseline had a tendency for an increased risk of a subsequent cardiovascular event; 156 a similar fi nding was reported in ACCORD. 144Several other fi ndings from Look AHEAD are worthy of comment.First, participants in the weight-loss group were more likely to achieve either partial or complete remission of diabetes, 157 had better glucose control needing fewer glucose-lowering drugs (including insulin), and were more likely to achieve a glycated haemoglobin A 1c measurement of less than 7% (53 mmol/mol) than were those in the control group. 158However, despite weight loss and addition of drugs, patients in the treatment group had similar progression of diabetes to that of the control group-ie, with continuous increases in glycated haemoglobin A 1c . 156Second, lifestyle change slowed progression of nephropathy.Third, other health outcomes associated with better quality of life-eg, sleep apnoea 159 and mobility 160 -improved.Thus, intensive lifestyle change in patients with type 2 diabetes has benefi ts, but unfortunately not for cardiovascular outcomes, which remain the major cause of premature mortality in type 2 diabetes.", + "INTRODUCTIONIntensive lifestyle interventions (eg, promoting increased physical activity and weight loss) can be effective in decreasing the incidence of type 2 diabetes mellitus (T2DM). 1 However, healthcare resources are limited, and participants in interventions to prevent diabetes should be prioritized.Identification of individuals at high risk of T2DM could facilitate the targeting of prevention efforts to those who could benefit from them and reduce the cost of preventing T2DM.", + "Epidemiological studies examining the associations between lifestyle behaviors and diabetes risk have reached similar conclusions as the clinical trials described above.For example, the 14-year follow-up University of Pennsylvania Alumni Health Study [52] (n = 5,990 men aged 39-68 years) showed PA (leisure time physical activity [LTPA] expressed in kcal expended per week through walking, stair climbing, and sports) was inversely associated with the incidence of T2D.Incidence rates declined as energy expenditure rose from 500 through 3,500 kcal/week.The age-adjusted relative risk ratio (RR) of T2D was reduced by about 6% for each 500 kcal increment increase in PA energy expenditure.There are two major factors that underlie these alarming projections.The first is T2D is associated with age, and Western populations are aging rapidly.The second major explanation is our lifestyles have changed dramatically in recent years.Epidemiological studies have identified strong T2D risk relationships for obesity, sedentary behavior [2][3][4], and diets rich in energy [5], processed carbohydrates [6], and animal fats [7].Collectively, these lifestyle factors impede the actions of insulin and raise hepatic glucose production, which can result in the diminution of endogenous insulin production and T2D.The strongest evidence for a causal relationship between adverse lifestyle behaviors and T2D comes from randomized controlled trials that show intensive lifestyle interventions involving structured exercise regimes which promote habitual physical activity (PA) and have a major beneficial impact on diabetes incidence in high-risk individuals [8,9].Practical ApplicationsAs we have described in this chapter, an abundance of evidence supports the protective effects of health lifestyle behaviors on type 2 diabetes risk.However, populationwide programs of intensive lifestyle interventions are probably unrealistic owing to the costs involved in running such trials and the difficulties in recruiting participants and motivating them to adhere to the interventions.It may be more feasible to identify individuals at high risk of diabetes who, because of their genetic characteristics, are likely to respond well to exercise interventions, as an example, and target these persons.This does not of course mean that healthy lifestyle behaviors would be discouraged in the remainder of the population, but one might prioritize other, more effective, preventive strategies in these individuals while continuing to promote the virtues of active lifestyles.The benefits to this approach might include reduced overall costs and greater preventive success.Moreover, because those who receive the intervention are likely to respond well and by consequence maintain motivation, attrition rates may diminish and adherence improve.The actualization of this perspective will first require robust empirical evidence, most likely emerging from the combination of epidemiology for hypothesis generation and clinical trials to test those hypotheses and provide evidence of causality.", + "Type 2 diabetes can be prevented or delayed by lifestyle modification, including increased physical activity, beneficial dietary changes, and weight reduction (22,44).However, only Model adjusted for age, gender, group, baseline value of moderate-to-vigorous physical activity, and baseline values and changes in body weight and in intakes of energy and energy-adjusted saturated fat and fiber. *The median (range) of each tertile of change in moderate-to-vigorous physical activity is shown.Adjusted interaction between moderate-to-vigorous physical activity (3 groups) and the polymorphism (2 groups) on the risk of developing type 2 diabetes.a few studies have investigated the effects of such lifestyle interventions on insulin sensitivity and insulin secretion in persons with IGT (21,46).On the basis of the 4-yr follow-up study of the DPS with repeated frequently sampled intravenous glucose tolerance test (FSIGT), insulin sensitivity improved along with lifestyle changes, while insulin secretion remained virtually unchanged (46).Most other data also indicate that physical activity, diet, and weight loss primarily increase insulin sensitivity.Insulin resistance and the associated glycemic stress may exhaust -cells and impair their function.Regular physical activity may diminish glycemic stress by improving insulin sensitivity of target tissues (18).While the mechanisms of improved -cell function in response to lifestyle interventions are still largely unknown, several studies suggest that physical activity (5,11), diet (19,26), weight loss (45), or their combination (21) may directly improve the first-phase insulin secretion that is an indicator of the -cell function.GENETIC FACTORS AND LIFESTYLE interact in the development of type 2 diabetes.Physical activity, favorable dietary changes, and weight reduction were essential components of a success-ful lifestyle intervention in two large randomized controlled trials on the prevention of type 2 diabetes in high-risk individuals with impaired glucose tolerance (IGT), including the Finnish Diabetes Prevention Study (DPS) (44) and the Diabetes Prevention Program (DPP) (22).In the DPS, increased physical activity was associated with a decreased risk of type 2 diabetes independently of changes in diet and body weight.The individuals who increased their physical activity most (i.e., were in the upper third of the change) were 66% less likely to develop type 2 diabetes than those in the lower third (24).", + "The worldwide explosion of the rates of diabetes and other metabolic diseases in the last few decades cannot be fully explained only by changes in the prevalence of classical lifestyle-related risk factors, such as physical inactivity and poor diet.For this reason, it has been recently proposed that other \"nontraditional\" risk factors could contribute to the diabetes epidemics.In particular, an increasing number of reports indicate that chronic exposure to and accumulation of a low concentration of environmental pollutants (especially the so-called persistent organic pollutants (POPs)) within the body might be associated with diabetogenesis.In this review, the epidemiological evidence suggesting a relationship between dioxin and other POPs exposure and diabetes incidence will be summarized, and some recent developments on the possible underlying mechanisms, with particular reference to dioxin, will be presented and discussed.The worldwide explosion of the rates of diabetes and other metabolic diseases in the last few decades cannot be fully explained only by changes in the prevalence of classical lifestyle-related risk factors, such as physical inactivity and poor diet.For this reason, it has been recently proposed that other \"nontraditional\" risk factors could contribute to the diabetes epidemics.In particular, an increasing number of reports indicate that chronic exposure to and accumulation of a low concentration of environmental pollutants (especially the so-called persistent organic pollutants (POPs)) within the body might be associated with diabetogenesis.In this review, the epidemiological evidence suggesting a relationship between dioxin and other POPs exposure and diabetes incidence will be summarized, and some recent developments on the possible underlying mechanisms, with particular reference to dioxin, will be presented and discussed.", + "Lifestyle modification including exercise, nutrition and behavioral changes is the cornerstone to prevent and treat type 2 diabetes.Oral antidiabetic medicationeither as single agent or combination therapyis frequently required to maintain metabolic control, as assessed by monitoring of glycated hemoglobin A 1C (HbA 1C ) levels.Eventually, a significant proportion of patients with type 2 diabetes require the exogenous administration of insulin [40].", + "Diet and lifestyle factorsDiet and lifestyle modification is an important aspect of T2DM prevention.Major clinical trials have demonstrated that intensive lifestyle interventions can lower the incidence of diabetes mellitus by 58% compared with control groups 55 .Trials have also shown that these interventions are more effective than pharmacological interventions 55 .Landmark clinical trials, such as the Diabetes Prevention Program in multi-ethnic Americans 55 , the Finnish Diabetes Prevention Study 56 and the Da Qing IGT and Diabetes Study in China 57 , have demonstrated that many cases of T2DM could be prevented through lifestyle interventions focused on increasing physical activity and adopting a healthy diet.Nevertheless, when lifestyle interventions are not feasible, pharmacological therapy can be considered as a strategy to prevent the development of T2DM.For example, metformin reduced the incidence of T2DM by 31% over an average follow-up period of 2.8 years among high-risk individuals from the USA who did not have diabetes mellitus 55 .Similarly, metformin reduced T2DM risk in clinical trials in India and China 58 .", + "Multiple interventions in adults with T2D have been evaluated for risk reduction and prevention, both in the short and the long term.A recent systematic review (69) reported that after active interventions lasting from 6 months to .6 years, relative risk reduction achieved from lifestyle interventions (39%) was similar to that attained from use of drugs (36%); however, only lifestyle interventions had a sustained reduction in risk once the intervention period had ended.Analysis of the postintervention follow-up period (;7 years) revealed a risk reduction of 28% with lifestyle modification compared with a nonsignificant risk reduction of 5% from drug interventions." + ], + [ + "Researchers are expanding our understanding of genetic risk factors for diabetes through ongoing discoveries.Genetic variants associated with increased susceptibility to type 2 diabetes, a disease that affects more than 200 million people worldwide, have been identified (NHGRI & NIDDK, 2007).Such discoveries accelerate efforts to understand genetic contributions to chronic illness, as well as facilitate greater investigation of how these genetic factors interact with each other and with lifestyle factors.Ultimately, once the association of these variants with diabetes are confirmed, genetic tests may be utilized to identify (even before escalating blood sugars) those individuals, like Vanessa, who may be able to delay or prevent diabetes with healthy lifestyle decisions and behaviors.Information to assist nurses in this challenge is available in a toolkit \"Your Game Plan for Preventing Type 2 Diabetes\" (Your Game Plan, n.d.).Would you have known whether or not genetic testing was available for Vanessa?If you had said no to this question but could have explained the progress currently being made in understanding diabetes, Vanessa would have had access to the best care possible today.", + "Recent gene discovery efforts have provided further evidence to support such assertions.Though, at this point, the identity of some of the genes mechanistically responsible for the association signals uncovered remains uncertain, it remains possible to determine, through studies of healthy populations, whether the type 2 diabetes-susceptibility variants themselves are mediating their effects through disruption of -cell function or insulin action.With the exception of FTO (known to influence type 2 diabetes risk through a primary effect on adiposity) and PPARG (long implicated in insulin action), all confirmed susceptibility alleles appear to exert their predominant effect on diabetes pathogenesis through abrogation of -cell function (or mass) (62,74 -77).It would be wrong to extrapolate too far: the known variants account for only a small proportion of overall genetic risk, and the focus on lean type 2 diabetes cases, which has characterized several of the genome-wide association (GWA) studies (58,59), may have generated a bias toward detection of variants detrimental to -cell performance.Nonetheless, the picture that emerges is one where alterations of -cell function seem to be playing the predominant role with respect to the inherited component of disease predisposition.", + "In briefGardner et al. queried the genomes of over 400,000 individuals and identified novel genes associated with type 2 diabetes risk.The biological function of these genes highlights potentially new therapeutic avenues for treatment of type 2 diabetes.", + "Ta rge ted T r e atmen t a nd Pr e v en t ion4][75] In monogenic forms of diabetes, at least, genetic testing already drives the choice of therapy.For example, in patients who have maturity-onset diabetes of the young due to mutations in the gene encoding glucokinase (GCK), the hyperglycemia is mild and stable, the risk of complications is low, and dietary management is often sufficient.In contrast, in patients who have maturity-onset diabetes of the young due to mutations in HNF1A, the disease follows a more aggressive course, with a greater risk of severe complications, but is particularly responsive to the hypoglycemic effects of sulfonylureas. 62,73Most children with neonatal diabetes have mutations in KCNJ11 or ABCC8, adjacent genes that jointly encode the beta-cell ATP-sensitive potassium channel that mediates glucose-stimulated insulin secretion and is the target of sulfonylureas.In such children, treatment with sulfonylureas has proved more effective and convenient than the lifelong insulin therapy previously considered the default option. 74,75n children with severe obesity due to profound leptin deficiency, exogenous leptin therapy is lifesaving. 76s yet, there are insufficient genetic data to support management decisions for common forms of type 2 diabetes and obesity. 77Although the TCF7L2 genotype is associated with variation in the response to sulfonylurea treatment, 78 the effect is too modest to guide the care of individual patients.For the time being, the contribution of genetic information to therapy is most likely to come through the drug-discovery pipeline.Information from genetic studies could be used to identify new targets for pharmaceutical intervention that have validated effects on physiological characteristics, to provide information about new and existing targets (e.g., clues about the long-term safety of pathway intervention), 32 and to characterize high-risk groups to enable more efficient clinical trials of agents designed to reduce the progression of type 2 diabetes or obesity or the risk of complications.From Gene t ic s t o Biol o gyAn improved understanding of pathophysiology achieved through genetic discovery provides new opportunities for treatment, diagnosis, and monitoring.Studies of risk variants for type 2 diabetes in healthy populations have shown that most variants act through perturbation of insulin secretion rather than insulin action, establishing inherited abnormalities of beta-cell function or mass (or both) as critical components of the progression to type 2 diabetes (Fig. 3). 22,50 Type 2 diabetes results when pancreatic beta cells are unable to secrete sufficient insulin to maintain normoglycemia, typically in the context of increasing peripheral insulin resistance.The beta-cell abnormalities fundamental to type 2 diabetes are thought to include both reduced beta-cell mass and disruptions of beta-cell function.Insulin resistance can be the consequence of obesity or of obesity-independent abnormalities in the responses of muscle, fat, or liver to insulin.Examples of susceptibility variants that, given current evidence, are likely to influence predisposition to type 2 diabetes by means of each of these mechanisms are shown.For type 2 diabetes and obesity, the discovery of causal genes (Fig. 1 and 2) has followed three main waves.The first wave consisted of family-based linkage analyses (see the Glossary) and focused candidate-gene studies.These proved effective in identifying genes responsible for extreme forms of early-onset disease segregating as single-gene (mendelian) disorders.Genes underlying several distinct, familial forms of nonautoimmune diabetes -including maturity-onset diabetes of the young, mitochondrial diabetes with deafness, and neonatal diabetes -were characterized (see the review by Waterfield and Gloyn 3 ).Similar approaches revealed mutations in genes responsible for rare forms of severe childhood obesity, including the genes encoding leptin, the leptin receptor, and proopiomelanocortin (see the review by O'Rahilly 4 ).These discoveries have provided insights into processes critical for the maintenance of normal glucose homeostasis and energy balance and clues to the inner workings of the pancreatic beta cell and hypothalamus.For many families, this information has led to improved diagnostic and therapeutic options (described in more detail below).", + "Gene-lifestyle interaction studies supporting the protective role of diet, exercise or combined lifestyle interventions in individuals genetically susceptible to obesity and type 2 diabetes.This document was downloaded for personal use only.Unauthorized distribution is strictly prohibited.", + "INTRODUCTIONDiabetes is a common, chronic disease that profoundly impacts health and longevity.Susceptibility is influenced by inheritance, and there has been substantial progress in identifying genes which, when mutated, influence individual risk of disease.Through study of common and rare forms, both polygenic and monogenic, diabetes genetics encompasses many pressing issues in human genetic research.", + "Advances in technology and analytical approaches have identifi ed genes linked with type 2 diabetes.With use of candidate-gene approaches, PPARG was the fi rst gene identifi ed. 18Subsequently, mostly with use of genomewide association studies, more than 50 gene loci have been linked with type 2 diabetes. 19Furthermore, 53 loci have been linked with concentrations of insulin and glucose (however, not always with both fasting and 2 h concentrations of glucose), of which 33 are also associated with type 2 diabetes. 19,20Although some loci are associated with obesity and insulin resistance, most are linked with -cell function. 21Gene products for most of these loci have not been defi nitively identifi ed.Together, these genes do not explain much of the genetic basis of type 2 diabetes; the use of genotype risk scores only slightly improves prediction of subsequent diabetes compared with more frequently used clinical risk factors. 22,23side from obvious increases in caloric intake and decreased energy expenditure, other environmental factors seem to be important.Nutrient composition, specifi cally increased amounts of dietary fat (particularly saturated fat), are important to development of obesity, insulin resistance, -cell dysfunction, and glucose intolerance. 24Furthermore, an ageing-associated reduction in the responsiveness of cells to carbohydrate partly underlies the fall in glucose tolerance with ageing. 25he in-utero environment, established partly by the mother's body size, could produce epigenetic and geneexpression changes that aff ect the risk of development of obesity and type 2 diabetes for the off spring. 26Recent Figure 1: Feedback loop between islet cells and insulin-sensitive tissues (A) Insulin interacts in the liver to suppress glucose production, and in muscle and adipose tissue to stimulate uptake of glucose, aminoacids, and fatty acids.The amount of insulin released to maintain normal glucose homoeostasis is established by prevailing insulin sensitivity.This feedback is probably mediated through neuronal and humoral mechanisms, but exact mediators are still not known. (B) When insulin resistance develops in insulin-sensitive tissues, feedback to cells ensures that the cells increase insulin output to maintain normal glucose tolerance. (C) When cells are incapable of increasing insulin output in the presence of insulin resistance, the result is development of increased glucose concentrations, which initially manifests as impaired glucose tolerance.Because -cell dysfunction progresses, further elevations in glycaemia occur and diabetes is the eventual result.", + "The availability of detailed information on gene environment interactions may enhance our understanding of the molecular basis of T2D, elucidate the mechanisms through which lifestyle exposures influence diabetes risk, and possibly help to refine strategies for diabetes prevention or treatment.The ultimate hope is genetics might one day be used in primary care to inform the targeting of interventions that comprise exercise regimes and other lifestyle therapies for individuals most likely to respond well to them.", + "Although the number of disease conditions for which the biomedical literature reports positive indications of genetic contributions increases weekly, diabetes has enjoyed a relatively long history of geneticized explanations.Medical geneticist James Neel's (1962) famous thrifty genotype hypothesis, for example, postulated that in the early stages of evolution those people who had a \"quick insulin trigger\" could rapidly convert sugar to fat in times of famine.Accordingly, peoples who have recently undergone a shift from hunter-gathering to a modern sedentary lifestyle (with concomitant energy dense food intake) are at increased risk of diabetes because they still carry genes that conferred this selective advantage. \"The Coca Colonization\" hypothesis (Zimmet 1997), as the thrifty genotype hypothesis is sometimes called, posits that recently \"primitive\" groups have undergone a \"domestication of lifestyle\" as they have moved to urban areas or lost their old way of life (Neel 1962(Neel , 1982;;Zimmet 1982).According to this hypothesis, these populations have, over time, evolved genetic traits that could metabolically compensate for periods of food scarcity.Because such scarcity is no longer the norm, the theory contends, the phenotypic consequence of thrifty genes in combination with the abundance of food and sedentary lifestyle typical of contemporary urban living make for impaired metabolic regulation of glucose.In other words, diabetes is thought to result from a genetic anachronism.", + "In support of our focus on developmental genes, pathway analysis of recent genome-wide association studies, which so far have yielded few T2D candidate genes, provided an integrated interpretation of the highest ranked risk genes for T2D [97].This analysis found that lipid metabolism and developmental genes were significantly over-represented in the upper ranked genes of the T2D genome-wide association studies, an observation based on thousands of samples, and one strongly consistent with the present independent analysis.Combined, we believe this presents strong evidence that developmental genes may play a role in setting or regulating the long-term responses of skeletal muscle to diabetes.", + "It has long been understood that genetics play a role in predisposition to type 2 diabetes (1).Mutations giving rise to several rare monogenic forms of this disorder have been cloned, including mutations in the insulin gene and in a number of genes conferring lean early-onset type 2 diabetes (maturity-onset diabetes of the young [MODY]); however, no gene predisposing to the common obese adultonset phenotype has been identified.One important reason for this is the substantial locus heterogeneity associated with diabetes risk.Despite identification of at least five MODY loci to date, there remain pedigrees that segregate autosomal dominant type 2 diabetes not attributable to detectable mutations in any of these genes (2).Similarly, it has been recognized that as many as 10% of patients diagnosed with type 2 diabetes may instead suffer from a disease etiologically (and presumably genetically) more akin to type 1 diabetes (3).", + "The future will see intensified research and improvement in such methodologies to identify and characterise the multiple genes underlying complex diseases.One of the most important goals of genetic studies of diabetes is to determine which multilocus genotypes (across all susceptibility loci) create the highest risk for development of diabetes.Individuals with those genotypes would be targeted for treatment to prevent diabetes when safe and effective prophylactic therapies become available.It is possible that several prophylactic options could be available, with effectiveness depending on the exact set of predisposing genes carried by the at-risk person.Thus, the next generation of genetic studies of Type I diabetes (and other complex disorders) will involve dissection of gene-gene interactions in order to clarify which persons, by virtue of their multilocus genotype, are most susceptible to diabetes.This research will be accompanied by studies of gene-environment interaction, when the relevant non-genetic factors are more clearly understood (eg.do differences in diabetes susceptibility via antiviral defence genes relate primarily to certain types of virus? ).", + "The molecular mechanisms involved in the development of type 2 diabetes are poorly understood.Starting from genome-wide genotype data for 1924 diabetic cases and 2938 population controls generated by the Wellcome Trust Case Control Consortium, we set out to detect replicated diabetes association signals through analysis of 3757 additional cases and 5346 controls and by integration of our findings with equivalent data from other international consortia.We detected diabetes susceptibility loci in and around the genes CDKAL1, CDKN2A/CDKN2B, and IGF2BP2 and confirmed the recently described associations at HHEX/IDE and SLC30A8.Our findings provide insight into the genetic architecture of type 2 diabetes, emphasizing the contribution of multiple variants of modest effect.The regions identified underscore the importance of pathways influencing pancreatic beta cell development and function in the etiology of type 2 diabetes.", + "The molecular mechanisms involved in the development of type 2 diabetes are poorly understood.Starting from genome-wide genotype data for 1924 diabetic cases and 2938 population controls generated by the Wellcome Trust Case Control Consortium, we set out to detect replicated diabetes association signals through analysis of 3757 additional cases and 5346 controls and by integration of our findings with equivalent data from other international consortia.We detected diabetes susceptibility loci in and around the genes CDKAL1, CDKN2A/CDKN2B, and IGF2BP2 and confirmed the recently described associations at HHEX/IDE and SLC30A8.Our findings provide insight into the genetic architecture of type 2 diabetes, emphasizing the contribution of multiple variants of modest effect.The regions identified underscore the importance of pathways influencing pancreatic beta cell development and function in the etiology of type 2 diabetes.", + "Genetic factors appear to play a role in determining an individual's risk of developing diabetes.It is hoped that genetic studies will ultimately identify key genetic elements that help determine susceptibility to diabetes, disease progression, and responsiveness to specific therapies, as well as help identify novel targets for future intervention.A substantial number of genetic loci, gene polymorphisms, and mutations have already been reported as having variable degrees of association with one or other type of diabetes (type 1, type 2, maturity onset diabetes of the young [MODY]), while others appear to be involved in response to antihyperglycemic agents.We have compiled the following glossary of genetic and genomic terms relating to diabetes, which we hope will prove a useful reference to researchers and clinicians with an interest in this disease.This is by no means an exhaustive list, but includes many of the genetic loci and variants that have been studied in association with diabetes.Gene encoding insulin-like growth factor 2 mRNA binding protein 2 (also known as IMP-2).SNPs in the gene have been associated with type 2 diabetes IFIH1", + "More than 65 loci, encoding up to 500 different genes, have been implicated by genome-wide association studies (GWAS) as conferring an increased risk of developing type 2 diabetes (T2D).Whilst mouse models have in the past been central to understanding the mechanisms through which more penetrant risk genes for T2D, for example, those responsible for neonatal or maturity-onset diabetes of the young, only a few of those identified by GWAS, notably TCF7L2 and ZnT8/SLC30A8, have to date been examined in mouse models.We discuss here the animal models available for the latter genes and provide perspectives for future, higher throughput approaches towards efficiently mining the information provided by human genetics.More than 65 loci, encoding up to 500 different genes, have been implicated by genome-wide association studies (GWAS) as conferring an increased risk of developing type 2 diabetes (T2D).Whilst mouse models have in the past been central to understanding the mechanisms through which more penetrant risk genes for T2D, for example, those responsible for neonatal or maturity-onset diabetes of the young, only a few of those identified by GWAS, notably TCF7L2 and ZnT8/SLC30A8, have to date been examined in mouse models.We discuss here the animal models available for the latter genes and provide perspectives for future, higher throughput approaches towards efficiently mining the information provided by human genetics.", + "BackgroundMultiple genetic loci have been convincingly associated with the risk of type 2 diabetes mellitus.We tested the hypothesis that knowledge of these loci allows better prediction of risk than knowledge of common phenotypic risk factors alone." + ], + [ + "IntroductionDiabetes is one of the most common metabolic disorders.It is estimated that the number of diabetes patients worldwide has already exceeded 200 million [92].This creates a need to understand the etiology of the disease, genetic and enviromental factors influencing development of diabetes.Diabetes is a group of metabolic diseases that are characterized by elevated glucose level.Poorly controlled or undiagnosed disease may be associated with so called late complications of diabetes such as accelerated atherosclerosis, blindness, renal insufficiency, stroke, and amputation of extremities.Diabetes is also associated with a decrease in life expectancy.These facts make diabetes a major health problem.There are two main forms of diabetes: type 1 and type 2. It is type 2 diabetes (T2DM), previously known as non-insulin dependent, that is the much more prevalent form, responsible for 90% of the disease prevalence [92,125].In the majority of the industrialised world societies this disease affects a few percent of the entire population [125].Recent publications indicate an increase in the prevalence of diabetes world-wide, especially in younger people [80] affecting a substantial percentage of the pediatric age group in some populations [30].T2DM is characterised by the presence of two basic abnormalities: impairment of insulin secretion and decrease in insulin sensitivity [52].The disease creates a large pathophysiological spectrum from a predominantly secretory defect with moderate, if any, degree of insulin resistance to a predominantly insulin resistant disease with relative insulin deficiency.Whereas insulin resistance can be demonstrated early in life, many years before the diagnosis of diabetes, impairment of insulin secretion develops later in life, usually along with the onset of impaired glucose tolerance [52].", + "The ADA lists four subtypes of diabetes based on the clinical symptoms at time of presentation, [4] namely, Type 1 diabetes, Type 2 diabetes (T2D), gestational diabetes, and diabetes due to specific causes (genetic defects causing deficient insulin secretion or action, diseases of pancreas, use of certain drugs such as steroids, thiazides among others).Of these, T2D is the most prevalent (close to 90% of all cases) and is the major cause of morbidity and mortality in both developed and developing nations [1].At times it is difficult to assign a patient to a particular subtype due to the difference in conditions associated with hyperglycemia at the time of diagnosis [4,7].For example, a lady diagnosed with gestational diabetes mellitus during pregnancy is highly susceptible to develop T2D later.Therefore, other than proper treatment during and post pregnancy, a regular follow-up is required for stratifying disease risk, and for timely management before progression to another subtype.It is clear that the classification of diabetes may not be as simple as just categorizing it into any one of the four given subtypes due to its miscellaneous nature.Every case needs to be considered at the time of presentation, on the basis of the risk factors or underlying cause of hyperglycemia, the clinical symptoms, and disease prognosis.Table 1 lists the various subtypes of diabetes based on the classification suggested by the ADA [4].", + "Type 2 diabetes is the most common type of diabetes with prevalence in the United Kingdom of around 4%.It is most commonly diagnosed in middle-aged adults, although more recently the age of onset is decreasing with increasing levels of obesity (Pinhas-Hamiel and Zeitler, 2005).Indeed, although development of the disease shows high hereditability, the risk increases proportionally with body mass index (Lehtovirta et al., 2010).Type 2 diabetes is associated with insulin resistance, and a lack of appropriate compensation by the beta cells leads to a relative insulin deficiency.Insulin resistance can be improved by weight reduction and exercise (Solomon et al., 2008).If lifestyle intervention fails, there are a variety of drugs available to treat type 2 diabetes (Krentz et al., 2008), which can be divided into five main classes: drugs that stimulate insulin production from the beta cells (e.g.sulphonylureas), drugs that reduce hepatic glucose production (e.g.biguanides), drugs that delay carbohydrate uptake in the gut (e.g.a-glucosidase inhibitors), drugs that improve insulin action (e.g.thiazolidinediones) or drugs targeting the GLP-1 axis (e.g.GLP-1 receptor agonists or DPP-4 inhibitors).", + "RACIALIZED ETIOLOGIES OF DIABETESDiabetes is not one disease but many.More than 90 percent of all diabetics have type 2 diabetes, which is characterized by elevated blood glucose triggered by a combination of poor insulin production, insulin resistance in skeletal muscle and lipid tissue, or both.Type 2 diabetes is also known as Non-Insulin-Dependent Diabetes because, unlike the rarer form of the disease, people with type 2 diabetes produce insulin and therefore seldom need therapeutic insulin at the initial onset of disease.Type 2 diabetes (hereafter, \"diabetes\"), like heart disease, hypertension and asthma, is referred to as a complex disease because its putative determinants lay in both environmental and biological domains.That is, diabetes is caused by a still-unknown combination of factors that include lifestyle, diet, physical activity, and an array of physiological triggers.", + "IntroductionDiabetes mellitus (DM) is a group of metabolic diseases characterized by hyperglycemia, which results from defects in insulin secretion, insulin activity or both.DM is associated with the dysfunction and failure of different organs, such as the blood vessels, heart and kidneys [1], and this disease is considered a global burden [2].The International Diabetes Federation's most recent estimates indicate that 8.3% of adults (382 million individuals) have diabetes, and the number of individuals with this disease is expected to rise beyond 592 million in less than 25 years [2].The vast majority of cases of DM fall into two broad etiopathogenetic categories: type 1 and type 2 DM (T1DM and T2DM, respectively).T1DM, previously named insulin-dependent diabetes or juvenile-onset diabetes, results from cellular-mediated autoimmune destruction of pancreatic cells; therefore, patients are dependent on exogenous insulin.Individuals with T1DM are considered to have a genetic predisposition, although environmental factors, such as dietary components, also contribute to T1DM development [3].Thus, T1DM is the result of a complex interrelation among cells, the immune system and environmental factors in genetically susceptible individuals [3].T1DM appears predominately in children and young adults and affects 5%-10% of diabetic patients [2].T2DM is chronic disorder caused by insulin secretion deficiency and insulin resistance.T2DM is a complex trait that results from the contribution of many genes [4], many environmental factors, including diet [5], and the interactions among these genes and environmental factors.T2DM is more common among individuals aged 40 to 60 years and accounts for most cases of DM (more than 90%) [2].", + "ACCEPTED MANUSCRIPTmost common form of diabetes (90% of all diabetic patients), mainly characterized by insulin resistance.The main causes of T2D include lifestyle, physical activity, dietary habits and heredity, whereas T1D is thought to be due to autoimmunological destruction of the Langerhans islets hosting pancreatic- cells.T1D affects almost 10% of all diabetic patients worldwide, with 10% of them ultimately developing idiopathic diabetes.Other forms of DM, classified on the basis of insulin secretion profile and/or onset, include Gestational Diabetes, endocrinopathies, MODY (Maturity Onset Diabetes of the Young), neonatal, mitochondrial, and pregnancy diabetes.The symptoms of DM include polyurea, polydipsia, and significant weight loss among others.Diagnosis depends on blood glucose levels (Fasting plasma glucose = 7.0 mmol/L) [15].From the perspective of DM, although there are several types of diabetes, the overall results suggest that the articles reviewed refer to T1D and T2D, with T2D representing the majority of the articles.A few articles refer to prediabetes and only one pertains to the metabolic syndrome, which is a term for metabolism-related pathophysiology.The types of data used in each case of the present collection were either clinical, genetic, electrochemical, chemical or medical.Only a few articles used clinical data in combination with genetic data.In addition, it is worth mentioning that the vast majority of the articles reviewed handled only clinical datasets.When it comes to prediction, the main biomarkers used involve anthropometric parameters, demographic characteristics, known risk factors, medical and drug history data, laboratory measurements, and epidemiological data.The most", + "Classification of DiabetesOn the basis of insulin deficiency, diabetes can be classified into the following types as follows.", + "| INTRODUCTIONToday, more than 265 million people are affected across the world.It is estimated that by the year 2030 this number will reach 366 million people (about 4/4 percent of the world's population), and now the cause of death is more than 1.1 million per year (including 50% of the population under-70 years of age and 55% of women).On the other hand, given its negative effect on the economic growth of developing countries, it calls for universal mobilization to combat this disease (Bhattacharya, Dey, & Roy, 2007).Diabetes or diabetes mellitus is referred to as a heterogeneous group of metabolic disorders characterized by chronic hyperglycemia and carbohydrate, fat and protein metabolism disorders that result from a defect in the secretion of insulin, or impairment in its function, or both.Types of diabetes mellitus include type 1, type 2 diabetes and other kind of diabetes, but the two most common types of diabetes mellitus are type 1 and type 2, which are different in several aspects (Meshkani, Taghikhani, Mosapour et al., 2007).Type 1 diabetes has been identified with autoimmune destruction of pancreatic beta cells (insulin secreting cells) and accounts for about 5% of all diabetic people, while type 2 diabetes is a predominant disorder characterized by insulin resistance or a relative decline in insulin production, and accounts for about 90% of all types of diabetes mellitus (Meshkani, Taghikhani, Al-Kateb et al., 2007).Important factors that predispose a person to type 2 diabetes are multifactorial, including genetic factors and environments.However, its inheritance has certainly not been proven, but it is believed that first-degree relatives of diabetic patients have a higher chance to develop the disease.In this regard, recognizing gene polymorphisms of this disease seems to be necessary (Hring et al., 2014).Multiple genes have been studied in the pathogenesis of type 2 diabetes.", + "CONCLUSIONSDiabetes is currently broadly classified as type 1, type 2, gestational, and a group of \"other specific syndromes. \"However, increasing evidence suggests that there are populations of individuals within these broad categories that have subtypes of disease with a well-defined etiology that may be clinically characterized (e.g., LADA, MODY).These developments suggest that perhaps, with more focused research in critical areas, we are approaching a point where it would be possible to categorize diabetes in a more precise manner that can inform individual treatment decisions.Type 2 DiabetesIn the U.S., an estimated 95% of the nearly 30 million people living with diabetes have type 2 diabetes.An additional 86 million have prediabetes, putting them at high risk for developing type 2 diabetes (9).Among the demographic associations for type 2 diabetes are older age, race/ ethnicity, male sex, and socioeconomic status (9).", + "Type 2 diabetes mellitus (T2DM) is characterized by dysregulation of carbohydrate, lipid and protein metabolism, and results from impaired insulin secretion, insulin resistance or a combination of both.Of the three major types of diabetes, T2DM is far more common (accounting for more than 90% of all cases) than either type 1 diabetes mellitus (T1DM) or gestational diabetes.Over the past few decades, our understanding of the development and progression of T2DM has evolved rapidly.Its main cause is progressively impaired insulin secretion by pancreatic -cells, usually upon a background of pre-existing insulin resistance in skeletal muscle, liver and adipose tissue 1", + "Background Diabetes is presently classified into two main forms, type 1 and type 2 diabetes, but type 2 diabetes in particular is highly heterogeneous.A refined classification could provide a powerful tool to individualise treatment regimens and identify individuals with increased risk of complications at diagnosis.", + "IntroductionIn 2018, a ground-breaking study identified five novel subtypes of adult-onset diabetes: severe autoimmune diabetes (SAID, including type 1 diabetes and latent autoimmune diabetes in adults [LADA]) and four subtypes of type 2 diabetes (severe insulin-deficient diabetes [SIDD], severe insulin-resistant diabetes [SIRD], mild obesity-related diabetes [MOD] and mild agerelated diabetes [MARD]) [1].These subtypes differ in their clinical characteristics, complications and genetic backgrounds [1,2].It is unclear if they also differ in modifiable risk factors.", + "Animal models of Type 2 diabetesType 2 diabetes represents a heterogeneous group of disorders characterized by insulin resistance and impaired insulin secretion and defined by a raised fasting or post-challenge blood glucose.Some subtypes of diabetes are now recognized as being because of specific single gene defects [e.g. the maturityonset diabetes of the young (MODY) syndromes [94], syndromes of severe insulin resistance [95] and mitochondrial diabetes [96]].However, for most patients with diabetes, several (if not many) genetic and environmental factors contribute to the causation and progression of the disease and also the late complications.", + "The disease burden related to diabetes is high and rising in every country, fuelled by the global rise in the prevalence of obesity and unhealthy lifestyles.The latest estimates show a global prevalence of 382 million people with diabetes in 2013, expected to rise to 592 million by 2035.The aetiological classification of diabetes has now been widely accepted.Type 1 and type 2 diabetes are the two main types, with type 2 diabetes accounting for the majority (>85%) of total diabetes prevalence.Both forms of diabetes can lead to multisystem complications of microvascular endpoints, including retinopathy, nephropathy and neuropathy, and macrovascular endpoints including ischaemic heart disease, stroke and peripheral vascular disease.The premature morbidity, mortality, reduced life expectancy and financial and other costs of diabetes make it an important public health condition.The disease burden related to diabetes is high and rising in every country, fuelled by the global rise in the prevalence of obesity and unhealthy lifestyles.The latest estimates show a global prevalence of 382 million people with diabetes in 2013, expected to rise to 592 million by 2035.The aetiological classification of diabetes has now been widely accepted.Type 1 and type 2 diabetes are the two main types, with type 2 diabetes accounting for the majority (>85%) of total diabetes prevalence.Both forms of diabetes can lead to multisystem complications of microvascular endpoints, including retinopathy, nephropathy and neuropathy, and macrovascular endpoints including ischaemic heart disease, stroke and peripheral vascular disease.The premature morbidity, mortality, reduced life expectancy and financial and other costs of diabetes make it an important public health condition.", + "IntroductionDiabetes mellitus, also known as simply diabetes, is the most prevalent disease in Westernized, developed countries, and the prevalence of this disease increases with age, accounting for 8.4% of all deaths worldwide [1].Diabetes is a well-recognized multifactorial endocrine metabolic disorder characterized by hyperglycemia (high blood sugar levels over a prolonged period) triggered by insulin secretion deficiencies, insulin action or both [2].The chronic hyperglycemia of diabetes is associated with dysfunction, long-term damage and failure of different organs, particularly the kidneys, heart, blood vessels, nerves and eyes.The development of diabetes involves various pathogenic processes including autoimmune destruction of the pancreatic -cells with subsequent insulin insufficiency which causes insulin resistance [3].The reason for the carbohydrate, fat and protein metabolism disorders in diabetes is insulin deficient activity on target tissues.Insulin deficient action results from insufficient insulin secretion and/or diminished tissue response [4].The great majority of diabetes cases fall into two broad categories of etiopathogenetics.Type 1 diabetes (T1D), falls in one category, is caused due to an absolute deficiency in insulin secretion from pancreatic beta cells.Genetic marker tests and serological evidences of an autoimmune pathological process in pancreatic islets can often be utilized for identification of individuals with increased risk of developing T1D [5].The more prevalent form of diabetes is type 2 diabetes mellitus (T2DM), which falls in the second category and is caused by a combination of insulin resistance and an inadequate compensatory insulin secretory response [6].Consequently, a degree of hyperglycemia occurs that might cause pathological and functional changes in different target tissues but without clinical symptoms and the condition may persist for a long time before T2DM is detected.There are other specific types of diabetes, such as exocrine pancreatic diseases, endocrinopathies, diabetes induced by drugs or chemicals, infection, uncommon forms of immune-mediated diabetes, other genetic syndromes, such as Down syndrome chromosomal abnormalities, Klinefelter syndrome, and sometimes diabetes-related Turner syndrome.Depending on the severity of the underlying disease, the degree of hyperglycemia can change over time [7].", + "TYPE 1 DIABETESDiabetes (or Diabetes mellitus) is a set of disorders characterized by either an absolute or a relative deficiency of insulin and/or insulin resistance.T1D accounts for about 10% of all diabetes cases (Maahs et al., 2010).It has been reported to be the second most prevalent chronic disease of childhood, with a peak onset at about twelve years (Imkampe and Gulliford, 2011)." + ] + ], + "task_id": [ + "849E78D8214245F8E8167E78C01BEE60", + "CB93CE86DA18F287DBEF22CB29C560CF", + "FEE16F5E4D12AF7E7B0DDBF6F047EB76", + "DF05AACA4A1466AC1753DE13631A6ACD", + "57CB850E74BC7A26A645CAAB823D35CD" + ] +}
\ No newline at end of file diff --git a/gnqa/data/study1/datasets/human/dataset_citizenscientist_diabetes_2.json b/gnqa/data/study1/datasets/human/dataset_citizenscientist_diabetes_2.json new file mode 100644 index 0000000..974e503 --- /dev/null +++ b/gnqa/data/study1/datasets/human/dataset_citizenscientist_diabetes_2.json @@ -0,0 +1,110 @@ +{ + "question": [ + "How many types of diabetes exist?", + "Is there a direct association between aging and susceptibility to having diabetes?", + "How does genetics influence the emergency of diabetes? ", + "what genes are associated with diabetes?", + "What causes diabetes?" + ], + "answer": [ + "There are primarily four types of diabetes: Type 1, Type 2, gestational diabetes, and diabetes due to specific causes. However, there are also subtypes within these broad categories, such as latent autoimmune diabetes of adults (LADA) and maturity onset diabetes of the young (MODY).", + "Yes, there is a direct association between aging and susceptibility to having diabetes. The risk of onset of type 2 diabetes increases with age, and most diabetic patients in certain regions are 40 years old or more. Additionally, aging is associated with changes in body composition and glucose tolerance, which can contribute to the development of diabetes.", + "Genetics plays a significant role in the emergence of diabetes. Certain forms of the disease result from mutations in a single gene, while others are multifactorial in origin. For example, monogenic forms of diabetes, which account for approximately 5% of cases, are caused by mutations in genes encoding insulin, the insulin receptor, and other factors. In type 1 diabetes, gene variants in the human leukocyte antigen (HLA) locus and about 50 other genes contribute to the genetic risk. These genes modulate immune regulation, viral responses, and responses to environmental signals. Genetic susceptibility to type 1 diabetes is also determined by genes related to immune function. Both type 1 and type 2 diabetes are polygenic diseases where many common variants contribute to overall disease risk.", + "The genes associated with diabetes include PPARG, KCNJ11, TCF7L2, SLC30A8, HHEX, CDKN2A, IGF2BP2, CDKAL1, IRS1, CCR5, FTO, NOTCH2, WFS1, JAZF1, ADIPOQ, AHSG, CAPN10, ENPP1, PPARGC1A, SREBF1, PDX1, PFAS, GCK, GIGYF1, HNF1A, TNRC6B, and G6PC2.", + "Diabetes is caused by a combination of genetic and environmental factors. This includes a family history of diabetes, increased age, hypertension, lack of physical exercise, obesity, and certain dietary habits. In type 2 diabetes, the body develops resistance to insulin due to the malfunction of insulin-producing -cells. Some cases of diabetes are also linked to single gene defects. Additionally, exposure to certain environmental pollutants has been associated with the development of diabetes." + ], + "contexts": [ + [ + "The prevalence of diabetes mellitus worldwide is sobering; the International Diabetes Federation estimates that 415 million people have diabetes mellitus, with 90% of these individuals having type 2 diabetes mellitus (T2DM) 1 .T2DM occurs when pancreatic -cells fail to release enough insulin to meet the demands of insulin-responsive tissues, which safely store and metabolize glucose.Driven by both genetic and environmental risk factors, T2DM is a complex, multifactorial disorder.Although the increasing prevalence of T2DM is driven by changes in our environment and mirrors the increase in obesity, the greater concordance for T2DM found in monozygotic compared with dizygotic twins has long provided evidence for a genetic component in T2DM risk 2 .", + "In the UK alone, nearly 1.8 million people are already recognized to have this disorder (consuming w5% of the total National Health Service budget), and the search is on to find the 'missing million' who are living with the condition but in whom the diagnosis has yet to be made. 3In the USA, the situation appears to be even more serious with some commentators predicting that one in every three Americans born in the year 2000 will go on to develop diabetes during their lifetime, bringing unprecedented costs in terms of healthcare dollars as well as human morbidity and mortality. 4The majority (w90%) of these cases will be type 2 in origin, reflecting a trend towards obesity and more sedentary lifestyles as the 'norm' rather than the exception in 'developed' societies.Indeed, the face of T2DM is changing, as a condition that was once considered the preserve of middle/old age is increasingly diagnosed in young adults and even children, reflecting the high rates of obesity (and, in particular, visceral adiposity) in these populations.", + "Table 1 lists the various subtypes of diabetes based on the classification suggested by the ADA [4].The ADA lists four subtypes of diabetes based on the clinical symptoms at time of presentation, [4] namely, Type 1 diabetes, Type 2 diabetes (T2D), gestational diabetes, and diabetes due to specific causes (genetic defects causing deficient insulin secretion or action, diseases of pancreas, use of certain drugs such as steroids, thiazides among others).Of these, T2D is the most prevalent (close to 90% of all cases) and is the major cause of morbidity and mortality in both developed and developing nations [1].At times it is difficult to assign a patient to a particular subtype due to the difference in conditions associated with hyperglycemia at the time of diagnosis [4,7].For example, a lady diagnosed with gestational diabetes mellitus during pregnancy is highly susceptible to develop T2D later.Therefore, other than proper treatment during and post pregnancy, a regular follow-up is required for stratifying disease risk, and for timely management before progression to another subtype.It is clear that the classification of diabetes may not be as simple as just categorizing it into any one of the four given subtypes due to its miscellaneous nature.Every case needs to be considered at the time of presentation, on the basis of the risk factors or underlying cause of hyperglycemia, the clinical symptoms, and disease prognosis.", + "IntroductionGlobally, diabetes affects more than 400 million people (World Health Organization, 2016), with Type 1 (insulin-dependent) diabetes (T1D) accounting for up to 10 percent of cases (American Diabetes Association, 2009).In the United States, T1D occurs at a rate of 15-30 cases per 100,000 children aged 0-14 years annually (International Diabetes Foundation, 2017;Maahs et al., 2010), with similar prevalence in Canada, Europe, Australia, and New Zealand (Fig. 1) (Derraik et al., 2012;International Diabetes Foundation, 2017;Maahs et al., 2010).By contrast, the estimated incidence rate of T1D among Asians, South Americans, and Africans is below 15 cases per 100,000 children (Fig. 1) (International Diabetes Foundation, 2017;Maahs et al., 2010).The global incidence of T1D has been rising by 3-5% per annum over the past two decades, with a notable increase in children below 10 years of age (Diamond Project, 2006;Patterson et al., 2009).", + "Animal Models9.2% in women and 9.8% in men, with approximately 347 million people suffering from the disease worldwide in 2008 (Danaei et al., 2011).There are several different classifications of diabetes, the most common being type 1 and type 2 diabetes.Type 2 diabetes is the most common type of diabetes with prevalence in the United Kingdom of around 4%.It is most commonly diagnosed in middle-aged adults, although more recently the age of onset is decreasing with increasing levels of obesity (Pinhas-Hamiel and Zeitler, 2005).Indeed, although development of the disease shows high hereditability, the risk increases proportionally with body mass index (Lehtovirta et al., 2010).Type 2 diabetes is associated with insulin resistance, and a lack of appropriate compensation by the beta cells leads to a relative insulin deficiency.Insulin resistance can be improved by weight reduction and exercise (Solomon et al., 2008).If lifestyle intervention fails, there are a variety of drugs available to treat type 2 diabetes (Krentz et al., 2008), which can be divided into five main classes: drugs that stimulate insulin production from the beta cells (e.g.sulphonylureas), drugs that reduce hepatic glucose production (e.g.biguanides), drugs that delay carbohydrate uptake in the gut (e.g.a-glucosidase inhibitors), drugs that improve insulin action (e.g.thiazolidinediones) or drugs targeting the GLP-1 axis (e.g.GLP-1 receptor agonists or DPP-4 inhibitors).", + "IntroductionDiabetes impacts the lives of approximately 200 million people worldwide [1], with chronic complications including accelerated development of cardiovascular disease.Over 90% of cases are of type 2 diabetes (T2D), with the bulk of the remainder presenting with type 1 diabetes (T1D).", + "Classification of DiabetesOn the basis of insulin deficiency, diabetes can be classified into the following types as follows.", + "| INTRODUCTIONToday, more than 265 million people are affected across the world.It is estimated that by the year 2030 this number will reach 366 million people (about 4/4 percent of the world's population), and now the cause of death is more than 1.1 million per year (including 50% of the population under-70 years of age and 55% of women).On the other hand, given its negative effect on the economic growth of developing countries, it calls for universal mobilization to combat this disease (Bhattacharya, Dey, & Roy, 2007).Diabetes or diabetes mellitus is referred to as a heterogeneous group of metabolic disorders characterized by chronic hyperglycemia and carbohydrate, fat and protein metabolism disorders that result from a defect in the secretion of insulin, or impairment in its function, or both.Types of diabetes mellitus include type 1, type 2 diabetes and other kind of diabetes, but the two most common types of diabetes mellitus are type 1 and type 2, which are different in several aspects (Meshkani, Taghikhani, Mosapour et al., 2007).Type 1 diabetes has been identified with autoimmune destruction of pancreatic beta cells (insulin secreting cells) and accounts for about 5% of all diabetic people, while type 2 diabetes is a predominant disorder characterized by insulin resistance or a relative decline in insulin production, and accounts for about 90% of all types of diabetes mellitus (Meshkani, Taghikhani, Al-Kateb et al., 2007).Important factors that predispose a person to type 2 diabetes are multifactorial, including genetic factors and environments.However, its inheritance has certainly not been proven, but it is believed that first-degree relatives of diabetic patients have a higher chance to develop the disease.In this regard, recognizing gene polymorphisms of this disease seems to be necessary (Hring et al., 2014).Multiple genes have been studied in the pathogenesis of type 2 diabetes.", + "CONCLUSIONSDiabetes is currently broadly classified as type 1, type 2, gestational, and a group of \"other specific syndromes. \"However, increasing evidence suggests that there are populations of individuals within these broad categories that have subtypes of disease with a well-defined etiology that may be clinically characterized (e.g., LADA, MODY).These developments suggest that perhaps, with more focused research in critical areas, we are approaching a point where it would be possible to categorize diabetes in a more precise manner that can inform individual treatment decisions.Type 2 DiabetesIn the U.S., an estimated 95% of the nearly 30 million people living with diabetes have type 2 diabetes.An additional 86 million have prediabetes, putting them at high risk for developing type 2 diabetes (9).Among the demographic associations for type 2 diabetes are older age, race/ ethnicity, male sex, and socioeconomic status (9).Type 1 DiabetesBetween 2001 and 2009, there was a 21% increase in the number of youth with type 1 diabetes in the U.S. (7).Its prevalence is increasing at a rate of ;3% per year globally (8).Though diagnosis of type 1 diabetes frequently occurs in childhood, 84% of people living with type 1 diabetes are adults (9).Type 1 diabetes affects males and females equally (10) and decreases life expectancy by an estimated 13 years (11).An estimated 5-15% of adults diagnosed with type 2 diabetes actually have type 1 diabetes or latent autoimmune diabetes of adults (LADA) (12).", + "Background Diabetes is presently classified into two main forms, type 1 and type 2 diabetes, but type 2 diabetes in particular is highly heterogeneous.A refined classification could provide a powerful tool to individualise treatment regimens and identify individuals with increased risk of complications at diagnosis.", + "Diabetes mellitus now affects ~8% of the world's adult population [1], including ~3 000 000 individuals in the UK (with a further 600 000 people affected but presently undiagnosed) [2].Of these cases, > 90% have Type 2 diabetes.Treatments of the complications of the disease, which range from stroke, blindness and kidney failure to lower limb amputations and cancer, presently consume ~10% of the National Health Service budget, some 14 bn per year [3].These figures are anticipated to increase further in the next 10 years, driven by increasingly sedentary lifestyles and increases in obesity; the collision between these 'environmental' factors and genetic susceptibility (see below) being the key underlying driver.Whilst existing treatments ameliorate the symptoms of the disease, notably hyperglyca-emia, none target the underlying molecular aetiology.In particular, no available treatments tackle the progressive and largely irreversible loss of insulin production [4] which, in the face of insulin resistance, underlies the progressive deterioration in glucose control.Reductions in b-cell mass [5,6] and dysfunction [7] both contribute to this gradual impairment in insulin release.Recent years have seen an increase in the view that the former may play a less important role than the latter, with a 2008 study by Rahier et al. [6] reporting that b-cell mass (and insulin content) in people with Type 2 diabetes was on average ~35% lower than that of healthy control subjects.However, this difference was only ~24% within 5 years of diagnosis, far below levels likely to lead to the symptoms of diabetes.Indeed, given our present inability to monitor b-cell mass prospectively over the course of the disease, it is conceivable that the differences observed post mortem between healthy individuals and those with Type 2 diabetes [5,6] may reflect an increased predisposition to diabetes in those born with a lower than average b-cell mass.", + "INTRODUCTIONType 2 diabetes (T2D) affects an estimated 425 million people worldwide, a number predicted to rise to 629 million by 2045 (1).The disease usually involves insulin resistance but is ultimately the result of pancreatic b cell failure, a sine qua non for disease development (2).In contrast, Type 1 diabetes (T1D) affects a smaller proportion of people with diabetes and is chiefly the result of pancreatic b cell destruction mediated by immune cells (3).", + "IntroductionDiabetes is a complex and heterogeneous disease with a staggering global impact and the most recent estimates indicate 346 million people worldwide suffer from this disease (WHO Diabetes Fact sheet No. 312, 2011).Type 2 diabetes mellitus (T2DM) is the most common form of diabetes, accounting for >90% of cases, and occurs when peripheral tissue insulin resistance accompanies insufficient b-cell insulin production.While >80% of diabetes deaths occur in low-and middle-income countries [1].India and China have the highest reported prevalence of diabetes with 65 and 98 million in 2013, respectively [2].", + "The disease burden related to diabetes is high and rising in every country, fuelled by the global rise in the prevalence of obesity and unhealthy lifestyles.The latest estimates show a global prevalence of 382 million people with diabetes in 2013, expected to rise to 592 million by 2035.The aetiological classification of diabetes has now been widely accepted.Type 1 and type 2 diabetes are the two main types, with type 2 diabetes accounting for the majority (>85%) of total diabetes prevalence.Both forms of diabetes can lead to multisystem complications of microvascular endpoints, including retinopathy, nephropathy and neuropathy, and macrovascular endpoints including ischaemic heart disease, stroke and peripheral vascular disease.The premature morbidity, mortality, reduced life expectancy and financial and other costs of diabetes make it an important public health condition.The disease burden related to diabetes is high and rising in every country, fuelled by the global rise in the prevalence of obesity and unhealthy lifestyles.The latest estimates show a global prevalence of 382 million people with diabetes in 2013, expected to rise to 592 million by 2035.The aetiological classification of diabetes has now been widely accepted.Type 1 and type 2 diabetes are the two main types, with type 2 diabetes accounting for the majority (>85%) of total diabetes prevalence.Both forms of diabetes can lead to multisystem complications of microvascular endpoints, including retinopathy, nephropathy and neuropathy, and macrovascular endpoints including ischaemic heart disease, stroke and peripheral vascular disease.The premature morbidity, mortality, reduced life expectancy and financial and other costs of diabetes make it an important public health condition.", + "Introduction: Is Type 2 Diabetes a Genetic Disorder?According to the World Health Organization (WHO), approximately 350 million people worldwide have diabetes, and this disorder is likely to be the seventh leading cause of death in 2030.Diabetes is an economic burden on healthcare systems, especially in developing countries (World Health Organization, 2013)." + ], + [ + "Our result provides a novel hypothesis on the mechanism for the connection between two aging-related diseases: Alzheimer's disease and type 2 diabetes.", + "There are two major factors that underlie these alarming projections.The first is T2D is associated with age, and Western populations are aging rapidly.The second major explanation is our lifestyles have changed dramatically in recent years.Epidemiological studies have identified strong T2D risk relationships for obesity, sedentary behavior [2][3][4], and diets rich in energy [5], processed carbohydrates [6], and animal fats [7].Collectively, these lifestyle factors impede the actions of insulin and raise hepatic glucose production, which can result in the diminution of endogenous insulin production and T2D.The strongest evidence for a causal relationship between adverse lifestyle behaviors and T2D comes from randomized controlled trials that show intensive lifestyle interventions involving structured exercise regimes which promote habitual physical activity (PA) and have a major beneficial impact on diabetes incidence in high-risk individuals [8,9].Epidemiological studies examining the associations between lifestyle behaviors and diabetes risk have reached similar conclusions as the clinical trials described above.For example, the 14-year follow-up University of Pennsylvania Alumni Health Study [52] (n = 5,990 men aged 39-68 years) showed PA (leisure time physical activity [LTPA] expressed in kcal expended per week through walking, stair climbing, and sports) was inversely associated with the incidence of T2D.Incidence rates declined as energy expenditure rose from 500 through 3,500 kcal/week.The age-adjusted relative risk ratio (RR) of T2D was reduced by about 6% for each 500 kcal increment increase in PA energy expenditure.", + "Overall, results were similar in analyses restricted to diabetes mellitus identified at baseline only, although the confidence interval included 1.These results suggest that diabetes mellitus is related to risk of AD in old age.These findings are consistent with the results of 2 large longitudinal cohort studies. 5,6In one study, 5 diabetes mellitus doubled the risk of AD during 2 years of follow-up in a sample of more than 6000 older persons from a defined cohort.The other study, 6 using data from about 2500 Japanese American men, found a similar result: diabetes mellitus approximately doubled the risk of AD.In contrast, 2 other longitudinal studies 7,8 did not demonstrate a significant association between diabetes mellitus and incident AD, but in both, the results were in the direction of increased risk.Some, [9][10][11] but not all, 12 previous studies found that diabetes mellitus was related to change in cognitive function.One factor that may contribute to variability from study to study is that diabetes mellitus may be related to decline in some cognitive systems but not others.4][15] Although diabetes mellitus was related to level of global cognition and multiple cognitive domains at baseline, we found that diabetes mellitus was only related to decline in perceptual speed.The one study 12 that did not find a relation between diabetes mellitus and cognitive decline did not include a measure of perceptual speed.COMMENTIn a cohort of more than 800 older persons, we found that diabetes mellitus sometime in the study was associated with an increased risk of developing AD during a mean of 5.5 years of observation.The risk of incident AD was 65% higher in those with diabetes mellitus than in those without it.In summary, these findings suggest that diabetes mellitus is associated with AD and decline in cognitive function in older persons.December 12, 2003.DIABETES MELLITUS AND RISK OF ADDuring the follow-up evaluations, 151 persons developed AD, of whom 31 had diabetes mellitus.In a proportional hazards model adjusted for age, sex, and educational level, there was a 65% increase in the risk of developing AD in those with diabetes mellitus compared with those without diabetes mellitus (hazard ratio, 1.65; 95% confidence interval, 1.10-2.47).The cumulative hazard of AD over time, adjusted for age, sex, and educational level, is shown graphically in Figure 1 for typical participants with and without diabetes mellitus.Similar results were found in analyses with diabetes mellitus identified at baseline only (hazard ratio, 1.53; 95% confidence interval, 0.96-2.45).", + "Age. Age is another factor that has a considerable effect on outcomes in obesity and T2DM research.In humans, body weight increases with age and peaks at ~55 years in both men and women.Ageing per se is associated with a redistribution of both the fat-free mass and the fat mass, with the latter increase starting at ~30 years of age 129 .Intramuscular and intrahepatic fat are particularly increased in older persons, and this increase has been linked to insulin resistance 130 .Partially on the basis of these changes, ageing has been proposed to be an independent determinant of glucose tolerance, which progressively worsens with age 131,132 .", + "Age also plays a vital role in the onset of diabetes (Cowie & Eberhardt, 1995).In south-east Asia almost 97% diabetic patients are 40 years old or more (IDF Atlas, 2017).In Bangladesh, the reported age of diabetes is 40 years in 71% urban and 85% rural female, while in the case of male the proportion is 85.5% urban and 86.5% in rural population (IDF Atlas, 2017).The current study also pinpointed an exponential increase in the risk of onset of T2DM with the increase of age when 40 years was chosen as the reference (Table S4).Whether age and stress variables are risk factors for type 2 diabetes incidence was assessed by multivariate logistic regression (Table S4).Subjects in the age groups of (40-60) and >60 years had 1.78 (p = .005)and 3.19 (p = .006)greater risk for type 2 diabetes respectively than group of <40 years.Overall, patients under stressful condition are more likely to develop T2DM than that of nonstressed respondent (p = .000).Moreover, when stress is divided into two groups-low stress and high stress, we found that both males (p = .000)and females (p = .000)with high stress were at high risk of diabetes mellitus, whereas the association between low stress and T2DM incidence was significant only among males (Male: p = .002;Female: p = .115).The distribution and association of the genotypes, age, and stress with T2DM have been summarized in Table 3 and Figure 3.There was no difference in T2DM incidence between CT (p = .030)and TT/CC (p = .034)genotype containing people who were in age group of 40-60 years (Table 3).In contrast, people who were more than 60 years old with CT genotype (OR = 4.636, p = .029)were more prone to T2DM than that of TT/CC genotype (OR = 3.714, p = .007)subjects (Table 3).", + "Research GapsThere is a clear correlation of environmental influences to diabetes risk.Yet, the assembled experts agreed that hypothesis-driven research is needed to define direct causal relationships between specific environmental factors and pathophysiologies leading to diabetes.Research efforts need to address environmental etiologies of type 1 diabetes and determine their relative contribution to onset of autoimmunity and progression to symptomatic disease.Whether there is a direct causal role of the intestinal microbiota in pathogenesis of type 1 and type 2 diabetes and response to therapies needs to be determined.Public health interventions that successfully reduce the levels of consumption of energy-dense foods and/or reduce sedentary time and increase time spent in physical activity need to be evaluated to determine whether they can reduce type 2 diabetes incidence at a population level.", + "In sum, it is clear that multiple risk factors are involved in diabetes-associated cognitive decrements as well as in dementia in relation to diabetes 38 .On the basis of our assessment of the literature, it is also clear that there are still substantial knowledge gaps on how the risk factors interconnect, how the risk factors translate to potentially modifiable mechanisms and which genetic factors are involved.", + "The aim of this study was to investigate the association between age at natural menopause and risk of developing type 2 diabetes, and to assess whether this association is independent of potential intermediate risk factors for type 2 diabetes.Furthermore, we examined the role of endogenous sex hormone levels in the association between age at natural menopause and type 2 diabetes.Aims/hypothesis In this study, we aimed to examine the association between age at natural menopause and risk of type 2 diabetes, and to assess whether this association is independent of potential mediators.Methods We included 3639 postmenopausal women from the prospective, population-based Rotterdam Study.Age at natural menopause was self-reported retrospectively and was treated as a continuous variable and in categories (premature, <40 years; early, 40-44 years; normal, 45-55 years; and late menopause, >55 years [reference]).Type 2 diabetes events were diagnosed on the basis of medical records and glucose measurements from Rotterdam Study visits.HRs and 95% CIs were calculated using Cox proportional hazards models, adjusted for confounding factors; in another model, they were additionally adjusted for potential mediators, including obesity, C-reactive protein, glucose and insulin, as well as for levels of total oestradiol and androgens.Aims/hypothesis In this study, we aimed to examine the association between age at natural menopause and risk of type 2 diabetes, and to assess whether this association is independent of potential mediators.Methods We included 3639 postmenopausal women from the prospective, population-based Rotterdam Study.Age at natural menopause was self-reported retrospectively and was treated as a continuous variable and in categories (premature, <40 years; early, 40-44 years; normal, 45-55 years; and late menopause, >55 years [reference]).Type 2 diabetes events were diagnosed on the basis of medical records and glucose measurements from Rotterdam Study visits.HRs and 95% CIs were calculated using Cox proportional hazards models, adjusted for confounding factors; in another model, they were additionally adjusted for potential mediators, including obesity, C-reactive protein, glucose and insulin, as well as for levels of total oestradiol and androgens.Results During a median follow-up of 9.2 years, we identified 348 individuals with incident type 2 diabetes.After adjustment for confounders, HRs for type 2 diabetes were 3.7 (95% CI 1.8, 7.5), 2.4 (95% CI 1.3, 4.3) and 1.60 (95% CI 1.0, 2.8) for women with premature, early and normal menopause, respectively, relative to those with late menopause (ptrend <0.001).The HR for type 2 diabetes per 1 year older at menopause was 0.96 (95% CI 0.94, 0.98).Further adjustment for BMI, glycaemic traits, metabolic risk factors, C-reactive protein, endogenous sex hormone levels or shared genetic factors did not affect this association.Conclusions/interpretation Early onset of natural menopause is an independent marker for type 2 diabetes in postmenopausal women.association and explore whether the timing of natural menopause can add value to diabetes prediction and prevention.", + "Although drawing of definitive conclusions is difficult from these observational studies, their results suggest that young-onset type 2 diabetes is associated with a much more frequent occurrence of adverse macrovascular and microvascular outcomes and a more rapidly progressing severity of complications than is seen in type 1 diabetes or later-onset type 2 diabetes.In a study of the age-specific incidence of type 2 diabetes in the UK (a retrospective cohort study of patients with newly diagnosed type 2 diabetes between 1990 and 2010), the investigators reported a substantial increase in the proportion of people aged 40 years or younger at diagnosisThe prevalence of type 2 diabetes in adolescents and young adults is dramatically increasing.Similar to older-onset type 2 diabetes, the major predisposing risk factors are obesity, family history, and sedentary lifestyle.Onset of diabetes at a younger age (defined here as up to age 40 years) is associated with longer disease exposure and increased risk for chronic complications.Young-onset type 2 diabetes also affects more individuals of working age, accentuating the adverse societal effects of the disease.Furthermore, evidence is accumulating that young-onset type 2 diabetes has a more aggressive disease phenotype, leading to premature development of complications, with adverse effects on quality of life and unfavourable effects on long-term outcomes, raising the possibility of a future public health catastrophe.In this Review, we describe the epidemiology and existing knowledge regarding pathophysiology, risk factors, complications, and management of type 2 diabetes in adolescents and young adults.", + "The biological processes linking aging and disease risk are poorly understood.Still, aging is considered to date as one of the main factors responsible for several complex diseases including cancer, cardiovascular diseases, and diabetes." + ], + [ + "A. Genetic ScreeningWe have discussed above the genetic component of T1D.The genetic susceptibility to T1D is determined by genes related to immune function with the potential exception of the insulin gene (434).The genetic susceptibility component of T1D allows some targeting of primary preventive care to family members of diagnosed T1D patients, but there is no complete inheritance of the disease.Nevertheless, the risk for developing T1D compared with people with no family history is 10 -15 times greater.Although 70% of individuals with T1D carry defined risk-associated genotypes at the HLA locus, only 3-7% of the carriers of such genetic risk markers develop diabetes (3).II. THE GENETICS OF TYPE 1 DIABETESA comprehensive overview of genetic data in mouse and human is beyond the scope of this article.Instead, we will focus on how the various susceptibility genes and environmental triggers can fit in a mechanistic model for T1D etiology.", + "If an environmental contributor is near ubiquitous and the geneticpredisposition common as well, interventions are most sensibly weighted towardsenvironmental risk factor modification. Even here, though, there is room for further research, since the etiopathogenesisof type 2 diabetes may not be as well understood as some suggest. Specifically,Chaufan implies that dietary intervention to prevent prenatal programmingleading to susceptibility to develop type 2 diabetes (the fetal origins of adult onsetdisease hypothesis) is as evidence-based as dietary management of the adult diabetic state. However, many questions remain in this area.", + "In 1976, the noted human geneticist James Neel titled a book chapter \"Diabetes Mellitus: A Geneticist's Nightmare.\" 1 Over the past 30 years, however, the phenotypic and genetic heterogeneity of diabetes has been painstakingly teased apart to reveal a family of disorders that are all characterized by the disruption of glucose homeostasis but that have fundamentally different causes.Recently, the availability of detailed information on the structure and variation of the human genome and of new high-throughput techniques for exploiting these data has geneticists dreaming of unraveling the genetic complexity that underlies these disorders.This review focuses on type 1 diabetes mellitus and includes an update on recent progress in understanding genetic factors that contribute to the disease and how this information may contribute to new approaches for prediction and therapeutic intervention.Type 1 diabetes becomes clinically apparent after a preclinical period of varying length, during which autoimmune destruction reduces the mass of beta cells in the pancreatic islets to a level at which blood glucose levels can no longer be maintained in a physiologic range.The disease has two subtypes: 1A, which includes the common, immune-mediated forms of the disease; and 1B, which includes nonimmune forms.In this review, we focus on subtype 1A, which for simplicity will be referred to as type 1 diabetes.Although there are rare monogenic, immune-mediated forms of type 1 diabetes, 2,3 the common form is thought to be determined by the actions, and possible interactions, of multiple genetic and environmental factors.The concordance for type 1 diabetes in monozygotic twins is less than 100%, and although type 1 diabetes aggregates in some families, it does not segregate with any clear mode of inheritance. 4-7Despite these complexities, knowledge of genetic factors that modify the risk of type 1 diabetes offers the potential for improved prediction, stratification of patients according to risk, and selection of possible therapeutic targets.As germ-line factors, genetic risk variants are present and amenable to study at all times -before, during, and after the development of diabetes.Thus, genetic information can serve as a potential predictive tool and provide insights into pathogenetic factors occurring during the preclinical phase of the disease, when preventive measures might be applied. Gene tic S t udiesBecause of the uncertainty regarding the number and action of genes involved in type 1 diabetes, genetic studies have tended to focus on approaches that require few assumptions about the underlying model of disease risk.The two primary approaches have been linkage studies (using pairs of affected relatives, typically siblings) and association studies (using either case-control or family-based designs).Linkage studies using affected sibling pairs seek to identify regions of the genome that are shared", + "EnvironmentThe second factor in Figure 1 is environmental aspects.An important concept is the diabetes genotype typically causes only a predisposition for glucose intolerance (note the terminology susceptibility gene was used in the preceding paragraphs).Whether one develops the diabetes phenotype depends on environmental factors, some obvious in how they act, others less so.For instance, the Nurses Health Survey showed positive associations between obesity and lack of physical activity in the development of type 2 diabetes (as expected), but also protection by not smoking and moderate alcohol intake (14).Already discussed, many studies have shown an association between TV watching, high calorie diets, and lack of physical activity with risk of diabetes, i.e., our modern lifestyle, so it is not surprising that there is an explosion in the incidence of diabetes worldwide.", + "The genetics of type 1 diabetesThere is a strong genetic risk to T1D.This is exemplified by (Redondo et al., 2001) who demonstrated a strong concordance of genetic inheritance (65%) and T1D susceptibility in monozygotic twin pairs.That is, when one sibling is afflicted, there is a high probability that the other twin will develop T1D by the age of 60 years.Additionally, autoantibody positivity and islet destruction was observed after a prospective long-term follow-up of monozygotic twins of patients with T1D, despite initial disease-discordance among the twins (Redondo et al., 2008).", + "Type 1 diabetes is a genetic diseaseFamily studies have indicated that genetic factors are important determinants of type 1 diabetes risk.First, the risk to a sibling of an affected individual is approximately 6%, as compared with an average risk of 0.4% (depending on the population), or a relative increased risk of 15-fold (17).The increased risk to siblings is referred to as l s (18) and is one measure of the degree of familial clustering of the disease.Family and twin studies indicate that a substantial fraction of susceptibility to type 1 diabetes is attributable to genetic factors.These and other epidemiologic studies also implicate environmental factors as important triggers.Although the specific environmental factors that contribute to immune-mediated diabetes remain unknown, several of the relevant genetic factors have been identified using two main approaches: genome-wide linkage analysis and candidate gene association studies.This article reviews the epidemiology of type 1 diabetes, the relative merits of linkage and association studies, and the results achieved so far using these two approaches.Prospects for the future of type 1 diabetes genetics research are considered.Family and twin studies indicate that a substantial fraction of susceptibility to type 1 diabetes is attributable to genetic factors.These and other epidemiologic studies also implicate environmental factors as important triggers.Although the specific environmental factors that contribute to immune-mediated diabetes remain unknown, several of the relevant genetic factors have been identified using two main approaches: genome-wide linkage analysis and candidate gene association studies.This article reviews the epidemiology of type 1 diabetes, the relative merits of linkage and association studies, and the results achieved so far using these two approaches.Prospects for the future of type 1 diabetes genetics research are considered.", + "Genes affecting type 1 diabetes diagnosis age / A. Syreeni et al.Genome-wide search for genes affecting the age at diagnosis of type 1 diabetes.", + "Thus, the most likely scenario is that these genes are more poised for activation in the case group compared with the control group, contributing to various diabetes complications in the long term.This could be a consequence of the early exposure to hyperglycemia (measured by HbA 1c level), which is known to be associated with increased rates of long-term diabetes complications.", + "Genetic Background and EnvironmentBoth type 1 and 2 diabetes as well as other rare forms of diabetes that are directly inherited, including MODY and diabetes due to mutations in mitochondrial DNA, are caused by a combination of genetic and environmental risk factors.Unlike some traits, diabetes does not seem to be inherited in a simple pattern.Undoubtedly, however, some people are born prone to developing diabetes more so than others.Several epidemiological patterns suggest that environmental factors contribute to the etiology of T1D.Interestingly, the recent elevated number of T1D incidents projects a changing global environment, which acts either as initiator and/or accelerator of beta cell autoimmunity rather than variation in the gene pool.Several genetic factors are involved in the development of the disease [127].There is evidence that more than twenty regions of the genome are involved in the genetic susceptibility to T1D.", + "Type 1 DiabetesThe higher type 1 diabetes prevalence observed in relatives implies a genetic risk, and the degree of genetic identity with the proband correlates with risk (22)(23)(24)(25)(26). Gene variants in one major locus, human leukocyte antigen (HLA) (27), confer 50-60% of the genetic risk by affecting HLA protein binding to antigenic peptides and antigen presentation to T cells (28).Approximately 50 additional genes individually contribute smaller effects (25,29).These contributors include gene variants that modulate immune regulation and tolerance (30)(31)(32)(33), variants that modify viral responses (34,35), and variants that influence responses to environmental signals and endocrine function (36), as well as some that are expressed in pancreatic b-cells (37).Genetic influences on the triggering of islet autoimmunity and disease progression are being defined in relatives (38,39).Together, these gene variants explain ;80% of type 1 diabetes heritability.Epigenetic (40), gene expression, and regulatory RNA profiles (36) may vary over time and reflect disease activity, providing a dynamic readout of risk.GeneticsBoth type 1 and type 2 diabetes are polygenic diseases where many common variants, largely with small effect size, contribute to overall disease risk.Disease heritability (h 2 ), defined as sibling-relative risk, is 3 for type 2 diabetes and 15 for type 1 diabetes (17).The lifetime risk of developing type 2 diabetes is ;40% if one parent has type 2 diabetes and higher if the mother has the disease (18).The risk for type 1 diabetes is ;5% if a parent has type 1 diabetes and higher if the father has the disease (19).Maturity-onset diabetes of the young (MODY) is a monogenic disease and has a high h 2 of ;50 (20).Mutations in any 1 of 13 different individual genes have been identified to cause MODY (21), and a genetic diagnosis can be critical for selecting the most appropriate therapy.For example, children with mutations in KCJN11 causing MODY should be treated with sulfonylureas rather than insulin.", + "Type 1 diabetes as well as type 2 diabetes shows a genetic predisposition, although only type 1 diabetes is HLA dependent [32,33,36,40].", + "Genetic factors have an important role in the development of diabetes, with some forms of the disease resulting from mutations in a single gene.Others are multifactorial in origin.The monogenic forms of diabetes account for approximately 5% of cases and are caused by mutations in genes encoding insulin 3 , the insulin receptor 4 , the glycolytic enzyme glucokinase 5 , and the transcription factors hepatocyte nuclear factor-1 (HNF-1), HNF-1, HNF-4, insulin promoter factor-1 and NeuroD1/BETA2 (refs 6-10).Mutations in maternally inherited mitochondrial genes can also cause diabetes, often in association with hearing loss 11 .", + "The proportion of diabetics t h a t will result frommating between genetic types can be predicted withcertainty, since the inheritance is known to be underthe control of a recessive gene with complete penetrance. Offspring t h a t will exhibit the diabetic syndrome can be distinguished from those t h a t will not,as early as 3 weeks after birth. Some disadvantages are equally apparent. Diabetichomozygotes do not breed, and heterozygotes cannotbe distinguished from normals except b y progenytesting.", + "Studies [71][72][73][74] in Mexican and Asian populations have identified several mutations associated with type 2 diabetes in young people.The high prevalence of type 2 diabetes in the parents of young people diagnosed with type 2 diabetes could reflect a stronger genetic predisposition, even when monogenic diabetes is excluded.This hypothesis suggests that efforts to define genes that cause type 2 diabetes by linkage might be more powerful if focused on young adults with diabetes, raising the question of whether type 2 diabetes in older populations has a relatively smaller genetic contribution and a stronger environmental contribution. 66", + "Genetics is one example of the 'other risk factors' involved in the pathogenesis of DR.Twin and epidemiological studies have strongly suggested a genetic component in the etiology of DR (6 -10), with heritability scores ranging from 27 to 52% in both type 1 and type 2 diabetes (7 -10).There is an increased risk of severe DR among family members of DR subjects (8,9) and in siblings of affected subjects (8,9).Furthermore, several studies have also shown a discrepant rate of the prevalence of DR among different racial ethnic groups in the US population, with a significantly higher prevalence observed among Hispanic, African-American and Chinese-American when compared with Caucasian populations (11).While these differences may partially be attributed to lifestyle factors, evidence from familial aggregation, ethnic differences and heritability clearly supports a genetic contribution in the etiology of DR." + ], + [ + "To see which other significant genes were likely to have a role in diabetes we looked at all variant sets with a significant glucose, HbA1c, or T2D association and examined whether they had associations with additional diabetes traits (p 0.0016, correcting for 32 sets tested).Damaging missense variants in PDX1 and PFAS, which significantly associated with HbA1c levels in our primary analysis, associated with T2D diagnosis using this threshold (Table 3 and Supplementary Table 14).Identification of genes with a biological role in diabetes. Variants in two genes, GCK and GIGYF1, significantly associated with glucose, HbA1c and T2D diagnosis, strongly suggesting a biological role in diabetes; GCK is involved in Mendelian forms of diabetes while GIGYF1 has not previously been implicated by genetics in the disease.Both GCK and GIGYF1 are located on chromosome 7 but are 56 Mb apart, strongly suggesting that these signals are independent; this independence was confirmed by conditional analysis (Supplementary Table 13).Two additional variant sets, HNF1A pLOF and TNRC6B pLOF, had genome-wide associations with both T2D diagnosis and HbA1c levels while G6PC2 damaging missense variants associated with decreased levels of both glucose and HbA1c but not T2D diagnosis (Table 3).", + "One obvious locus to consider is TCF7L2 in the context of type 2 diabetes.Common genetic variation located within the gene encoding transcription factor 7 like 2 (TCF7L2) has been consistently reported to be strongly associated with the disease.Such reports range from 2006, when we first published the association [3], to the recent transethnic meta-analysis GWAS of type 2 diabetes [4].", + "Testing of these loci for association with T2D as a dichotomous trait in up to 40,655 cases and 87,022 nondiabetic controls demonstrated that the fasting glucose-raising alleles at seven loci (in or near ADCY5, PROX1, GCK, GCKR and DGKB-TMEM195 and the known T2D genes TCF7L2 and SLC30A8) are robustly associated (P < 5 10 8 ) with increased risk of T2D (Table 2).The association of a highly correlated SNP in ADCY5 with T2D in partially overlapping samples is reported by our companion manuscript 29 .We found less significant T2D associations (P < 5 10 3 ) for variants in or near CRY2, FADS1, GLIS3 and C2CD4B (Table 2).These data clearly show that loci with similar fasting glucose effect sizes may have very different T2D risk effects (see, for example, ADCY5 and MADD in Table 2).", + "Among the confirmed and potential type 2 diabetes risk genes described in Tables 1 and 2, eight genes influence whole-body or peripheral insulin sensitivity: ADIPOQ (47, 52, 250 -257), AHSG (75, 258), CAPN10 (259 -264), ENPP1 (265)(266)(267)(268)(269)(270)(271), PPARG (272)(273)(274)(275)(276)(277)(278)(279)(280)(281)(282)(283), PPARGC1A (284,285), SREBF1 (65), and TCF7L2 (133,151,286,287).", + "Despite identification of many putative causative genetic variants, few have generated credible susceptibility variants for type 2 diabetes.Indeed, the most important finding using linkage studies is the discovery that the alteration of TCF7L2 (TCF-4) gene expression or function (33) disrupts pancreatic islet function and results in enhanced risk of type 2 diabetes.Candidate gene studies have also reported many type 2 diabetes-associated loci and the coding variants in the nuclear receptor peroxisome proliferator-activated receptor-g (34), the potassium channel KCNJ11 (34), WFS1 (35), and HNF1B (TCF2) (36) are among the few that have been replicated (Table 2).Recently, there have been great advances in the analysis of associated variants in GWA and replication studies due to highthroughput genotyping technologies, the International HapMap Project, and the Human Genome Project.Type 2 susceptibility loci such as JAZF1, CDC123-CAMK1D, TSPAN8-LGR5, THADA, ADAMTS9, NOTCH2, and ADCY5 (37,38) are among some of the established loci (Table 2).CDKN2A/B, CDKAL1, SLC30A8, IGF2BP2, HHEX/IDE, and FTO are other established susceptibility loci for diabetes (Table 2) (34,39,40).GWA studies have also identified the potassium voltage-gated channel KCNQ1 (32) as an associated gene variant for diabetes.A recent GWA study reporting a genetic variant with a strong association with insulin resistance, hyperinsulinemia, and type 2 diabetes, located adjacent to the insulin receptor substrate 1 (IRS1) gene, is the C allele of rs2943641 (41).Interestingly, the parental origin of the single nucleotide polymorphism is of importance because the allele that confers risk when paternally inherited is protected when maternally transmitted.GWA studies for glycemic traits have identified loci such as MTNR1B (42), GCK (glucokinase) (42), and GCKR (glucokinase receptor) (42); however, further investigation of genetic loci on glucose homeostasis and their impact on type 2 diabetes is needed.Indeed, a recent study by Soranzo et al. (42) using GWA studies identified ten genetic loci associated with HbA 1c .Genetic factors affecting expression, turnover, and abnormal glycation of hemoglobin may be associated with changes in levels of HbA 1c .", + "G enome-wide association studies (GWAS) have iden- tified several type 2 diabetes mellitus (T2DM) susceptibility loci including CDKAL1, CDKN2B, IGF2BP2, HHEX, SLC30A8, PKN2, LOC387761 (1)(2)(3)(4)(5), and KCNQ1, which was recently identified by similar GWAS approach in two independent Japanese samples (6,7).Although these associations have been well replicated in Japanese populations (8), the role of these loci in other East Asian populations remains less clear.For example, a study in China by Wu et al. (9) did not find significant associations between single-nucleotide polymorphisms (SNPs) in IGF2BP2 and SLC30A8 with T2DM, whereas an association between SNPs at the HHEX locus and T2DM was reported among Chinese living in Shanghai, but not among Chinese in Beijing.Another study in Hong Kong Chinese (10) also did not find an association with SNPs at the IGF2BP2 locus; however, they reported an association between T2DM with SNPs at the HHEX and SLC30A8 loci.", + "In studies where overt T2D has been the phenotype the majority of associated polymorphisms have encoded proteins known to be involved in -cell metabolism; for example TCF7L2, KCNJ11 and HHEX have shown robust association [170,171].This suggests that these genes could prove useful in predicting -cell preservation during the course of T2D.The glucokinase gene (GCK) coding for the initial glucose-sensing step in the -cell can have activating mutations causing hypoglycemia that might provide structural and functional models leading to drug targets for treating T2D [172].In the GoDARTs study, investigators examined the medication response of metformin and sulphonylurea based on the TCF7L2 variants mainly affecting the -cell.The carriers of the at risk 'T' allele responded less well to sulphonylurea therapy than metformin [173].Also it is of significant public health interest that in the Diabetes Prevention Program, lifestyle modifications were shown to reduce the risk of diabetes conferred by risk variants of TCF7L2 at rs7093146, and in placebo participants who carried the homozygous risk genotype (TT), there was 80% higher risk for developing diabetes compared to the lifestyle intervention group carrying the same risk genotypes [35].These findings could herald significant future progress in the field of T2D pharmacogenomics, possibly leading to the development and use of agents tailored on the basis of genotype.", + "Despite sharing only 9 loci (among 26 and 17 total in the two analyses, respectively), the separate analyses both identified genes involved in diabetes-related biological functions, including \"glucose homeostasis,\" \"pancreas development\" and \"insulin secretion\" (Supplementary Tables 3 and 5).Three of the top eleven scoring genes in our independent replication analysis have verified causal links to T2D, as annotated in the OMIM 41 .These include genes encoding transcription factors TCF7L2 (TCF4), which has extensive evidence of being causal in T2D 61,62 , and HNF1B, which is a known cause of maturity onset diabetes of the young 63 .Other high-ranking candidate genes have been identified as therapeutic targets in T2D (for example, CTBP1 (ref.64) and LEP 65 ), and the high-scoring gene HHEX has recently been shown to play a key role in islet function 66 .", + "Similar findings to AMD are now unfolding with type 2 DM.Grant et al. (24) first reported on a variant of the gene TCF7L2, which has been linked to reduced beta cell function and poor insulin response to oral glucose loads (51).Since its first discovery, this gene has been widely confirmed in independent studies as a pivotal susceptibility marker for type 2 DM (23,(25)(26)(27)(28)40).Recently, 6 genome-wide SNP association studies have identified and replicated in separate stages several additional novel genes conferring susceptibility to type 2 DM (23,(25)(26)(27)(28)40) (Table 2).Interestingly, these loci primarily include genes involved in pancreatic beta cell development and function as opposed to insulin resistance-the current accepted mechanism for type 2 DM.This development casts doubt on our traditional pathophysiological modeling of the type 2 diabetic patient and underscores the need for genomic studies to further define pathobiological processes of complex traits.", + "Of the 16 loci that have been associated with type 2 diabetes previously, [8][9][10][11][12][13][14][15] we showed that 11 -TCF7L2, PPARG, FTO, KCNJ11, NOTCH2, WFS1, CDKAL1, IGF2BP2, SLC30A8, JAZF1, and HHEXwere associated with an enhanced risk of future diabetes.Many of the variants that we genotyped appear to influence beta-cell function, possibly through effects on proliferation, regeneration, and apoptosis.There was a time-dependent increase in the BMI and a decrease in insulin sensitivity in the subjects from the Botnia study, an increase in insulin resistance that was reflected by an increase in insulin secretion.However, this increase was inadequate to compensate for the increase in insulin resistance in carriers with a high genetic risk, which resulted in a markedly impaired disposition index.Only variants in FTO were associated with an increased BMI.Both FTO and PPARG together with TCF7L2 and KCNJ11 predicted transition from impaired fasting glucose levels or impaired glucose tolerance to manifest diabetes, which suggests that a combination of increased obesity and insulin resistance with a deterioration in beta-cell function contribute to the manifestation of diabetes in these subjects.Collectively, our findings emphasize the critical role of inherited defects in beta-cell function for the development of type 2 diabetes.Type 2 DiabetesCommon variants in 11 genes were significantly associated with the risk of future type 2 diabetes in the MPP cohort, including TCF7L2 (odds ratio, 1.30; P = 9.510 13 ), PPARG (odds ratio, 1.20; P = 4.010 4 ), FTO (odds ratio, 1.14; P = 9.210 5 ), KCNJ11 (odds ratio, 1.13; P = 3.610 4 ), NOTCH2 (odds ratio, 1.13; P = 0.02), WFS1 (odds ratio, 1.12; P = 0.001), CDKAL1 (odds ratio, 1.11; P = 0.004), IGF2BP2 (odds ratio, 1.10; P = 0.008), SLC30A8 (odds ratio, 1.10; P = 0.008), JAZF1 (odds ratio, 1.08; P = 0.03), and HHEX (odds ratio, 1.07; P = 0.03) (Table 2).Although these findings could not be fully replicated in the smaller Botnia study, there was little heterogeneity between the studies with respect to the risk conferred by different genotypes.", + "To date, more than 70 genes have been identified as involved in T2DM, primarily by association analysis [34].In addition, via GWAS arrays, more than 100 SNPs have been identified for T2DM [35].From the 50 novel loci associated with T2DM previously identified, more than 40 loci have been associated with T2DM-related traits, including fasting proinsulin, insulin and glucose (Table 1) [36][37][38][39].However, for T2DM-related traits, such as the HOMA index or pancreatic cell function, there are virtually no published data examining the relationship between these traits or the genotype and environment interactions.Clinical investigations of some loci have suggested that the genetic components of T2DM risk act preferentially through cell function [40].Among all 40 loci associated with T2DM-related traits, only transcription factor-7-like 2 (TCF7L2) was shown to clearly contribute to T2DM risk [41].Several studies in white European [42], Indian [43], Japanese [44], Mexican American [45] and West African [46] individuals have shown a strong association between TCF7L2 and T2DM.It is also noteworthy that these populations represent the major racial groups with a high prevalence of T2DM.In all populations, TCF7L2 showed a strong association, with the odds of developing T2DM increased by 30%-50% for each allele inherited.This finding indicates an approximately double odds ratio compared to most other diabetes susceptibility polymorphisms.TCF7L2 is a transcription factor involved in the Wnt signaling pathway that is ubiquitously expressed, and it has been observed that TCF7L2 risk alleles result in the overexpression of TCF7L2 in pancreatic cells.This overexpression causes reduced nutrient-induced insulin secretion, which results in a direct predisposition to T2DM as well as an indirect predisposition via an increase in hepatic glucose production [47].Most Relevant T2DM Susceptibility GenesGene and environment interaction studies have shown a nice association between variants in peroxisome proliferator-activated receptor gamma (PPARG), TCF7L2 and fat mass and obesity-associated protein (FTO) genes, a Western dietary pattern and T2DM.", + "One of these genes associated with type 2 diabetes is the insulin receptor substrate 1 (IRS1, OMIM association number, 147545) (Alharbi, Khan, Abotalib, & Al-Hakeem, 2014;Alharbi, Khan, Munshi et al., 2014;Brender et al., 2013;Brunetti, Chiefari, & Foti, 2014) and another is the C-C motif chemokine receptor5(CCR5, OMIM association number, 601373) (Balistreri et al., 2007;Mokubo et al., 2006;Muntinghe et al., 2009).", + "Genes boosted in type 2 diabetesBefore the Wellcome Trust study, PPARG, KCNJ11, and TCF7L2 had all been identified as genes involved in type 2 diabetes through genome-wide association studies and replicated in follow-up studies (for review, see Bonnefond et al. 2010).The strongest candidate gene for type 2 diabetes, TCF7L2, was also the strongest signal seen in the Wellcome trust study, although the others were not so strong.However, the exact mechanism by which TCF7L2 acts was not entirely clear.In our analysis (Fig. 5), we find it directly connected to the b-catenin/WNT signaling pathway by its functional connection to CTNNB1, as well as to BACH2, a gene that has been repeatedly implicated in type 1 diabetes (e.g., Cooper et al. 2008;Madu et al. 2009), but which has not yet been linked to type 2 diabetes.BACH2 is among the genes most strongly boosted by network linkages, deriving additional signal from CREB5 and PARD3B, which both score highly in the GWAS data.PARD6G, PARD3B, and CDC42 are also emphasized by the method.Notably, these genes form a complex with PRKCZ (Koh et al. 2008), a variant of which correlates with type 2 diabetes in Han Chinese (Qin et al. 2008).EBF1, a known regulator of adipocyte differentiation (Akerblad et al. 2005) is also strongly boosted by the network, supporting a possible role in type 2 diabetes.", + "RESULTS-We confirmed the associations of TCF7L2, SLC30A8, HHEX, CDKAL1, CDKN2A/CDKN2B, IGF2BP2, and FTO with risk for type 2 diabetes, with odds ratios ranging from 1.13 to 1.35 (1.3 10 12 P unadjusted 0.016).In addition, the A allele of rs8050136 at FTO was associated with increased BMI in the control subjects (P unadjusted 0.008).However, we did not observe significant association of any genetic variants with surrogate measures of insulin secretion or insulin sensitivity indexes in a subset of 2,662 control subjects.Compared with subjects carrying zero, one, or two risk alleles, each additional risk allele was associated with 17% increased risk, and there was an up to 3.3-fold increased risk for type 2 diabetes in those carrying eight or more risk alleles.Despite most of the effect sizes being similar between Asians and Europeans in the metaanalyses, the ethnic differences in risk allele frequencies in most of these genes lead to variable attributable risks in these two populations.OBJECTIVE-Recent genome-wide association studies have identified six novel genes for type 2 diabetes and obesity and confirmed TCF7L2 as the major type 2 diabetes gene to date in Europeans.However, the implications of these genes in Asians are unclear.RESEARCH DESIGN AND METHODS-We studied 13 associated single nucleotide polymorphisms from these genes in 3,041 patients with type 2 diabetes and 3,678 control subjects of Asian ancestry from Hong Kong and Korea. RESULTS-We confirmed the associations of TCF7L2, SLC30A8, HHEX, CDKAL1, CDKN2A/CDKN2B, IGF2BP2, and FTO with risk for type 2 diabetes, with odds ratios ranging from 1.13 to 1.35 (1.3 10 12 P unadjusted 0.016).In addition, the A allele of rs8050136 at FTO was associated with increased BMI in the control subjects (P unadjusted 0.008).However, we did not observe significant association of any genetic variants with surrogate measures of insulin secretion or insulin sensitivity indexes in a subset of 2,662 control subjects.Compared with subjects carrying zero, one, or two risk alleles, each additional risk allele was associated with 17% increased risk, and there was an up to 3.3-fold increased risk for type 2 diabetes in those carrying eight or more risk alleles.Despite most of the effect sizes being similar between Asians and Europeans in the metaanalyses, the ethnic differences in risk allele frequencies in most of these genes lead to variable attributable risks in these two populations. CONCLUSIONS-Ourfindings support the important but differential contribution of these genetic variants to type 2 diabetes and obesity in Asians compared with Europeans.Diabetes 57: 2226-2233, 2008T ype 2 diabetes is a major health problem affecting more than 170 million people worldwide.In the next 20 years, Asia will be hit hardest, with the diabetic populations in India and China more than doubling (1).Type 2 diabetes is characterized by the presence of insulin resistance and pancreatic -cell dysfunction, resulting from the interaction of genetic and environmental factors.Until recently, few genes identified through linkage scans or the candidate gene approach have been confirmed to be associated with type 2 diabetes (e.g., PPARG, KCNJ11, CAPN10, and TCF7L2).Under the common variant-common disease hypothesis, several genome-wide association (GWA) studies on type 2 diabetes have been conducted in large-scale case-control samples.Six novel genes (SLC30A8, HHEX, CDKAL1, CDKN2A and CDKN2B, IGF2BP2, and FTO) with modest effect for type 2 diabetes (odds ratio [OR] 1.14 -1.20) had been reproducibly demonstrated in multiple populations of European ancestry.Moreover, TCF7L2 was shown to have the largest effect for type 2 diabetes (1.37) in the European populations to date (2-8).Although many of these genes may be implicated in the insulin production/secretion pathway (TCF7L2, SLC30A8, HHEX, CDKAL1, CDKN2A/B, and IGF2BP2) (6,9 -11), FTO is associated with type 2 diabetes through its regulation of adiposity (8,12,13).Moreover, two adjacent regions near CDKN2A/B are associated with type 2 diabetes and cardiovascular diseases risks, respectively (7,14 -16).Despite the consistent associations among Europeans, the contributions of these genetic variants in other ethnic groups are less clear.Given the differences in environmental factors (e.g., lifestyle), risk factor profiles (body composition and insulin secretion/resistance patterns), and genetic background (linkage disequilibrium pattern and risk allele frequencies) between Europeans and Asians, it is important to understand the role of these genes in Asians.A recent case-control study in 1,728 Japanese subjects revealed nominal association to type 2 diabetes for variants at the SLC30A8, HHEX, CDKAL1, CDKN2B, and FTO genes but not IGF2BP2 (17).In the present large-scale case-control replication study of 6,719 Asians, we aimed to test for the association of six novel genes from GWA studies and TCF7L2, which had the largest effect in Europeans, and their joint effects on type 2 diabetes risk and metabolic traits. RESEARCH DESIGN AND METHODSAll subjects were recruited from Hong Kong and Korea and of Asian ancestry.The subjects in the Hong Kong case-control study were of southern Han Chinese ancestry residing in Hong Kong.Participants for the case cohort consisting of 1,481 subjects with type 2 diabetes were selected from two", + "OBJECTIVE-Common variants in PPARG, KCNJ11, TCF7L2, SLC30A8, HHEX, CDKN2A, IGF2BP2, and CDKAL1 genes have been shown to be associated with type 2 diabetes in European populations by genome-wide association studies.We have studied the association of common variants in these eight genes with type 2 diabetes and related traits in Indians by combining the data from two independent case-control studies.", + "IntroductionMany genes have been evaluated as candidates for T2D susceptibility.However, only variants in the TCF7L2, PPARG, KCNJ11 and HNFA4 genes have been extensively replicated in populations around the world, showing their indisputable association with T2D risk (Zeggini 2007).In the particular case of the HNF4A gene, it has been implicated in maturity-onset diabetes of the young type 1 (MODY 1) (Mitchell and Frayling 2002;Zhu et al. 2003).HNF4A is a member of the nuclear receptor super-family that plays a critical role in embryogenesis and metabolism, by regulating gene expression in pancreatic beta cells, liver and other tissues.The HNF4A gene is localized to chromosome 20q13, a region that has demonstrated evidence for linkage with T2D (Sladek et al. 1990;Ghosh et al. 1999).Several genetic studies, mainly in Caucasian and Asian populations, have provided evidence for the association of the variants in HNF4A with T2D (Ghosh et al. 1999;Silander et al. 2004;Winckler et al. 2005)." + ], + [ + "A wide array of other dietary compounds and environmental triggers have been shown to affect diabetes development in animal models, and for some of these such as omega-3 fatty acids (312), there is limited proof in human patients.", + "Type 2 diabetes is now a pandemic and shows no signs of abatement.In this Seminar we review the pathophysiology of this disorder, with particular attention to epidemiology, genetics, epigenetics, and molecular cell biology.Evidence is emerging that a substantial part of diabetes susceptibility is acquired early in life, probably owing to fetal or neonatal programming via epigenetic phenomena.Maternal and early childhood health might, therefore, be crucial to the development of eff ective prevention strategies.Diabetes develops because of inadequate islet -cell and adipose-tissue responses to chronic fuel excess, which results in so-called nutrient spillover, insulin resistance, and metabolic stress.The latter damages multiple organs.Insulin resistance, while forcing cells to work harder, might also have an important defensive role against nutrient-related toxic eff ects in tissues such as the heart.Reversal of overnutrition, healing of the cells, and lessening of adipose tissue defects should be treatment priorities.Type 2 diabetes is now a pandemic and shows no signs of abatement.In this Seminar we review the pathophysiology of this disorder, with particular attention to epidemiology, genetics, epigenetics, and molecular cell biology.Evidence is emerging that a substantial part of diabetes susceptibility is acquired early in life, probably owing to fetal or neonatal programming via epigenetic phenomena.Maternal and early childhood health might, therefore, be crucial to the development of eff ective prevention strategies.Diabetes develops because of inadequate islet -cell and adipose-tissue responses to chronic fuel excess, which results in so-called nutrient spillover, insulin resistance, and metabolic stress.The latter damages multiple organs.Insulin resistance, while forcing cells to work harder, might also have an important defensive role against nutrient-related toxic eff ects in tissues such as the heart.Reversal of overnutrition, healing of the cells, and lessening of adipose tissue defects should be treatment priorities.", + "Type 2 diabetes (T2D) is a result of complex gene-environment interactions, and several risk factors have been identified, including age, family history, diet, sedentary lifestyle and obesity.Statistical models that combine known risk factors for T2D can partly identify individuals at high risk of developing the disease.However, these studies have so far indicated that human genetics contributes little to the models, whereas socio-demographic and environmental factors have greater influence 1 .Recent evidence suggests the importance of the gut microbiota as an environmental factor, and an altered gut microbiota has been linked to metabolic diseases including obesity 2,3 , diabetes 4 and cardiovascular disease 5 .", + "The prevalence of diabetes mellitus worldwide is sobering; the International Diabetes Federation estimates that 415 million people have diabetes mellitus, with 90% of these individuals having type 2 diabetes mellitus (T2DM) 1 .T2DM occurs when pancreatic -cells fail to release enough insulin to meet the demands of insulin-responsive tissues, which safely store and metabolize glucose.Driven by both genetic and environmental risk factors, T2DM is a complex, multifactorial disorder.Although the increasing prevalence of T2DM is driven by changes in our environment and mirrors the increase in obesity, the greater concordance for T2DM found in monozygotic compared with dizygotic twins has long provided evidence for a genetic component in T2DM risk 2 .", + "DietExcessive caloric intake is a major driving force behind escalating obesity and type 2 diabetes epidemics worldwide, but diet quality also has independent effects.In the Nurses' Health Study (NHS), we found that the quality of fats and carbohydrates play an important role in the development of diabetes, independent of BMI and other risk factors (11).In particular, higher dietary glycemic load (GL) and trans fat are associated with increased diabetes risk, whereas greater consumption of cereal fiber and polyunsaturated fat is associated with decreased risk (Fig. 2).In a meta-analysis, we found that a 2 serving/day increment in whole-grain intake was associated with a 21% lower risk of diabetes (12).", + "IntroductionThe aetiology of type 2 diabetes is poorly defined: several studies indicate that the disease results from a combination of genetic susceptibility and external risk factors [1].According to this multifactorial model, genetically predisposed subjects will not necessarily develop overt disease unless they are also exposed to particular environmental factors [2].Important risk factors for the development of type 2 diabetes include a family history of diabetes, increased age, hypertension, lack of physical exercise, and obesity [1].", + "BackgroundNearly 350 million people world-wide are currently affected by diabetes, and the number of people with type 2 diabetes mellitus is increasing at an alarming rate [1].Type 2 diabetes results when the -cells of the pancreas are no longer capable of producing sufficient insulin to meet the body's demands.Thus -cell dysfunction is a key component of type 2 diabetes pathology.Although the increased prevalence of obesity and resulting insulin resistance is contributing to the increased prevalence of type 2 diabetes, many obese individuals are insulin resistant but do not develop diabetes [2].Genetic factors, many of which have been proposed to affect -cell function, play an important role in determining an individual's risk within this context [3][4][5][6].In a small number of individuals, type 2 diabetes is caused by rare single gene mutations, but for most individuals type 2 diabetes results from the combined effects of many common single-nucleotide polymorphisms (SNPs), each of which have a small effect on risk and likely interact with each other and with environmental and lifestyle factors [7].", + "Type I Diabetes is a disease that results when cells, such as fat and muscle cells, do not properly take up sugar from the blood.There are many symptoms of diabetes; however, one common symptom is a large increase of glucose levels in the blood, called hyperglycemia, because glucose cannot enter the other cells of the body.Hyperglycemia can cause blurred vision and can make one feel extremely hungry and very tired.In extreme cases it can 10 21 cause loss of consciousness.Type I diabetes is a genetic disease.", + "What these predisposing factors share is an ability to negatively impact the glucose homeostasis system through worsening of insulin resistance or to impair b-cell function.Superimposing these factors onto a genetically compromised glucose homeostasis system raises the risk of progressing to hyperglycemia.It is the rapid emergence of these disadvantageous environmental factors that is causing the worldwide diabetes epidemic.This concept of environmental changes promoting diabetes was highlighted many years ago by populations that rarely experienced type 2 diabetes, but then moved from a nomadic or farm existence to urban environments followed by an explosion of diabetes, typically with profound obesity: Pima Indians in the Southwest U.S., Saharan nomadic tribes, Australian Aborigines, and many others.Particularly dramatic were studies that showed reversal of the diabetes when they returned to their prior way of life (15).A recent example of this is the rapidly rising incidence of type 2 diabetes in China and India as people move from the country to cities-there is a 0.1-0.2%incidence of diabetes for rural farmers in China as opposed to well more than 5% for city dwellers.Perhaps the scariest example of this is children in the U.S. where the obesity statistics worsen yearly.As many as 20% of U.S. children are now obese, and they are developing all of the elements of the metabolic syndrome-insulin resistance, hypertension, hyperlipidemia, and glucose intolerance (16).", + "BackgroundType 2 diabetes (T2D) is a common, chronic disease caused by both genetic and environmental risk factors and their interactions [1], which has significantly increased prevalence in the past 20 years [2] and disproportionately afflicts communities of color [3][4][5].The current screening of T2D focuses on individuals with demographic and clinical risk factors, including overweight or obesity, age >35 years, and a family history of diabetes [6].However, despite preventative strategies and public health efforts to improve nutrition and physical activity, facilitate access to care, and limit tobacco and alcohol use, the morbidity and mortality associated with T2D remain unaltered [5], likely because most interventions are adopted too late in the course of disease trajectory.", + "BackgroundType 2 diabetes is a cause of poor health and early death that is spreading worldwide and exerting a fearsome human and economic toll [1,2].Prevention and control of diabetes requires a better understanding of its basic molecular causes.Type 2 diabetes is a heterogeneous disease arising from physiological dysfunction in the pancreas, skeletal muscle, liver, adipose and vascular tissue.Much of the heterogeneity of type 2 diabetes has a genetic basis.A full picture of the complex genetic architecture of diabetes has been elusive [3][4][5][6][7].", + "RACIALIZED ETIOLOGIES OF DIABETESDiabetes is not one disease but many.More than 90 percent of all diabetics have type 2 diabetes, which is characterized by elevated blood glucose triggered by a combination of poor insulin production, insulin resistance in skeletal muscle and lipid tissue, or both.Type 2 diabetes is also known as Non-Insulin-Dependent Diabetes because, unlike the rarer form of the disease, people with type 2 diabetes produce insulin and therefore seldom need therapeutic insulin at the initial onset of disease.Type 2 diabetes (hereafter, \"diabetes\"), like heart disease, hypertension and asthma, is referred to as a complex disease because its putative determinants lay in both environmental and biological domains.That is, diabetes is caused by a still-unknown combination of factors that include lifestyle, diet, physical activity, and an array of physiological triggers.", + "IntroductionType 2 diabetes (T2D) affects at least 6% of the world's population; the worldwide prevalence is expected to double by 2025 [1].T2D is a complex disorder that is characterized by hyperglycemia, which results from impaired pancreatic b cell function, decreased insulin action at target tissues, and increased glucose output by the liver [2].Both genetic and environmental factors contribute to the pathogenesis of T2D.The disease is considered to be a polygenic disorder in which each genetic variant confers a partial and additive effect.Only 5%-10% of T2D cases are due to single gene defects; these include maturity-onset diabetes of the young (MODY), insulin resistance syndromes, mitochondrial diabetes, and neonatal diabetes [3][4][5].Inherited variations have been identified from studies of monogenic diabetes, and have provided insights into b cell physiology, insulin release, and the action of insulin on target cells [6].", + "The worldwide explosion of the rates of diabetes and other metabolic diseases in the last few decades cannot be fully explained only by changes in the prevalence of classical lifestyle-related risk factors, such as physical inactivity and poor diet.For this reason, it has been recently proposed that other \"nontraditional\" risk factors could contribute to the diabetes epidemics.In particular, an increasing number of reports indicate that chronic exposure to and accumulation of a low concentration of environmental pollutants (especially the so-called persistent organic pollutants (POPs)) within the body might be associated with diabetogenesis.In this review, the epidemiological evidence suggesting a relationship between dioxin and other POPs exposure and diabetes incidence will be summarized, and some recent developments on the possible underlying mechanisms, with particular reference to dioxin, will be presented and discussed.", + "| DISCUSSIONThe rapidly increasing number of diabetic patients becomes a global burden especially for health sector in low-and middleincome countries including Bangladesh (Bleich, Koehlmoos, Rashid, Peters, & Anderson, 2011).Many reasons such as obesity, lack of physical activity, food habit, sedentary job nature and genetic makeup are factors accounting for developing diabetes (Lyssenko & Laakso, 2013;Vilchis-Gil, Galvn-Portillo, Klnder-Klnder, Cruz, & Flores-Huerta, 2015).Another cause is stress, which plays important role in the etiology of T2DM (S.J. Kelly & Ismail, 2015;Pouwer et al., 2010).Type 2 diabetic patients not only have to cope with this chronic disease, they are also at increased risk for several diseases like coronary heart disease, peripheral vascular disease, retinopathy, nephropathy, and neuropathy (Pouwer et al., 2010).To fight and control T2DM we have to seek out an alternative way of diagnosis and treatment based on patient's genetic information.This requires a deep insight into the etiology of this disease including associated single nucleotide polymorphism (SNP).", + "BackgroundType 2 Diabetes (T2D) is a complex metabolic disease that affects 25.8 million Americans in 2011, according to statistics reported by Centers for Disease Control and Prevention (CDC).T2D occurs when the body develops resistance to insulin due to the malfunction of insulin producing -cells.The developmental process of T2D involves a complex interplay between genetic and environmental factors.However, it is not clear how the underlying genetic defects give rise to T2D pathogenesis over time.Recent T2D genetic study results, particularly those from genome-wide association studies (GWAS), have yielded insights to the molecular mechanisms and underlying genetic risk factors of T2D [1].Among the many risk genes identified are: transcription factor 7-like 2 (TCF7L2) [2][3][4], peroxisome proliferator-activated receptor gamma (PPARG) [5][6][7], and potassium inwardlyrectifying channel, subfamily J, member 11 (KCNJ11) [5,6].", + "Aetiological factorsProspective studies suggest that the main pathophysiological defects leading to type 2 diabetes are insulin resistance and a relative insulin secretory defect.The main aetiological risk factors are age, obesity, family history, and physical inactivity.Dietary risk factors have recently emerged: risk is increased by high consumption of red and processed meat 13 and sugar-sweetened beverages, 14 and reduced by intake of fruit and vegetables, 15 some types of dairy products, 16 and some overall dietary patterns. 17Novel strategies to use quantifiable nutritional biomarkers are paving the way for more detailed understanding of the association between diet and diabetes.Although the heritability of type 2 diabetes is high (30e70%) and more than 60 genetic variants related with diabetes risk have now been identified, 18 even when combined into a genetic score, known genes contribute little to the prediction of diabetes.Phenotype-based risk models provide greater discrimination for diabetes, and the addition of genotypic information adds no more than 5e10% improvement in prediction.The current conclusion is that genetic variants provide insights into biological pathways and pathogenesis of diabetes, but not its prediction.It is likely that interactions between the environment/lifestyle and genetic factors provide the explanation for the risk of type 2 diabetes, but demonstrating such interaction is challenging.Encouraging research findings have recently shown higher absolute risk of diabetes associated with obesity at any level of genetic risk. 19evention and screening", + "IntroductionType 2 diabetes (T2D) is caused by the inability to regulate glucose levels as a result of insufficient insulin production or the incapability of the body to use bioavailable insulin (Asif 2014;Wong and Tabet 2015).Because of its complications such as retinopathy, nephropathy and heart disease, T2D is a critical disorder threatening adult health and life in humans.The number of T2D patients has steadily increased in recent decades and will continue to increase in the future, and is projected to reach 592 million patients worldwide by 2035 (Guariguata et al. 2014).In particular, Asian countries account for more than 60% of the world's T2D patients, and the T2D population is growing rapidly (Ramachandran et al. 2012).", + "IntroductionDiabetes mellitus, also known as simply diabetes, is the most prevalent disease in Westernized, developed countries, and the prevalence of this disease increases with age, accounting for 8.4% of all deaths worldwide [1].Diabetes is a well-recognized multifactorial endocrine metabolic disorder characterized by hyperglycemia (high blood sugar levels over a prolonged period) triggered by insulin secretion deficiencies, insulin action or both [2].The chronic hyperglycemia of diabetes is associated with dysfunction, long-term damage and failure of different organs, particularly the kidneys, heart, blood vessels, nerves and eyes.The development of diabetes involves various pathogenic processes including autoimmune destruction of the pancreatic -cells with subsequent insulin insufficiency which causes insulin resistance [3].The reason for the carbohydrate, fat and protein metabolism disorders in diabetes is insulin deficient activity on target tissues.Insulin deficient action results from insufficient insulin secretion and/or diminished tissue response [4].The great majority of diabetes cases fall into two broad categories of etiopathogenetics.Type 1 diabetes (T1D), falls in one category, is caused due to an absolute deficiency in insulin secretion from pancreatic beta cells.Genetic marker tests and serological evidences of an autoimmune pathological process in pancreatic islets can often be utilized for identification of individuals with increased risk of developing T1D [5].The more prevalent form of diabetes is type 2 diabetes mellitus (T2DM), which falls in the second category and is caused by a combination of insulin resistance and an inadequate compensatory insulin secretory response [6].Consequently, a degree of hyperglycemia occurs that might cause pathological and functional changes in different target tissues but without clinical symptoms and the condition may persist for a long time before T2DM is detected.There are other specific types of diabetes, such as exocrine pancreatic diseases, endocrinopathies, diabetes induced by drugs or chemicals, infection, uncommon forms of immune-mediated diabetes, other genetic syndromes, such as Down syndrome chromosomal abnormalities, Klinefelter syndrome, and sometimes diabetes-related Turner syndrome.Depending on the severity of the underlying disease, the degree of hyperglycemia can change over time [7]." + ] + ], + "task_id": [ + "9309F248E5933718BFB625E4EF2D3E42", + "72FBC4F382B6502EAF41BD6682E63A2D", + "02C953165B9CA94E273DD4A04301C89F", + "2272C482CC247E746D15C9F55EDD8BCE", + "2AE18C9AAFB4E3A103F03C86BBEB2DD1" + ] +}
\ No newline at end of file diff --git a/gnqa/data/study1/datasets/human/dataset_citizenscientist_diabetes_3.json b/gnqa/data/study1/datasets/human/dataset_citizenscientist_diabetes_3.json new file mode 100644 index 0000000..35f4f6f --- /dev/null +++ b/gnqa/data/study1/datasets/human/dataset_citizenscientist_diabetes_3.json @@ -0,0 +1,104 @@ +{ + "question": [ + "Does cycling reduce risk of diabetes?", + "How can GeneNetwork assist in identifying genetic factors involved in diabetes?", + "What specific tools within GeneNetwork are most useful for diabetes research, and how are they applied?", + "What role does insulin play in the regulation of blood glucose levels?", + " How does aging affect the risk of developing type 2 diabetes?" + ], + "answer": [ + "Yes, the text mentions that in the Atherosclerosis Risk in Communities (ARIC) study, the highest quartile of leisure activity, which primarily included cycling and walking, had a 34% lower odds of developing diabetes over 6 years compared to the least active.", + "GeneNetwork can assist in identifying genetic factors involved in diabetes by integrating comprehensive genetic information from Genome-wide association studies (GWAS) and Single Nucleotide Polymorphisms (SNP's) related to Type II Diabetes (T2D). It constructs a T2D-specific molecular interaction network consisting of T2D genetic risk genes and their interacting gene partners. This network can help identify highly interconnected network", + "GeneNetwork provides a powerful statistical platform for online network analyses and mapping. It allows researchers to model causal networks that link DNA differences to traits such as differences in expression, cell number, volumes, and behavior. This can be particularly useful in diabetes research for identifying genes with disease relevance and exploring their functional connections. Tools like DAVID and GARNET can be used within GeneNetwork to search for enriched KEGG pathways and identify enriched Gene Ontology categories. Additionally, GeneNetwork allows for the construction of protein-protein interaction networks, which can be useful in understanding the complex interactions involved in diabetes.", + "Insulin plays a crucial role in the regulation of blood glucose levels. It is released by pancreatic -cells in response to increased glucose levels. Insulin facilitates the uptake of glucose, amino acids, and fatty acids by insulin-sensitive tissues, such as muscle and adipose tissue. It also suppresses the production of glucose in the liver. If insulin resistance is present, cells increase insulin output to maintain normal glucose tolerance. However, if cells are incapable of this task, plasma concentrations of glucose increase.", + "Aging significantly increases the risk of developing type 2 diabetes. This is due to factors such as progressive insulin resistance, inactivity, and weight gain that often accompany advancing age. The risk of onset of type 2 diabetes increases exponentially with age, particularly after the age of 40. More than one in four Americans over the age of 65 years have diabetes. Aging is also associated with a redistribution of fat mass, which can lead to insulin resistance, a key factor in the development of type 2 diabetes." + ], + "contexts": [ + [ + "Physical ActivityIn the Da Qing Impaired Glucose Tolerance and Diabetes Study, incident diabetes decreased by 46% in the exercise group [103].In the nonrandomized Malm Feasibility Study in 260 middle-aged men with impaired glucose tolerance, the incidence of diabetes was 50% lower in the intervention group after 5 years [104].In the Finnish Diabetes Prevention Study, subjects with a change in moderate-to-vigorous leisure-time physical activity (LTPA) in the highest tertile were 49% to 65% less likely to develop diabetes than those in the lowest tertile [105].In the Coronary Artery Risk Development in Young Adults study (CARDIA) with over 15 years of follow-up, there was a significant 17% reduction of risk of incident hypertension for every 300-exercise unit increment in average physical activity [106].In the Atherosclerosis Risk in Communities (ARIC) study, the highest quartile of leisure activity (primarily cycling and walking) had a 34% lower odds of developing hypertension over 6 years compared to the least active [107].Thus, physical activity reduces the risk of developing diabetes and hypertension.The mechanism involves changes in body weight and glucose tolerance, as well as other factors [107].", + "Conclusion:In this cohort of men with diabetes, low fitness level was associated with increased risk of CVD mortality within normal weight, overweight, and class 1 obese weight categories.", + "In aggregate, these findings from the FHS make several important points.First, the incidence rate of diabetes is increasing.Second, because the relative risk of diabetes as a CVD risk factor has remained constant over time, the relative importance of diabetes with respect to CVD has increased.Finally, individuals with diabetes remain inadequately managed with regard to CVD risk factor levels.These findings highlight the importance of early identification of diabetes and a means to identify diabetes early in the life course to promote the early aggressive management of CVD risk factors.Another major remaining question is why the relative risk for diabetes as a CVD risk factor has failed to decrease over time.As described earlier, the rates of CVD among participants in the FHS have decreased; but this reduction has been outpaced by those without diabetes (Fox et al. 2004a).In terms of primary prevention, we can aim to reduce the burden of uncontrolled CVD risk factors, including incompletely treated hypertension, dyslipidemia, and participants with diabetes who continue to smoke (Preis et al. 2009a).Observational studies such as the FHS can help to explore rates of treatment and control for known modifiable risk factors.", + "Physical activityNumerous epidemiologic studies show that increased physical activity reduces risk of diabetes, whereas sedentary behaviors increase risk.In the NHS (26), each 2-h/day increment of time spent watching television (TV) was associated with a 14% increase in diabetes risk.Each 2-h/day increment of standing or walking around at home was associated with a 12% reduction in risk.Each 1-h/day increment of brisk walking was associated with a 34% reduction in risk (Fig. 3).These results indicate a continuum in the relationship between physical activity levels and diabetes risk.Among sedentary behaviors (TV watching, sitting at work, and other sitting), prolonged TV watching was associated with the highest risk.PREVENTABILITY OF TYPESeveral randomized clinical trials have demonstrated that diabetes is preventable.One of the first diabetes prevention trials was conducted in Daqing, China (58).After 6 years of active intervention, risk was reduced by 31, 46, and 42% in the diet-only, exercise-only, and diet-plus-exercise groups, respectively, compared with the control group.In a subsequent 14-year follow-up study, the intervention groups were combined and compared with control subjects to assess how long the benefits of lifestyle change can extend beyond the period of active intervention (59).Compared with control subjects, individuals in the combined lifestyle intervention group had a 51% lower risk of diabetes during the active intervention period, and a 43% lower risk over a 20-year follow-up.DietExcessive caloric intake is a major driving force behind escalating obesity and type 2 diabetes epidemics worldwide, but diet quality also has independent effects.In the Nurses' Health Study (NHS), we found that the quality of fats and carbohydrates play an important role in the development of diabetes, independent of BMI and other risk factors (11).In particular, higher dietary glycemic load (GL) and trans fat are associated with increased diabetes risk, whereas greater consumption of cereal fiber and polyunsaturated fat is associated with decreased risk (Fig. 2).In a meta-analysis, we found that a 2 serving/day increment in whole-grain intake was associated with a 21% lower risk of diabetes (12).", + "Evidence from randomized controlled trailsThe effi cacy of lifestyle changes in obesity and T2DM prevention has been established in numerous randomized controlled trails (RCTs).Several of them may, however, be considered of major importance due to their large sample sizes (i.e., 458-3234 individuals) and long-term duration (i.e., 3-6 years).The Chinese Da Qing diabetes prevention study was the fi rst to investigate the eff ect of 6-year lifestyle change on body weight and diabetes incidence in individuals with impaired glucose tolerance (IGT) ( Pan et al., 1997 ).Pan and co-workers (1997) reported 42 % reduction in diabetes incidence, although no signifi cant diff erence in body weight was present.Similar results were found in the Finnish Diabetes Prevention Study (DPS) and the US Diabetes Prevention Program (DPP).DPS and DPP independently reported reduction in diabetes incidence of 58 % accompanied by significant reduction in body weight (5-7 %) as a result of the lifestyle modifi cation ( Knowler et al., 2002 ;Tuomilehto et al., 2001 ).These fi ndings were also confi rmed in Japanese and Indian populations, reporting 67.4 % and 28.5 % reduction in diabetes incidence, respectively ( Kosaka et 2011) reported signifi cant reduction in body weight and diabetes incidence at 1, as well as, at 3 years during a lifestyle modifi cation program carried out in a primary healthcare setting among subjects with IGT.All large-scale interventions have been successful in preventing T2DM during the active intervention period.Remarkably when the eff ectiveness of the lifestyle modifi cation programs was assessed on the long-term after discontinuation of the intervention, diabetes risk still remained substantially reduced.In the Finnish DPS, for instance, at extended follow-up 3 years after the 4-year intervention period a substantial reduction in body weight and T2DM incidence was still present ( Lindstrom et al., This document was downloaded for personal use only.Unauthorized distribution is strictly prohibited.al., 2002 ;Kosaka et al., 2005 ;Lindstrom et al., 2003 ;Tuomilehto et al., 2001 ).In some studies although no or just minor weight loss was achieved, diabetes incidence was also reduced( Pan etal., 1997 ; Ramachandran et al., 2006 ).In addition, on the long term weight was partially or totally regained in all of the studies ( Knowler et al., 2009 ; Li et al., 2008 ; Lindstrom et al., 2006 ; Lindstrom et al., 2003 ).Despite this regain T2DM risk remained low or decreased further, thus the eff ect of lifestyle is unlikely to be solely due to body weight reduction.In support of this notion Pan et al. (1997) reported comparable decrease in T2DM incidence in the intervention group of Da Qing among overweight and lean individuals.", + "Epidemiological studies examining the associations between lifestyle behaviors and diabetes risk have reached similar conclusions as the clinical trials described above.For example, the 14-year follow-up University of Pennsylvania Alumni Health Study [52] (n = 5,990 men aged 39-68 years) showed PA (leisure time physical activity [LTPA] expressed in kcal expended per week through walking, stair climbing, and sports) was inversely associated with the incidence of T2D.Incidence rates declined as energy expenditure rose from 500 through 3,500 kcal/week.The age-adjusted relative risk ratio (RR) of T2D was reduced by about 6% for each 500 kcal increment increase in PA energy expenditure.Physical Activity and T2D RiskTraining studies show aerobic exercise enhances insulin action [43] and glucose metabolism [44] in healthy individuals and those at high risk of T2D.Exercise often normalizes plasma glucose levels by improving insulin sensitivity and glucose transportation [45].Exercise can also improve endothelial function, reduce inflammation, and beneficially affect the autonomic nervous system [46].Even in the absence of weight loss, exercise can enhance insulin sensitivity [9] and glycemic control [47].These findings are particularly relevant as they show regular exercise can be used effectively as a treatment for preventing T2D from developing in individuals with IFG/IGT and for improving insulin action in people with manifest diabetes.", + "Previous studies of physical activity and risk of diabetes have been predom inantly cross-sectional investigations in high-risk populations.Indirect evidence from descriptive comparisons of NIDDM prevalence in rural vs urban populations in Western Samoa1112 and the South Pa cific12 have supported the hypothesis that higher levels of physical activity may be protective against NIDDM.However, other aspects of urban living, including differences in diet, could have accounted for the variation in diabetes risk.Crosssectional studies among Polynesians,13 Melanesian and Indian Fijians,1415 Mi- cronesians,15 Swedes,16 and Mauritians17 have also proposed an association of physical activity with reduced preva lence of NIDDM.The absence of an as sociation between physical activity and glucose intolerance, however, also has been observed.3334In one retrospective study, a reduced risk of diabetes was observed among women who engaged in regular sports in college compared with those who did not, but obesity was not controlled in the analysis.18To our knowledge, only two previous prospec tive studies of physical activity and in cidence of NIDDM have been reported, both supporting a protective effect of exercise.1920Our results in male physi cians are similar to our earlier findings in female nurses,20 suggesting that gen der does not appreciably modify the re lation between physical activity and NIDDM incidence.Objective.\\p=m-\\Toexamine prospectively the association between regular exercise and the subsequent development of non\\p=m-\\insulin-dependent diabetes mellitus (NIDDM).Design.\\p=m-\\Prospective cohort study including 5 years of follow-up.Participants.\\p=m-\\21 271US male physicians participating in the Physicians' Health Study, aged 40 to 84 years and free of diagnosed diabetes mellitus, myo- cardial infarction, cerebrovascular disease, and cancer at baseline.Morbidity follow-up was 99.7% complete.Main Outcome Measure.\\p=m-\\IncidenceofNIDDM.Results.\\p=m-\\Atbaseline, information was obtained about frequency of vigorous exercise and other risk indicators.During 105141 person-years of follow-up, 285 new cases of NIDDM were reported.The age-adjusted incidence of NIDDM ranged from 369 cases per 100 000 person-years in men who engaged in vigorous exer- cise less than once weekly to 214 cases per 100000 person-years in those exer- cising at least five times per week (P, trend, <.001).Men who exercised at least once per week had an age-adjusted relative risk (RR) of NIDDM of 0.64 (95% Cl, 0.51 to 0.82; P=.0003) compared with those who exercised less frequently.The age-adjusted RR of NIDDM decreased with increasing frequency of exercise: 0.77 for once weekly, 0.62 for two to four times per week, and 0.58 for five or more times per week (P, trend, .0002).A significant reduction in risk of NIDDM persisted after adjustment for both age and body-mass index: RR, 0.71 (95% Cl, 0.56 to 0.91; P=.006) for at least once per week compared with less than once weekly, and P, trend, .009,for increasing frequency of exercise.Further control for smoking, hypertension, and other coronary risk factors did not materially alter these associa- tions.The inverse relation of exercise to risk of NIDDM was particularly pronounced among overweight men.Conclusions.\\p=m-\\Exerciseappears to reduce the development of NIDDM even after adjusting for body-mass index.Increased physical activity may be a promising approach to the primary prevention of NIDDM.", + "Type 2 diabetes can be prevented or delayed by lifestyle modification, including increased physical activity, beneficial dietary changes, and weight reduction (22,44).However, only Model adjusted for age, gender, group, baseline value of moderate-to-vigorous physical activity, and baseline values and changes in body weight and in intakes of energy and energy-adjusted saturated fat and fiber. *The median (range) of each tertile of change in moderate-to-vigorous physical activity is shown.Adjusted interaction between moderate-to-vigorous physical activity (3 groups) and the polymorphism (2 groups) on the risk of developing type 2 diabetes.a few studies have investigated the effects of such lifestyle interventions on insulin sensitivity and insulin secretion in persons with IGT (21,46).On the basis of the 4-yr follow-up study of the DPS with repeated frequently sampled intravenous glucose tolerance test (FSIGT), insulin sensitivity improved along with lifestyle changes, while insulin secretion remained virtually unchanged (46).Most other data also indicate that physical activity, diet, and weight loss primarily increase insulin sensitivity.Insulin resistance and the associated glycemic stress may exhaust -cells and impair their function.Regular physical activity may diminish glycemic stress by improving insulin sensitivity of target tissues (18).While the mechanisms of improved -cell function in response to lifestyle interventions are still largely unknown, several studies suggest that physical activity (5,11), diet (19,26), weight loss (45), or their combination (21) may directly improve the first-phase insulin secretion that is an indicator of the -cell function.GENETIC FACTORS AND LIFESTYLE interact in the development of type 2 diabetes.Physical activity, favorable dietary changes, and weight reduction were essential components of a success-ful lifestyle intervention in two large randomized controlled trials on the prevention of type 2 diabetes in high-risk individuals with impaired glucose tolerance (IGT), including the Finnish Diabetes Prevention Study (DPS) (44) and the Diabetes Prevention Program (DPP) (22).In the DPS, increased physical activity was associated with a decreased risk of type 2 diabetes independently of changes in diet and body weight.The individuals who increased their physical activity most (i.e., were in the upper third of the change) were 66% less likely to develop type 2 diabetes than those in the lower third (24).", + "Aerobic activity, alone or in combination with diet, can reduce systolic blood pressure, reduce total cholesterol, raise HDL cholesterol, and improve endothelial function in overweight patients with young-onset type 2 diabetes. 47owever, any potential benefits to the cardiovascular disease risk profile are lost within 3-6 months after cessation of exercise training, and do not confer protection against later cardiovascular events. 47,121Additionally, reviews 49,121,122 of the limited number of studies done to date have not identified substantial or lasting benefits of doing aerobic exercise on glucose homoeostasis for patients who are obese with young-onset type 2 diabetes, unless accompanied by dietary intervention.", + "Weight change is a complex outcome, as both the degree and pattern of weight change impact health.For example, in the Diabetes Prevention Program (DPP; described in more detail later), both short-and intermediate-term weight loss were associated with reduced diabetes risk and intermediate cardiometabolic risk factor levels, whereas weight cycling (defined as number of 5 lb [2.25 kg] weight cycles) raised diabetes risk, fasting glucose levels, insulin resistance, and systolic blood pressure.Initial (baseline to 1 month) and late (last 6 months of the 2-year intervention period) weight loss had no discernable impact of diabetes risk (26).Similar results have been reported in people with pre-existing diabetes who underwent lifestyle intervention as part of the Look AHEAD (Action for Health in Diabetes) trial (27).These studies point to alternative phenotypes that may be informative for genetics studies of weight loss/ maintenance/regain.", + "Physical activity. Increased physical activity is an essential component of all effective lifestyle-based trials for the prevention of T2DM.Prospective evidence has shown that both aerobic exercise and resistance training independently have beneficial effects on preventing T2DM 64 .One study has shown that spending more time on moderateintensity and vigorous-intensity physical activity is beneficial for preventing insulin resistance, independent of time spent sedentary 65 .By contrast, another study found that time spent sedentary was associated with an increased risk of T2DM, regardless of physical activity 66 .", + "Multiple interventions in adults with T2D have been evaluated for risk reduction and prevention, both in the short and the long term.A recent systematic review (69) reported that after active interventions lasting from 6 months to .6 years, relative risk reduction achieved from lifestyle interventions (39%) was similar to that attained from use of drugs (36%); however, only lifestyle interventions had a sustained reduction in risk once the intervention period had ended.Analysis of the postintervention follow-up period (;7 years) revealed a risk reduction of 28% with lifestyle modification compared with a nonsignificant risk reduction of 5% from drug interventions.", + "Engagement in regular physical activity and increased physical fitness are recommended for the prevention and treatment of diabetes and other pathological conditions 5,18,19 .We recently demonstrated that four months of moderate physical training, besides being beneficial to glycemic control, was also effective in improving the redox homeostasis in diabetic patients, lowering the oxidant species production and/or increasing the endogenous antioxidant defenses 20 .In the present study, we aimed to analyse the effect of regular engagement in moderate physical training on telomere length, spontaneous and H 2 O 2 -induced DNA damage, and apoptosis in purified blood leukocytes derived from untrained and trained T2D subjects, compared to age-matched untrained and trained controls.In addition, we examined whether exercise training affected the transcriptional level of a set of genes involved in DNA repairs systems, cell cycle control, as well as antioxidants and defence systems, by comparing untrained and trained T2D patients." + ], + [ + "IntroductionComplex diseases, such as diabetes and obesity, result from the interaction of genetic and environmental factors [1][2][3].Approximately 170 gene loci have been robustly implicated in diabetes through genome-wide association studies [4].Studies with knockout mouse models have identified hundreds of genes that can act autonomously to regulate insulin levels (MP:0001560) [5].However, it is still elusive to understand the underlying mechanisms of how these loci or genes contribute to diseases.Network modeling methods have been developed based on the premise that complex diseases are often caused by perturbation to a sub-network of genes [1,[6][7][8][9][10][11][12][13][14].We have applied these methods to identify causal genes for diabetes-related traits in multiple experimental mouse crosses [13][14] and human populations [1].These analyses suggest that potentially many thousands of genes, under the right circumstances, can affect metabolic states.", + "Genetic factors are known to play a role in T2D and an understanding of the genetic basis of T2D could lead to the development of new treatments (Frayling, 2007a,b;Frayling & Mccarthy, 2007;Frayling, 2008).With the increased prevalence of diabetes worldwide, the need for intensive research is of high priority.Sequencing of the human genome and development of a set of powerful tools has made it possible to find the genetic contributions to common complex diseases (Donnelly, 2011).Genome-wide association studies (GWAS) have been used to search for genetic risk factors for complex disease (Hindorff, Junkins et al., 2009;Hindorff, Sethupathy et al., 2009).Used in combination with the scaffold data of the human genome courtesy of the HUGO Project (2003) and the International HapMap Project (Thorisson et al., 2005), it is now possible to analyse the whole genome to identify genetic variants that contribute to common disease in a fast and efficient manner.", + "GENE DISCOVERY IN T2DWhy?", + "Candidate g ene a pproachThe molecular screening of candidate genes to search for genetic variants (either rare when the allele frequency is < 0.01, or common in the population tested) potentially associated with diabetes status (i.e. more frequent in individuals with T2DM) has so far been the most frequently used approach to tackle the genetic determinants of T2DM [61] .There are many reasons why specifi c genes may be candidates: A gene may have a known or presumed biologic function in glucose homeostasis or energy balance in humans.", + "Interactions in diabetes <p>An integrative analysis combining genetic interactions and protein interactions can be used to identify candidate genes/proteins for type 1 diabetes and other complex diseases.</p>", + "Received: 7 May 2009 Accepted: 25 February 2010Published: 25 February 2010References1. Sieberts SK, Schadt EE: Moving toward a system genetics view of disease. Mamm Genome 2007, 18:389-401. 2. Keller MP, Choi Y, Wang P, Davis DB, Rabaglia ME, Oler AT, Stapleton DS,Argmann C, Schueler KL, Edwards S, Steinberg HA, Chaibub Neto E,Kleinhanz R, Turner S, Hellerstein MK, Schadt EE, Yandell BS, Kendziorski C,Attie AD: A gene expression network model of type 2 diabetes links cellcycle regulation in islets with diabetes susceptibility. Genome Res 2008,18:706-716. 3.", + "Genome-wide interaction studies have potential to identify gene variants that influence diabetes risk that might not be detected using hypothesis-driven approaches.However, the statistical power limitations of such studies when applying conventional tests of interaction, combined with the challenges of identifying large cohort collections with appropriately characterized environmental, genetic, and phenotypic data, pose challenges that conventional genetic association studies do not face.Several methods have been developed to mitigate these challenges; among the most promising is the joint meta-analysis approach, which is derived from the model with two degrees of freedom popularized by Kraft et al. (45) and developed further by Manning et al. (46).Manning et al. (47) went on to apply the joint meta-analysis approach in a genome-wide study of 52 cohorts in which they tested for SNP main effects and interactions (with BMI) on fasting glucose and insulin levels.The analysis yielded novel experiment-wide association signals for main effects, but none was discovered for interactions.", + "Genome-wide association studies (GWAS) have discovered germline genetic variation associated with type 2 diabetes risk (1)(2)(3)(4).One of the largest GWAS, involving DNA taken from individuals of European descent and conducted by the DIAGRAM (DIAbetes Genetics Replication And Meta-analysis) consortium, identified 65 loci associated with type 2 diabetes risk (1).However, for most of these loci, the precise identity of the affected gene and the molecular mechanisms underpinning the altered risk are not known.", + "Genome wide association studies (GWAS) have transformed the study of heritable factors influencing complex diseases such as type 2 diabetes (T2D), with the current tally of established risk loci approaching 70.Each of these loci has the potential to offer novel insights into the biology of this disease, and opportunities for clinical exploitation.However, the complexity of this condition has often frustrated efforts to achieve these functional and translational advances.This review describes progress made over the past year to expand genome wide association studies, to characterize the mechanisms through which diabetes risk loci operate, and to define the processes involved in diabetes predisposition.", + "Figure5.Consideration of the human gene network boosts recovery of validated type 2 diabetes genes from GWAS analysis of 2000 patients and 3000 controls. (A,B) Plotted using the same conventions as in Figure4, analyzing WTCCC GWAS data (Wellcome Trust Case Control Consortium 2007) for type 2 diabetes alone and in combination with HumanNet and measuring performance as AUC (<5% FPR) for recovering the top 20 genes from a type 2 diabetes meta-analysis of 4549 cases and 5579 controls(Zeggini et al. 2008).As for Crohn's disease, consideration of the network boosts performance across a wide range of parameter values.Notably, consideration of the network strongly implicates the genes CTNNB1 and BACH2 in type 2 diabetes; CTNNB1 is well studied in connection with type 2 diabetes and BACH2 has been previously implicated in type 1 diabetes and celiac disease (e.g.,Cooper et al. 2008;Madu et al. 2009), but not type 2 diabetes.", + "GenomicsDuring the past few decades, candidate gene approach with case-control study design has been most successful in understanding the genetic etiology of any complex disease.The method begins with selection of putative candidate gene based on its functional role in disease related metabolic pathway, followed by prioritizing single nucleotide polymorphisms (SNPs) that have functional consequences either by affecting the gene regulation or its product.Finally, the prioritized SNPs/variants are genotyped in a random sample of cases and controls and tested for their association with the trait.So far, a total of 1874 unique markers that belong to 421 genes were identified as associated with type 2 diabetes through this approach (Lim et al. 2010).However, an overwhelming inconsistency is observed in the patterns of their association with the disease, with exception to the polymorphisms that belong to TCF7L2, CAPN10, PPARG, KCNJ11, ABCC8, HNF1A, HNF4A, GCK, PC-1/ENPPI, IRS, PTPN1, and LMNA genes which showed much greater degree of consistency (Kommoju and Reddy 2011;Ali 2013).Not being satisfied with this approach, researchers shifted the focus to genome wide association studies (GWAS), which is an agnostic method of testing for association of all the SNPs identified in human genome project with a particular disease through chip based microarray technologies such as Illumina and Affymetrics.A large number of cases and controls are screened through this method and the SNPs with strong signal/high significance (pB10 -08 ) are considered to be disease susceptible/causing.Only these SNPs are further evaluated for their functional consequences.Through this approach, numerous polymorphisms have been identified as associated with type 2 diabetes and the SNPs of TCF7L2, HHEX, CDKN2A/2B, IGF2BP2, SLC30A8, CDKAL1, HMGA2, KCNQ11, and NOTCHADAM30 genes being the most replicated ones (www.genome.gov/gwastudies).The search results for type 2 diabetes associated genetic variants yielded 388 significant SNPs from 58 GWAS studies.However, many of these type 2 diabetes associated variants need to be functionally validated before attempting to understand their prospective clinical benefits.The TCF7L2 is the only gene which is hitherto functionally characterized as key transcription factor coding gene and involved in regulating the glucose homeostasis (Savic et al. 2011;Boj et al. 2012).As a key component of WNT signaling pathway, it is involved in pancreatic b-cell proliferation and in turn insulin secretion and action (Gupta et al. 2008).It was initially identified as associated with the disease through a genetic linkage study on the Icelandic population (Grant et al. 2006) and subsequently replicated in Danish (Grant et al. 2006), European (Scott et al. 2006) and US cohorts (Zhang et al. 2006) and currently known to be associated across the ethnic groups worldwide (Kommoju and Reddy 2011).Additionally, a 4kb haplotype block at 9p21.3 chromosomal region was found specific to and associated with type 2 diabetes (Silander et al. 2009).Harboring CDKN2A/CDKN2B genes with functional implications in cell proliferation pathway, this chromosomal region was observed to be associated with multiple complex diseases and needs detailed exploration for its potential as a therapeutic target in general and particularly with type 2 diabetes.However, the variants identified by GWAS were found to explain only 10% of variation in type 2 diabetes and most of those (more than 90%) are located in the non-coding region (Grarup et al. 2014;Scott et al. 2016).The search for rare variants with larger penetrance and functional significance is on through next generation and exome sequencing strategies (Jenkinson et al. 2016).", + "One attractive methodology to circumvent the puzzle of choosing either a hypothesis-driven or an exploratory research may be the strategy of gene prioritization offered by the new bioinformatics tools based on the biological plausibility of a gene-disease association and on knowledge of the protein function. 6e propose an approach for expanding the selection of genes or loci of interest and prioritizing associations over GWAs related with genetic susceptibility to type 2 diabetes.The proposal profits from the recent initiatives of data sharing of the genome scan results that make the information publicly available as soon as they are generated and checked for quality.Both the DGI and the WTCCC are committed to embracing these principles as they made available all the phenotype-genotype data for type 2 diabetes.", + "Background: Many genetic studies, including single gene studies and Genome-wide association studies (GWAS), aim to identify risk alleles for genetic diseases such as Type II Diabetes (T2D).However, in T2D studies, there is a significant amount of the hereditary risk that cannot be simply explained by individual risk genes.There is a need for developing systems biology approaches to integrate comprehensive genetic information and provide new insight on T2D biology.Methods: We performed comprehensive integrative analysis of Single Nucleotide Polymorphisms (SNP's) individually curated from T2D GWAS results and mapped them to T2D candidate risk genes.Using protein-protein interaction data, we constructed a T2D-specific molecular interaction network consisting of T2D genetic risk genes and their interacting gene partners.We then studied the relationship between these T2D genes and curated gene sets.Results: We determined that T2D candidate risk genes are concentrated in certain parts of the genome, specifically in chromosome 20.Using the T2D genetic network, we identified highly-interconnected network \"hub\" genes.By incorporating T2D GWAS results, T2D pathways, and T2D genes' functional category information, we further ranked T2D risk genes, T2D-related pathways, and T2D-related functional categories.We found that highlyinterconnected T2D disease network \"hub\" genes most highly associated to T2D genetic risks to be PI3KR1, ESR1, and ENPP1.The well-characterized TCF7L2, contractor to our expectation, was not among the highest-ranked T2D gene list.Many interacted pathways play a role in T2D genetic risks, which includes insulin signalling pathway, type II diabetes pathway, maturity onset diabetes of the young, adipocytokine signalling pathway, and pathways in cancer.We also observed significant crosstalk among T2D gene subnetworks which include insulin secretion, regulation of insulin secretion, response to peptide hormone stimulus, response to insulin stimulus, peptide secretion, glucose homeostasis, and hormone transport.Overview maps involving T2D genes, gene sets, pathways, and their interactions are all reported.Conclusions: Large-scale systems biology meta-analyses of GWAS results can improve interpretations of genetic variations and genetic risk factors.T2D genetic risks can be attributable to the summative genetic effects of many genes involved in a broad range of signalling pathways and functional networks.The framework developed for T2D studies may serve as a guide for studying other complex diseases.ConclusionsLarge-scale systems biology meta-analyses of GWAS results can improve interpretations of genetic variations and genetic risk factors.In this work, we determined that T2D candidate risk genes are located in higher concentration in certain parts of the genome, specifically in chromosome 20.Using the T2D genetic network, we identified highly interconnected network \"hub\" genes.By incorporat-T2D GWAS results, T2D pathways, and T2D genes' functional category information, we further ranked T2D risk genes, T2D-related pathways, and T2D-related functional categories.Overview maps involving T2D genes, gene sets, pathways, and their interactions are all reported.Moreover, we demonstrate a computational framework built upon disease-specific data integration, Figure 2 T2D risk gene pathway interaction network.Here, an edge will be created between two pathways, if and only if the pathways involved three of more risk genes.Figure 3 T2D risk gene functional category crosstalk network.For this figure an edge will be created between two functional categories for all significant Gene Ontology catagories.To confirm the presence of molecular systems structures that may better explain missing heritability problems for T2D, we adopted a Systems Biology approach to studying T2D genetic risk gene networks as a whole rather than the risk genes individually.Prior to this study, several reports [10,11] examined genes implicated T2D differential expressions in affected tissues.In this study, we used T2Dassociated SNP information curated from the Type 2 Diabetes Genetic Association Database (T2DGADB), which integrated comprehensively reported SNPs, their odds ratios, population description, and all related metadata from various T2D GWAS performed worldwide [12].We further annotated individual SNPs collected from T2DGADB with information from the DbSNP database [13], including information such as nearby genes, Chromosomal location, gene functional class, and base changes.To create a model for T2D genetic risk gene molecular systems structure, we built a gene interaction network seeded by T2D risk genes collected from T2DGADB and expanded with high-confidence protein interaction data collected from the Human Annotated and Predicted Protein Interaction database (HAPPI) [14].We also ranked risk genes in the network according to these high confidence interactions.Results: We determined that T2D candidate risk genes are concentrated in certain parts of the genome, specifically in chromosome 20.Using the T2D genetic network, we identified highly-interconnected network \"hub\" genes.By incorporating T2D GWAS results, T2D pathways, and T2D genes' functional category information, we further ranked T2D risk genes, T2D-related pathways, and T2D-related functional categories.We found that highlyinterconnected T2D disease network \"hub\" genes most highly associated to T2D genetic risks to be PI3KR1, ESR1, and ENPP1.The well-characterized TCF7L2, contractor to our expectation, was not among the highest-ranked T2D gene list.Many interacted pathways play a role in T2D genetic risks, which includes insulin signalling pathway, type II diabetes pathway, maturity onset diabetes of the young, adipocytokine signalling pathway, and pathways in cancer.We also observed significant crosstalk among T2D gene subnetworks which include insulin secretion, regulation of insulin secretion, response to peptide hormone stimulus, response to insulin stimulus, peptide secretion, glucose homeostasis, and hormone transport.Overview maps involving T2D genes, gene sets, pathways, and their interactions are all reported.Conclusions: Large-scale systems biology meta-analyses of GWAS results can improve interpretations of genetic variations and genetic risk factors.T2D genetic risks can be attributable to the summative genetic effects of many genes involved in a broad range of signalling pathways and functional networks.The framework developed for T2D studies may serve as a guide for studying other complex diseases.", + "Genetic factors appear to play a role in determining an individual's risk of developing diabetes.It is hoped that genetic studies will ultimately identify key genetic elements that help determine susceptibility to diabetes, disease progression, and responsiveness to specific therapies, as well as help identify novel targets for future intervention.A substantial number of genetic loci, gene polymorphisms, and mutations have already been reported as having variable degrees of association with one or other type of diabetes (type 1, type 2, maturity onset diabetes of the young [MODY]), while others appear to be involved in response to antihyperglycemic agents.We have compiled the following glossary of genetic and genomic terms relating to diabetes, which we hope will prove a useful reference to researchers and clinicians with an interest in this disease.This is by no means an exhaustive list, but includes many of the genetic loci and variants that have been studied in association with diabetes.Gene encoding insulin-like growth factor 2 mRNA binding protein 2 (also known as IMP-2).SNPs in the gene have been associated with type 2 diabetes IFIH1", + "To gain insights into how the linking nodes of our final network contribute to T2D biology, we used the DisGeNET database [37], which collates gene-disease information from public data as well as from literature via natural language processing tools.We focused on the 274 linking nodes included in our model to avoid circularity arising from using the seeds, and identified 92 (~33%) with known links to T2D (Additional file 1: Table S2).Examples include as follows: (a) NEUROD1 which encodes a transcription factor that is involved in the development of the endocrine cell lineage and has been implicated in monogenic diabetes [38], (b) PRKCB involved in insulin resistance [39] and (c) GNAS, implicated in beta-cell proliferation [40].For this last gene, mouse knockouts have been shown to produce phenotypes concordant with diabetes [41].These examples demonstrate the potential of these analyses to draw in \"linking\" nodes as related to T2D even when they are not located within genome-wide association signals.Background: Genome-wide association studies (GWAS) have identified several hundred susceptibility loci for type 2 diabetes (T2D).One critical, but unresolved, issue concerns the extent to which the mechanisms through which these diverse signals influencing T2D predisposition converge on a limited set of biological processes.However, the causal variants identified by GWAS mostly fall into a non-coding sequence, complicating the task of defining the effector transcripts through which they operate.Methods: Here, we describe implementation of an analytical pipeline to address this question.First, we integrate multiple sources of genetic, genomic and biological data to assign positional candidacy scores to the genes that map to T2D GWAS signals.Second, we introduce genes with high scores as seeds within a network optimization algorithm (the asymmetric prize-collecting Steiner tree approach) which uses external, experimentally confirmed protein-protein interaction (PPI) data to generate high-confidence sub-networks.Third, we use GWAS data to test the T2D association enrichment of the \"non-seed\" proteins introduced into the network, as a measure of the overall functional connectivity of the network. Results:We find (a) non-seed proteins in the T2D protein-interaction network so generated (comprising 705 nodes) are enriched for association to T2D (p = 0.0014) but not control traits, (b) stronger T2D-enrichment for islets than other tissues when we use RNA expression data to generate tissue-specific PPI networks and (c) enhanced enrichment (p = 3.9 10 5 ) when we combine the analysis of the islet-specific PPI network with a focus on the subset of T2D GWAS loci which act through defective insulin secretion.Conclusions: These analyses reveal a pattern of non-random functional connectivity between candidate causal genes at T2D GWAS loci and highlight the products of genes including YWHAG, SMAD4 or CDK2 as potential contributors to T2D-relevant islet dysfunction.The approach we describe can be applied to other complex genetic and genomic datasets, facilitating integration of diverse data types into disease-associated networks.Background: Genome-wide association studies (GWAS) have identified several hundred susceptibility loci for type 2 diabetes (T2D).One critical, but unresolved, issue concerns the extent to which the mechanisms through which these diverse signals influencing T2D predisposition converge on a limited set of biological processes.However, the causal variants identified by GWAS mostly fall into a non-coding sequence, complicating the task of defining the effector transcripts through which they operate.Methods: Here, we describe implementation of an analytical pipeline to address this question.First, we integrate multiple sources of genetic, genomic and biological data to assign positional candidacy scores to the genes that map to T2D GWAS signals.Second, we introduce genes with high scores as seeds within a network optimization algorithm (the asymmetric prize-collecting Steiner tree approach) which uses external, experimentally confirmed protein-protein interaction (PPI) data to generate high-confidence sub-networks.Third, we use GWAS data to test the T2D association enrichment of the \"non-seed\" proteins introduced into the network, as a measure of the overall functional connectivity of the network." + ], + [ + "Data generated by these experiments are iteratively subjected to novelinformatics approaches, network analysis, and modeling to find important regulatory nodes, discover the emergent property of the system,and predict the systems behavior under various conditions. GEO, Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/geo/); BIND,Biomolecular Interaction Network Database (http://www.unleashedinformatics.com/index.php?pg=products&refer=bind). GENETICSTHE TIDE HAS TURNEDTO RIGOROUS PHENOTYPINGThe classical forward genetic screen has been thesingle most powerful tool to conclusively identifycritical components of the circadian oscillator, and itscontribution in advancing the field of chronobiology cannot be overstated.", + "This approach requires the accumulation and integration of many types of data,and also requires the use of many types of statistical tools to extract relevant patterns ofcovariation and causal relations as a function of genetics, environment, stage, and treatment. Inthis protocol we explain how to use the GeneNetwork web service, a powerful and free onlineresource for systems genetics. We provide workflows and methods to navigate massive multiscalardata sets and we explain how to use an extensive systems genetics toolkit for analysis andsynthesis.GeneNetwork: A Toolbox for Systems GeneticsMegan K. Mulligan1, Khyobeni Mozhui2, Pjotr Prins1,2, Robert W. Williams11.Departmentof Genetics, Genomics, and Informatics, University of Tennessee Health ScienceCenter, Memphis, USA2.PreventiveMedicine, University of Tennessee Health Science Center, Memphis, USAAuthor ManuscriptAbstractThe goal of systems genetics is to understand the impact of genetic variation across all levels ofbiological organization, from mRNAs, proteins, and metabolites, to higher-order physiological andbehavioral traits.", + "GeneNetwork is one ofeither generate or test ideas by reusing data that oftenan interlinked trio of sites built up by NIAAA (GeneWeaverhave been rescued from the classic literature. Below is a short list of both well-known and more esoteric and WebGestalt are the other two) to house extensiveresources, many of which have been supported by NIAAA, data for human, monkey, rat, mouse, and fruit fly.", + "In the second part of this work the computed T2DM gene set has been used to identify biological networks on different layers of cellular information such as signaling and metabolic pathways, a comprehensive gene regulatory network and protein-protein interactions.Background: Multiple functional genomics data for complex human diseases have been published and made available by researchers worldwide.The main goal of these studies is the detailed analysis of a particular aspect of the disease.Complementary, meta-analysis approaches try to extract supersets of disease genes and interaction networks by integrating and combining these individual studies using statistical approaches.Results: Here we report on a meta-analysis approach that integrates data of heterogeneous origin in the domain of type-2 diabetes mellitus (T2DM).Different data sources such as DNA microarrays and, complementing, qualitative data covering several human and mouse tissues are integrated and analyzed with a Bootstrap scoring approach in order to extract disease relevance of the genes.The purpose of the meta-analysis is two-fold: on the one hand it identifies a group of genes with overall disease relevance indicating common, tissue-independent processes related to the disease; on the other hand it identifies genes showing specific alterations with respect to a single study.Using a random sampling approach we computed a core set of 213 T2DM genes across multiple tissues in human and mouse, including well-known genes such as Pdk4, Adipoq, Scd, Pik3r1, Socs2 that monitor important hallmarks of T2DM, for example the strong relationship between obesity and insulin resistance, as well as a large fraction ( 128) of yet barely characterized novel candidate genes.Furthermore, we explored functional information and identified cellular networks associated with this core set of genes such as pathway information, protein-protein interactions and gene regulatory networks.Additionally, we set up a web interface in order to allow users to screen T2DM relevance for any -yet non-associated -gene. Conclusion:In our paper we have identified a core set of 213 T2DM candidate genes by a metaanalysis of existing data sources.We have explored the relation of these genes to disease relevant information and -using enrichment analysis -we have identified biological networks on different layers of cellular information such as signaling and metabolic pathways, gene regulatory networks and protein-protein interactions.The web interface is accessible via http://t2dmgeneminer.molgen.mpg.de.", + "We decided to pursue the first hypothesis and adapted a systems biology perspective.Rather than looking for significant aberrations in expression of individual insulin-signaling genes, we looked for significant aberrations in the collective expression of a set of insulin-signaling genes whose protein products form a connected protein-protein interaction network.This was accomplished using a simple methodology referred to as gene network enrichment analysis (GNEA).", + "Exploring genes, molecules, and phenotypes is easily accomplished using GeneNetwork. In thismanuscript we will outline some simple use cases, and show how a small number of plausiblecandidate genes can be identified for an immune phenotype. 1. DataOnce you have navigated to genenetwork.org, there are two ways to search for data in GN. Thefirst is to use the global search bar located at the top of the page (Figure 1). This is a newfeature in GN that allows researchers to search for genes, mRNAs, or proteins across all of thedatasets.Recent improvements toGeneNetwork have reinvigorated it, including the addition of data from 10 species, multi-omicsanalysis, updated code, and new tools. The new GeneNetwork is now an exciting resource forpredictive medicine and systems genetics, which is constantly being maintained and improved. Here, we give a brief overview of the process for carrying out some of the most commonfunctions on GeneNetwork, as a gateway to deeper analyses, demonstrating how a smallnumber of plausible candidate genes can be found for a typical immune phenotype.", + "This approach requires the accumulation and integration of many types of data,and also requires the use of many types of statistical tools to extract relevant patterns ofcovariation and causal relations as a function of genetics, environment, stage, and treatment. Inthis protocol we explain how to use the GeneNetwork web service, a powerful and free onlineresource for systems genetics. We provide workflows and methods to navigate massive multiscalardata sets and we explain how to use an extensive systems genetics toolkit for analysis andsynthesis.", + "Readersmay refer [42] for a comprehensive review on various availablesoftware tools. GeneNetWeaver (GNW) [43] is a Java-based reverse engineering tool for generating synthetic benchmark expression datasetsfrom gold standard DREAM challenge network. E. coli and Yeasttranscriptional regulatory networks are integrated as test case forbenchmark. Comparative assessment of inference algorithmsagainst DREAM challenge data can also be performed with thehelp GNW. Cytoscape [44] is a powerful tool most suitable forlarge-scale network analysis.", + "Researchers, however, have thepossibility to fully explore the results by altering the thresholds on the open web resource. Although onlyprotein-coding genes were included in our analysis, the same approach can be applied to non-coding genes63to reveal their potential functions. Similarly, GeneBridge can also be utilized to identify novel gene-diseaseassociations based on known disease-associated genes from databases, such as the Human DiseaseOntology (DO) [207] or DisGeNET [208]. The GeneBridge toolkit could also be applied to large-scaleproteomics datasets after correcting for the background of all measured proteins.", + "Exploring genes, molecules, and phenotypes is easily accomplished using GeneNetwork. In thismanuscript we will outline some simple use cases, and show how a small number of plausiblecandidate genes can be identified for an immune phenotype. 1. DataOnce you have navigated to genenetwork.org, there are two ways to search for data in GN. Thefirst is to use the global search bar located at the top of the page (Figure 1). This is a newfeature in GN that allows researchers to search for genes, mRNAs, or proteins across all of thedatasets.Recent improvements toGeneNetwork have reinvigorated it, including the addition of data from 10 species, multi-omicsanalysis, updated code, and new tools. The new GeneNetwork is now an exciting resource forpredictive medicine and systems genetics, which is constantly being maintained and improved. Here, we give a brief overview of the process for carrying out some of the most commonfunctions on GeneNetwork, as a gateway to deeper analyses, demonstrating how a smallnumber of plausible candidate genes can be found for a typical immune phenotype.", + "Genome Biol 8(2):R25Hubner N, Wallace CA, Zimdahl H, Petretto E, Schulz H et al (2005)Integrated transcriptional profiling and linkage analysis for identification of genes underlying disease. Nat Genet 37(3):243253Ihaka R, Gentleman RC (1996) R: a language for data analysis andgraphics. J Comput Graph Stat 5:299314Keller MP, Choi Y, Wang P, Davis DB, Rabaglia ME et al (2008) Agene expression network model of type 2 diabetes links cellcycle regulation in islets with diabetes susceptibility.", + "We next constructed protein-protein interaction networks.To do this, we selected 76 genes known from monogenic forms of diabetes, obesity, and hypertension or GWAS hits (type 2 diabetes, obesity, and hypertension) for which the lead association lies within the protein-coding part of the gene (Table S3).", + "To test this hypothesis, we used the Web-basedGeneNetwork databases that have been recently introducedto the scientific community and proved to be a powerful toolfor hypothesis-driven investigations (Chesler et al. 2003,2004; Wang et al. 2003). Researchers can take advantageof genetic diversity in panels of recombinant inbred mousestrains to use these databases for studies of the regulation ofgene expression and genetic mechanisms of complex traits. Our in silico investigation provided evidence for potentialfunctional relationships among the 21 DAT-associated proteins detected by mass spectrometry in this study.", + "Construction and analysis of the T2D risk genes networkTo further sift the results and explore functional connections, we also mapped genes onto known gene sets.For this purpose, we used DAVID [22,23] to search for enriched KEGG [24] pathways.We also used GARNET [25] to identify enriched Gene Ontology categories and their relationships.", + "Thereby such networks have the potential to beof importance in the emergence of precision medicine (Curtis, 2015; Desautels et al. , 2014;Glade Bender et al. , 2015; Jorgensen, 2015; Kummar et al. , 2015; Marquet et al. , 2015;Rubin, 2014) wherein therapeutic strategies need to be aligned with specific properties oftumors. Author ManuscriptMethodsGeneNetwork and WebGestaltGeneNetwork is an open access, online data analysis resource for systems biology andsystems genetics.", + "GeneNetwork.org also offers a powerful statistical platform foronline network analyses and mapping, enabling numerous molecular questions to be probed in one centralized location(Chesler et al. , 2003, 2005; Li et al. , 2010; Mulligan et al. , 2012,2017, 2019). Most data are from groups of animals or humanswho have been fully genotyped or even sequenced. As a result, itcan be used to model causal networks that link DNA differencesto traits such as differences in expression, cell number, volumes,and behavior using real-time computation and graphing." + ], + [ + "Insulin ResistanceInsulin is a pleiotropic hormone that plays a pivotal role in the development of hypertension, diabetes, and the metabolic syndrome.The main metabolic actions of insulin are to stimulate glucose uptake in skeletal muscle and heart and to suppress the production of glucose and very low-density lipoprotein (VLDL) in the liver [66].Under fasting conditions, insulin secretion is suppressed, leading to increased glucose synthesis in the liver and kidneys (gluconeogenesis) and increased conversion of glycogen to glucose in the liver (glycogenolysis) [67].After a meal, insulin is released from pancreatic -cells and inhibits gluconeogenesis and glycogenolysis [67].Insulin stimulates the sympathetic nervous system (SNS) to increase cardiac output and the delivery and utilization of glucose in the peripheral tissues [68].Other metabolic effects of insulin include inhibition of glucose release from the liver, inhibition of the release of free fatty acids (FFAs) from adipose tissue, and stimulation of the process by which amino acids are incorporated into protein [67].", + "Insulin Resistance in Type 2 DiabetesInsulin resistance is defined as impaired insulin-mediated glucose clearance into target tissues.Physiology studies many years ago showed most of the insulin-mediated clearance of a glucose load goes into skeletal muscle, plus the insulin response to the meal shuts down hepatic glucose production.We now know that the defect with insulin resistance is at both sites.In the fasting state, the degree of hyperglycemia is directly determined by the rate of glucose overproduction by the liver.With eating, failure of adequate insulin-mediated nutrient clearance into skeletal muscle combined with an attenuated halting of hepatic glucose production cause the raised postprandial glycemia.Reference ( 84) is an excellent review of the known pathophysiology from an investigator who performed many of the key studies.", + "The present: the crucial role of cells to glucose homoeostasis by feedback regulationThe importance of insulin resistance and -cell dysfunction to the pathogenesis of type 2 diabetes was debated for a long time; many thought that insulin resistance was the main abnormality in type 2 diabetes, and that inability to secrete insulin was a late manifestation. 5This notion changed with the fi nding that, as with most endocrine systems in human beings, a feedback loop operates to ensure integration of glucose homoeo stasis and maintenance of glucose concentration in a narrow range. 7his feedback loop relies on crosstalk between cells and insulin-sensitive tissues (fi gure 1).Insulin released in response to -cell stimu lation mediates uptake of glucose, aminoacids, and fatty acids by insulin-sensitive tissues.In turn, these tissues feed back information to islet cells about their need for insulin.The mediator of this process has not been identifi ed, but probably includes integration between the brain and humoral system.If insulin resistance is present, as often happens in people with obesity, cells increase insulin output to maintain normal glucose tolerance.However, if cells are incapable of this task, plasma concentrations of glucose increase.Glucose metabolism is normally regulated by a feedback loop including islet cells and insulin-sensitive tissues, in which tissue sensitivity to insulin aff ects magnitude of -cell response.If insulin resistance is present, cells maintain normal glucose tolerance by increasing insulin output.Only when cells cannot release suffi cient insulin in the presence of insulin resistance do glucose concentrations rise.Although -cell dysfunction has a clear genetic component, environmental changes play an essential part.Modern research approaches have helped to establish the important role that hexoses, aminoacids, and fatty acids have in insulin resistance and -cell dysfunction, and the potential role of changes in the microbiome.Several new approaches for treatment have been developed, but more eff ective therapies to slow progressive loss of -cell function are needed.Recent fi ndings from clinical trials provide important information about methods to prevent and treat type 2 diabetes and some of the adverse eff ects of these interventions.However, additional long-term studies of drugs and bariatric surgery are needed to identify new ways to prevent and treat type 2 diabetes and thereby reduce the harmful eff ects of this disease.", + "Molecular mechanisms of insulin resistance. Binding of insulin to its receptor activates insulin receptor tyrosine kinase and phosphorylation of a family of insulin receptor substrates (IRSs), especially IRS1 and IRS2 (REF.105) (FIG.6).These phosphorylated IRS proteins bind to and activate intracellular signalling molecules, most important of which is phosphatidylinositol 3-kinase (PI3K).PI3K promotes glucose transporter type 4 (GLUT4) translocation to the plasma membrane, resulting in glucose uptake into skeletal muscle, and phosphorylates and inactivates the transcription factor forkhead box protein O1 (FOXO1), altering transcription of downstream genes.Insulin also stimulates the RAS-mitogen-activated p rotein kinase (MAPK) pathway.Figure 4 | Insulin secretion in response to glucose.a | Characteristic insulin secretory response (reconstructed by deconvolution of plasma C-peptide levels) to oral glucose in patients with type 2 diabetes mellitus (T2DM) and in body mass index (BMI)-matched non-diabetic individuals.Note the higher fasting secretion rate, the initial blunted secretory response and the later catch-up phase (due to higher glycaemia).b | The insulin secretion rates of panel a are here plotted against the concomitant plasma glucose concentrations to show the deficit in glucose sensing in patients versus normal glucose-tolerant (NGT) controls.Actual experimental data have been averaged and interpolated to produce these graphs.Box 1 | Glucose homeostasisFollowing a meal, insulin secretion is stimulated and glucagon secretion is inhibited by the combined actions of hyperinsulinaemia and hyperglycaemia.Approximately 60-70% of insulin secretion is dependent on the release of the incretin hormones, including glucagon-like peptide 1 (GLP1) and gastric inhibitory polypeptide (GIP) by the L cells and the K cells in the gut, respectively.Collectively, the changes in glucose, insulin and glucagon levels suppress hepatic glucose production, stimulate muscle glucose uptake and inhibit lipolysis; the latter results in a reduction in the free fatty acid concentration in blood, which further enhances the effect of insulin on the liver and muscle.Type 2 diabetes mellitus is associated with major disturbances in all of the preceding physiological responses: insulin secretion is impaired; fasting plasma glucagon levels are increased and fail to suppress normally after a meal; basal hepatic glucose production is increased and fails to suppress normally after a meal; muscle glucose uptake is impaired; fasting plasma free fatty acid levels are increased and fail to suppress normally following a meal; and the post-meal rise in GLP1 and GIP is normal or modestly decreased.However, there is severe -cell resistance to the stimulatory effect of both GLP1 and GIP on insulin secretion.Insulin secretion.-cells integrate inputs from substrates (such as glucose, FFAs, arginine, fructose and amino acids), hormones and nerve endings to adjust insulin release in response to changing demands (for example, fasting-feeding cycles, exercise and stress) on a minuteto-minute basis in order to maintain normal blood glucose levels, and inter-individual differences affect this adjustment.For example, a lean, insulin-sensitive adult might need as little as 0.5 U of insulin to dispose of an oral load of 75 g of glucose over 2 hours, whereas an obese, insulin-resistant, glucose-intolerant person might require 45 U to perform the same task (~90-fold inter-individual difference).In vivo tests in humans using intravenous or oral glucose, arginine, sulfonylureas (antidiabetic drugs) or mixed meals have demonstrated impaired -cell function in overt T2DM.However, reliable quantitation of in vivo -cell dysfunction requires some form of modelling 78 .Absolute insulin secretion in response to an oral glucose challenge can be normal or even increased in T2DM (FIG.4a), except in long-standing, poorly controlled disease, in which absolute insulin secretion is reduced.However, when insulin secretion rates are plotted against the concomitant plasma glucose concentrations, patients with T2DM secrete substantially less insulin than non-diabetic controls (FIG.4b).This decline in -cell glucose sensing occurs along a continuum extending from normo glycaemia through prediabetes to decompensated diabetes in adults 79 and children 80 , and is a potent predictor of progression to diabetes independently of insulin resistance and classic phenotypic predictors 79 .Absolute insulin secretion is a positive antecedent of deteriorating glucose tolerance.Furthermore, the ability of -cells to respond to the rate of increase in plasma glucose concentration (rate sensitivity) is impaired in individuals with T2DM 79 .Antecedent hyperglycaemia and high levels of incretin hormones (GLP1 and GIP) potentiate glucosestimulated insulin release in healthy individuals.In patients with T2DM, glucose-mediated potentiation of insulin release is increased compared with normal glucose-tolerant individuals (owing to the hyperglycaemia); incretin potentiation, however, is severely compromised 81 .The incretin defect is not reversed by reducing the plasma glucose concentration 82 .", + "The effect of insulin has also been investigated both in vivo and in vitro. In vivo, contradictory results were obtained depending onthe way of administration and the quantity ofinsulin used. For instance, the intraperitonealadministration of a pharmacological dose of insulin decreased expression of FBPase (PlagnesJuan et al. , 2008), but similar acute treatmentwith physiological dose exhibited opposite effect (Polakof et al. , 2010d). Inhibitory actionof insulin can nevertheless be observed afterlong-term infusion of physiological quantity ofinsulin (Polakof et al. , 2010d).", + "However, a suggestion thatinsulin exerts partial control over gluconeogenesis isobserved since the activity of phosphoenolpyruvatecarboxyldnase in liver from younger diabetic mice isnot as greatly increased as it is in liver from olderdiabetics with blood sugar concentrations greater than250 mg/100 ml. P l a s m a insulin assay.The reasons for the ineffectiveness of this excesscirculating insulin in maintaining normal blood sugarconcentration and in regulating the rate of gluconeogenesis are obscure. A possibility, which cannot beexcluded, is the presence of insulin antagonists [23]. However, their presence seems unlikely in view of thepotent action of insulin in sustaining lipogenesis andin increasing glycolysis in these mice.", + "The pathophysiological processes leading to type 2 diabetesGlucose, a monosaccharide, is the key carbohydrate of energy metabolism.The three major sources of circulating glucose in the human body are intestinal absorption, gluconeogenesis and glycogenolysis.Blood glucose homeostasis is regulated by gluco-regulatory hormones such as insulin, glucagon, amylin, glucagon-like peptide 1, glucose-dependent insulinotropic peptide, epinephrine, cortisol and growth hormone (Stephen et al. 2004).Insulin is the key regulatory hormone of blood glucose homeostasis with its excitatory action of stimulating glucose uptake and inhibitory actions on gluconeogenesis, glycogenolysis, proteolysis, lipolysis and ketogenesis (Sonksen and Sonksen 2000).Ever since the role of insulin in glucose homeostasis is understood, it has been the primary therapeutic target in type 2 diabetes patients (Tibaldi 2013).The major pathological mechanisms of type 2 diabetes are the defective insulin secretion due to dysfunctional pancreatic b-cells and impaired insulin action through insulin resistance (Lin and Sun 2010; Ashcroft and Rorsman 2012).", + "Impaired b-cell function is considered a key factor in the pathogenesis of type 2 diabetes (T2D) driven by insulin resistance (1).Insulin secretion in response to an intravenous glucose stimulus is a two-phase process: the first peak of insulin secretion occurs rapidly within 5-10 min after the glucose infusion, followed by a second peak depending on the degree and duration of glucose stimulus (1).Although the insulin response to ingested glucose (e.g., from a meal) does not exhibit a clear biphasic shape under physiological conditions, an early insulin response with rapid elevations of portal and peripheral insulin concentrations has been observed (2,3).A previous study found that the plasma insulin response at 30 min after an oral glucose load was inversely associated with the 2-h plasma glucose concentrations in patients with impaired glucose tolerance (4).This implies that the early-phase insulin secretion is a marker for postprandial glucose homeostasis and plays a role in the development of T2D.", + "IntroductionType 2 diabetes is characterised by an elevation in blood glucose in the fasting state and/or following a glucose challenge resulting from insulin resistance and insufficient compensatory insulin secretion by pancreatic beta islet cells.Insulin action, as the insulin sensitivity index (S I ), can be estimated from the frequently sampled IVGTT with minimal model.Other indices include the acute insulin response to glucose (AIR g , reflecting insulin secretion) and the disposition index (DI=S I AIR g , measuring overall glucose homeostasis and taking account of the hyperbolic relationship between S I and insulin secretion).Glucose effectiveness (S G ) represents an insulin-independent effect whereby glucose mediates its own disposal from plasma.Impairments in these insulin action and glucose metabolism indices are recognised as prediabetic phenotypes involving pathogenic development and pathogenetic processes of type 2 diabetes.Exercise training improves peripheral S I and S G in healthy human subjects [1], and significant improvements in S I , AIR g , DI and S G in response to 20 weeks of endurance exercise training have been observed and reported in the HERITAGE Family Study [2].Recent investigations in HERITAGE provide further evidence that physiological training responses vary appreciably from person to person, and these individual differences are influenced by genetic factors [3].", + "(i) Removal of glucose from the blood is primarily achieved by insulin induction of glucose uptake into muscle.This involves insulin sensing and signalling within individual muscle cells, mobilisation of GLUT4 transporters to the cell membrane and conversion of glucose to glycogen for storage [31].Each of these processes has strict regulatory mechanisms that respond to more than just the amount of insulin the cells are exposed to (e.g.glycogen content, exercise, adrenaline, hypoxia, lipids, etc.). (ii) Glucose can be removed from the blood by adipose tissue and is also a fuel source for most cells in the body.At the same time endogenous glucose production in the liver is suppressed by insulin [32], but also by other nutrients (including glucose), and the liver is the primary site of insulin removal from the blood.Therefore there are at least three major organs that contribute directly to the level of glucose and insulin in the blood, and which work in concert to cope with variations in nutrient load or requirement, as well as to induce counterregulatory pathways to limit rebound in any given response.It is now known that many of the proteins involved in these actions work in a tissue-specific fashion, and that most of the intracellular molecular pathways involved have inherent redundancy (Fig. 2), with the ability to mask minor changes in the activity of the proteins involved [33,34]. (iii) Whole-body insulin resistance could arise from hepatic, muscle or adipose insulin resistance or combinations thereof.Glucose homeostasis depends in large part on production of appropriate quantities of insulin by pancreatic b-cells correctly timed around nutrient ingestion.In the evolution of an individual case of T2DM, it is generally considered that sensitivity to insulinmediated glucose disposal and insulin suppression of hepatic glucose production diminishes over time (e.g. as a result of increasing adiposity), with an initial compensatory increase in insulin secretion from b-cells to achieve glucose homeostasis.At this stage, which may be asymptomatic and prolonged, absolute insulin concentrations measured in plasma may be higher than the reference range.For an individual developing T2DM, a plot against time of total insulin secretion across a standard oral glucose tolerance test (OGTT) is therefore an inverted ''U''-shape as b-cells (teleologically) fail to maintain compensation [15].As compensation becomes less effective (''b-cell exhaustion''), even in the absence of a further deterioration of insulin sensitivity, either impaired glucose tolerance or impaired fasting glucose will develop before finally, the threshold is crossed for a diagnosis of T2DM (as defined by current WHO/ ADA glucose criteria).This trajectory of increase in insulin resistance, b-cell compensation and subsequent failure is nicely demonstrated in the Whitehall II study, a prospective follow up of London civil servants (Fig. 1) [16].In this model, insulin resistance plays an early (pre-diabetic) and important part in the development of T2DM, possibly even inducing b-cell failure due to the strain of prolonged compensation.Complex processes involved in insulin actionAs detailed earlier, clinical assessment of insulin sensitivity primarily relies on measurement of blood glucose and insulin, either in the fasted condition or under hormonal or nutrient ''clamp'' conditions.While the secretion of insulin is almost exclusively controlled by the functional state of the b-cell there are a large number of other tissues involved in maintaining proper response to changes in nutrients such as glucose.In addition there are multiple counter-regulatory mechanisms in the body to cope with changes in hormonal and nutrient exposure.In other words, mammals have evolved to keep a very tight control on blood glucose concentration and it is highly likely that multiple molecular problems would have to occur simultaneously to alter whole body insulin sensitivity significantly.", + "Pathophysiology and major risk factorsWhen the feedback loops between insulin action and insulin secretion do not function properly, the action of insulin in insulin-sensitive tissues such as liver, muscle and adipose tissue (insulin resistance in T2DM) and insulin secretion by pancreatic islet -cells (-cell dysfunction in T2DM) are affected, which results in abnormal blood levels of glucose 37 (FIG.2).In T2DM, insulin resistance contributes to increased glucose production in the liver and decreased glucose uptake in muscle and adipose tissue at a set insulin level.In addition, -cell dysfunction results in reduced insulin release, which is insufficient for maintaining normal glucose levels 38 .Both insulin resistance and -cell dysfunction occur early in the pathogenesis of T2DM, and their critical importance has been verified longitudinally in Pima Indian people progressing from normal glucose tolerance to impaired glucose tolerance to T2DM 39 .Figure 2 | Pathophysiology of hyperglycaemia in T2DM.Insulin secretion from the -cells in the pancreas normally reduces glucose output by the liver and increases glucose uptake by skeletal muscle and adipose tissue.Once -cell dysfunction in the pancreas and/or insulin resistance in the liver, skeletal muscle or adipose tissue occur, hyperglycaemia develops, leading to an excessive amount of glucose circulating in the blood.The various factors listed at the top affect insulin secretion and insulin action.T2DM, type 2 diabetes mellitus.", + "The role for pro-inflammatory cytokines in regulating insulin action and glucose homeostasis and their function in T2DM has been suggested by several lines of evidence." + ], + [ + "Type 2 diabetes (T2D) is a result of complex gene-environment interactions, and several risk factors have been identified, including age, family history, diet, sedentary lifestyle and obesity.Statistical models that combine known risk factors for T2D can partly identify individuals at high risk of developing the disease.However, these studies have so far indicated that human genetics contributes little to the models, whereas socio-demographic and environmental factors have greater influence 1 .Recent evidence suggests the importance of the gut microbiota as an environmental factor, and an altered gut microbiota has been linked to metabolic diseases including obesity 2,3 , diabetes 4 and cardiovascular disease 5 .", + "Diet, Nutrition, and Type 2 DiabetesObesity is pathophysiologically associated with the development of type II diabetes [199,200].Oxidative stress and inflammation, metabolic impairment and accelerated aging on both the micro-and macrocellular level contribute to the pathogenesis of metabolic diseases [201,202].", + "Our result provides a novel hypothesis on the mechanism for the connection between two aging-related diseases: Alzheimer's disease and type 2 diabetes.", + "IntroductionThe aetiology of type 2 diabetes is poorly defined: several studies indicate that the disease results from a combination of genetic susceptibility and external risk factors [1].According to this multifactorial model, genetically predisposed subjects will not necessarily develop overt disease unless they are also exposed to particular environmental factors [2].Important risk factors for the development of type 2 diabetes include a family history of diabetes, increased age, hypertension, lack of physical exercise, and obesity [1].", + "T ype 2 diabetes, though poorly understood, is known to be a disease characterized by an inadequate beta-cell response to the progressive insulin resistance that typically accompanies advancing age, inactivity, and weight gain. 1 The disease accounts for substantial morbidity and mortality from adverse effects on cardiovascular risk and disease-specific complications such as blindness and renal failure. 2 The increasing global prevalence of type 2 diabetes is tied to rising rates of obesity 2 -in part a consequence of social trends toward higher energy intake and reduced energy expenditure.However, the mechanisms that underlie individual differences in the predisposition to obesity remain obscure.T ype 2 diabetes, though poorly understood, is known to be a disease characterized by an inadequate beta-cell response to the progressive insulin resistance that typically accompanies advancing age, inactivity, and weight gain. 1 The disease accounts for substantial morbidity and mortality from adverse effects on cardiovascular risk and disease-specific complications such as blindness and renal failure. 2 The increasing global prevalence of type 2 diabetes is tied to rising rates of obesity 2 -in part a consequence of social trends toward higher energy intake and reduced energy expenditure.However, the mechanisms that underlie individual differences in the predisposition to obesity remain obscure.Failure to understand the pathophysiology of diseases such as type 2 diabetes and obesity frustrates efforts to develop improved therapeutic and preventive strategies.The identification of DNA variants influencing disease predisposition will, it is hoped, deliver clues to the processes involved in disease pathogenesis.This would not only spur translational innovation but also provide opportunities for personalized medicine through stratification according to an individual person's risk and more precise classification of the disease subtype.In this article, I consider the extent to which these objectives have been realized.", + "Although the etiology of T2D has not been fully established, a number of risk factors are well defined.According to the ADA [22], the risk of developing T2D is associated with age (increased risk at 45 years), overweight/obesity, and lack of PA.T2D is more common in individuals with a family history of the disease, in certain ethnic groups (e.g., African-Americans, Hispanic-Americans, Native Americans, Asian-Americans, and Pacific Islanders), and in individuals with hypertension (140/90 mmHg in adults), dyslipidemia (high density lipoprotein cholesterol [HDL-C] 35 mg/dL (0.90 mmol/L) and/or a triglyceride level 250 mg/dL (2.82 mmol/L)), IFG, IGT, a history of vascular disease or gestational diabetes, or polycystic ovary syndrome.In addition, a range of common genetic variants are also known to raise the risk of T2D [23][24][25], of which some may interact with lifestyle factors to modify the risk of the disease [26].Several examples are provided below.", + "Background: Type 2 diabetes mellitus is an important risk factor for Alzheimer disease and is more prevalent in elderly minority persons compared with non-Hispanic white persons.", + "Age. Age is another factor that has a considerable effect on outcomes in obesity and T2DM research.In humans, body weight increases with age and peaks at ~55 years in both men and women.Ageing per se is associated with a redistribution of both the fat-free mass and the fat mass, with the latter increase starting at ~30 years of age 129 .Intramuscular and intrahepatic fat are particularly increased in older persons, and this increase has been linked to insulin resistance 130 .Partially on the basis of these changes, ageing has been proposed to be an independent determinant of glucose tolerance, which progressively worsens with age 131,132 .", + "Age also plays a vital role in the onset of diabetes (Cowie & Eberhardt, 1995).In south-east Asia almost 97% diabetic patients are 40 years old or more (IDF Atlas, 2017).In Bangladesh, the reported age of diabetes is 40 years in 71% urban and 85% rural female, while in the case of male the proportion is 85.5% urban and 86.5% in rural population (IDF Atlas, 2017).The current study also pinpointed an exponential increase in the risk of onset of T2DM with the increase of age when 40 years was chosen as the reference (Table S4).", + "Type 2 diabetes incidence is increasing in youth, especially among the racial and ethnic groups with disproportionately high risk for developing type 2 diabetes and its complications: American Indians, African Americans, Hispanics/Latinos, Asians, and Pacific Islanders (9).Older age is very closely correlated to risk for developing type 2 diabetes.More than one in four Americans over the age of 65 years have diabetes, and more than half in this agegroup have prediabetes (9).The prevalence of type 2 diabetes in the U.S. is higher for males (6.9%) than for females (5.9%) (15).Independent of geography, the risk of developing type 2 diabetes is associated with low socioeconomic status.Low educational level increases risk by 41%, low occupation level by 31%, and low income level by 40% (16).", + "The aim of this study was to investigate the association between age at natural menopause and risk of developing type 2 diabetes, and to assess whether this association is independent of potential intermediate risk factors for type 2 diabetes.Furthermore, we examined the role of endogenous sex hormone levels in the association between age at natural menopause and type 2 diabetes.", + "The prevalence of type 2 diabetes in adolescents and young adults is dramatically increasing.Similar to older-onset type 2 diabetes, the major predisposing risk factors are obesity, family history, and sedentary lifestyle.Onset of diabetes at a younger age (defined here as up to age 40 years) is associated with longer disease exposure and increased risk for chronic complications.Young-onset type 2 diabetes also affects more individuals of working age, accentuating the adverse societal effects of the disease.Furthermore, evidence is accumulating that young-onset type 2 diabetes has a more aggressive disease phenotype, leading to premature development of complications, with adverse effects on quality of life and unfavourable effects on long-term outcomes, raising the possibility of a future public health catastrophe.In this Review, we describe the epidemiology and existing knowledge regarding pathophysiology, risk factors, complications, and management of type 2 diabetes in adolescents and young adults.The prevalence of type 2 diabetes in adolescents and young adults is dramatically increasing.Similar to older-onset type 2 diabetes, the major predisposing risk factors are obesity, family history, and sedentary lifestyle.Onset of diabetes at a younger age (defined here as up to age 40 years) is associated with longer disease exposure and increased risk for chronic complications.Young-onset type 2 diabetes also affects more individuals of working age, accentuating the adverse societal effects of the disease.Furthermore, evidence is accumulating that young-onset type 2 diabetes has a more aggressive disease phenotype, leading to premature development of complications, with adverse effects on quality of life and unfavourable effects on long-term outcomes, raising the possibility of a future public health catastrophe.In this Review, we describe the epidemiology and existing knowledge regarding pathophysiology, risk factors, complications, and management of type 2 diabetes in adolescents and young adults.Although drawing of definitive conclusions is difficult from these observational studies, their results suggest that young-onset type 2 diabetes is associated with a much more frequent occurrence of adverse macrovascular and microvascular outcomes and a more rapidly progressing severity of complications than is seen in type 1 diabetes or later-onset type 2 diabetes.ComplicationsEarlier onset of type 2 diabetes is associated with a greater lifetime risk of diabetes-associated complications. 98vidence from several cross-sectional studies [99][100][101][102] has suggested that the burden of diabetes complications is greater for people with young-onset type 2 diabetes than for people with type 1 diabetes or later-onset type 2 diabetes.Based on a modelling study of a hypothetical cohort of adolescents and young adults in the USA, 99 overall life expectancy among patients diagnosed with type 2 diabetes Review at 20-40 years is reduced by 14 years in men and 16 years in women compared with people without diabetes.Summary and future research directionsAlthough it is tempting to extrapolate the disease course of type 2 diabetes in young people as just an earlier and more rapid form of type 2 diabetes in older adults, distinctive differences are evident.The young-onset phenotype has a stronger family history, a greater association with obesity, early loss of both first and second phases of insulin secretion alongside often severe insulin resistance, early onset and rapid progression of microvascular and macrovascular complications, and poor sustainability of responsiveness to oral glucose-lowering therapies, frequently neces sitating early introduction of insulin.In a study of the age-specific incidence of type 2 diabetes in the UK (a retrospective cohort study of patients with newly diagnosed type 2 diabetes between 1990 and 2010), the investigators reported a substantial increase in the proportion of people aged 40 years or younger at diagnosis", + "T ype 2 diabetes is a major risk factor for cardiovascular disease (CVD) and other age-related ailments and affects 200 million people worldwide (1).The prevalence of type 2 diabetes differs across regions and ethnicities, being higher in African-American, Asian, Native-American, and Hispanic populations.In addition to the classical disease biomarkers, type 2 diabetes patients exhibit significantly elevated oxidative DNA damage, as measured by concentrations of 8-hydroxydeoxyguanosine (8-OHdG) or 8-hydroxyguanosine (8-OHG) in leukocytes (2) or urine (3)such that their use as biomarkers in the diagnosis of the disease has been considered (3).Mitochondria control both energy metabolism and reactive oxygen species (ROS) production (4 -6).Thus, mitochondrial dysfunction may contribute to the development of type 2 diabetes (4).Furthermore, diabetic hamsters treated with inhibitors of advanced glycation end products (AGEs) showed reduced oxidative stress and restored pancreatic -cell function (7).However, the mechanism underlying the development of type 2 diabetes, how that mechanism relates to DNA damage, and how type 2 diabetes increases the risk of CVD are not well understood." + ] + ], + "task_id": [ + "2A2860BB54BC0D36A929838ED41243A7", + "F35BF9C40081CE0521E562CD95BA4C2F", + "9DD88454267DEF2106A3EA7E6E8B5443", + "732D340E5C8F09381CEFA440AD2A7AB6", + "CE5922BDA6B949A17665AB4E1A8138D5" + ] +}
\ No newline at end of file diff --git a/gnqa/data/study1/datasets/human/dataset_citizenscientist_diabetes_4.json b/gnqa/data/study1/datasets/human/dataset_citizenscientist_diabetes_4.json new file mode 100644 index 0000000..145a6ba --- /dev/null +++ b/gnqa/data/study1/datasets/human/dataset_citizenscientist_diabetes_4.json @@ -0,0 +1,26 @@ +{ + "question": [ + "Can lifestyle changes reverse type 2 diabetes?" + ], + "answer": [ + "Lifestyle changes, including diet modification and increased physical activity, can help prevent and manage type 2 diabetes. They can improve insulin sensitivity, control blood glucose levels, and promote weight loss, which are all beneficial in managing the disease. However, they may not completely reverse the disease, especially in advanced stages." + ], + "contexts": [ + [ + "Accumulating evidence strongly demonstrates that the majority of type 2 diabetes cases can be prevented through diet and lifestyle modification.However, the adoption of a healthy diet and lifestyle requires not only individual behavioral changes, but also changes in our food, built, and social environments.Public health strategies that target the obesogenic environment are critical.Translating clinical and epidemiologic findings into practice requires fundamental shifts in public policies and health systems.To curb the diabetes epidemic, primary prevention through the promotion of a healthy diet and lifestyle should be a global public policy priority.Together, these clinical trials demonstrate that diet and lifestyle modification is highly effective in preventing type 2 diabetes in different ethnic and racial groups.There is an urgent need to translate the findings from these trials into clinical and public health practice.Emphasis should be placed on early adoption of healthy habits in pediatric populations because these practices track through to adulthood.PREVENTABILITY OF TYPESeveral randomized clinical trials have demonstrated that diabetes is preventable.One of the first diabetes prevention trials was conducted in Daqing, China (58).After 6 years of active intervention, risk was reduced by 31, 46, and 42% in the diet-only, exercise-only, and diet-plus-exercise groups, respectively, compared with the control group.In a subsequent 14-year follow-up study, the intervention groups were combined and compared with control subjects to assess how long the benefits of lifestyle change can extend beyond the period of active intervention (59).Compared with control subjects, individuals in the combined lifestyle intervention group had a 51% lower risk of diabetes during the active intervention period, and a 43% lower risk over a 20-year follow-up.", + "An obvious conclusion is a manipulation of lifestyle provides an opportunity to reverse the diabetes trend.Stated another way, we cannot change our genetic make-up, but we can alter environmental factors.Indeed, many studies have shown that diet and exercise slow the onset of diabetes in persons with IGT (2,17,18).Also, low glycemic index diets have been shown to promote weight loss along with having metabolic benefits in persons with type 2 diabetes (19).The difficulty, of course, is trying to get people to change their habits.One can predict that returning to healthy lifestyles would reverse the rising incidence of type 2 diabetes.Unfortunately, that is not a practical solution.Instead, the current approach is to better understand the pathogenesis of type 2 diabetes, hopefully followed by the development of pharmaceuticals that reverse the key pathogenic elements.We entered the 1990s knowing that type 2 diabetes was characterized by the classic triad of b-cell dysfunction, excess glucose production from the liver, and insulin resistance defined as impaired insulin-mediated glucose clearance into skeletal muscle (8).However, knowledge at that time provided no physiological connection between these organs.Another conundrum was how excess adiposity, i.e., being fat, caused insulin resistance, which again is a defect in skeletal muscle physiology.", + "Evidence from randomized controlled trailsThe effi cacy of lifestyle changes in obesity and T2DM prevention has been established in numerous randomized controlled trails (RCTs).Several of them may, however, be considered of major importance due to their large sample sizes (i.e., 458-3234 individuals) and long-term duration (i.e., 3-6 years).The Chinese Da Qing diabetes prevention study was the fi rst to investigate the eff ect of 6-year lifestyle change on body weight and diabetes incidence in individuals with impaired glucose tolerance (IGT) ( Pan et al., 1997 ).Pan and co-workers (1997) reported 42 % reduction in diabetes incidence, although no signifi cant diff erence in body weight was present.Similar results were found in the Finnish Diabetes Prevention Study (DPS) and the US Diabetes Prevention Program (DPP).DPS and DPP independently reported reduction in diabetes incidence of 58 % accompanied by significant reduction in body weight (5-7 %) as a result of the lifestyle modifi cation ( Knowler et al., 2002 ;Tuomilehto et al., 2001 ).These fi ndings were also confi rmed in Japanese and Indian populations, reporting 67.4 % and 28.5 % reduction in diabetes incidence, respectively ( Kosaka et 2011) reported signifi cant reduction in body weight and diabetes incidence at 1, as well as, at 3 years during a lifestyle modifi cation program carried out in a primary healthcare setting among subjects with IGT.All large-scale interventions have been successful in preventing T2DM during the active intervention period.Remarkably when the eff ectiveness of the lifestyle modifi cation programs was assessed on the long-term after discontinuation of the intervention, diabetes risk still remained substantially reduced.In the Finnish DPS, for instance, at extended follow-up 3 years after the 4-year intervention period a substantial reduction in body weight and T2DM incidence was still present ( Lindstrom et al., This document was downloaded for personal use only.Unauthorized distribution is strictly prohibited.As already pointed out in several of the T2DM prevention studies the reduction in diabetes risk has been paralleled by substantial weight loss and weight reduction has been considered to have major importance for diabetes prevention ( Knowler et 1998 ).Hence, lifestyle modifi cation seems to have an eff ect on T2DM not only through reduction in body weight, but also through improvement in insulin sensitivity, blood glucose control and lipid profi le.Whereas there is convincing evidence that lifestyle changes can prevent T2DM in randomized controlled studies, so far little is known whether a lifestyle intervention could also modify cardiovascular morbidity and mortality.The 20-year follow-up results from the Chinese Da Qing diabetes prevention study showed a non-signifi cant 17 % reduction in cardiovascular mortality in the combined (diet and/or PA) intervention group vs. controls ( Li et al., 2008 ).Similarly, lifestyle intervention in the Finnish DPS was not found to reduce signifi cantly cardiovascular mortality during the fi rst 10 years of follow-up ( Uusitupa et al., 2009 ).However, this study was not initially designed to examine the eff ect of lifestyle intervention on total mortality or cardiovascular morbidity, and therefore the statistical power may not have been suffi cient to detect small diff erences in cardiovascular events between the 2 groups.Besides, a longer follow-up period might be needed to answer this question.In the Malm Preventive trial with a 12-year follow-up of men with IGT total and cardiovascular mortality were lower among participants in the lifestyle intervention group, however, these results should be considered with caution due to the non-randomized design of the study ( Eriksson and Lindgarde, 1998 ).Recent fi ndings of bariatric surgery treatment of very obese subjects showed that weight loss indeed may reduce not only T2DM risk but also total mortality ( Sjstrm et al., 2007 ).Further investigations are needed to clarify whether prevention of T2DM by lifestyle modifi cation is associated with cardiovascular disease prevention; until then decisions have to be made on the basis of the best available information.al., 2002 ;Kosaka et al., 2005 ;Lindstrom et al., 2003 ;Tuomilehto et al., 2001 ).In some studies although no or just minor weight loss was achieved, diabetes incidence was also reduced( Pan etal., 1997 ; Ramachandran et al., 2006 ).In addition, on the long term weight was partially or totally regained in all of the studies ( Knowler et al., 2009 ; Li et al., 2008 ; Lindstrom et al., 2006 ; Lindstrom et al., 2003 ).Despite this regain T2DM risk remained low or decreased further, thus the eff ect of lifestyle is unlikely to be solely due to body weight reduction.In support of this notion Pan et al. (1997) reported comparable decrease in T2DM incidence in the intervention group of Da Qing among overweight and lean individuals.In conclusion, evidence from epidemiological studies and RCTs demonstrate that lifestyle modifi cation comprising higher levels of PA and prudent food consumption may be eff ective in obesity and T2DM prevention.The positive eff ect of lifestyle on body weight seems somewhat transient, whereas the eff ect on T2DM is sustained for longer periods.Furthermore, lifestyle modifi cation appears to have an eff ect on diabetes risk independently of body weight and even of weight loss.", + "Because lifestyle changes to reduce bodyweight have always been an important therapy for type 2 diabetes, investigators of Look AHEAD trial 156 examined the eff ect of weight reduction (achieved by an intensive lifestyle intervention) on cardiovascular events.Despite diff erential weight loss for more than 10 years and improvements in many cardiovascular risk factors (including blood pressure and lipids), lifestyle change did not reduce cardiovascular events compared with diabetes support and education (control group).This fi nding might have been because large proportions of participants in both groups received medical treatment for these risk factors.However, participants in the group receiving Glucokinase Reduce hepatic production of glucoseTable 1: Selected therapeutic targets of largely untested mechanisms for type 2 diabetesintensive lifestyle intervention who had a history of a cardiovascular event at baseline had a tendency for an increased risk of a subsequent cardiovascular event; 156 a similar fi nding was reported in ACCORD. 144Several other fi ndings from Look AHEAD are worthy of comment.First, participants in the weight-loss group were more likely to achieve either partial or complete remission of diabetes, 157 had better glucose control needing fewer glucose-lowering drugs (including insulin), and were more likely to achieve a glycated haemoglobin A 1c measurement of less than 7% (53 mmol/mol) than were those in the control group. 158However, despite weight loss and addition of drugs, patients in the treatment group had similar progression of diabetes to that of the control group-ie, with continuous increases in glycated haemoglobin A 1c . 156Second, lifestyle change slowed progression of nephropathy.Third, other health outcomes associated with better quality of life-eg, sleep apnoea 159 and mobility 160 -improved.Thus, intensive lifestyle change in patients with type 2 diabetes has benefi ts, but unfortunately not for cardiovascular outcomes, which remain the major cause of premature mortality in type 2 diabetes.", + "INTRODUCTIONIntensive lifestyle interventions (eg, promoting increased physical activity and weight loss) can be effective in decreasing the incidence of type 2 diabetes mellitus (T2DM). 1 However, healthcare resources are limited, and participants in interventions to prevent diabetes should be prioritized.Identification of individuals at high risk of T2DM could facilitate the targeting of prevention efforts to those who could benefit from them and reduce the cost of preventing T2DM.", + "GENETIC FACTORS AND LIFESTYLE interact in the development of type 2 diabetes.Physical activity, favorable dietary changes, and weight reduction were essential components of a success-ful lifestyle intervention in two large randomized controlled trials on the prevention of type 2 diabetes in high-risk individuals with impaired glucose tolerance (IGT), including the Finnish Diabetes Prevention Study (DPS) (44) and the Diabetes Prevention Program (DPP) (22).In the DPS, increased physical activity was associated with a decreased risk of type 2 diabetes independently of changes in diet and body weight.The individuals who increased their physical activity most (i.e., were in the upper third of the change) were 66% less likely to develop type 2 diabetes than those in the lower third (24).Type 2 diabetes can be prevented or delayed by lifestyle modification, including increased physical activity, beneficial dietary changes, and weight reduction (22,44).However, only Model adjusted for age, gender, group, baseline value of moderate-to-vigorous physical activity, and baseline values and changes in body weight and in intakes of energy and energy-adjusted saturated fat and fiber. *The median (range) of each tertile of change in moderate-to-vigorous physical activity is shown.Adjusted interaction between moderate-to-vigorous physical activity (3 groups) and the polymorphism (2 groups) on the risk of developing type 2 diabetes.a few studies have investigated the effects of such lifestyle interventions on insulin sensitivity and insulin secretion in persons with IGT (21,46).On the basis of the 4-yr follow-up study of the DPS with repeated frequently sampled intravenous glucose tolerance test (FSIGT), insulin sensitivity improved along with lifestyle changes, while insulin secretion remained virtually unchanged (46).Most other data also indicate that physical activity, diet, and weight loss primarily increase insulin sensitivity.Insulin resistance and the associated glycemic stress may exhaust -cells and impair their function.Regular physical activity may diminish glycemic stress by improving insulin sensitivity of target tissues (18).While the mechanisms of improved -cell function in response to lifestyle interventions are still largely unknown, several studies suggest that physical activity (5,11), diet (19,26), weight loss (45), or their combination (21) may directly improve the first-phase insulin secretion that is an indicator of the -cell function.", + "Several studies have shown that treatment with lifestyle change or medication can reduce the progression from prediabetes to diabetes (98,99).Furthermore, a clinical benefit of early therapy has been demonstrated (100,101), with reductions in retinopathy and cardiovascular and allcause mortality (102).This evidence suggests that identifying prediabetes at an early stage and keeping glucose levels close to normal could change the natural history of the disease (93).", + "Lifestyle modification including exercise, nutrition and behavioral changes is the cornerstone to prevent and treat type 2 diabetes.Oral antidiabetic medicationeither as single agent or combination therapyis frequently required to maintain metabolic control, as assessed by monitoring of glycated hemoglobin A 1C (HbA 1C ) levels.Eventually, a significant proportion of patients with type 2 diabetes require the exogenous administration of insulin [40].", + "Lifestyle changes9][120][121] Intervention studies involving diet alone to treat patients with youngonset type 2 diabetes have been limited.In a study 120 of 20 obese children and adolescents (mean age 145 years) with type 2 diabetes, improvements in weight (BMI was reduced from 435 to 393 kg/m), insulin sensitivity, and HbA 1c concentration were seen after following a very low calorie diet (<800 kcal per day) for a 2 month period.These improvements were, however, not maintained after cessation of the diet.", + "Diet and lifestyle factorsDiet and lifestyle modification is an important aspect of T2DM prevention.Major clinical trials have demonstrated that intensive lifestyle interventions can lower the incidence of diabetes mellitus by 58% compared with control groups 55 .Trials have also shown that these interventions are more effective than pharmacological interventions 55 .Landmark clinical trials, such as the Diabetes Prevention Program in multi-ethnic Americans 55 , the Finnish Diabetes Prevention Study 56 and the Da Qing IGT and Diabetes Study in China 57 , have demonstrated that many cases of T2DM could be prevented through lifestyle interventions focused on increasing physical activity and adopting a healthy diet.Nevertheless, when lifestyle interventions are not feasible, pharmacological therapy can be considered as a strategy to prevent the development of T2DM.For example, metformin reduced the incidence of T2DM by 31% over an average follow-up period of 2.8 years among high-risk individuals from the USA who did not have diabetes mellitus 55 .Similarly, metformin reduced T2DM risk in clinical trials in India and China 58 .T2DM managementModification of lifestyle, including weight loss, increasing physical activity and adopting a healthy diet, remains one of the first-line strategies for the management of T2DM.In the Look AHEAD (Action for Health in Diabetes) trial in the USA 133 , a 4-year intensive lifestyle intervention through caloric restriction and increased physical activity achieved increased weight loss, improved cardiometabolic risk profiles and a reduced requirement for medication to control CVD risk factors compared with the control group (who had diabetes mellitus and received support and education about lifestyle modifications) 134 .However, after a median follow-up period of 9.6 years, the trial was terminated because the intervention did not reduce the rate of CVD events 134 .", + "Multiple interventions in adults with T2D have been evaluated for risk reduction and prevention, both in the short and the long term.A recent systematic review (69) reported that after active interventions lasting from 6 months to .6 years, relative risk reduction achieved from lifestyle interventions (39%) was similar to that attained from use of drugs (36%); however, only lifestyle interventions had a sustained reduction in risk once the intervention period had ended.Analysis of the postintervention follow-up period (;7 years) revealed a risk reduction of 28% with lifestyle modification compared with a nonsignificant risk reduction of 5% from drug interventions." + ] + ], + "task_id": [ + "F0CC742EA104CB2C8B8BCA9CB6EB78F0" + ] +}
\ No newline at end of file diff --git a/gnqa/data/study1/datasets/human/dataset_citizenscientist_general_1.json b/gnqa/data/study1/datasets/human/dataset_citizenscientist_general_1.json new file mode 100644 index 0000000..861a366 --- /dev/null +++ b/gnqa/data/study1/datasets/human/dataset_citizenscientist_general_1.json @@ -0,0 +1,113 @@ +{ + "question": [ + "What is the most cited environmental factor for the onset of asthma?", + "How would one extract the DNA, from say, flora or fauna?", + "genetics", + "what is bioinformatics", + "Explain the process of finding a genetic marker followed by a quantitative trait loci." + ], + "answer": [ + "The most cited environmental factor for the onset of asthma is living in an urban area, particularly in low-income settings. Other factors include exposure to air pollution, toxins in food and drink, and aerosols, especially during the rainy season.", + "DNA extraction from flora or fauna involves several steps. For flora, a common method is the mixed alkyl trimethyl ammonium bromide (MATAB) procedure. Plant material is ground and incubated in a pre-warmed extraction buffer, then purified with chloroform:isoamylalcohol, and DNA extracts are precipitated with isopropanol. For fauna, DNA can be extracted from various samples such as blood, tissue, or even stool. A common method involves cell rupture in the presence of EDTA to prevent DNA fragmentation, followed by enzymatic digestion of cell walls, solubilization of the cell membrane, and purification using phenol-chloroform extraction and ethanol precipitation. In both cases, the extracted DNA is usually stored at -20 C until use.", + "Genetics plays a significant role in various aspects of human life and health, including aging, weight loss, bone traits, physical activity levels, and susceptibility to diseases. Genetic factors can influence lifespan, with certain genes linked to conditions like Alzheimer's disease, heart disease, and cancer. Studies have also shown that genetics can affect the magnitude of weight loss and physical activity levels. In addition, genetics can impact bone traits and the response to different diets. Research on animal models has further demonstrated the influence of genetics on exercise capacity and responses to training regimens.", + "Bioinformatics is a field that deals with the application of computer systems to the understanding and organization of biological data. It involves the use of computational tools to gather, store, analyze and integrate biological and genetic information. This can then be used for gene-based drug discovery and development, prediction of protein function from sequence and structural information, and analysis of genomic data.", + "The process of finding a genetic marker followed by a quantitative trait loci (QTL) involves several steps. First, a population is developed for genetic mapping. This population can be a segregating population or a permanent population. The population is then genotyped using molecular markers. Next, the population is phenotyped for an interested trait. QTL analysis is then conducted using statistical procedures to find markers linked to the QTL. This involves single-marker regression across all chromosomes, where a hypothetical QTL is evaluated at the location of each marker locus. The significance of that QTL is estimated from a likelihood ratio statistic. A permutation test is then conducted to establish genome-wide significance criteria for the trait. The result is a list of marker loci that show a significant association with the trait. These loci are most likely to be near QTLs. The goal of QTL mapping is to identify regions of the genome that harbor genes relevant to a specified trait." + ], + "contexts": [ + [ + "INTRODUCTIONAsthma is a chronic disease of the airways defined by its symptoms, which include reversible airflow obstruction, inflammation, and bronchial hyperresponsiveness.The ancient Egyptians, Greeks, and Romans made reference to the symptoms of asthma, and today the disease is estimated to affect 235-334 million people worldwide (44,53).The atopic triad.Perhaps the most widely recognized pattern of co-occurrence is the one of asthma, atopic dermatitis (eczema), and allergic rhinitis (hay fever), which together are referred to as the atopic triad and characteristically present clinically in a temporal sequence known as the atopic march.Within this sequence, atopic dermatitis is typically the first component to manifest, with approximately 20-30% of individuals with mild disease and 70% of those with severe disease going on to develop asthma.Individuals who undergo this distinctive sequence of disease progression frequently exhibit a more severe and persistent phenotype, with increased risk of allergen sensitization.", + "Clinically, asthma is characterized by episodes of coughing, chest tightness, wheezing, dyspnea, or sputum production.Often, asthma sufferers experience a combination of these symptoms, or some symptoms more than others.Pulmonary breathing tests typically demonstrate variable airway obstruction and hyperreactivity, but may be normal, even in patients with severe and uncontrolled disease [8].Thus, the diagnosis of asthma, which is based on general clinical symptoms and variable lung function testing, is non-specific and heavily dependent on clinical history.Within the \"umbrella\" diagnosis of asthma there exists a diverse array of differing clinical phenotypes [9].For example, childhood asthma is often associated with personal and parental atopic diseases (i.e., atopic dermatitis, food allergy, eosinophilic esophagitis, allergic rhinitis), viral infections, and tobacco smoke exposure [10].Alternatively, adult-onset asthma is less associated with atopic disease [11,12], but more associated with female sex [13], sinus disease [14], and preceding respiratory infections such as pneumonia [15].In addition, adult-onset disease is often of higher severity [12,16] with a faster and more persistent decline in lung function [17].Moreover, although severe patients are found in every demographic and age group, the most common phenotype is an adult female that is older and obese [18].IntroductionAn estimated 9% of children and 6% of adults in the United States have asthma [1].The total number of asthma sufferers worldwide is estimated to be over 300 million, with an additional 100 million expected to develop asthma by 2025 [2][3][4][5].Developed countries are the most affected, with some of the highest rates found in the United Kingdom, Australia, New Zealand and the Republic of Ireland [3].Asthma prevalence is rising significantly in developing countries in transition to a more Western lifestyle [3].In 2007, the cost of disease in the United States was estimated to be $56 billion in relation to medical expenses, missed days of work, and early deaths [1].The rate of asthma deaths has likely plateaued, but is still as high as 250,000 per year worldwide [6].Morbidity and mortality are particularly high in ethnic minorities living below or near the poverty line, and African American children had a death rate 10 times that of non-Hispanic white children in 2015 [7].Thus, asthma is a costly, growing health problem associated with high morbidity and mortality.", + "Getting accurate estimatesof exposures is difficult, whether this is air pollution or toxins in our food anddrink, but these are important questions. Rutter: That is an important point. From the twin study data it is clear thatenvironmental effects account for quite a lot of the variance on all the multifactorial disorders. Yet the kinds of measures that are used arent terribly solid. Theyinclude broad thing such as socio-economic status (SES). Even where there aregood measures the care taken in testing for environmental mediation is usuallypoor.Bronchiolitis, a diseasethat happens in the first year of life in many infants, is strongly associated withsubsequent asthma. We ascertained it in the first years of life and have been following these people to age 25 now. For the people who had bronchiolitis and nowhave asthma, their parents recall much better that they had bronchiolitis than thosewho dont have asthma now. It is at least twice more. Extraordinarily, some ofthese latter parents dont recall that they took their child to the doctor in the fi rstyear of life.If you arrive in the USA whenyou are young you have almost the same prevalence of asthma as an adult as thosewho are born in the USA and who are not Mexican. But if you arrive at older agesyou have less asthma. If you arrive at the age of 20 you have the same asthma riskas those born in Mexico (Eldeirawi et al 2005). Kotb: This is extremely interesting. There is a relationship between depressionand the immune system. This especially applies to natural killer (NK) cells, whichare the main cells that fight cancers.A more constructive approach is the use of refined measures of environment: an interviewthat quantifies the level of independence of stressful life events (Brown & Harris1978) or objectively recorded events in natural experiments (Kilpatrick et al2007). Factors that are considered as environmental, e.g. smoking, are strongly determined by personality and genetic factors. Personality-related factors and stressfullife events also influence detection of physical health outcomes including abdominal pain, appendectomy, peptic ulcer or diabetes control (Creed 2000).", + "; Guffey, S.E. Investigation into pedestrian exposure to near-vehicle exhaust emissions. Environ. Health2009, 8, 13. [CrossRef] [PubMed]Our World in Data.org. 2017. Available online: https://ourworldindata.org/data-review-air-pollution-deaths (accessed on10 January 2022). Pope, C.A. , III. Respiratory disease associated with community air pollution and a steel mill, Utah Valley. Am. J. Public Health1989, 79, 623628. [CrossRef] [PubMed]Pope, C.A. , III. What do epidemiologic findings tell us about the health effects of environmental aerosols? J. Aerosol. Med. 2000,13, 335354. [CrossRef] [PubMed]Pope, C.A. , III.", + "Case for Support BBSRC Grant Application September 2005Integrative Analysis of the Genetic Factors behind Asthma and Atopic DermatitisPart I: Research ProposalBackgroundAIntroduction of topic of research and its academic and wider contextAsthma is the most common disease of childhood, and affects one child in seven in the UnitedKingdom. Atopic Dermatitis (AD, eczema) affects similar numbers of children. About 60% of children withsevere AD will have concomitant asthma. Treatments for both diseases are unsatisfactory. Abandonment oforthodox medical therapy for AD is common in many families who have children with the disease.", + "This is most common during the rainyseason when aerosols are created, which results in repeated inhalation of Bp [43, 44]. Environmental sampling studies reveal there is a positive association between theprevalence of disease and the degree of environmental contamination [7]. In addition toenvironmental factors, data suggests that host factors play an important role in mountingan immune response against infectious diseases [45] such as melioidosis. While healthypersons can contract melioidosis, most patients in endemic regions have an underlyingpredisposition [28], which suggests that the immunological status of the patient caninfluence disease initiation and progression [15].", + "Sensitivity analysisWe did two sets of post-hoc sensitivity analyses to assess the effects of potential poor recall of age of onset among individuals with adult-onset asthma, and the effects of misclassification of COPD as asthma among the adultonset cases, even with exclusion of cases with a reported diagnosis of COPD, emphysema, or chronic bronchitis.First, to assure that the adult-onset cases did not include a significant proportion of childhood-onset asthma in which symptoms remitted in early life but then relapsed in adulthood, we replaced adult-onset cases with increasing proportions of randomly selected childhood-onset cases, and then tested for association at the two most significant childhood onset-specific loci.This procedure was repeated 20 times for each proportion to quantify the sampling variability (appendix pp 7-8).Second, we did two analyses in which we removed either individuals with ages of asthma onset between 46 and 65 years or adult-onset cases and controls with FEV/FVC <070.For each, we compared p values and ORs with the GWAS including all adult-onset cases (appendix pp 8-9).We used data for British white individuals from UK Biobank data release July 19, 2017. 8We extracted disease status (asthma, allergic rhinitis, atopic dermatitis, food allergy, chronic obstructive pulmonary disease (COPD), emphysema, and chronic bronchitis), age of on set of asthma, and sex from self-reported question naires and hospital records (International Classification of Diseases 10th revision [ICD-10] codes) by querying our in-house protected UK Biobank database server. 9For our main case analysis, we included individuals who self-reported that they had doctor-diagnosed asthma.Further details of our research approach are provided in the appendix (pp 4-7).", + "; Guffey, S.E. Investigation into pedestrian exposure to near-vehicle exhaust emissions. Environ. Health2009, 8, 13. [CrossRef] [PubMed]Our World in Data.org. 2017. Available online: https://ourworldindata.org/data-review-air-pollution-deaths (accessed on10 January 2022). Pope, C.A. , III. Respiratory disease associated with community air pollution and a steel mill, Utah Valley. Am. J. Public Health1989, 79, 623628. [CrossRef] [PubMed]Pope, C.A. , III. What do epidemiologic findings tell us about the health effects of environmental aerosols? J. Aerosol. Med. 2000,13, 335354. [CrossRef] [PubMed]Pope, C.A. , III.", + "8 Thesocio-ecologic framework posits that various aspects of a childs environment directly and indirectly impact thechilds health and development.9 Drawing on this framework, Beck and colleagues10 examined several biologic,social and ecologic variables to provide a greater understanding of factors influencing asthma-related hospitalreadmissions for black children compared to their white counterparts. The study revealed that black childrenwere over two times as likely to be readmitted for an asthma-related illness compared to white children; thisresulted from significant differences in almost every socio-ecologic variable measured, including diseasemanagement practices and access to primary care.Specific AimsAsthma is the most common chronic pediatric medical condition in the United States, with a prevalenceover 9.6% in children under 18 years of age.1, 2 Low-income, urban children incur a disproportionate share ofasthma prevalence and morbidity;2-4 13% of children living below the poverty threshold are diagnosed withasthma compared to 8% of non-poor (>200% poverty),3 and poverty is associated with higher rates of asthmaattacks.1 Living in an urban area confers additional risk for asthma and increased ED utilization.4, 5Implementation of the National Asthma Education and Prevention Programs (NAEPP) Guidelines hascontributed to reductions in asthma morbidity and mortality rates, and these guidelines emphasize establishinga partnership between healthcare providers and patients/families to promote effective asthma management.6The NAEPP expert panel states, building a partnership requires that clinicians promote opencommunication and ensure that patients have a basic and accurate foundation of knowledge about asthma(p.124),6 yet care partnerships also require that the patient/parent effectively communicate issues such asemerging symptoms or response to medications.Vital & health statistics Series 3, Analytical and epidemiological studies. 2012(35):1-58. CDC. Current Asthma Prevalence. https://www.cdc.gov/asthma/most_recent_data.htm. 2015. UpdatedJune 2017. Accessed March 9, 2018. Northridge J, Ramirez OF, Stingone JA, Claudio L. The role of housing type and housing quality inurban children with asthma. Journal of urban health : bulletin of the New York Academy of Medicine. 2010;87(2):211-224. Flores G, Snowden-Bridon C, Torres S, et al. Urban minority children with asthma: substantialmorbidity, compromised quality and access to specialists, and the importance of poverty and specialtycare.Asthma Prevalence and DisparitiesAsthma is the most common chronic pediatric medical condition in the United States,1 affecting anestimated 6.2 million children annually.2 Poorly controlled pediatric asthma contributes to over 700,000 visits ayear to emergency departments (ED).1 Children living in impoverished, urban settings are disproportionatelyaffected by asthma,3 and the disparate impact of asthma is even worse among black and Latino children, andchildren whose parents have limited English proficiency (LEP) in these urban low-income areas.4-6 A 2017longitudinal study revealed that black race and Latino ethnicity are significantly associated with worse asthmaoutcomes including 1) asthma knowledge, 2) asthma-related quality of life, 3) asthma severity, and4) asthma control.The Journal of asthma : official journal of the Association for the Care of Asthma. 2017:16. Inkelas M, Garro N, McQuaid EL, Ortega AN. Race/ethnicity, language, and asthma care: findings froma 4-state survey. Annals of allergy, asthma & immunology : official publication of the American Collegeof Allergy, Asthma, & Immunology. 2008;100(2):120-127. National Asthma Education and Prevention Program. Expert Panel Report 3: Guidelines for theDiagnosis and Management of Asthma Bethesda, MD: National Institutes of Health, National Heart,Lung, and Blood Institute; 2007. Publication no. 08-045.1. NIH Consensus Group. Video report: What is mHealth?Contact PD/PI: Coker, Tumaini RuckerINTRODUCTION TO APPLICATIONResearch Plan OverviewChildhood asthma is the most common pediatric medical condition in the United States, anddisproportionately affects children living in low-income, urban settings. Many low-income, urban families rely onemergency department (ED) services as their source for sick care for their child. This is often due to not havinga primary care provider or sufficient access to their primary care provider for asthma management." + ], + [ + "Taxon Sampling and DNA ExtractionsWe extracted DNA from 72 pinned specimens from the National Museum of Natural History (NMNH) Entomology collection for this study.We plucked middle legs from the pinned bees using a pair of sterilized forceps and washed the tissue in 95% ethanol to remove dust, pollen, and other forms of accumulated debris on the bee legs.After evaporation of the ethanol (by drying the tissue on a clean Kimwipe ), the samples were placed in a freezer for several hours.DNA was then extracted destructively by grinding the frozen tissue with a sterile pestle, using a DNeasy Blood and TissueKit (Qiagen, Valencia, CA, USA) and following the manufacturer's protocol, except the DNA was eluted in 130L ddH 2 O instead of the supplied buffer.We ran 10L of each extract for 60 min at 100 volt on 1.5% agarose SB (sodium borate) gels, to estimate size of the genomic DNA.", + "Extraction of biomolecular fractions from faecal samples.Biomolecular fractions were extracted from unthawed, frozen faecal subsamples (150 mg) after pretreatment of the weighed subsamples with 1.5 ml RNAlater ICE (LifeTechnologies) overnight.The faeces-RNAlater ICE mixture was homogenized by bead-beating, as previously described 53 .Differential centrifugation and extraction using the All-In-One kit (Norgen Biotek) to recover DNA and proteins were carried out as previously described 53 .DNA fractions were supplemented with DNA extracted from 200 mg subsamples using the MOBIO Power Soil Kit.", + "Bulk DNA Extraction.Total DNA was collected from the cell pellets remaining after Ficoll density centrifugation for B lymphocyte isolation using the DNeasy Blood & Tissue Kit (Qiagen) following the manufacturer's specifications.The concentrations of DNA were quantified using the Qubit High-Sensitivity dsDNA Kit, and the qualities of DNA were evaluated with 1% agarose gel electrophoresis.", + "MethodsLaboratory procedures.We initially screened 107 ancient samples (Supplementary Data 1) in dedicated clean facilities at the ancient DNA lab of Jilin University, China, following published protocols for DNA extraction and library preparation 36,37 .Prior to sampling, we wiped all skeletal elements with 5% bleach and irradiated with UV-light for 30 min from each side.We drilled teeth to obtain fine powder using a dental drill (Dremel, USA).We sampled the dense part of petrous bones around the cochlea by first removing the outer part using the sandblaster (Renfert, Germany), and then grinding the clean inner part into fine powder with the mixer mill (Retsch, Germany).We digested the powder (50-100 mg) in 900 l 0.5 M EDTA (Sigma-Aldrich), 16.7 l of Proteinase K (Sigma-Aldrich), and 83.3 l ddH 2 O (Thermo Fisher, USA) at 37 C for 18 h.Then we transferred the supernatant to a MinElute silica spin column (QIAGEN, Germany) after fully mixed with the 13 ml custom binding buffer [5 M guanidine hydrochloride (MW 95.53), 40% Isopropanol, 90 mM Sodium Acetate (3 M), and 0.05% Tween-20] followed by two washes with PE buffer (80% ethanol).Then we eluted the DNA with 100 l TET buffer (QIAGEN, Germany).", + "Genomic DNA extractionLeukocytes were isolated from 5-ml peripheral blood samples.DNA was prepared by phenol extraction and chloroform extraction followed by isopropanol precipitation, washed with ethanol, and air-dried.Tris-EDTA buffer pH 8.0 was used to dissolve the final genomic DNA product.", + "The pulled down DNA fragments were extracted and purified using phenolchloroform extraction/ethanol precipitation.The samples were stored at -20 C until use.", + "DNA and RNA extractionFor nucleic acid extraction, pellets containing 2,000 to 5,000 nematodes were ground into fine powder with a liquid nitrogen-cooled mortar and pestle [88] and then extracted using either an RNeasy kit (Qiagen, Valencia, CA, USA) or a Genomic Tips kit (Qiagen; following the protocol for extraction of genomic DNA from cells in culture).Alternatives to the liquid nitrogen grinding procedure were attempted for DNA extraction (including homogenization, bead beating, three rounds of freeze-thaw, and simple incubation with the Genomic Tips digestion buffer from Qiagen, proteinase K and RNase A), but all resulted in the extraction of degraded genomic DNA.The integrity of genomic DNA after different extraction methods was evaluated by examination of highmolecular-weight genomic DNA using agarose gel electrophoresis and comparison of amplification of long PCR products from equal amounts of template (QPCR; described below).RNA was quantified with a NanoDrop Fluorospectrometer (NanoDrop Technologies, Wilmington, DE, USA) and analyzed for integrity with a BioAnalyzer (Agilent Technologies, Santa Clara, CA, USA).DNA quantity was measured before QPCR using PicoGreen dye (Invitrogen Corporation, Carlsbad, CA, USA), as described previously [21].", + "Bacterial DNA extractionDNA was extracted from the freeze-dried luminal content of the 4 sections of the intestine using the method described by Salonen et al. [28].In short, approximately 0.1 g was used for mechanical and chemical lysis using 0.5 ml buffer (500 mM NaCl, 50 mM Tris-HCl (pH 8), 50 mM EDTA, 4% SDS) and 0.25 g of 0.1 mm zirconia beads and 3 mm glass beads.Nucleic acids were precipitated by addition of 130 l, 10 M ammonium acetate, using one volume of isopropanol.Subsequently, DNA pellets were washed with 70% ethanol.Further purification of DNA was performed using the QiaAmp DNA Mini Stool Kit (Qiagen, Hilden, Germany).Finally, DNA was dissolved in 200 l Tris/EDTA buffer and its purity and quantity were checked spectrophotometrically (ND-1000, nanoDrop technologies, Wilmington, USA).DNA isolation from scrapings of the small intestine and the colon Genomic DNA was isolated from the crushed scraping by using DNeasy W Blood and Tissue Kit (Qiagen, Venlo, the Netherlands) according to the manufacturer's instructions.The DNA was treated with RNase and eluted in Tris/EDTA buffer (pH 9.0).DNA purity and quantity were checked spectrophotometricaly (ND-1000, nanoDrop technologies, Wilmington, USA).", + "DNA extractionIn a strictly controlled, separate and sterile workplace, approximately 0.2 mL saliva and 50 mL PBS containing the plaque sample were mixed with Qiagen's AL buffer by pulse vortexing for 30 s (Qiagen, Valencia, CA).Total DNA was extracted from the suspension of each sample using a QIAamp DNA Mini Kit (Qiagen, Valencia, CA).Isolated DNA was eluted in 50 mL distilled water.", + "Most typically, DNA is extracted from blood samples, dried blood spots, buccal swabs, saliva, tissue and even urine and stool samples.In forensic science, other sources have been validated e.g.bone, tooth pulp, dandruff and others.", + "Blood samples were collected by jugular venipuncture from each animal into 6-ml EDTA vacutainer tubes (Greiner Bio-One, GmbH).The collected blood samples were kept in iceboxes until refrigerated at 4 C.Genomic DNA was extracted using the DNeasy Blood and Tissue Kit (Qiagen), as per the manufacturer's instructions with a slight modification of increased lysis time to 90 min.DNA quality and quantity were determined using 1% agarose gel electrophoresis (Merck) and Qubit 3.0 Fluorometer (Life Technologies) respectively.", + "Genomic DNA extractionDNA from MEF cultures or mouse liver was isolated by phenol/chloroform extraction, as described [11].", + "DNA isusually recovered from cells by methods that include cell rupture but thatprevent the DNA from fragmenting by mechanical shearing. This is generally undertaken in the presence of EDTA, which chelates the magnesium ionsneeded as cofactors for enzymes that degrade DNA, termed DNase. Ideally,cell walls, if present, should be digested enzymatically (e.g. , lysozyme in thebacteria or bacterial cell). In addition the cell membrane should be solubilizedusing detergent.In specific cases, such as insects,contamination can be reduced by hypochlorite treatment before extractionto avoid contact with foreign DNA (15). DNA preparation includes thedigestion of samples using different lysis buffers, which contain proteinaseK at several concentrations. DNA purification has been performed bythe classical phenol-chloroform extraction and ethanol precipitation (16). Further treatment with RNAse and a further round of extraction and precipitation has been recommended (5,17). Negative controls using distilled waterinstead of a DNA sample can detect possible environmental or reagentcontaminants.DNA solutions can be stored frozen,although repeated freezing and thawing tends to damage long DNA moleculesby shearing. A flow diagram summarizing the extraction of DNA is given inFig. 1.2. The above-described procedure is suitable for total cellular DNA. If the DNA from a specific organelle or viral particle is needed, it is best toisolate the organelle or virus before extracting its DNA, because the recoveryof a particular type of DNA from a mixture is usually rather difficult.", + "Isolation of Total DNA from Tissues.Total DNA was isolated as described (19) with slight modifications.Briefly, 0.1-g samples of tissue were frozen in liquid nitrogen, and DNA was extracted from the frozen tissues by the proteinase KSDSphenol method.", + "Genomic DNA extractionGenomic DNA was extracted by the mixed alkyl trimethyl ammonium bromide (MATAB) procedure.Briefly, 250 mg of plant material was ground in liquid nitrogen and immediately incubated in 2 ml of pre-warmed extraction buffer (100 mM Tris-HCl, pH 8, containing 20 mM EDTA, 1.4 M NaCl, 2% (w/v) MATAB, 1% (w/v) PEG6000 (polyethylene glycol), 0.5% (w/v) sodium sulfite, 20% (w/v) Igepal CA630, 20% (w/v) lithium dodecyl sulfate, and 20% (w/v) sodium deoxycholate) at 74 C for 20 min.After purification with 2 ml of chloroform:isoamylalcohol (24:1, v/v), DNA extracts were precipitated with 1.6 ml of isopropanol then resuspended in 1 ml of buffer (50 mM Tris-HCl, pH 8, containing 10 mM EDTA and 0.7 M NaCl).The extracts were purified on anion exchange columns (QIAGEN-tip 20) following the manufacturer's instructions (QIAGEN, Valencia, CA).", + "After three washes withice-cold phosphate buffer saline (PBS), DNA was extracted from 100-150mg of cecal contentsusing the QIAmp DNA stool Mini Kit (Qiagen) following mechanical cell lysis as describedpreviously [10]. The supernatant from the first wash, which was 10 times volume per weight ofcecal contents, was stored at -80C for sIgA measurements. Extracted DNA was initially amplified using universal primers for the V5-V6 region of the 16S rRNA gene and containing barcoded adapters. The forward primer used was 784F (5-RGGATTAGATACCC-3) and thereverse primer was 1064R (5-CGACRRCCATGCANCACCT-3).", + "The conventional DNA extraction procedure involved the homogenization of single D. magna in 400 l of sperm lysis buffer (100 mM Tris-HCl, pH 8; 500 mM NaCl; 10 mM ethylenediaminetetraacetic acid [EDTA], pH 8; 1% SDS; 2% mercaptoethanol) followed by RNase treatment (40 g, 37C for 1.5 h).The DNA was then extracted in phenol (pH 8) and chloroform:isoamyl alcohol (1:1).The DNA was finally precipitated by two volumes of ice-cold ethanol in the presence of 3 M sodium acetate (1/10 of the DNA volume) and was incubated at 80C overnight.Precipitated DNA was harvested by centrifugation, dried in air, and the final pellet dissolved in sterile analytic grade water." + ], + [ + "Recent developments on the genetics of aging can be seen as several streams of effort.In general, humans show a relatively modest (<50%) heritability of life spans (results obtained from twin studies discussed below).The apoE polymorphisms are remarkable for their influence on both cardiovascular disease and Alzheimer disease.In contrast, rare mutant genes with high penetrance cause these same diseases but with early onset and a major shortening of the life span.Shortlived laboratory models (fruit flies, nematodes, mice) are yielding rapid advances, with the discovery of mutants that increase life spans in association with altered metabolism, which leads to questions on the physiological organization of aging processes.Although these early findings do not show that a conserved genetic program actually controls aging processes across animal phylogeny, it is striking how frequently findings of metabolic rate, insulin signaling, and free radicals have emerged from very different approaches to aging in nematodes and mammals, for example.These findings hint that the genetic control of life span was already developed in the common ancestor of modern animals so that subsequent evolution of life spans was mediated by quantitative changes in the control of metabolism through insulin and the production of free radicals.", + "In orderto accomplish this task, we looked for possible novel genetic factors that regulatephysical activity levels. We used behavioral genetics methodology combined with atranslational genetics approach in order to propose genetic candidate regions as wellas candidate genes for this complex phenotype in humans (Chapter 2 and 3) andmice (Chapters 2, 3, and 4).", + "Since that time, observations across species have shown that life span can be extended by genetic factors.One of the first demonstrations of this entailed the study of recombinant inbred populations of the nematode worm Caenorhabditis elegans by Thomas E. Johnson.Then a postdoc in William (Bill) Wood's lab at the University of Colorado Boulder, Tom and Bill demonstrated that crosses of C. elegans strains did not display the heterosis effect that interfered with many other studies, \"As predicted, we found significant genetic effects on life span as well as other life history traits. \"This finding established a method for evaluating genetic factors that influenced life-span variation.In fact, their measurements of life span of the recombinant inbred strains demonstrated the heritability of life span to be 19%-51% (1).Consistent with theories of the 1970s and 1980s, it was concluded that these genetic factors were a collection of small influences across many genes.This finding was one of the first steps in demonstrating that genetic factors influence aging.As genetic analysis was making great progress in understanding other biological processes, such as developmental programming, the realization that aging could be investigated using the same tools was highly significant.GeneticsAging is influenced by genetic factors.It may be surprising to know that as recently as the 1970s and 1980s, the concept of modulating", + "Previous unbiased systemsgenetics approaches relying on the use of mouse genetic reference populations (GRPs) have been successful in identifying theunderlying mechanisms in complex metabolic traits, such asmitochondrial function (Chella Krishnan et al. , 2018; Norheimet al. , 2019; Williams et al. , 2016), lipid metabolism (Jha et al. ,2018a, 2018b; Linke et al. , 2020; Parker et al. , 2019), atherosclerosis (Bennett et al. , 2015; Smallwood et al. , 2014), and liver diseases (Chella Krishnan et al. , 2018; Hui et al. , 2018).", + "This population geneticmechanism also can maintain genetic variability for aging, like antagonistic pleiotropy. LARGE-EFFECT MUTANTS AND THE GENETICS OF AGINGOne approach that has become increasingly common in the characterization of the genetics of aging is to isolate aging mutants, usually from mutagenesis experiments, andthen to determine the mechanistic basis for the unusual life span in the mutants. Thisapproach has led to the discovery of genes that can enhance (e.g. , Maynard Smith 1958;Lin et al. 1988; reviewed in Guarente and Kenyon 2000, Kim 2007) or reduce life span(e.g. , Pearl and Parker 1922).Research with animal modelshas established that genetic factors explain a significant amount of variation in both exercise capacity in an untrained state (Koch and Britton 2001) and in the physiological responses to training regimens (Troxell et al. 2003). Bunger et al. (1994) reported the results of sixty generations of selecting laboratorymice for an index combining high body weight and high stress resistance, where the308L E V E L S O F O B S E R VAT I O Nlatter denoted the distance to exhaustion on a treadmill.", + "The DNA of over 500,000 people was read to reveal the specific 'genetic fingerprints' of each participant.Then, after asking each of the participants how long both of their parents had lived, Timmers et al. pinpointed 12 DNA regions that affect lifespan.Five of these regions were new and had not been linked to lifespan before.Across the twelve as a whole several were known to be involved in Alzheimer's disease, smoking-related cancer or heart disease.Looking at the entire genome, Timmers et al. could then predict a lifespan score for each individual, and when they sorted participants into ten groups based on these scores they found that top group lived five years longer than the bottom, on average.", + "NATurE GENETicSadjustments, using a matched meta-analysis conducted on the same subset of 28 studies:", + "GENETIC ANALYSIS OF LONGEVITY, OF AGING, AND OF AGE-SENSITIVE TRAITS IN MICEBiogerontology has just begun to benefit from the attention and skills of professional geneticists.Geneticists can attack problems of aging from several related but fundamentally distinct directions.Studies of rare mutations at individual loci, such as the Werner's syndrome locus WRN, whose mutant form produces, in middle-aged people, several of the diseases typically not seen until old age, can give attractive points of entry into the pathophysiology of age-related diseases.In mice there are now four reports of mutations-two naturally occurring and two artificially produced-that lead to impressive increases in mean and maximal longevity (Miskin and Masos, 1997;Brown-Borg et al., 1996;Miller, 1999;Migliaccio et al., 1999), and thus provide extremely valuable models for testing mechanistic ideas and the control of aging.Some of these, such as the dw/dw and df/df dwarfing mutations that affect levels of growth hormone and thyroid hormone, provide clues to endocrine-dependent pathways that could regulate age effects in multiple cells and tissues.The recent report (Migliaccio et al., 1999) that mouse life span can be extended by an induced mutation that diminishes cell susceptibility to apoptotic death after injury should stimulate new inquiries into the effects of altered cell turnover on age-dependent changes.Each of these mutations, however, is exceptionally rare in natural populations; despite their effect on longevity, perhaps mediated by a direct effect on aging, each of the mutations is likely to have, overall, a negative effect on reproductive success and thus fail to become fixed in natural mouse populations.", + "Genetics had a strong impact on femoral traits (eg, bone volume fraction [BV/TV] basal Ca, h2 = 0.60) as well as their RCR (eg, BV/TV,h2 = 0.32). Quantitative trait locus (QTL) mapping identied up to six loci affecting each bone trait. A subset of loci was detected inboth diet groups, providing replication of environmentally robust genetic effects. Several loci control multiple bone phenotypes suggesting the existence of genetic pleiotropy. QTL controlling the bone RCR did not overlap with basal diet QTL, demonstrating geneticindependence of those traits.", + "This population geneticmechanism also can maintain genetic variability for aging, like antagonistic pleiotropy. LARGE-EFFECT MUTANTS AND THE GENETICS OF AGINGOne approach that has become increasingly common in the characterization of the genetics of aging is to isolate aging mutants, usually from mutagenesis experiments, andthen to determine the mechanistic basis for the unusual life span in the mutants. Thisapproach has led to the discovery of genes that can enhance (e.g. , Maynard Smith 1958;Lin et al. 1988; reviewed in Guarente and Kenyon 2000, Kim 2007) or reduce life span(e.g. , Pearl and Parker 1922).", + "(17) The role ofgenetics in bone was first suggested by early twin studies(18,19) and family studies. (20-23) Forexample, Krall and Dawson-Hughes(22) measured familial resemblance of bone density of femaleand male members of 40 families. They reported that 46-62% of variance in bone density wasattributable to heredity. However, the fact that genetics does not explain all of the variation in bone18mass suggests that bone mass is also influenced by other environmental factors as well as theinteraction between genetics and extrinsic factors.", + "when examining the role that genetics may play in howchildren form attachments, as other studies have observedthat parenting particularly affected children with variouspolymorphisms of genes that regulate the DA system (i.e. , DAT19- and 10-repeat and Dopamine Receptor D4 7-repeat) andreward sensitivity (Bakermans-Kranenburg et al. , 2008; Bosmanset al. , 2020). Our findings further support the notion thatmultiple genes may make a child more or less susceptibleto their caregiving environment (Belsky and Beaver, 2011;Roisman et al.", + "when examining the role that genetics may play in howchildren form attachments, as other studies have observedthat parenting particularly affected children with variouspolymorphisms of genes that regulate the DA system (i.e. , DAT19- and 10-repeat and Dopamine Receptor D4 7-repeat) andreward sensitivity (Bakermans-Kranenburg et al. , 2008; Bosmanset al. , 2020). Our findings further support the notion thatmultiple genes may make a child more or less susceptibleto their caregiving environment (Belsky and Beaver, 2011;Roisman et al.", + "Previous unbiased systemsgenetics approaches relying on the use of mouse genetic reference populations (GRPs) have been successful in identifying theunderlying mechanisms in complex metabolic traits, such asmitochondrial function (Chella Krishnan et al. , 2018; Norheimet al. , 2019; Williams et al. , 2016), lipid metabolism (Jha et al. ,2018a, 2018b; Linke et al. , 2020; Parker et al. , 2019), atherosclerosis (Bennett et al. , 2015; Smallwood et al. , 2014), and liver diseases (Chella Krishnan et al. , 2018; Hui et al. , 2018).", + "TranslationalA LTHOUGH there is much debate about the processes driving human aging, there is little doubt that genetic influences play a significant role (1).Humans clearly live very much longer than the currently favored laboratory models of aging, and such interspecies differences in reproductively 'fit' life span must have an inherited genetic foundation.Within human populations, environmental and behavioral exposures are important but at least a quarter of life expectancy variation in twin or family studies is attributable to inherited genetic or epigenetic factors (2).Age-related conditions such as type 2 diabetes, myocardial infarction, common cancers, and Alzheimer's disease (AD) typically have onsets after the fourth decade of life; \"successful\" agers delay these onsets until relatively late in life (3).Many aging traits and diseases show moderate heritability, including cardiovascular disease (CVD) (4) and impaired physical functioning (5), independent of known environmental risk factors.", + "Genetics of weight loss.A necessary condition for tailoring weight loss protocols to genetics or genomics is identifying reliable and meaningful genetic or genomic predictors.The heritability, or genetic variance, of weight loss first was documented in a careful laboratory study of identical twins.Bouchard and colleagues (C. Bouchard et al., 1994) induced weight loss in identical twin pairs through supervised exercise designed to produce of daily energy balance deficits of 500 kcals.Strong similarity between co-twins as compared to non-related individuals provided some of the first evidence of genetic involvement in magnitude of weight loss with intervention.", + "lifestyle and changes in diet, a significant proportion of heritable factors also contribute to individual susceptibility (Hu 2011).", + "Genetics had a strong impact on femoral traits (eg, bone volume fraction [BV/TV] basal Ca, h2 = 0.60) as well as their RCR (eg, BV/TV,h2 = 0.32). Quantitative trait locus (QTL) mapping identied up to six loci affecting each bone trait. A subset of loci was detected inboth diet groups, providing replication of environmentally robust genetic effects. Several loci control multiple bone phenotypes suggesting the existence of genetic pleiotropy. QTL controlling the bone RCR did not overlap with basal diet QTL, demonstrating geneticindependence of those traits." + ], + [ + "At a high level, the Research and Development Space of Bioinformatics canbe viewed as a set of non-orthogonal vectors (Figure 1) that describeBioinformatic ActivitiesBiological Data TypesBiological SpeciesComputing InfrastructureDevelopment EffortBioinformatic activities (acquisition, storage, retrieval, integration, analysis,visualization, modeling) need to be developed for multiple biological data typesArchitectures for Integration of Data and Applications33(nucleic and amino acid sequences, physical and linkage maps, RNA, protein andmetabolite expression arrays and clinical and eld assays) derived from multiple biological species using multiple biotechnology platforms.As Bioinformaticsemerges as a discipline, however, it is likely that both research and developmentcan and will be accommodated in large programmatic grants. 7. REFERENCESBenton, D., 2000, Standards to Enable Bioinformatics Data and Information Integration, In BarnettInternationals 2nd Annual Bioinformatics and Data Integration Conference, Philadelphia, PA.Boyle, J., 1998, Building Component Software for the Biological Sciences, CCP11 Newsletter, 4:2214. Dowell, R., Jokerst, A., Day, S., Eddy, L., and Stein, L., 2001, The distributed annotation system, BMCBioinformatics 2(7). This article is available at http://www.biomedcentral.com/1471-2105/2/7.3132William D. BeavisClinical AssaysBiologicalData TypesCellular NetworksMolecular NetworksProtein ExpressionInfrastructureRNA ExpressionMapsDNA SequenceBioinformaticActivitiesFlyAcquire DataStoreIntegrateQueryAnalyzeVisualizeModelYeast A.thalianaH.sapiensCow Pig corn soyBiologicalSpeciesCDevelopmentEffortFIGURE 1. Representation of the research and development space spanned by various aspects ofBioinformatics. to understanding the structure and evolution of whole genomes. Even the morefocused and applied bioinformatics goals, e.g.", + "The Bioinformatics (Modeling core) analyzed biological data (responseto infection by a pathogen) from projects using Bayesian network analysis and created aBayesian Network Webserver (BNW - http://compbio.uthsc.edu/BNW). We have obtained significant results for all projects supported by this grant funding. We aretherefore very enthusiastic to follow up on the data we have obtained. We are applying forfunding from different sources to continue these studies either as separate projects for thedifferent DoD priority pathogens, or as a big program project that will involve pathogens andsupporting cores to do omics studies.", + "Ball Department ofBiochemistry, Stanford University MedicalSchool, Stanford, CA, USAJames R. Brown Bioinformatics,GlaxoSmithKline Pharmaceuticals, UpperProvidence, PA, USAAruna Bansal Discovery and PipelineGenetics, GlaxoSmithKlinePharmaceuticals, Third Avenue, Harlow,Essex, UKElissa J. Chesler Oak Ridge NationalLaboratory, Biosciences Division, OakRidge, TN, USAMichael R. Barnes Bioinformatics,GlaxoSmithKline Pharmaceuticals, ThirdAvenue, Harlow, Essex, UKBryan J. Barratt Research andDevelopment Genetics, AstraZeneca,Alderley Park, Macclesfield, Cheshire, UKMatthew J. Betts Structural andComputational Biology Programme, EMBL,Meyerhofstrasse 1, 69117 Heidelberg,GermanyDiana Blaydon Centre for CutaneousResearch, Institute of Cell and MolecularScience, Queen Marys School of Medicineand Dentistry, Whitechapel, London, UKKarl W. Broman Department ofBiostatistics, Johns Hopkins University,Baltimore, MD, USAEllen M. Brown Discovery Informatics,AstraZeneca, Alderley Park, Macclesfield,Cheshire, UKRichard R. Copley Wellcome Trust Centrefor Human Genetics, University of Oxford,Oxford, UKBarry Dancis Bioinformatics,GlaxoSmithKline Pharmaceuticals UpperProvidence, PA, USASteve Deharo Bioinformatics,GlaxoSmithKline Pharmaceuticals, ThirdAvenue, Harlow, Essex, UKPaul S. Derwent Bioinformatics,GlaxoSmithKline Pharmaceuticals, ThirdAvenue, Harlow, Essex, UKIan C. Gray Paradigm Therapeutics (S) PteLtd, 10 Biopolis Way, Singapore 138670Joel Greshock Translational Medicine,Clinical Pharmacology Division,GlaxoSmithKline Pharmaceuticals, UpperMerion, PA, USASimon C. Heath Centre National deGenotypage, Evry Cedex, FrancexviiiCONTRIBUTORSDavid P. Kelsell Centre for CutaneousResearch, Institute of Cell and MolecularScience, Queen Marys School of Medicineand Dentistry, Whitechapel, London,UKRalph McGinnis Wellcome Trust SangerInstitute, Hinxton, Cambridge, UKCharles A. Mein Genome Centre, QueenMarys School of Medicine and Dentistry,Charterhouse Square, London, UKMary Plumpton Bioinformatics,GlaxoSmithKline Pharmaceuticals,Stevenage, Hertfordshire, UKRobert B. Russell Structural andComputational Biology Programme, EMBL,Meyerhofstrasse 1, 69117 Heidelberg,GermanyPhilippe Sanseau Bioinformatics,GlaxoSmithKline Pharmaceuticals,Stevenage, Hertfordshire, UKColin A. M. Semple Bioinformatics, MRCHuman Genetics Unit, Edinburgh EH4 2XU,UKGavin Sherlock Department of Genetics,Stanford University Medical School,Stanford, CA, USAChristopher Southan Global CompoundSciences, AstraZeneca R&D, Molndal,SwedenMartin S. Taylor Wellcome Trust Centrefor Human Genetics, University of Oxford,Oxford, UKMagnus Ulvsback MolecularPharmacology, AstraZeneca R&D, Molndal,SwedenCharlotte Vignal Discovery and PipelineGenetics, GlaxoSmithKlinePharmaceuticals, Third Avenue, Harlow,Essex, UKChaolin Zhang Department of BiomedicalEngineering, State University of New Yorkat Stony Brook, NY, USAMichael Q. Zhang Cold Spring HarborLaboratory, Cold Spring Harbor, NY, USAXiaoyue Zhao Cold Spring HarborLaboratory, Cold Spring Harbor, NY, USAGlossary of BioinformaticsBLAST (Basic Local Alignment Search Tool) A tool for identifying sequences in adatabase that match a given query sequence.", + "TheNCBI creates automated systems for storing and analyzing knowledge about molecular biology, biochemistry, andgenetics; facilitating the use of such databases and software by the research and medical community; coordinatingefforts to gather biotechnology information both nationallyand internationally; and performing research into advancedmethods of computer-based information processing for analyzing the structure and function of biologically importantmolecules. NCBI bioinformatics-related resources may beaccessed through its home page at: www.ncbi.nlm.nih.gov. The NCBI has three principal branches:1. Computational Biology Branch (http://www.ncbi.nlm. nih.gov/CBBresearch/)2. Information Engineering Branch (http://www.ncbi.nlm. nih.gov/IEB/)3.", + "Bioinformatics 18(Suppl 1):S136S144. doi: 10.1093/bioinformatics/18.suppl_1.S136.", + "CBELife Sciences EducationVol. 9, 98 107, Summer 2010ArticleTeaching Bioinformatics and Neuroinformatics by UsingFree Web-based ToolsWilliam Grisham,* Natalie A. Schottler,* Joanne Valli-Marill, Lisa Beck,and Jackson Beatty**Department of Psychology and Office of Instructional Development, University of California, Los Angeles,Los Angeles, CA 90095; and Department of Psychology, Bryn Mawr College, Bryn Mawr, PA 19010Submitted November 9, 2009; Revised February 25, 2010; Accepted March 2, 2010Monitoring Editor: Mary Lee LedbetterThis completely computer-based modules purpose is to introduce students to bioinformaticsresources.We present an easy-to-adopt module that weaves together several important bioinformatic tools so students can grasp how these tools are used in answering research questions. Students integrate information gathered from websites dealing with anatomy (Mouse BrainLibrary), quantitative trait locus analysis (WebQTL from GeneNetwork), bioinformatics and geneexpression analyses (University of California, Santa Cruz Genome Browser, National Center forBiotechnology Informations Entrez Gene, and the Allen Brain Atlas), and information resources(PubMed).", + "TheNCBI creates automated systems for storing and analyzing knowledge about molecular biology, biochemistry, andgenetics; facilitating the use of such databases and software by the research and medical community; coordinatingefforts to gather biotechnology information both nationallyand internationally; and performing research into advancedmethods of computer-based information processing for analyzing the structure and function of biologically importantmolecules. NCBI bioinformatics-related resources may beaccessed through its home page at: www.ncbi.nlm.nih.gov. The NCBI has three principal branches:1. Computational Biology Branch (http://www.ncbi.nlm. nih.gov/CBBresearch/)2. Information Engineering Branch (http://www.ncbi.nlm. nih.gov/IEB/)3.", + "CONCLUSIONNIH-PA Author ManuscriptBioinformatics is fundamentally about the information of biology. Information, in turn, isburied within a cacophony of data produced by a wide swath of molecular techniques. Inneuroscience, the breadth of data is exceptionally large as it spans genomics, proteomics,metabolomics, image analysis, and behavioral science, among other protocols, and requiresresearchers to store data with due diligence based on the data types, data scope and depth,and underlying querying requirements.", + "As David Searls, director of bioinformatics at SmithKline Beecham (King of Prussia, Pennsylvania), points out, bioinformatics is supported by theory; an increasing number of journals and scientific meetings are devoted to it; and it now has its own society, the International Society for Computational Biology (associated with the conference series Intelligent Systems for Molecular Biology), whose president is Larry Hunter of the National Library of Medicine.A case in point is Structural Bioinformatics (San Diego, California), a start-up company that, as its name suggests, is particularly interested in structural information about gene products.The company has been look-ing for a vice-president of bioinformatics since December -someone who takes a systems approach to structure-function issues, has a strong grounding in biology, cell biology and biochemistry and who knows how to use computational systems to solve these problems, but who is not necessarily a computational scientist.", + "Ball Department ofBiochemistry, Stanford University MedicalSchool, Stanford, CA, USAJames R. Brown Bioinformatics,GlaxoSmithKline Pharmaceuticals, UpperProvidence, PA, USAAruna Bansal Discovery and PipelineGenetics, GlaxoSmithKlinePharmaceuticals, Third Avenue, Harlow,Essex, UKElissa J. Chesler Oak Ridge NationalLaboratory, Biosciences Division, OakRidge, TN, USAMichael R. Barnes Bioinformatics,GlaxoSmithKline Pharmaceuticals, ThirdAvenue, Harlow, Essex, UKBryan J. Barratt Research andDevelopment Genetics, AstraZeneca,Alderley Park, Macclesfield, Cheshire, UKMatthew J. Betts Structural andComputational Biology Programme, EMBL,Meyerhofstrasse 1, 69117 Heidelberg,GermanyDiana Blaydon Centre for CutaneousResearch, Institute of Cell and MolecularScience, Queen Marys School of Medicineand Dentistry, Whitechapel, London, UKKarl W. Broman Department ofBiostatistics, Johns Hopkins University,Baltimore, MD, USAEllen M. Brown Discovery Informatics,AstraZeneca, Alderley Park, Macclesfield,Cheshire, UKRichard R. Copley Wellcome Trust Centrefor Human Genetics, University of Oxford,Oxford, UKBarry Dancis Bioinformatics,GlaxoSmithKline Pharmaceuticals UpperProvidence, PA, USASteve Deharo Bioinformatics,GlaxoSmithKline Pharmaceuticals, ThirdAvenue, Harlow, Essex, UKPaul S. Derwent Bioinformatics,GlaxoSmithKline Pharmaceuticals, ThirdAvenue, Harlow, Essex, UKIan C. Gray Paradigm Therapeutics (S) PteLtd, 10 Biopolis Way, Singapore 138670Joel Greshock Translational Medicine,Clinical Pharmacology Division,GlaxoSmithKline Pharmaceuticals, UpperMerion, PA, USASimon C. Heath Centre National deGenotypage, Evry Cedex, FrancexviiiCONTRIBUTORSDavid P. Kelsell Centre for CutaneousResearch, Institute of Cell and MolecularScience, Queen Marys School of Medicineand Dentistry, Whitechapel, London,UKRalph McGinnis Wellcome Trust SangerInstitute, Hinxton, Cambridge, UKCharles A. Mein Genome Centre, QueenMarys School of Medicine and Dentistry,Charterhouse Square, London, UKMary Plumpton Bioinformatics,GlaxoSmithKline Pharmaceuticals,Stevenage, Hertfordshire, UKRobert B. Russell Structural andComputational Biology Programme, EMBL,Meyerhofstrasse 1, 69117 Heidelberg,GermanyPhilippe Sanseau Bioinformatics,GlaxoSmithKline Pharmaceuticals,Stevenage, Hertfordshire, UKColin A. M. Semple Bioinformatics, MRCHuman Genetics Unit, Edinburgh EH4 2XU,UKGavin Sherlock Department of Genetics,Stanford University Medical School,Stanford, CA, USAChristopher Southan Global CompoundSciences, AstraZeneca R&D, Molndal,SwedenMartin S. Taylor Wellcome Trust Centrefor Human Genetics, University of Oxford,Oxford, UKMagnus Ulvsback MolecularPharmacology, AstraZeneca R&D, Molndal,SwedenCharlotte Vignal Discovery and PipelineGenetics, GlaxoSmithKlinePharmaceuticals, Third Avenue, Harlow,Essex, UKChaolin Zhang Department of BiomedicalEngineering, State University of New Yorkat Stony Brook, NY, USAMichael Q. Zhang Cold Spring HarborLaboratory, Cold Spring Harbor, NY, USAXiaoyue Zhao Cold Spring HarborLaboratory, Cold Spring Harbor, NY, USAGlossary of BioinformaticsBLAST (Basic Local Alignment Search Tool) A tool for identifying sequences in adatabase that match a given query sequence.", + "The large number of bioinformatic tools that have beenmade available to scientists during the last few years has presented theproblem of which to use and how best to obtain scientifically valid answers(3). In this chapter, we will provide a guide for the most efficient way toanalyze a given sequence or to collect information regarding a gene, protein,structure, or interaction of interest by applying current publicly available software and databases that mainly use the World Wide Web.", + "At a high level, the Research and Development Space of Bioinformatics canbe viewed as a set of non-orthogonal vectors (Figure 1) that describeBioinformatic ActivitiesBiological Data TypesBiological SpeciesComputing InfrastructureDevelopment EffortBioinformatic activities (acquisition, storage, retrieval, integration, analysis,visualization, modeling) need to be developed for multiple biological data typesArchitectures for Integration of Data and Applications33(nucleic and amino acid sequences, physical and linkage maps, RNA, protein andmetabolite expression arrays and clinical and eld assays) derived from multiple biological species using multiple biotechnology platforms.As Bioinformaticsemerges as a discipline, however, it is likely that both research and developmentcan and will be accommodated in large programmatic grants. 7. REFERENCESBenton, D., 2000, Standards to Enable Bioinformatics Data and Information Integration, In BarnettInternationals 2nd Annual Bioinformatics and Data Integration Conference, Philadelphia, PA.Boyle, J., 1998, Building Component Software for the Biological Sciences, CCP11 Newsletter, 4:2214. Dowell, R., Jokerst, A., Day, S., Eddy, L., and Stein, L., 2001, The distributed annotation system, BMCBioinformatics 2(7). This article is available at http://www.biomedcentral.com/1471-2105/2/7.", + "Ball Department ofBiochemistry, Stanford University MedicalSchool, Stanford, CA, USAJames R. Brown Bioinformatics,GlaxoSmithKline Pharmaceuticals, UpperProvidence, PA, USAAruna Bansal Discovery and PipelineGenetics, GlaxoSmithKlinePharmaceuticals, Third Avenue, Harlow,Essex, UKElissa J. Chesler Oak Ridge NationalLaboratory, Biosciences Division, OakRidge, TN, USAMichael R. Barnes Bioinformatics,GlaxoSmithKline Pharmaceuticals, ThirdAvenue, Harlow, Essex, UKBryan J. Barratt Research andDevelopment Genetics, AstraZeneca,Alderley Park, Macclesfield, Cheshire, UKMatthew J. Betts Structural andComputational Biology Programme, EMBL,Meyerhofstrasse 1, 69117 Heidelberg,GermanyDiana Blaydon Centre for CutaneousResearch, Institute of Cell and MolecularScience, Queen Marys School of Medicineand Dentistry, Whitechapel, London, UKKarl W. Broman Department ofBiostatistics, Johns Hopkins University,Baltimore, MD, USAEllen M. Brown Discovery Informatics,AstraZeneca, Alderley Park, Macclesfield,Cheshire, UKRichard R. Copley Wellcome Trust Centrefor Human Genetics, University of Oxford,Oxford, UKBarry Dancis Bioinformatics,GlaxoSmithKline Pharmaceuticals UpperProvidence, PA, USASteve Deharo Bioinformatics,GlaxoSmithKline Pharmaceuticals, ThirdAvenue, Harlow, Essex, UKPaul S. Derwent Bioinformatics,GlaxoSmithKline Pharmaceuticals, ThirdAvenue, Harlow, Essex, UKIan C. Gray Paradigm Therapeutics (S) PteLtd, 10 Biopolis Way, Singapore 138670Joel Greshock Translational Medicine,Clinical Pharmacology Division,GlaxoSmithKline Pharmaceuticals, UpperMerion, PA, USASimon C. Heath Centre National deGenotypage, Evry Cedex, FrancexviiiCONTRIBUTORSDavid P. Kelsell Centre for CutaneousResearch, Institute of Cell and MolecularScience, Queen Marys School of Medicineand Dentistry, Whitechapel, London,UKRalph McGinnis Wellcome Trust SangerInstitute, Hinxton, Cambridge, UKCharles A. Mein Genome Centre, QueenMarys School of Medicine and Dentistry,Charterhouse Square, London, UKMary Plumpton Bioinformatics,GlaxoSmithKline Pharmaceuticals,Stevenage, Hertfordshire, UKRobert B. Russell Structural andComputational Biology Programme, EMBL,Meyerhofstrasse 1, 69117 Heidelberg,GermanyPhilippe Sanseau Bioinformatics,GlaxoSmithKline Pharmaceuticals,Stevenage, Hertfordshire, UKColin A. M. Semple Bioinformatics, MRCHuman Genetics Unit, Edinburgh EH4 2XU,UKGavin Sherlock Department of Genetics,Stanford University Medical School,Stanford, CA, USAChristopher Southan Global CompoundSciences, AstraZeneca R&D, Molndal,SwedenMartin S. Taylor Wellcome Trust Centrefor Human Genetics, University of Oxford,Oxford, UKMagnus Ulvsback MolecularPharmacology, AstraZeneca R&D, Molndal,SwedenCharlotte Vignal Discovery and PipelineGenetics, GlaxoSmithKlinePharmaceuticals, Third Avenue, Harlow,Essex, UKChaolin Zhang Department of BiomedicalEngineering, State University of New Yorkat Stony Brook, NY, USAMichael Q. Zhang Cold Spring HarborLaboratory, Cold Spring Harbor, NY, USAXiaoyue Zhao Cold Spring HarborLaboratory, Cold Spring Harbor, NY, USAGlossary of BioinformaticsBLAST (Basic Local Alignment Search Tool) A tool for identifying sequences in adatabase that match a given query sequence.", + "There are online bioinformatics resources from which this type of information may be sourced.", + "Ball Department ofBiochemistry, Stanford University MedicalSchool, Stanford, CA, USAJames R. Brown Bioinformatics,GlaxoSmithKline Pharmaceuticals, UpperProvidence, PA, USAAruna Bansal Discovery and PipelineGenetics, GlaxoSmithKlinePharmaceuticals, Third Avenue, Harlow,Essex, UKElissa J. Chesler Oak Ridge NationalLaboratory, Biosciences Division, OakRidge, TN, USAMichael R. Barnes Bioinformatics,GlaxoSmithKline Pharmaceuticals, ThirdAvenue, Harlow, Essex, UKBryan J. Barratt Research andDevelopment Genetics, AstraZeneca,Alderley Park, Macclesfield, Cheshire, UKMatthew J. Betts Structural andComputational Biology Programme, EMBL,Meyerhofstrasse 1, 69117 Heidelberg,GermanyDiana Blaydon Centre for CutaneousResearch, Institute of Cell and MolecularScience, Queen Marys School of Medicineand Dentistry, Whitechapel, London, UKKarl W. Broman Department ofBiostatistics, Johns Hopkins University,Baltimore, MD, USAEllen M. Brown Discovery Informatics,AstraZeneca, Alderley Park, Macclesfield,Cheshire, UKRichard R. Copley Wellcome Trust Centrefor Human Genetics, University of Oxford,Oxford, UKBarry Dancis Bioinformatics,GlaxoSmithKline Pharmaceuticals UpperProvidence, PA, USASteve Deharo Bioinformatics,GlaxoSmithKline Pharmaceuticals, ThirdAvenue, Harlow, Essex, UKPaul S. Derwent Bioinformatics,GlaxoSmithKline Pharmaceuticals, ThirdAvenue, Harlow, Essex, UKIan C. Gray Paradigm Therapeutics (S) PteLtd, 10 Biopolis Way, Singapore 138670Joel Greshock Translational Medicine,Clinical Pharmacology Division,GlaxoSmithKline Pharmaceuticals, UpperMerion, PA, USASimon C. Heath Centre National deGenotypage, Evry Cedex, FrancexviiiCONTRIBUTORSDavid P. Kelsell Centre for CutaneousResearch, Institute of Cell and MolecularScience, Queen Marys School of Medicineand Dentistry, Whitechapel, London,UKRalph McGinnis Wellcome Trust SangerInstitute, Hinxton, Cambridge, UKCharles A. Mein Genome Centre, QueenMarys School of Medicine and Dentistry,Charterhouse Square, London, UKMary Plumpton Bioinformatics,GlaxoSmithKline Pharmaceuticals,Stevenage, Hertfordshire, UKRobert B. Russell Structural andComputational Biology Programme, EMBL,Meyerhofstrasse 1, 69117 Heidelberg,GermanyPhilippe Sanseau Bioinformatics,GlaxoSmithKline Pharmaceuticals,Stevenage, Hertfordshire, UKColin A. M. Semple Bioinformatics, MRCHuman Genetics Unit, Edinburgh EH4 2XU,UKGavin Sherlock Department of Genetics,Stanford University Medical School,Stanford, CA, USAChristopher Southan Global CompoundSciences, AstraZeneca R&D, Molndal,SwedenMartin S. Taylor Wellcome Trust Centrefor Human Genetics, University of Oxford,Oxford, UKMagnus Ulvsback MolecularPharmacology, AstraZeneca R&D, Molndal,SwedenCharlotte Vignal Discovery and PipelineGenetics, GlaxoSmithKlinePharmaceuticals, Third Avenue, Harlow,Essex, UKChaolin Zhang Department of BiomedicalEngineering, State University of New Yorkat Stony Brook, NY, USAMichael Q. Zhang Cold Spring HarborLaboratory, Cold Spring Harbor, NY, USAXiaoyue Zhao Cold Spring HarborLaboratory, Cold Spring Harbor, NY, USAGlossary of BioinformaticsBLAST (Basic Local Alignment Search Tool) A tool for identifying sequences in adatabase that match a given query sequence." + ], + [ + "This is an open access article distributed under the Creative Commons Attribution License,which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. 1. IntroductionThe association between a complex phenotypic trait andgenetic markers on the chromosomes can be detectedthrough statistical analysis, leading to the identification ofquantitative trait loci (QTL)regions of the chromosomesthat appear to be associated with the phenotype. Quantitativetrait loci (QTL) are expected to be associated with the genescontrolling some aspects of the phenotype.", + "Nowadays manydifferent cost-efficient genotyping solutions (including sequencing and SingleNucleotide Polymorphisms arrays) have opened the way to systematic genome-widefine mapping of quantitative traits (Quantitative Trait Locus or QTL mapping). The process of QTL mapping (Figure 1) consists in searching for genome regions that influence the value of a given trait. For example, identifying a QTL forplant height means finding a DNA region at which the plants that carry a certainallele tend to be significantly higher or lower than those carrying another allele.", + "QTLs are regions within thegenome whose genetic variation modulates quantitatively a phenotype characteristic ofthe particular trait under study (Lynch and Walsh, 1998). Determining the associationbetween variations in specific disease phenotypes or a trait, with variations in genotypesof a reference population can be used to locate a QTL. One of the methods used formapping QTLs associated with complex traits is genetic markers-trait association. Genetic markers associated with certain loci can be inherited in linkage disequilibrium. Generating populations with linked loci in disequilibrium is achieved though eithercrosses between inbred lines, or use of the out-bred populations.", + "Often, the first step in analysis of new traitdata is single-marker regression across all chromosomes. A hypothetical QTL is evaluated atthe location of each marker locus, and the significance of that QTL is estimated from a likelihood ratio statistic (LRS) (Haley and Knott,1992). For this analysis, WebQTL automatically does a permutation test to establish genomewide significance criteria for the trait (Churchilland Doerge, 1994).", + "One possible approach to facilitate this endeavor is to identify quantitative trait loci(QTL) that contribute to the phenotype and consequently unravel the candidategenes within these loci. Each proposed candidate locus contains multiple genes and,therefore, further analysis is required to choose plausible candidate genes. One ofsuch methods is to use comparative genomics in order to narrow down the QTL to aregion containing only a few genes. We illustrate this strategy by applying it togenetic findings regarding physical activity (PA) in mice and human.", + "Elucidation of the molecular basis of these traits has provendifficult as they are under the control of multiple genes andgenetic loci. The standard approach to gene identificationinvolves mapping by linkage analysis in experimental crosses,and this has led to the localization in the rat genome ofhundreds of quantitative trait loci (QTLs) underlying traitvariation (68). We refer to these loci as physiological quantitative trait loci (pQTLs).", + "Often, the first step in analysis of new trait data is single-marker regression across all chromosomes.A hypothetical QTL is evaluated at the location of each marker locus, and the significance of that QTL is estimated from a likelihood ratio statistic (LRS) (Haley and Knott, 1992).For this analysis, WebQTL automatically does a permutation test to establish genomewide significance criteria for the trait (Churchill and Doerge, 1994).By default, it returns a list of marker loci that show greater than sugges-tive association with the trait according to standard criteria (Lander and Kruglyak, 1995), but it will also accept user-defined criteria.Local maxima in the LRS in this list identify loci that are most likely to be near QTLs.WebQTL provides this list within a few seconds.", + "QTLs can be identified through their geneticlinkage to visible marker loci with genotypes that can be readily classified [94, 97]. Assuch, markers that are genetically linked quantitative trait will segregate more often withtrait values, whereas unlinked markers will lack an association with the phenotype [94,98]. The principal goal of a QTL analysis is to identify all QTLs linked to a trait anddiscern whether phenotypic differences are mainly due to a few loci with large effects, ormany loci with small effects [98].", + "This is an open access article distributed under the Creative Commons Attribution License,which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. 1. IntroductionThe association between a complex phenotypic trait andgenetic markers on the chromosomes can be detectedthrough statistical analysis, leading to the identification ofquantitative trait loci (QTL)regions of the chromosomesthat appear to be associated with the phenotype. Quantitativetrait loci (QTL) are expected to be associated with the genescontrolling some aspects of the phenotype.", + "The basic principle of classic QTL is trait segregation along with themarkers and necessitated the availability of two or more genetically differentlines corresponding with the phenotypic trait. Markers like single nucleotidepolymorphisms (SNPs) and microsatellites are used for genotypic distinctions(Vignal et al. , 2002). QTL mapping is achieved in four basic steps; the first one is the measurementof variation for a trait in the individuals. It is a prerequisite to have the traitsthat show phenotypic variability among the individuals (inbred strains).", + "Often, the first step in analysis of new trait data is single-marker regression across all chromosomes.A hypothetical QTL is evaluated at the location of each marker locus, and the significance of that QTL is estimated from a likelihood ratio statistic (LRS) (Haley and Knott, 1992).For this analysis, WebQTL automatically does a permutation test to establish genomewide significance criteria for the trait (Churchill and Doerge, 1994).By default, it returns a list of marker loci that show greater than sugges-tive association with the trait according to standard criteria (Lander and Kruglyak, 1995), but it will also accept user-defined criteria.Local maxima in the LRS in this list identify loci that are most likely to be near QTLs.WebQTL provides this list within a few seconds.", + "Often, the first step in analysis of new traitdata is single-marker regression across all chromosomes. A hypothetical QTL is evaluated atthe location of each marker locus, and the significance of that QTL is estimated from a likelihood ratio statistic (LRS) (Haley and Knott,1992). For this analysis, WebQTL automatically does a permutation test to establish genomewide significance criteria for the trait (Churchilland Doerge, 1994).", + "Quantitative Trait Locus (QTL) mappingTo map QTL, we used 934 AXB/BXA genetic informative markers obtained from http://www. genenetwork.org. For all the in vitro measurements and gene expression linkage analysis, agenome-wide scan was performed using R/qtl [57]. Significance of QTL logarithm-of-odds(LOD) scores was assessed using 1000 permutations of the phenotype data [114] and the corresponding p-values reported. For the cellular phenotypes, QTL significance was reported at agenome-wide threshold corresponding to p < 0.05.", + "Typically one may obtain a location known to derive from only one of the twoparent strains that contains a chromosomal region that correlates with a trait of interest. Since the actual gene and gene product will frequently remain unknown, the region isreferred to as quantitative trait locus (QTL), and is simply named for the trait itself(Alberts & Schughart, 2010). Growing sets of strain-dependent marker locations inestablished RI strains are continually updated in online repositories.", + "By definition, aquantitative trait locus is a chromosomal region that contains a gene, or genes, thatregulate a portion of the genetic variation for a particular phenotype (Wehner et al. 2001). The goal of QTL mapping is to identify regions of the genome that harbourgenes relevant to a specified trait. QTL map locations are commonly determined byinitial screening of mice with specific genetic characteristics, such as recombinantinbred strains, the F2 of two inbred strains, or recombinant congenic strains (Flint2003).", + "Often, the first step in analysis of new traitdata is single-marker regression across all chromosomes. A hypothetical QTL is evaluated atthe location of each marker locus, and the significance of that QTL is estimated from a likelihood ratio statistic (LRS) (Haley and Knott,1992). For this analysis, WebQTL automatically does a permutation test to establish genomewide significance criteria for the trait (Churchilland Doerge, 1994).", + "QTL linkage studies are conducted in order to map a region or regions of the genome whichaffect a continuous or quantitative trait. In agriculture, as soon as markers linked to QTL arefound for economically important traits, these markers can be used for selecting individualsin breeding programmes. In human studies, the aim is often to identify markers indicatingdisease susceptibility. Current techniques for measuring markers are usually relatively slowand laborious. Newer DNA technology, such as SNP or single nucleotide polymorphisms(Kwok, 2001b; Patil et al.", + "Genomic regions linked to complex traits can be identified by genetic mappingand quantitative trait locus (QTL) analysis (Shehzad and Okuno 2014). 7QTL mappingQTL mapping with molecular markers is the first strategy in genetic studies. In plantbreeding, QTL mapping is an essential step required for marker-assisted selection(Mohan et al. 1997; Shehzad and Okuno 2014). The fundamental idea underlying QTLanalysis is to associate genotype and phenotype in a population exhibiting a geneticvariation (Broman and Sen 2009).Four steps of QTL mapping are (1) development aWpopulation, (2) genotyping the population using molecular markers, (3) phenotyping thepopulation for an interested trait, and (4) QTL analysis using statistical procedures to findIEmarkers linked to the QTL (Bernardo 2002). PREVPopulations used for genetic mapping can be a segregating population (F2 andbackcross) or a permanent population (double haploids or recombinant inbred lines). Recombinant inbred lines (RILs) are developed by selfing of individual progenies of theF2 plants until homozygosity is achieved (F7-F8).", + "Thistool allows systems genetic analysis of single genes or small sets of genes using a bottom-upapproach. relations define quantitative trait loci (QTLs). Because the marker is not typically theactual site of the polymorphism, interpolative methods have been developed to estimatethe distance of the QTL from the marker and the strength of the association. Usingmultiple-regression and model-fitting methods, the true complexity of the phenotypicvariation can be modeled through the consideration of multiple loci and environmentalfactors as predictors [13]." + ] + ], + "task_id": [ + "33FC2CC0F61BA22E4D095586B95703BD", + "59E2406798D265A3CB466B766683E63C", + "370380F3A38AC4A788463D14E0EC673A", + "1E0DA0931F4E3A8C2893353CCA114B10", + "02A94D1056FDA77BDA9AC6CFDE0D5FC6" + ] +}
\ No newline at end of file diff --git a/gnqa/data/study1/datasets/human/dataset_citizenscientist_general_2.json b/gnqa/data/study1/datasets/human/dataset_citizenscientist_general_2.json new file mode 100644 index 0000000..c8cde59 --- /dev/null +++ b/gnqa/data/study1/datasets/human/dataset_citizenscientist_general_2.json @@ -0,0 +1,92 @@ +{ + "question": [ + "What about recombination in human centromeres?", + "How does recombination work in human centromeres?", + "What about recombination in the human genome?", + "Create a how to guide for genetic sequencing ", + "What is the significance of the length of telomeres? " + ], + "answer": [ + "Human centromeres contain the CENP-B box, a 17-bp motif that is bound by the centromere protein CENP-B. This motif is present on every chromosome except for the Y chromosome. It is necessary for the formation of de novo centromeres on artificial chromosomes, but it is not essential for the formation of neocentromeres. Mice lacking CENP-B are viable and fertile.", + "In human centromeres, recombination involves the CENP-B box, a 17-bp motif that is bound by the centromere protein CENP-B. This motif is present on every chromosome except for the Y chromosome. It is necessary for the formation of de novo centromeres on artificial chromosomes, but it is not essential for the formation of neocentromeres. Despite this, mice lacking CENP-B are viable and fertile. This suggests that recombination in human centromeres involves complex interactions between specific DNA motifs and proteins.", + "The text mentions the application of site-specific recombinase technology, which allows for the deletion, insertion, inversion, or exchange of chromosomal DNA with high fidelity. This suggests that recombination, a process that can cause these types of genetic changes, is a significant aspect of the human genome.", + "1. Start with an initial linkage or association in a genetic study.\n2. Use bioinformatics to analyze the genomic sequence, which is a common thread through every stage.\n3. Establish true orthology between genes in different species, especially if one is identified to play a role in a disease model.\n4. Apply the approaches to sequence data from any collection of organisms, with an emphasis on human genetics.\n5. Understand the genome structure and content to provide context for subsequent discussions.\n6. Utilize genotyping and sequencing technologies to produce, store, and analyze the sequence data.\n7. Use the genome sequence as a framework for integration of genetic and biological data.\n8. Analyze short-read, whole genome, DNA sequences.\n9. Perform comparative analysis of the genome sequences from members of a family to define sequencing errors and genetic heterozygosity.\n10. Track sequence changes/inconsistencies in inheritance from parent to offspring.", + "The length of telomeres is significant as it is associated with aging and disease. Shorter telomeres are considered a sign of advanced age and have been linked to age-related diseases, mortality, and higher risk of heart disease and infection-related death. Longer telomeres, on the other hand, are associated with protection from age-related diseases, better cognitive function, and healthier aging. Telomere length can also be influenced by factors such as stress, with high-stress situations potentially leading to faster telomere shortening and thus, accelerated aging." + ], + "contexts": [ + [ + "Primate and mouse centromeric DNA contain the CENP-B box, a 17-bp motif that is bound by the centromere protein CENP-B.In humans, the CENP-B box is present on every chromosome except for the Y chromosome [20].It is paradoxically necessary for formation of de novo centromeres on artificial chromosomes, but it is not essential for the formation of neocentromeres, and mice lacking CENP-B are viable and fertile [21,22].", + "Box 3 Mechanism of homologous recombination and end joiningThe severe phenotype of the mouse mutants and the highly cancer-prone human syndromes highlight the importance of homologous recombination.Mouse KU mutants display sensitivity to agents that lead to breaks in DNA, and have immunological problems because the KU proteins are involved in V(D)J recombination of antibody gene sequences.In addition, these mutants display poor development, several features of premature ageing and increased apoptosis of postmitotic neurons in the developing brain.Mice with defects in DNA-PK cs (SCID mice) display a similar but generally milder phenotype.In contrast, XRCC4-and ligase IV-knockout mice seem more severe, with late embryonic lethality resulting from massive ATM-and p53-dependent neuronal apoptosis 33,38 .Cells in G1 have only the homologous chromosome for recombination repair.However, this may be difficult to find in the complex genome.Moreover, it is potentially dangerous as a template for repair as it may lead to homozygosity for recessive mutations.As an alternative, the end-joining reaction simply links ends of a DSB together, without any template, using the end-binding KU70/80 complex and DNA-PK cs , followed by ligation by XRCC4-ligase4 (reviewed by 27,33; see the right panel of the figure, stages V-VII).The function of KU70/80 might involve end protection and approximating the ends, in addition to a signalling function by DNA-PK cs .End joining may be further facilitated when the ends are still held together through nucleosomes or other structures.End joining is sometimes associated with gain or loss of a few nucleotides if internal microhomologies are used for annealing before sealing.This implies the involvement of DNA polymerases and/or nucleases.Note that the KU complex is also involved in telomere metabolism 27,62 .found to be lethal 34 .Inactivation of ATR by itself is inviable already at the blastocyst stage.Inactivation of BRCA1 and BRCA2 in mice is also embryonically lethal; cell lines display defects in homologous recombination [35][36][37] .371A tentative scenario for the homologousrecombination reaction is depicted in the left panel of the figure.To promote strand invasion into homologous sequences, the 5-3 exonuclease activity of the RAD50/MRE11/NBS1 complex (also a substrate for ATM phosphorylation) exposes both 3 ends 30 (I).RPA facilitates assembly of a RAD51 nucleoprotein filament that probably includes RAD51-related proteins XRCC2, XRCC3, RAD51B, C and D. RAD52 stimulates filament assembly (II).RAD51 has, like its Escherichia coli RecA counterpart, the ability to exchange the single strand with the same sequence from a double-stranded DNA molecule.Correct positioning of the sister chromatids by cohesins probably facilitates the identification of a homologous sequence.A candidate for the complex chromatin transactions associated with these DNA gymnastics is RAD54, a member of the SWI/SNF family of DNA-dependent ATPases.After identification of the identical sister chromatid sequence, the intact double-stranded copy is used as a template to properly heal the broken ends by DNA synthesis (III).Finally, the so-called Hollidayjunctions are resolved by resolvases 27,33,60 (IV).Homologous recombination involves the simultaneous action of large numbers of the same molecules, which are found to be concentrated in radiation-induced nuclear foci.These depend on, and also include, the BRCA1 and BRCA2 proteins 36 .Recent evidence implicates BRCA2 directly or indirectly in nuclear translocation of RAD51 (ref.61).", + "This picture poses more questions than it seeks to answer.Is the grouping of the regions by product rather than by type of region correct?Given that the recombina- tion fraction between HLA-A and HLA-B is of the order of .08%,and that this is likely to represent a distance of at least hundreds of thousands of nucleotides, how are the pieces put together over such relatively long distances?Is it possible that regions of the DNA loop out, so that transcripts can be made directly from noncon- tiguous DNA sequences, the loops being held in place by small RNAs as suggested for the control of splicing by Steitz, and her colleagues [24] and by others [25]?If these small RNAs are coded for well outside the HLA region, does this provide a mechanism for control of expression of products by unlinked genes, as may be the case for one of the constituent polypeptides of the HLA-DR product?What might be the nature of the signals that control which of a multiple set of alternative regions is expressed by any given chromosome?", + "Mamm Genome. 2006; 17:220229. [PubMed: 16518689]72. Romanoski CE, et al. Systems genetics analysis of gene-by-environment interactions in humancells. Am J Hum Genet. 2010; 86:399410. [PubMed: 20170901]73. Myers S, Freeman C, Auton A, Donnelly P, McVean G. A common sequence motif associatedwith recombination hot spots and genome instability in humans. Nature Genet. 2008; 40:11241129. [PubMed: 19165926]74. Myers S, et al. Drive against hotspot motifs in primates implicates the PRDM9 gene in meioticrecombination. Science. 2010; 327:876879. [PubMed: 20044541]75. Cordell HJ. Detecting gene-gene interactions that underlie human diseases. Nature Rev Genet. 2009; 10:392404.", + "Classification of common conserved sequences in mammalianintergenic regions. Hum. Mol. Genet. 2002, 11, 669674. 25. Zhu, L.; Swergold, G.D.; Seldin, M.F. Examination of sequence homology between humanchromosome 20 and the mouse genome: Intense conservation of many genomic elements. Hum. Genet. 2003, 113, 6070. 26. Pevzner, P.; Tesler, G. Human and mouse genomic sequences reveal extensive breakpoint reuse inmammalian evolution. Proc. Natl. Acad. Sci. USA 2003, 100, 76727677. 27. Christmann, R.B. ; Sampaio-Barros, P.; Stifano, G.; Borges, C.L. ; de Carvalho, C.R. ; Kairalla, R.;Parra, E.R. ; Spira, A.; Simms, R.; Capellozzi, V.L. ; et al.", + "a The table lists proteins in which mutations have been shown to increase homologous recombination (HR), gross chromosomal rearrangements (GCRs), chromosomal instability (CIN), sister chromatid exchanges (SCEs), tri-nucleotide repeat expansions and contractions (TNR), telomere fusions (Tel fusion), or fragile telomeres (Tel fragility).A phenotype inside brackets ([ ]) indicates that it is caused by overexpression of the protein.For further details and references see Supplementary Table1.Abbreviations: DSB, double-strand break; PCNA, proliferating cell nuclear antigen; RFC, replication factor C complex; SCF, Skp1-Cdc53/Cullin-F-box.Figure 3 Intermediates and chromosome structural alterations, as observed by different techniques. (a) Replication fork stalling, as monitored by 2D-gel electrophoresis and Southern analysis in yeast (for details about the technique, see Reference 161). (b) Slower human replication forks covering shorter DNA synthesis tracks, as determined by incorporation of IdU and CldU via DNA combing (52), which permits visualization of the process of replication on DNA fibers. (c) Accumulation of double-strand breaks (DSBs) or replicative stress, as inferred by H2AX foci or by H2AX pan staining, respectively, in human cells. (d ) DSBs or ssDNA (single-stranded DNA) gaps as seen directly by nuclear \"comet tails\" via single-cell electrophoresis assays in human cells (52). (e) Sister-chromatid exchanges (SCEs), as determined by Giemsa staining in human cells (207). ( f ) Hyper-recombination, as determined by colony sectoring in yeast (5). ( g) Gross chromosomal rearrangements (GCRs), as determined by spectral karyotyping in mouse cells (118). (h) Translocations, as visualized by pulse-field gel electrophoresis in yeast (168). (i ) Fragile sites, as detected by mitotic spreads in human cells (109). ( j) Telomere fusions, as determined by CO-FISH (chromosome-orientation fluorescent in situ hybridization) in mouse cells (124). (k) Anaphase bridges, presumably resulting from unfinished replication, dicentric chromosomes, and sister-chromatid nondisjunction, as detected by fluorescence microscopy in mouse cells.Arrows indicate the specific structural alterations referred to in each panel; in panel h, closed and open arrows indicate the position where the translocated or missing parental chromosome migrate or should migrate, respectively.When necessary, a normal control is shown on top of the panel, with the exception of panel a, which is shown on the left.Detailed description of each technique can be found in the references provided.Photos are from the laboratories of A. Nussenzweig ( g), A. Losada (k), M. Blasco ( j), L. Tora (i ), and ours (all others).Abbreviations: HR, homologous recombination; NHEJ, nonhomologous end-joining.", + "In humans, the pericentromeric region of chromosome 9 is densely packed with segmental genomic duplications (segdups) and is prone to microdeletions and microduplications. 5In order to evaluate this region for microdeletions and microduplications in family T, we screened genomic DNA from affected individual II-7 by arrayCGH with the Nimblegen HD2 platform with the previously described CHP-SKN sample 6 as the reference.Data were normalized and CNVs were called by identifying regions where Z-scores consistently deviated from the diploid mean.At 9q21.11, a genomic duplication of ~270 kb was apparent in the genomic DNA of II-7 (Figure 1D).The Genomic duplications may or may not be in tandem with their parent segment and may be either in the same or inverted orientation. 7We developed primers that would uniquely amplify genomic DNA with the duplication under each of these conditions.Forward (5 0 -CCCAGCAGA AGCAATGGTGGTAGCC-3 0 ) and reverse (5 0 -GGTGGTGAA TCCAAAAACACAAGAACAAAGTC-3 0 ) primers diagnostic for a tandem inverted duplication (Figure 2A) yielded products of expected size in family T relatives with hearing loss, but yielded no product in unaffected family T relatives (Figure 2B).Genotypes of all 58 participating relatives in family T indicated that the tandem inverted duplication was coinherited with hearing loss.The duplication spans approximately positions 71,705,804 to 71,974,823 (hg19) on chromosome 9 for a size of ~269,023 bp.The duplication includes the entire locus for the tight junction protein TJP2, which spans positions 71,788,971 to 71,870,124 (hg19).", + "Chromosomal context of human NORsHuman NORs are positioned on the short arms of the acrocentric chromosomes that still remain unsequenced and thus missing from the current human genome draft, GRCh38.p7.Seeking an understanding of the chromosomal context of human NORs and to identify potential NOR regulatory elements, my laboratory has begun to characterize the sequences on both proximal (centromeric) and distal (telomeric) sides of the rDNA arrays (Fig. 3A; Floutsakou et al. 2013).Building on earlier reports of sequences distal and proximal to the rDNA array on HSA21 and HSA22, respectively (Worton et al. 1988;Sakai et al. 1995;Gonzalez and Sylvester 1997), 207 kb of sequence immediately proximal and 379 kb distal to rDNA arrays have been reported recently (Floutsakou et al. 2013).Consensus proximal junction (PJ) and distal junction (DJ) sequences were constructed mostly from chromosome 21 BACs (bacterial artificial chromosomes).Comparison of these sequences with BACs and cosmids derived from the other acrocentrics revealed that the PJ and DJ sequences are, respectively, 95% and 99% identical between all five acrocentric chromosomes.Conservation of DJ sequences among the acrocentrics is consistent with frequent recombination between the rDNA arrays on each of the acrocentric chromosomes (Worton et al. 1988).However, conservation of PJ sequences suggests that there must also be frequent recombination events in the interval between the centromere and rDNA arrays.Proximal sequences are almost entirely segmentally duplicated, similar to the regions bordering centromeres.Consequently, they are unlikely to contain any specific elements that would regulate the activity of the linked NOR.In contrast, the distal sequence is predominantly unique to the acrocentric short arms and is dominated by a very large inverted repeat.Each arm of the inverted repeat is >100 kb, and they share an average sequence identity of 80%.There is a large (40-kb) block of a 48base-pair (bp) satellite repeat, CER, at the distal end of the DJ (Fig. 3A).CER blocks are found distal to the rDNA on all acrocentric chromosomes, with additional pericentromeric blocks on chromosomes 14 and 22. Finally, there are two blocks of a novel 138-bp tandem repeat, ACRO138, present within the DJ.The conservation of DJ sequence between the five human acrocentric chromosomes provides a unique opportunity to visualize NORs by FISH.Whereas the rDNA content of NORs can vary greatly, probing of human metaphase chromosome spreads with a DJ BAC results in signal that is consistent between NORs (Floutsakou et al. 2013).Using this probing scheme, it was observed that in most human cell lines analyzed, including multiple primary lines, at least one and sometimes as many as four of the NORs present have very little or no detectable rDNA (C van Vuuren and B McStay, unpubl. ).Many studies have used silver staining of metaphase spreads prepared from stimulated human peripheral blood lymphocytes to determine how many NORs are active in normal human cells.The number of active NORs ranges from seven to 10, with an average of eight (Heliot et al. 2000).Possibly, NORs with low rDNA content are active but fall below a detection threshold in silver staining.At this point, it is worth considering the distribution of active versus silent rDNA repeats in humans and other mammals.If 50% of rDNA repeats are truly repressed, there are insufficient \"silent\" NORs to house them.We must conclude that active NORs are a mosaic of active and silent repeats.", + "However, excluding some cases, recombinationsuppression occurs in a small genomic tractwhere these genes are located, and it doesnot extend over most of the sex chromosomepair, as occurs in mammals and birds (Bergeroand Charlesworth, 2009). It is not clear if thissuppression occurs by the presence of inversions or as a modulation of the recombinationmechanism itself, but both could be involved(Bergero and Charlesworth, 2009). Evidence ofrecombination in the SD region in sex reversalindividuals supports the second hypothesis.", + "Orthologous chromosomes between baboon and human", + "Lichter P, Cremer T, Borden J, Manuelidis L, Ward DC (1988) Delineation ofindividual human chromosomes in metaphase and interphase cells by in situ suppression hybridization using recombinant DNA libraries. Hum Genet 80:2242343. Jang W, Yonescu R, Knutsen T, Brown T, Reppert T, Sirotkin K, Schuler GD, RiedT, Kirsch IR (2006) Linking the human cytogenetic map with nucleotide sequence:the CCAP clone set. Cancer Genet Cytogenet 168:89974.NatureGenet 1:22222555. Foote S, Vollrath D, Hilton A, Page DC (1992) The human Y chromosome: overlapping DNA clones spanning the euchromatic region. Science 258:606656. Chumakov IM, Rigault P, Le Gall I et al (1995) A YAC contig map of the humangenome. Nature 377:17529757. Hudson TJ, Stein LD, Gerety SS et al (1995) An STS-based map of the humangenome. Science 270:1945195458. Coffey AJ, Roberts RG, Green ED et al (1992) Construction of a 2.6-Mb contig inyeast artificial chromosomes spanning the human dystrophin gene using an STSbased approach. Genomics 12:47448459.", + "Figure 4 Schematic depiction of proposed mechanisms for observed intrachromosomal rearrangements.The blue and red arrows indicate the orientation of the integrated plasmid loci and the recovered mouse sequences, respectively, on the original non-rearranged chromosome (left column).All four combinations are given for an arbitrarily orientated chromosome (green line).The middle column shows how two breakpoints (lightning signs) could lead to the inversion or deletion of the encompassed chromosomal sequence (yellow-orange dual tone line) and result in a recoverable mutation in the right column.The last row indicates the two options for a transposition, in which either the transgene locus or the recovered mouse sequence is copied or excised (as indicated by the pink and light blue arrows) and integrates in the breakpoint at the other location.As mentioned above, by taking into account that for a genome rearrangement to be detected, the 5 plasmid sequence of the breakpoint in lacZ must remain intact and end immediately in front of the recovered mouse sequence, the simplest intrachromosomal mutation that could have taken place was inferred (Fig. 4).Rearrangements with breakpoints in the mouse genome on either site of the integrated plasmid concatamer, but with reversely orientated sequences, could be inversions (Fig. 4).Rearrangements in the direction of the integrated plasmids, proximal for chromosome 3 and distal for chromosome 4 (Fig. 3), with similarly orientated breakpoints in the mouse genome, could be deletions (Fig. 4).Rearrangements in the reverse direction of the integrated plasmids, with reversely orientated mouse sequences, are more complicated and might be owing to transpositions (Fig. 4).According to these schemes, half of the intrachromosomal rearrangements would have been inversions, whereas deletions and transpositions each made up one fourth (Fig. 3).Alternatively, these rearrangements could be explained by translocations involving the transgene clusters integrated on either the homolog or the other chromosome.", + "FIGURE 3. Telomere arrays of chicken and human chromosomes: the chicken genome contains more telomere sequence than the human genome.Chicken (a) and human (b) metaphase chromosomes and interphase cells hybridized with a telomeric sequence-peptide nucleic acid (PNA)-fluorescein probe.Human and chicken slide preparations were processed, and images were captured using the same parameters.Qualitatively, the telomere-positive fluorescent signals (white spots) from chicken cells and chromosomes have greater intensity than those of human (4,6 diamidino-2-phenylindole, DAPI counterstain).", + "In a previous study on the accumulation of spontaneous genome rearrangements in normal mice with aging, we discovered that 50% of the events were intrachromosomal, i.e., large deletions or inversions [22].In contrast, in this present study most of the rearrangements resulted from inter-chromosomal recombination, in both the Ercc1-mutant and control animals (Table 3).Previously, we used lacZ-plasmid line 60 mice with integration sites on Chromosomes 3 and 4, while in the present study line 30 mice were used with a single integration site on Chromosome 11.This indicates that the relative frequency of translocations is founder line specific and could be due to the position of the lacZ-plasmid cluster on the chromosome.Indeed, the chromosomal integration sites in line 60 mice are in the E1 region of Chromosome 3 (half way along the chromosome) and the C5 region of Chromosome 4 (two-thirds of the way along the chromosome) [22], while the integration site of founder line 30 (used in this study) is on the centromeric tip of Chromosome 11 (region A1-A2; not shown).The proximal location on Chromosome 11 prevents the detection of all but relatively small intra-chromosomal recombinations; larger events would lead to loss of the centromere and, therefore, the entire chromosome.If the orientation of the integration site in line 30, which is currently unknown, is towards the centromere, transpositions and inversions towards the distal end are the only detectable large intra-chromosomal rearrangements (for a detailed explanation of the different chromosomal events that can occur at the lacZ locus, see [22])." + ], + [ + "Primate and mouse centromeric DNA contain the CENP-B box, a 17-bp motif that is bound by the centromere protein CENP-B.In humans, the CENP-B box is present on every chromosome except for the Y chromosome [20].It is paradoxically necessary for formation of de novo centromeres on artificial chromosomes, but it is not essential for the formation of neocentromeres, and mice lacking CENP-B are viable and fertile [21,22].", + "Box 3 Mechanism of homologous recombination and end joiningThe severe phenotype of the mouse mutants and the highly cancer-prone human syndromes highlight the importance of homologous recombination.Mouse KU mutants display sensitivity to agents that lead to breaks in DNA, and have immunological problems because the KU proteins are involved in V(D)J recombination of antibody gene sequences.In addition, these mutants display poor development, several features of premature ageing and increased apoptosis of postmitotic neurons in the developing brain.Mice with defects in DNA-PK cs (SCID mice) display a similar but generally milder phenotype.In contrast, XRCC4-and ligase IV-knockout mice seem more severe, with late embryonic lethality resulting from massive ATM-and p53-dependent neuronal apoptosis 33,38 .Cells in G1 have only the homologous chromosome for recombination repair.However, this may be difficult to find in the complex genome.Moreover, it is potentially dangerous as a template for repair as it may lead to homozygosity for recessive mutations.As an alternative, the end-joining reaction simply links ends of a DSB together, without any template, using the end-binding KU70/80 complex and DNA-PK cs , followed by ligation by XRCC4-ligase4 (reviewed by 27,33; see the right panel of the figure, stages V-VII).The function of KU70/80 might involve end protection and approximating the ends, in addition to a signalling function by DNA-PK cs .End joining may be further facilitated when the ends are still held together through nucleosomes or other structures.End joining is sometimes associated with gain or loss of a few nucleotides if internal microhomologies are used for annealing before sealing.This implies the involvement of DNA polymerases and/or nucleases.Note that the KU complex is also involved in telomere metabolism 27,62 .found to be lethal 34 .Inactivation of ATR by itself is inviable already at the blastocyst stage.Inactivation of BRCA1 and BRCA2 in mice is also embryonically lethal; cell lines display defects in homologous recombination [35][36][37] .371A tentative scenario for the homologousrecombination reaction is depicted in the left panel of the figure.To promote strand invasion into homologous sequences, the 5-3 exonuclease activity of the RAD50/MRE11/NBS1 complex (also a substrate for ATM phosphorylation) exposes both 3 ends 30 (I).RPA facilitates assembly of a RAD51 nucleoprotein filament that probably includes RAD51-related proteins XRCC2, XRCC3, RAD51B, C and D. RAD52 stimulates filament assembly (II).RAD51 has, like its Escherichia coli RecA counterpart, the ability to exchange the single strand with the same sequence from a double-stranded DNA molecule.Correct positioning of the sister chromatids by cohesins probably facilitates the identification of a homologous sequence.A candidate for the complex chromatin transactions associated with these DNA gymnastics is RAD54, a member of the SWI/SNF family of DNA-dependent ATPases.After identification of the identical sister chromatid sequence, the intact double-stranded copy is used as a template to properly heal the broken ends by DNA synthesis (III).Finally, the so-called Hollidayjunctions are resolved by resolvases 27,33,60 (IV).Homologous recombination involves the simultaneous action of large numbers of the same molecules, which are found to be concentrated in radiation-induced nuclear foci.These depend on, and also include, the BRCA1 and BRCA2 proteins 36 .Recent evidence implicates BRCA2 directly or indirectly in nuclear translocation of RAD51 (ref.61).", + "Homologous Recombination RepairDuring HR, broken DNA ends are first recognized by the Mre11-Rad50-Xrs2 (MRX) complex (Mre11-Rad50-Nbs1 (MRN) complex in mammals) and are processed by MRE11 to 3' single-strand DNA (ssDNA) tails through a series of 5' 3' strand resection activities [6,9,45].The C-terminus of NBS1 interacts with ATM and recruits it to DSBs [46].ATM belongs to the phosphatidylinositol-3-like kinaserelated kinase (PIKK) family and plays an important role in the propagation of the initial DSB lesion by phosphorylating a number of downstream substrates.In undamaged cells, ATM forms inactive dimers or multimers.Upon induction of DSBs, ATM is autophosphorylated at serine 1981, leading to its dissociation into activated monomers [47].Activated ATM rapidly phosphorylates and activates downstream repair factors to directly promote their recruitment to sites of DNA damage.Perhaps, the most important event is the ATM-dependent phosphorylation of the histone variant H2AX at the C-terminal of the protein, corresponding to Ser139 ( -H2AX) [48].Other substrates for activated ATM include the proteins SMC1, NBS1, CHK2, p53, BRCA1 and MDC1 [49].Key amongst these substrates are the Chk2 kinase and p53 which act to reduce cyclin-dependent kinase (CDK) activity and arrest cells in the various stages of cell cycle to allow time of the completion of DNA repair.Following recruitment and activation of ATM, BRCA1, BRCA2 and RAD52 epistasis group proteins including XRCC2, XRCC3, RAD51B, RAD51C and RAD51D [50] are also recruited to DSBs to further transmit signals to downstream processing enzymes.The single-strand overhangs are then rapidly bound by ssDNA-binding protein replication protein A (RPA), and recruit Rad51 and Rad52 to the damaged sites [45].Loading of Rad51 onto the ssDNA tail subsequently results in the formation of ssDNA-Rad51 nucleoprotein filament, which then searches for its homologous counterpart in the corresponding intact sister chromatid.If the specific region of duplex DNA is found, strand invasion is initiated in the presence of another set of HR facilitating proteins (Rad54, Rad55, and Rad57) followed by strand exchange and joint molecule formation [45,51].Once the Holliday junctions are resolved, distal broken ends are sealed through DNA synthesis by DNA polymerase, resulting in an errorfree repair event and preserving genetic contents [44].In response to antigen or humoral stimulation, class switch recombination (CSR) further diversifies antibodies by switching their isotypes [63].CSR occurs between two switch (S) regions located upstream of C H (constant regions of immunoglobulin heavy chain).Similar to V(D)J recombination, CSR also involves DSB generation and NHEJ.Upon humoral stimulation, activation-induced cytidine deaminase (AID) deaminates deoxycytidine (dC) resulting in deoxyuracil (dU) bases on both strands of two transcriptionally active S regions [64].The dU is excised by the uracil DNA glycosylases (UNG) and the resultant abasic site is further cut by apurinic/apyrimidinic endonuclease 1/2 (APE-1/2), generating single strand breaks (SSBs).Either two adjacent SSBs on opposite strands spontaneously lead to one DSB, or the MMR machinery is triggered to convert SSB to DSB [65].Deficiency of AID, UNG, APE or any of the MMR components, including Msh2, Msh6, Mlh1, Pms2 and Exo1, leads to loss or reduction of CSR in B cells [63].After DSB formation, the NHEJ pathway is activated.The Ku70-Ku80 heterodimers bind to the DNA ends and recruit necessary proteins to process the DNA ends to facilitate the ligation mediated by Xrcc4-DNA ligase IV complex [66].CSR in Ku70 / and Ku80 / B cells is nearly ablated [67,68].Either Xrcc4 or DNA ligase IV deficiency causes significant reduction in CSR [69,70].While compatible ends are joined rapidly by canonical NHEJ components, complex lesions need substantial processing and are re-ligated slowly.In the later case, ATM, 53BP1 and MRM complex cooperate with canonical NHEJ components to mediate end-joining recombination.Disruption of ATM, 53BP1 or MRN complex in mice leads to defects in either V(D)J recombination or CSR or both [71][72][73][74].Recent studies in mouse models deficient in NHEJ core components revealed a robust alt-NHEJ pathway that utilizes microhomology to mediate the end joining in CSR [69,70].Alt-NHEJ leads to Ig locus deletion and translocation.However, the molecular mechanisms underlying alt-NHEJ are not well elucidated so far.", + "Action of RecQ helicases on DNA recombination intermediatesTwo key intermediates of HR (homologous recombination) are the four-stranded HJ and the three-stranded D-loop.An early important observation was made that BLM and WRN selectively bind HJ structures in vitro and are capable of efficiently promoting ATP-dependent HJ branch migration through greater than 2 kb of DNA [54,55], suggesting they may act upon such four-stranded structures at blocked or collapsed replication forks to allow processing into mature recombinants.Other RecQ helicases that have been shown to efficiently unwind HJ structures include E. coli RecQ, Sgs1, RECQ1 and RECQ5 [15,16,56,57].The bacterial HJ core recognition protein RuvA inhibits HJ branch migration by BLM, WRN, RECQ1 or RECQ5 [15,16,55,58], suggesting that these RecQ helicases specifically recognize the HJ core where they initiate unwinding.", + "This picture poses more questions than it seeks to answer.Is the grouping of the regions by product rather than by type of region correct?Given that the recombina- tion fraction between HLA-A and HLA-B is of the order of .08%,and that this is likely to represent a distance of at least hundreds of thousands of nucleotides, how are the pieces put together over such relatively long distances?Is it possible that regions of the DNA loop out, so that transcripts can be made directly from noncon- tiguous DNA sequences, the loops being held in place by small RNAs as suggested for the control of splicing by Steitz, and her colleagues [24] and by others [25]?If these small RNAs are coded for well outside the HLA region, does this provide a mechanism for control of expression of products by unlinked genes, as may be the case for one of the constituent polypeptides of the HLA-DR product?What might be the nature of the signals that control which of a multiple set of alternative regions is expressed by any given chromosome?", + "Mamm Genome. 2006; 17:220229. [PubMed: 16518689]72. Romanoski CE, et al. Systems genetics analysis of gene-by-environment interactions in humancells. Am J Hum Genet. 2010; 86:399410. [PubMed: 20170901]73. Myers S, Freeman C, Auton A, Donnelly P, McVean G. A common sequence motif associatedwith recombination hot spots and genome instability in humans. Nature Genet. 2008; 40:11241129. [PubMed: 19165926]74. Myers S, et al. Drive against hotspot motifs in primates implicates the PRDM9 gene in meioticrecombination. Science. 2010; 327:876879. [PubMed: 20044541]75. Cordell HJ. Detecting gene-gene interactions that underlie human diseases. Nature Rev Genet. 2009; 10:392404.", + "a The table lists proteins in which mutations have been shown to increase homologous recombination (HR), gross chromosomal rearrangements (GCRs), chromosomal instability (CIN), sister chromatid exchanges (SCEs), tri-nucleotide repeat expansions and contractions (TNR), telomere fusions (Tel fusion), or fragile telomeres (Tel fragility).A phenotype inside brackets ([ ]) indicates that it is caused by overexpression of the protein.For further details and references see Supplementary Table1.Abbreviations: DSB, double-strand break; PCNA, proliferating cell nuclear antigen; RFC, replication factor C complex; SCF, Skp1-Cdc53/Cullin-F-box.Figure 3 Intermediates and chromosome structural alterations, as observed by different techniques. (a) Replication fork stalling, as monitored by 2D-gel electrophoresis and Southern analysis in yeast (for details about the technique, see Reference 161). (b) Slower human replication forks covering shorter DNA synthesis tracks, as determined by incorporation of IdU and CldU via DNA combing (52), which permits visualization of the process of replication on DNA fibers. (c) Accumulation of double-strand breaks (DSBs) or replicative stress, as inferred by H2AX foci or by H2AX pan staining, respectively, in human cells. (d ) DSBs or ssDNA (single-stranded DNA) gaps as seen directly by nuclear \"comet tails\" via single-cell electrophoresis assays in human cells (52). (e) Sister-chromatid exchanges (SCEs), as determined by Giemsa staining in human cells (207). ( f ) Hyper-recombination, as determined by colony sectoring in yeast (5). ( g) Gross chromosomal rearrangements (GCRs), as determined by spectral karyotyping in mouse cells (118). (h) Translocations, as visualized by pulse-field gel electrophoresis in yeast (168). (i ) Fragile sites, as detected by mitotic spreads in human cells (109). ( j) Telomere fusions, as determined by CO-FISH (chromosome-orientation fluorescent in situ hybridization) in mouse cells (124). (k) Anaphase bridges, presumably resulting from unfinished replication, dicentric chromosomes, and sister-chromatid nondisjunction, as detected by fluorescence microscopy in mouse cells.Arrows indicate the specific structural alterations referred to in each panel; in panel h, closed and open arrows indicate the position where the translocated or missing parental chromosome migrate or should migrate, respectively.When necessary, a normal control is shown on top of the panel, with the exception of panel a, which is shown on the left.Detailed description of each technique can be found in the references provided.Photos are from the laboratories of A. Nussenzweig ( g), A. Losada (k), M. Blasco ( j), L. Tora (i ), and ours (all others).Abbreviations: HR, homologous recombination; NHEJ, nonhomologous end-joining.", + "Chromosomal context of human NORsHuman NORs are positioned on the short arms of the acrocentric chromosomes that still remain unsequenced and thus missing from the current human genome draft, GRCh38.p7.Seeking an understanding of the chromosomal context of human NORs and to identify potential NOR regulatory elements, my laboratory has begun to characterize the sequences on both proximal (centromeric) and distal (telomeric) sides of the rDNA arrays (Fig. 3A; Floutsakou et al. 2013).Building on earlier reports of sequences distal and proximal to the rDNA array on HSA21 and HSA22, respectively (Worton et al. 1988;Sakai et al. 1995;Gonzalez and Sylvester 1997), 207 kb of sequence immediately proximal and 379 kb distal to rDNA arrays have been reported recently (Floutsakou et al. 2013).Consensus proximal junction (PJ) and distal junction (DJ) sequences were constructed mostly from chromosome 21 BACs (bacterial artificial chromosomes).Comparison of these sequences with BACs and cosmids derived from the other acrocentrics revealed that the PJ and DJ sequences are, respectively, 95% and 99% identical between all five acrocentric chromosomes.Conservation of DJ sequences among the acrocentrics is consistent with frequent recombination between the rDNA arrays on each of the acrocentric chromosomes (Worton et al. 1988).However, conservation of PJ sequences suggests that there must also be frequent recombination events in the interval between the centromere and rDNA arrays.Proximal sequences are almost entirely segmentally duplicated, similar to the regions bordering centromeres.Consequently, they are unlikely to contain any specific elements that would regulate the activity of the linked NOR.In contrast, the distal sequence is predominantly unique to the acrocentric short arms and is dominated by a very large inverted repeat.Each arm of the inverted repeat is >100 kb, and they share an average sequence identity of 80%.There is a large (40-kb) block of a 48base-pair (bp) satellite repeat, CER, at the distal end of the DJ (Fig. 3A).CER blocks are found distal to the rDNA on all acrocentric chromosomes, with additional pericentromeric blocks on chromosomes 14 and 22. Finally, there are two blocks of a novel 138-bp tandem repeat, ACRO138, present within the DJ.", + "However, excluding some cases, recombinationsuppression occurs in a small genomic tractwhere these genes are located, and it doesnot extend over most of the sex chromosomepair, as occurs in mammals and birds (Bergeroand Charlesworth, 2009). It is not clear if thissuppression occurs by the presence of inversions or as a modulation of the recombinationmechanism itself, but both could be involved(Bergero and Charlesworth, 2009). Evidence ofrecombination in the SD region in sex reversalindividuals supports the second hypothesis.", + "Figure2| The homologous and nonhomologous DNA end-joining pathways.a | Homologous recombination requires that the cell be diploid for the DNA region involved.A break on one chromosome can invade the homologous region on the homologous chromosome with the aid of radiation-sensitivity protein 51 (RAD51) and the single-stranded-binding protein, replication protein A (RPA).Subsequent steps involve DNA synthesis by DNA polymerase to copy the information from the intact chromosome, before ligating the newly synthesized region back to the chromosome that is undergoing repair (lower portion of diagram).The DNA crossovers are resolved to generate the two resulting intact duplexes (not shown).If the chromosome being copied is wild type, then the information content of the broken chromosome is restored to wild type.Other proteins that participate in this pathway include RAD54, RAD55, RAD57, breast cancer protein 1 (BRCA1) and BRCA2, and the Fanconi anaemia gene products76 .b | The NHEJ pathway starts with the binding of the Ku heterodimer (Ku70-Ku86) to the DNA ends32 .Ku is thought to then recruit the Artemis-DNA-PK cs complex, which functions as an endonuclease to trim 5 and 3 overhangs.After fill-in synthesis, Ku recruits the XRCC4-DNA-ligase-IV complex to carry out the ligation.When the Artemis-DNA-PK cs complex trims the DNA ends, nucleotides at the DNA ends are permanently lost.", + "Though the pathway is not fullyknown it employs a number of proteins including XRCC1, PARP1 (Poly ADP Ribose22polymerase 1, DNA ligase III, Polynucleotide kinase (PNK) , Flap endonuclease 1(Fen1), Mre11, Rad50 and Nbs1 [111-113]. Homologous recombination: In homologous recombination the broken end of a DSB isjoined to its correct partner by using the information in the sister chromatid (in G2phase), homologous chromosome or a similar repeat in the DNA. This pathway isstarted by recognition of the DSB by MRN complex.", + "Lichter P, Cremer T, Borden J, Manuelidis L, Ward DC (1988) Delineation ofindividual human chromosomes in metaphase and interphase cells by in situ suppression hybridization using recombinant DNA libraries. Hum Genet 80:2242343. Jang W, Yonescu R, Knutsen T, Brown T, Reppert T, Sirotkin K, Schuler GD, RiedT, Kirsch IR (2006) Linking the human cytogenetic map with nucleotide sequence:the CCAP clone set. Cancer Genet Cytogenet 168:89974.", + "Figure 4 Schematic depiction of proposed mechanisms for observed intrachromosomal rearrangements.The blue and red arrows indicate the orientation of the integrated plasmid loci and the recovered mouse sequences, respectively, on the original non-rearranged chromosome (left column).All four combinations are given for an arbitrarily orientated chromosome (green line).The middle column shows how two breakpoints (lightning signs) could lead to the inversion or deletion of the encompassed chromosomal sequence (yellow-orange dual tone line) and result in a recoverable mutation in the right column.The last row indicates the two options for a transposition, in which either the transgene locus or the recovered mouse sequence is copied or excised (as indicated by the pink and light blue arrows) and integrates in the breakpoint at the other location.As mentioned above, by taking into account that for a genome rearrangement to be detected, the 5 plasmid sequence of the breakpoint in lacZ must remain intact and end immediately in front of the recovered mouse sequence, the simplest intrachromosomal mutation that could have taken place was inferred (Fig. 4).Rearrangements with breakpoints in the mouse genome on either site of the integrated plasmid concatamer, but with reversely orientated sequences, could be inversions (Fig. 4).Rearrangements in the direction of the integrated plasmids, proximal for chromosome 3 and distal for chromosome 4 (Fig. 3), with similarly orientated breakpoints in the mouse genome, could be deletions (Fig. 4).Rearrangements in the reverse direction of the integrated plasmids, with reversely orientated mouse sequences, are more complicated and might be owing to transpositions (Fig. 4).According to these schemes, half of the intrachromosomal rearrangements would have been inversions, whereas deletions and transpositions each made up one fourth (Fig. 3).Alternatively, these rearrangements could be explained by translocations involving the transgene clusters integrated on either the homolog or the other chromosome.", + "FIGURE 3. Telomere arrays of chicken and human chromosomes: the chicken genome contains more telomere sequence than the human genome.Chicken (a) and human (b) metaphase chromosomes and interphase cells hybridized with a telomeric sequence-peptide nucleic acid (PNA)-fluorescein probe.Human and chicken slide preparations were processed, and images were captured using the same parameters.Qualitatively, the telomere-positive fluorescent signals (white spots) from chicken cells and chromosomes have greater intensity than those of human (4,6 diamidino-2-phenylindole, DAPI counterstain).", + "In a previous study on the accumulation of spontaneous genome rearrangements in normal mice with aging, we discovered that 50% of the events were intrachromosomal, i.e., large deletions or inversions [22].In contrast, in this present study most of the rearrangements resulted from inter-chromosomal recombination, in both the Ercc1-mutant and control animals (Table 3).Previously, we used lacZ-plasmid line 60 mice with integration sites on Chromosomes 3 and 4, while in the present study line 30 mice were used with a single integration site on Chromosome 11.This indicates that the relative frequency of translocations is founder line specific and could be due to the position of the lacZ-plasmid cluster on the chromosome.Indeed, the chromosomal integration sites in line 60 mice are in the E1 region of Chromosome 3 (half way along the chromosome) and the C5 region of Chromosome 4 (two-thirds of the way along the chromosome) [22], while the integration site of founder line 30 (used in this study) is on the centromeric tip of Chromosome 11 (region A1-A2; not shown).The proximal location on Chromosome 11 prevents the detection of all but relatively small intra-chromosomal recombinations; larger events would lead to loss of the centromere and, therefore, the entire chromosome.If the orientation of the integration site in line 30, which is currently unknown, is towards the centromere, transpositions and inversions towards the distal end are the only detectable large intra-chromosomal rearrangements (for a detailed explanation of the different chromosomal events that can occur at the lacZ locus, see [22])." + ], + [ + "Genome Res, 2011, 21: 17691776Mattick JS, Dinger ME. The extent of functionality in the humangenome. HUGO J, 2013, 7, doi:10.1186/1877-6566-1187-1182ENCODE Project Consortium, Bernstein BE, Birney E, Dunham I,Green ED, Gunter C, Snyder M. An integrated encyclopedia of DNAelements in the human genome. Nature, 2012, 489: 5774Pheasant M, Mattick JS. Raising the estimate of functional humansequences. Genome Res, 2007, 17: 12451253Hu T, Long M, Yuan D, Zhu Z, Huang Y, Huang S. The geneticequidistance result, misreading by the molecular clock and neutraltheory and reinterpretation nearly half of a century later.", + "This approach enables, on the one hand, studying the process ofmammalian evolution and, on the other hand, translational studies using modelorganisms of complex human phenotypes. Detection of regions conserved betweendistant species points to high functional importance of these fragments of the DNAsequence. Human and mouse developmental lines diverged about 75 million years ago, andever since evolutionary forces shaped the two genotypes in a different manner(Waterston et al. , 2002). Nevertheless, the extent of the changes is, however, smallenough for conservation of local gene order (Waterston et al. , 2002).", + "First, the human and mouse genome projectselucidated the sequences of over 20,000 genes [Lander et al. ,2001; Venter et al. , 2001], and most are expressed in the CNS. The availability of gene sequences has allowed rapid analysis ofcandidate human disease and disorder genes and the isolation ofthe mouse homologues. Second, the application of site-specicrecombinase technology provides investigators with the opportunity to engineer genes in the mouse that will allow for thedeletion, insertion, inversion, or exchange of chromosomalDNA with high delity (for review see Branda and Dymechi,2004].", + "In some cases, structural variations, such as copy number polymorphisms,exist (Feuk et al. , 2006); however, because of the nature of the genome assemblyprocess, these will invariably be collapsed into a single contig that does not reflectthe natural sequence. To address the technical challenges of whole-genome assembly,the human genome is released as defined builds on a quarterly basis (Lander et al. ,2001; reviewed in Chapter 4). The increasing complexity of processes that mapdata to the genome implicitly involves some lag in availability of the most currentsequence assembly.Inpractical terms, this has meant that we acquire many fragments, from a few hundredbases to a few hundred kilobases in length, of a genome that must then be assembled computationally to produce a continuous sequence. In the case of the humangenome, two unfinished draft sequences were produced by different methods, oneby the International Human Genome Sequencing Consortium (IHGSC) and one byCelera Genomics (CG). The IHGSC began with a BAC (bacterial artificial chromosome) clone-based physical map of the genome (IHGSC, 2001).4Assembling a View of theHuman GenomeColin A. M. SempleBioinformatics, MRC Human Genetics Unit, Edinburgh EH4 2XU, UK4.1 IntroductionThe miraculous birth of the draft human genome sequence took place againstthe odds. It was only made possible by parallel revolutions in the technologiesused to produce, store and analyse the sequence data, and by the development ofnew, large-scale consortia to organize and obtain funding for the work (Watson,1990). The initial flood of human sequence has subsided as the sequencing centres have sequenced genomes from other mammalian orders and beyond.", + "THE HUMAN GENOME PROJECT IS generating vast amounts of new information at breakneck speed and causing a fundamental shift in disease research.Now with the availability of a nearly complete, high-accuracy sequence of the mouse genome (7), a new and powerful paradigm for biomedical research is established.The remarkable similarity of mouse and human genomes, in both synteny and sequence, unconditionally validates the mouse as an exceptional model organism for understanding human biology.The discovery among inbred mouse strains of defined regions of high and low genomic variation inherited primarily from two ancestral Mus subspecies (6) holds great promise to make mapping and positional cloning more rapid and feasible.Haplotype maps of inbred mouse strains combined with sophisticated delineation of their phenotypic variation and gene expression patterns will enable complex trait analysis on an unprecedented scale.This issue of Journal of Applied Physiology highlights inbred strain surveys exploring phenotypic variation in drug responses [see Crabbe et al. (1) and Watters et al. (8) in this issue].These mouse initiatives demonstrate a viable, cost-effective alternative to human research requiring family studies, population linkage analysis, or genome-wide genotyping on a multitude of individuals for association mapping.", + "How Many Genes are There in the Human Genome?", + "The Landscape of Human Genome Variation", + "In some cases, structural variations, such as copy number polymorphisms,exist (Feuk et al. , 2006); however, because of the nature of the genome assemblyprocess, these will invariably be collapsed into a single contig that does not reflectthe natural sequence. To address the technical challenges of whole-genome assembly,the human genome is released as defined builds on a quarterly basis (Lander et al. ,2001; reviewed in Chapter 4). The increasing complexity of processes that mapdata to the genome implicitly involves some lag in availability of the most currentsequence assembly.Inpractical terms, this has meant that we acquire many fragments, from a few hundredbases to a few hundred kilobases in length, of a genome that must then be assembled computationally to produce a continuous sequence. In the case of the humangenome, two unfinished draft sequences were produced by different methods, oneby the International Human Genome Sequencing Consortium (IHGSC) and one byCelera Genomics (CG). The IHGSC began with a BAC (bacterial artificial chromosome) clone-based physical map of the genome (IHGSC, 2001).4Assembling a View of theHuman GenomeColin A. M. SempleBioinformatics, MRC Human Genetics Unit, Edinburgh EH4 2XU, UK4.1 IntroductionThe miraculous birth of the draft human genome sequence took place againstthe odds. It was only made possible by parallel revolutions in the technologiesused to produce, store and analyse the sequence data, and by the development ofnew, large-scale consortia to organize and obtain funding for the work (Watson,1990). The initial flood of human sequence has subsided as the sequencing centres have sequenced genomes from other mammalian orders and beyond.", + "Science 291:130413513. Lander ES et al (2001) Initial sequencing and analysis of the human genome. Nature 409:8609214. Engle LJ, Simpson CL, Landers JE (2006) Using high-throughput SNP technologies to study cancer. Oncogene 25:159416015. Elston RC, Anne Spence M (2006) Advances in statistical human genetics over thelast 25 years. Stat Med 25:304930806. Larson GP et al (2005) Genetic linkage of prostate cancer risk to the chromosome3 region bearing FHIT. Cancer Res 65:8058147. Botstein D, Risch N (2003) Discovering genotypes underlying human phenotypes:past successes for mendelian disease, future approaches for complex disease.McPherson JD, Marra M, Hillier L et al (2001) A physical map of the humangenome. Nature 409:93494113. Burke DT, Carle GF, Olson MV. (1987) Cloning of large segments of exogenousDNA into yeast by means of artificial chromosome vectors. Science 236:80681214. Fleischmann RD, Adams MD, White O et al (1995) Whole-genome randomsequencing and assembly of Haemophilus influenzae Rd Science 269:49651215. Arabidopsis Genome Initiative (2000) Analysis of the genome sequence of theflowering plant Arabidopsis thaliana. Nature 408:79681516.", + "T he human genome has been cracked wide open in recent years and is spilling many of its secrets.More than 100 genome wide association studies have been conducted for scores of hu man diseases, identifying hun dreds of polymorphisms that are widely seen to influence disease risk.After many years in which the study of complex human traits was mired in false claims and methodologic inconsistencies, ge nomics has brought not only com prehensive representation of com mon variation but also welcome rigor in the interpretation of sta tistical evidence.Researchers now know how to properly account for most of the multiple hypothesis testing involved in mining the ge nome for associations, and most reported associations reflect real biologic causation.But do they matter?", + "In some cases, structural variations, such as copy number polymorphisms,exist (Feuk et al. , 2006); however, because of the nature of the genome assemblyprocess, these will invariably be collapsed into a single contig that does not reflectthe natural sequence. To address the technical challenges of whole-genome assembly,the human genome is released as defined builds on a quarterly basis (Lander et al. ,2001; reviewed in Chapter 4). The increasing complexity of processes that mapdata to the genome implicitly involves some lag in availability of the most currentsequence assembly.Inpractical terms, this has meant that we acquire many fragments, from a few hundredbases to a few hundred kilobases in length, of a genome that must then be assembled computationally to produce a continuous sequence. In the case of the humangenome, two unfinished draft sequences were produced by different methods, oneby the International Human Genome Sequencing Consortium (IHGSC) and one byCelera Genomics (CG). The IHGSC began with a BAC (bacterial artificial chromosome) clone-based physical map of the genome (IHGSC, 2001).", + "In some cases, structural variations, such as copy number polymorphisms,exist (Feuk et al. , 2006); however, because of the nature of the genome assemblyprocess, these will invariably be collapsed into a single contig that does not reflectthe natural sequence. To address the technical challenges of whole-genome assembly,the human genome is released as defined builds on a quarterly basis (Lander et al. ,2001; reviewed in Chapter 4). The increasing complexity of processes that mapdata to the genome implicitly involves some lag in availability of the most currentsequence assembly.Inpractical terms, this has meant that we acquire many fragments, from a few hundredbases to a few hundred kilobases in length, of a genome that must then be assembled computationally to produce a continuous sequence. In the case of the humangenome, two unfinished draft sequences were produced by different methods, oneby the International Human Genome Sequencing Consortium (IHGSC) and one byCelera Genomics (CG). The IHGSC began with a BAC (bacterial artificial chromosome) clone-based physical map of the genome (IHGSC, 2001).4Assembling a View of theHuman GenomeColin A. M. SempleBioinformatics, MRC Human Genetics Unit, Edinburgh EH4 2XU, UK4.1 IntroductionThe miraculous birth of the draft human genome sequence took place againstthe odds. It was only made possible by parallel revolutions in the technologiesused to produce, store and analyse the sequence data, and by the development ofnew, large-scale consortia to organize and obtain funding for the work (Watson,1990). The initial flood of human sequence has subsided as the sequencing centres have sequenced genomes from other mammalian orders and beyond." + ], + [ + "We (Hein, Schierup and Wiuf) have published a300 page book on molecular population genetics titled Gene Genealogies, Sequence Variation and Evolution OxfordUniversity Press, and are presently developing a tutorial in association mapping that we hope to publish as a booklet in2006 and are also involved in a very large EU collaboration (Holland, Denmark, Iceland and UK) to find susceptibilitygenes for breast and prostate cancer. In comparative genomics, the most fundamental investigation is to find genes in a pair of aligned genomes.", + "Key bioinformatic steps totake a genetic study from an initial linkage or association to laboratory genotyping are illustrated. The reader should note the role of genomic sequence as a common thread through every stageregions in man (see Chapter 5). Similar issues also exist in the establishment oftrue orthology between genes in different species, where one is identified to play arole in a disease model. If two genes are truly orthologous, their evolution closelyfollows patterns of speciation (Fitch, 2000).In general terms, the approaches we describe can be applied to sequence data from any collection of organisms, but our emphasis here is primarily onBioinformatics for Geneticists, Second Edition. Edited by Michael R. Barnes2007 John Wiley & Sons, Ltd ISBN 978-0-470-02619-9 (HB) ISBN 978-0-470-02620-5 (PB)\u0002C106CH 6 COMPARATIVE GENOMICSquestions of relevance to human genetics. We begin, in Section 6.2 by presenting anoverview of genome structure and content, providing a context for the subsequentdiscussions.4Assembling a View of theHuman GenomeColin A. M. SempleBioinformatics, MRC Human Genetics Unit, Edinburgh EH4 2XU, UK4.1 IntroductionThe miraculous birth of the draft human genome sequence took place againstthe odds. It was only made possible by parallel revolutions in the technologiesused to produce, store and analyse the sequence data, and by the development ofnew, large-scale consortia to organize and obtain funding for the work (Watson,1990). The initial flood of human sequence has subsided as the sequencing centres have sequenced genomes from other mammalian orders and beyond.This fully indexed but semi-intelligibleBioinformatics for Geneticists, Second Edition. Edited by Michael R. Barnes2007 John Wiley & Sons, Ltd ISBN 978-0-470-02619-9 (HB) ISBN 978-0-470-02620-5 (PB)\u0002C4CH 1 BIOINFORMATICS CHALLENGES FOR THE GENETICISTbook of life immediately began to serve as a valuable framework for integration ofgenetic and biological data. However, knowledge of the genome sequence did notimmediately clarify the nature and structure of human genetic variation.", + "2011) human genome reference sequence provides a basis foranalyzing short-read, whole genome, DNA sequences andFunct Integr Genomics (2012) 12:19sequencing of individuals from families segregating for aparticular phenotype (Roach et al. 2010; Dewey et al. 2011;Tian et al. 2012). The comparative analysis of the genomesequences from members of a family enabled the noise inDNA sequencing, namely, sequencing errors and genetic heterozygosity in DNA from a diploid organism, to be defined(Roach et al. 2010; Dewey et al. 2011). Through the process oftracking sequence changes/inconsistencies in inheritance fromparent to offspring, Roach et al.", + "Characteristics of genotyping and sequencing technologies", + "Key bioinformatic steps totake a genetic study from an initial linkage or association to laboratory genotyping are illustrated. The reader should note the role of genomic sequence as a common thread through every stageregions in man (see Chapter 5). Similar issues also exist in the establishment oftrue orthology between genes in different species, where one is identified to play arole in a disease model. If two genes are truly orthologous, their evolution closelyfollows patterns of speciation (Fitch, 2000).In general terms, the approaches we describe can be applied to sequence data from any collection of organisms, but our emphasis here is primarily onBioinformatics for Geneticists, Second Edition. Edited by Michael R. Barnes2007 John Wiley & Sons, Ltd ISBN 978-0-470-02619-9 (HB) ISBN 978-0-470-02620-5 (PB)\u0002C106CH 6 COMPARATIVE GENOMICSquestions of relevance to human genetics. We begin, in Section 6.2 by presenting anoverview of genome structure and content, providing a context for the subsequentdiscussions.4Assembling a View of theHuman GenomeColin A. M. SempleBioinformatics, MRC Human Genetics Unit, Edinburgh EH4 2XU, UK4.1 IntroductionThe miraculous birth of the draft human genome sequence took place againstthe odds. It was only made possible by parallel revolutions in the technologiesused to produce, store and analyse the sequence data, and by the development ofnew, large-scale consortia to organize and obtain funding for the work (Watson,1990). The initial flood of human sequence has subsided as the sequencing centres have sequenced genomes from other mammalian orders and beyond.This fully indexed but semi-intelligibleBioinformatics for Geneticists, Second Edition. Edited by Michael R. Barnes2007 John Wiley & Sons, Ltd ISBN 978-0-470-02619-9 (HB) ISBN 978-0-470-02620-5 (PB)\u0002C4CH 1 BIOINFORMATICS CHALLENGES FOR THE GENETICISTbook of life immediately began to serve as a valuable framework for integration ofgenetic and biological data. However, knowledge of the genome sequence did notimmediately clarify the nature and structure of human genetic variation.", + "Key bioinformatic steps totake a genetic study from an initial linkage or association to laboratory genotyping are illustrated. The reader should note the role of genomic sequence as a common thread through every stageregions in man (see Chapter 5). Similar issues also exist in the establishment oftrue orthology between genes in different species, where one is identified to play arole in a disease model. If two genes are truly orthologous, their evolution closelyfollows patterns of speciation (Fitch, 2000).In general terms, the approaches we describe can be applied to sequence data from any collection of organisms, but our emphasis here is primarily onBioinformatics for Geneticists, Second Edition. Edited by Michael R. Barnes2007 John Wiley & Sons, Ltd ISBN 978-0-470-02619-9 (HB) ISBN 978-0-470-02620-5 (PB)\u0002C106CH 6 COMPARATIVE GENOMICSquestions of relevance to human genetics. We begin, in Section 6.2 by presenting anoverview of genome structure and content, providing a context for the subsequentdiscussions.4Assembling a View of theHuman GenomeColin A. M. SempleBioinformatics, MRC Human Genetics Unit, Edinburgh EH4 2XU, UK4.1 IntroductionThe miraculous birth of the draft human genome sequence took place againstthe odds. It was only made possible by parallel revolutions in the technologiesused to produce, store and analyse the sequence data, and by the development ofnew, large-scale consortia to organize and obtain funding for the work (Watson,1990). The initial flood of human sequence has subsided as the sequencing centres have sequenced genomes from other mammalian orders and beyond.This fully indexed but semi-intelligibleBioinformatics for Geneticists, Second Edition. Edited by Michael R. Barnes2007 John Wiley & Sons, Ltd ISBN 978-0-470-02619-9 (HB) ISBN 978-0-470-02620-5 (PB)\u0002C4CH 1 BIOINFORMATICS CHALLENGES FOR THE GENETICISTbook of life immediately began to serve as a valuable framework for integration ofgenetic and biological data. However, knowledge of the genome sequence did notimmediately clarify the nature and structure of human genetic variation.", + "Ample time was allotted to answer questions and a copy of \"A Guide to Your Genome\" (National Human Genome Research Institute 2007) was provided to further assist participants' understanding and ability to communicate results with family members or others.", + "Key bioinformatic steps totake a genetic study from an initial linkage or association to laboratory genotyping are illustrated. The reader should note the role of genomic sequence as a common thread through every stageregions in man (see Chapter 5). Similar issues also exist in the establishment oftrue orthology between genes in different species, where one is identified to play arole in a disease model. If two genes are truly orthologous, their evolution closelyfollows patterns of speciation (Fitch, 2000).In general terms, the approaches we describe can be applied to sequence data from any collection of organisms, but our emphasis here is primarily onBioinformatics for Geneticists, Second Edition. Edited by Michael R. Barnes2007 John Wiley & Sons, Ltd ISBN 978-0-470-02619-9 (HB) ISBN 978-0-470-02620-5 (PB)\u0002C106CH 6 COMPARATIVE GENOMICSquestions of relevance to human genetics. We begin, in Section 6.2 by presenting anoverview of genome structure and content, providing a context for the subsequentdiscussions.4Assembling a View of theHuman GenomeColin A. M. SempleBioinformatics, MRC Human Genetics Unit, Edinburgh EH4 2XU, UK4.1 IntroductionThe miraculous birth of the draft human genome sequence took place againstthe odds. It was only made possible by parallel revolutions in the technologiesused to produce, store and analyse the sequence data, and by the development ofnew, large-scale consortia to organize and obtain funding for the work (Watson,1990). The initial flood of human sequence has subsided as the sequencing centres have sequenced genomes from other mammalian orders and beyond.This fully indexed but semi-intelligibleBioinformatics for Geneticists, Second Edition. Edited by Michael R. Barnes2007 John Wiley & Sons, Ltd ISBN 978-0-470-02619-9 (HB) ISBN 978-0-470-02620-5 (PB)\u0002C4CH 1 BIOINFORMATICS CHALLENGES FOR THE GENETICISTbook of life immediately began to serve as a valuable framework for integration ofgenetic and biological data. However, knowledge of the genome sequence did notimmediately clarify the nature and structure of human genetic variation." + ], + [ + "In some organisms, there is no clear relationship between telomere length and lifespan.Age-related telomere attrition could not be detected in Daphnia pulex (57) or sea urchin species (Strongylocentrotus franciscanus and Lytechinus variegatus) (58).Studies in C. elegans examining natural variation in telomere length and experimentally manipulated telomere length detect no correlation with lifespan (59,60), and in Drosophila, which uses a telomerase-independent mechanism for telomere maintenance, there is a similar lack of correlation between longevity and telomere length (61).Similarly, data on sex differences in age-related telomere shortening are mixed.For example, in the ant species Lasius niger, the rate of telomere shortening is more rapid in short-lived males compared to longer-lived females.But, mean telomere length does not differ between the two types of females, queens and workers, despite the fact that queens live much longer than workers (up to 28 years vs 2-3 months) (62).These findings suggest that the question of how telomere shortening affects aging across species and how sex affects telomere attrition rates are complex.", + "With new methodologies to assess relative telomere length by Q-PCR, studies were designed to address the impact of telomere length on aging, aging associated pathologies, and mortality.One such study has correlated shorter leukocyte telomere lengths at age 60 with a three times higher risk of heart disease and an eightfold increase in risk of infection-related death (36), thereby associating measured relative cellular aging with disease and life expectancy.In a similar way, chronic stress was shown to correlate with short leukocyte telomere length, a phenomenon attributed to higher levels of oxidative stress at the cellular level (70).More recent studies have linked telomere length in smooth muscle cells with senescence and disease severity in patients with atherosclerosis (141,150).Leukocyte telomere length was also short in a cohort of similar patients and associated with a higher risk of developing occult cardiovascular disease (71).More data are needed to understand and validate the use of leukocyte telomere length as a biomarker for cardiovascular and other diseases.", + "Shortening of the telomeres at the ends of chromosomes has been associated with age-related disease and mortality [16][17][18].A recent study identified a common haplotype of four SNPs in the human telomerase reverse transcriptase gene (hTERT) that is enriched in centenarians and associated with longer telomere length [19].It was also shown that centenarians and their offspring maintain longer telomeres compared with controls and that longer telomeres are associated with protection from age-related diseases, better cognitive function and lipid profiles of healthy ageing [19].", + "New research has indicated how social factors, such as subordination, may translate into biological effects (epel et al. 2004;Chae et al. 2014).In a now classic study, epel et al. ( 2004) examined the telomere lengths of fifty-eight healthy premenopausal women who either had a healthy child (n = 19) or were giving care to a chronically ill child (n = 39. )They measured perceived stress, years of caregiving, telomere length, and oxidative stress.They found highly statistically significant differences in telomere length between women taking care of chronically ill children and those who had healthy children.They found highly statistically significant negative correlations between telomere length and perceived stress and years of caregiving.Telomerase activity had highly statistically significant negative correlations with perceived stress and years of caregiving.Oxidative stress was highly positively correlated with perceived stress and years of caregiving.They concluded that the telomere length shortening was equivalent to 9 to 17 years of aging in the high stress group.Telomere length is considered a biomarker of aging (Finch and Kirkwood 2000).Thus, this study showed that caregiver stress had essentially aged these women 9 to 17 years compared to women who had healthy children.", + "Adult studies have also found a negative correlation with baseline telomere length, suggesting a negative feedback regulation of leukocyte telomere length (Farzaneh-Far et al. 2010;Aviv et al. 2009;Epel et al. 2008;Nordfjall et al. 2009).It is possible that while our follow-up period was shorter than Shalev et al. 2013 and adult studies, which had a minimum of 5 year intervals with the exception of Puterman et al. (2015) who followed for a one-year time period, there may be biological regulation of telomere length at 4 and 5 years of age such that shorter telomeres are more robustly maintained, whereas longer telomeres have greater rates of decline, over a short period of one year.It is unlikely that this relationship is due to assay error or regression to the mean given the consistency of our findings across studies.We have had similar findings of longer telomeres having greater rates of decline and shorter telomeres being maintained in our different studies (Farzaneh-Far et al. 2010;Epel et al. 2008;Puterman et al. 2015).The single, consistent predictor of the rate of telomere attrition shown in multiple adult and the few child longitudinal studies is the baseline measurement of telomere length at the start of each study.This suggests the importance of understanding predictors of telomere length prior to adulthood, as it determines in part the rate of change (Revesz et al. 2014a, b;Nordfjall et al. 2009).Moreover, longitudinal studies in adults have had found that telomere attrition rate is dependent on baseline telomere length independent of any phenotypic predictors of shortening, such as disease or demographic variables (Nordfjall et al. 2009), attesting to the importance of studies to evaluate risk factors for shortening prior to adulthood.Rates of decline in childhood may be particularly relevant for later chronic disease risk as shorter telomere length has been implicated in disease progression through exposure to cellular senescence, inflammatory cytokines and adipocyte hypertrophy (Raschenberger et al. 2015;Willeit et al. 2014;Monickaraj et al. 2012;Fyhrquist et al. 2013).Telomere attrition in adultsAdditional early childhood telomere studies are also needed so as to better interpret disease across the lifespan.Specifically, retrospective adult studies have pointed to the importance of early life exposures, finding associations of shorter telomere length in adulthood with perinatal complications, and stressful and deprived early childhood environments, including factors associated with lower socioeconomic status and those of physical or social neglect (Drury et al. 2012;Shalev et al. 2014;Tyrka et al. 2010, Kananen et al. 2010).Some adult studies have found that predictors of shorter telomere length involving inflammation and oxidative stress exposures such as depression are only observed in younger adults compared with middle aged and older ones (Philips et al. 2013), suggesting it may be harder to tease out risk factors for accelerated attrition as, firstly, these processes happen early in life, and, secondly, repeat exposures to inflammation and oxidative stress may statistically plateau out across older age groups.While a strong maternal heritability to telomere length is widely reported (Broer et al. 2013), with overall heritability of telomere length estimated to be 64 % (95 % CI 39-83 %) (Hjelmborg et al. 2015a, b), the heritability of attrition rate is poorly understood.Hjelmborg et al. recently published data on studies of monozygotic and dizygotic twins and calculated heritability of attrition rate to be 28 % (95 % CI 16-44 %), less than the heritability of telomere length, suggesting a sizable environmental component.However, this twin study emphasized the importance of understanding environmental factors at birth and in the first years of life to better understand telomere dynamics in adulthood, as environmental exposures in adulthood played a small role in explaining adult rate of telomere loss.Additionally, studies suggest that the genetic variants associated with leucocyte telomere length in adults do not determine leukocyte telomere length in children.Different genetic determinants impact child compared with adult telomere length (Stathopoulou et al. 2015).Stathopoulou et al. suggest that single nucleotide polymorphisms (SNPs), associated with generally minor effects on telomere length in adulthood, may determine adult telomere maintenance processes versus different SNPs which are responsible for regulating telomere attrition in childhood.Abstract Telomeres are the protective complexes at the end of chromosomes, required for genomic stability.Little is known about predictors of attrition in young children or the relationship between parental and child patterns of telomere change.Telomere length was assessed twice over one year, at 4 and at 5 years of age, in Latino preschool children (n = 77) and their mothers (n = 70) in whole blood leukocytes.Maternal and child rates of attrition during the same time period were compared in 70 mother-child pairs.More children showed lengthened telomeres over one year compared to their mothers and very few children showed attrition (2.6 %).Approximately 31 % of children and 16 % of mothers displayed lengthening over one year while 66 % of children showed maintenance in contrast with 74 % of mothers.The strongest predictor for child telomere length change was child's baseline telomere length (r = 0.61,p < 0.01).Maternal rate of change was associated with child rate of change (r = 0.33, p < 0.01).After controlling for child baseline telomere length, the relationship between child and maternal rate of change trended towards significance (Coeff = 0.20, 95 % CI 0.03 to 0.43; p = 0.08).", + "Blackburn and Epel, a health psychologist who did original research on how specific lifestyle and psychological habits can protect telomeres, published The Telomere Effect (Blackburn & Epel, 2017), in which they suggested that individuals with shorter telomeres developed diseases earlier in life (a shorter \"disease span\").What follows is the evidence from these authors, their colleagues, and other researchers describing how length of telomeres contributes to mind-body connection and healthy longevity.", + "As early as at the time of birth, each of the 92 telomeres of the human genome has its own characteristic length.Additionally, each telomere shortens by its individual attrition rate.In general, longer telomeres at birth are associated with higher age-dependent attrition rates and vice versa.Overall, telomere shortening appears more dynamic in males.It is generally accepted that telomeres shorten during DNA replication both in vitro and in vivo.In individuals, short telomeres are considered to be a sign of advanced age.Cawthon and coworkers (2003) showed that telomere shortening in humans likely contributes to mortality, supporting the hypothesis that they might act as a mitotic clock (Allsopp et al., 1992).Telomere length dynamics, however, does not seem to Abstract.During aging, telomeres are gradually shortened, eventually leading to cellular senescence.By T/C-FISH (telomere/centromere-FISH), we investigated human telomere length differences on single chromosome arms of 205 individuals in different age groups and sexes.For all chromosome arms, we found a linear correlation between telomere length and donor age.Generally, males had shorter telomeres and higher attrition rates.Every chromosome arm had its individual age-specifi c telomere length and erosion pattern, resulting in an unexpected heterogeneity in chromosomespecifi c regression lines.This differential erosion pattern, however, does not seem to be accidental, since we found a correlation between average telomere length of single chromosome arms in newborns and their annual attrition rate.Apart from the above-mentioned sex-specifi c discrepancies, chromosome arm-specifi c telomere lengths were strikingly similar in men and women.This implies a mechanism that arm specifi cally regulates the telomere length independent of gender, thus leading to interchromosomal telomere variations.In conclusion, these data suggest that with increasing physical and genetic length of whole chromosomes, the corresponding telomeres also tend to be longer and that recombination rate and telomere length are inversely proportional.In conclusion, a combination of overall and chromosomespecifi c shorter telomeres and more pronounced age-dependent telomere erosion could be observed in males.There is a prospective clinical study strongly suggesting that longer telomeres decrease the risk of dying (Cawthon et al., 2003).With this in mind, the telomere length discrepancies between the sexes may indeed be a factor infl uencing the differences in their life expectancy.In every chromosome a linear decline of telomere length with age was observed, being more pronounced in men independent of the examined chromosome arm.This might suggest that telomere length on single chromosome arms may be infl uenced by the same factors which determine overall telomere length.S. Mayer a S. Brderlein a S. Perner a I. Waibel a A. Holdenried a N. Ciloglu a C. Hasel a T. Mattfeldt a K.V. Nielsen b P. Mller a a Institute of Pathology, University of Ulm, Ulm (Germany); b DakoCytomation A/S, Glostrup (Denmark) follow uniformity.In previous studies, sex-specifi c differences in telomere length and attrition rate of men and women were found (Benetos et al., 2001;Cawthon et al., 2003;Nawrot et al., 2004), suggesting gender differences in behavior of telomeres.In individual chromosome arms, telomere length was also shown not to be homogeneous (Lansdorp et al., 1996;Benn, 1997;Martens et al., 1998;Surralles et al., 1999;Hao and Tan, 2001;Londono-Vallejo et al., 2001;Graakjaer et al., 2003), some telomeres being signifi cantly shorter, others longer than the average length.To date, these characteristics in telomere lengths could not be set in a biological context, as only a few groups have provided detailed information about chromosome-specifi c patterns of telomere distribution (Lansdorp et al., 1996;Graakjaer et al., 2003).Whether accumulation of short telomeres (Martens et al., 2000;Londono-Vallejo et al., 2001) or rather the shortest telomere of one specifi c chromosome arm (Hemann et al., 2001) elicits senescence, remains an open question so far.In recent literature, there are hints that the average telomere length may be higher in women and that their annual shortening rate may be somewhat lower (Vaziri et al., 1993;Rufer et al., 1998;Jeanclos et al., 2000), but these reported differences failed to reach statistical signifi cance except for one study (Jeanclos et al., 2000).Here, we provide compelling evidence that this is indeed the case.", + "Shortening of the telomeres at the ends of chromosomes has been associated with age-related disease and mortality [16][17][18].A recent study identified a common haplotype of four SNPs in the human telomerase reverse transcriptase gene (hTERT) that is enriched in centenarians and associated with longer telomere length [19].It was also shown that centenarians and their offspring maintain longer telomeres compared with controls and that longer telomeres are associated with protection from age-related diseases, better cognitive function and lipid profiles of healthy ageing [19]." + ] + ], + "task_id": [ + "2F8796A8C3DC633F00DB901C9BA396DA", + "DEE6D385D1B01B4155AA4ABE59515893", + "10ABD2210053119B18D94F1FE266E73E", + "C52A9690417093A861C669A0753689BD", + "93DE2EF005059DFEA5A7FBBA3BD17D03" + ] +}
\ No newline at end of file diff --git a/gnqa/data/study1/datasets/human/dataset_citizenscientist_general_3.json b/gnqa/data/study1/datasets/human/dataset_citizenscientist_general_3.json new file mode 100644 index 0000000..07fa4b8 --- /dev/null +++ b/gnqa/data/study1/datasets/human/dataset_citizenscientist_general_3.json @@ -0,0 +1,116 @@ +{ + "question": [ + "Once a sperm combines with an egg, what determines how traits are passed on to the resulting lifeform? ", + "Why is genetic tracing matrilineal rather than patrilineal? ", + "How does one tell the difference between X and Y DNA, with respect to DNA tracing and determining QTLs?", + "what type of dataset is useful for qtl mapping analysis in genenetwork2? ", + "what are the bioinformatics tools for QTLs analysis?" + ], + "answer": [ + "The traits are determined by the combination of chromosomes from the sperm and egg during fertilization. Each parent contributes one set of 23 chromosomes, which include both dominant and recessive genes. These genes interact with each other and the environment, and sometimes by chance, to determine the traits of the offspring. The process of meiosis and recombination, or crossing over, also plays a crucial role in shuffling genetic material and creating genetic variation.", + "Genetic tracing is both matrilineal and patrilineal. Matrilineal tracing is done through mitochondrial DNA (mtDNA), which is passed from mother to all her children without any contribution from the father. Patrilineal tracing, on the other hand, is done through Y-DNA, which is passed from father to son. Both types of tracing provide different insights into an individual's ancestry.", + "The difference between X and Y DNA is not explicitly discussed in the context of determining QTLs in the provided text. However, in general, X and Y chromosomes are different in size and gene content, which can be identified through genetic sequencing. In the context of QTL studies, these differences could potentially influence the traits being studied. However, the specific process of how this is done in QTL mapping is not detailed in the provided text.", + "GeneNetwork2 utilizes datasets containing legacy SNP and transcriptome data for QTL mapping analysis. It also uses gene expression datasets from multiple brain regions and the entirety of > 7,000 BXD Published Phenotypes deposited in GeneNetwork2.", + "The bioinformatics tools for QTL analysis include R/qtl, QTL cartographer, MapQTL, WebQTL, QTL IciMapping, eQTL Explorer, eQTL Viewer, FastMap, Lirnet, and xQTL workbench. Other tools built into resources include QTL Analyst, Semantic Gene Organizer, and various tools for Gene Ontology overrepresentation and pathway matching." + ], + "contexts": [ + [ + "Selection could occur at multiple levels, from germ cell generation and propagation to fertilization and early embryonic growth.Chromosomal abnormalities, including aneuploidy, were found in 10-20% of spermatozoa and oocytes (20) and in the cleaved embryo, with a 21% rate of abnormalities in preimplantation embryos (21).These findings led to a model for natural selection against chromosome abnormalities (21).Selection extends to the end of gestation: Only approximately 30% of all conceptions result in a live birth, with more than half of aborted fetuses containing chromosomal abnormalities (22), a number likely to be an underestimate because of technological limitations in measuring all possible mutations.But even in the very small fraction of germ cell duos that survive this withering genome attack and result in a live birth, a number of severe de novo mutations will still be found (23).The data on gross chromosomal alterations suggest that overall, mutation frequency early in life is very high.The functional consequence, however, is limited because of selection.Somewhat surprisingly, this picture points toward an initial decline in genomic alterations, allowing the adult individual to acquire a somatic genome optimally equipped to provide function.", + "In most plants and animals, sexis a necessary component of reproduction, and the question for evolutionary biologistsis why reproductive mechanisms have evolved that way. In one of the experimentsdescribed next, evolutionary geneticists have nevertheless devised a way to compareevolution with and without recombination in the obligately sexual fruit fly.Sex brings harmful alleles together into thesame genetic background, allowing selection to more efficiently purge them fromthe population and potentially producing some offspring that are fitter than eitherparent. However, the benefit of recombining deleterious mutations may depend on thenature of the epistatic interactions between them. The mutational deterministic hypothesis(Kondrashov 1988) depends partly on this epistasis.This disparity in investment is the basis for the twofold cost: asexualfemales hypothetically could transmit twice as many alleles at the same cost. In most plants and animals, mates tend to be unrelated, leading to outcrossing. Butsex usually also involves the basic process of physical recombination: the breakage andreunion of two different DNA or RNA molecules. Of these two processes, recombinationis clearly the more widespread feature of sexual reproduction. A variety of reproductivesystems, such as selfing and automixis, involve recombination but not outcrossing. Incontrast, relatively few reproductive systems have outcrossing without recombination.Longago, Wright (1931) noted that sex may destroy adaptation because a successful combination of characteristics is attained in individuals only to be broken up in the next generation by the mechanisms of meiosis itself. Similarly, if alleles at different loci werejointly responsible for the production of phenotypes, sex has the potential to break apartcoadapted gene complexes, as it moves alleles away from genetic backgrounds wherebeneficial epistatic interactions have evolved through natural selection. Why should sex therefore be so common, given the obvious costs?", + "Crossing over-The swapping of genetic material that occurs in the germline.During the formation of egg and sperm cells, also known as meiosis, paired chromosomes from each parent align so that similar DNA sequences from the paired chromosomes cross over one another.Crossing over results in a shuffling of genetic material and is an important cause of the genetic variation seen among offspring.This process is also known as meiotic recombination.The reason for the rarity of these mutations is natural selection: If the mutations result in disorders that decrease health and reproductive fitness, they will eventually be eliminated from a population.In exceptional cases, mutations may cause both beneficial and detrimental consequences, resulting in opposing forces of positive selection and negative selection that may cause the mutations to be preserved at nonrare frequencies in a population.For example, the HbS mutation in the HBB gene (which produces the subunit of hemoglobin) causes sickle cell disease when present in both alleles, a detrimental consequence, but protects against malaria when present in 1 allele, a beneficial consequence, ensuring that the mutation persists in populations in areas of the world where malaria is endemic.Genes are passed from parents to offspring via the process of meiosis by which gametes, the egg cells in the mother and the sperm cells in the father, are generated.Ordinarily, each cell has 23 pairs of chromosomes; the gametes have 23 unpaired chromosomes.In meiosis, the 23 pairs are split so that each gamete receives 1 chromosome from each pair (Figures 8 and 9).Two gametes (egg and sperm) ultimately join into a single cell, the zygote, which has the full complement of 23 chromosome pairs restored.If all goes well, the zygote gives rise to a live offspring.Recombination (meiotic recombination)-The swapping of genetic material that occurs in the germline.During the formation of egg and sperm cells, also known as meiosis, paired chromosomes from each parent align so that similar DNA sequences from the paired chromosomes recombine with one another.Recombination results in a shuffling of genetic material and is an important cause of the genetic variation seen among offspring.Also known as crossing over.", + "In the generation of gametes, crossing over regularly occurs, and genetic information is swapped between members of a chromosome pair.That doesn't matter within inbred animals, because the swapped parts are identical.In an F 1 animal, however, the chromosomes of a particular pair are genetically different, one each having come from each parent.Each gamete produced will be unique, as will be each F 2 zygote formed by uniting of the gametes from two F 1 parents.An F 2 group thus provides for expression of some genetic variability.This variability is limited to the allelic differences existing between the parent strains of the F 1 s, so that another F 2 , derived from different inbred strains, will express different genetic differences.", + "In most plants and animals, sexis a necessary component of reproduction, and the question for evolutionary biologistsis why reproductive mechanisms have evolved that way. In one of the experimentsdescribed next, evolutionary geneticists have nevertheless devised a way to compareevolution with and without recombination in the obligately sexual fruit fly.Sex brings harmful alleles together into thesame genetic background, allowing selection to more efficiently purge them fromthe population and potentially producing some offspring that are fitter than eitherparent. However, the benefit of recombining deleterious mutations may depend on thenature of the epistatic interactions between them. The mutational deterministic hypothesis(Kondrashov 1988) depends partly on this epistasis.This disparity in investment is the basis for the twofold cost: asexualfemales hypothetically could transmit twice as many alleles at the same cost. In most plants and animals, mates tend to be unrelated, leading to outcrossing. Butsex usually also involves the basic process of physical recombination: the breakage andreunion of two different DNA or RNA molecules. Of these two processes, recombinationis clearly the more widespread feature of sexual reproduction. A variety of reproductivesystems, such as selfing and automixis, involve recombination but not outcrossing. Incontrast, relatively few reproductive systems have outcrossing without recombination.Longago, Wright (1931) noted that sex may destroy adaptation because a successful combination of characteristics is attained in individuals only to be broken up in the next generation by the mechanisms of meiosis itself. Similarly, if alleles at different loci werejointly responsible for the production of phenotypes, sex has the potential to break apartcoadapted gene complexes, as it moves alleles away from genetic backgrounds wherebeneficial epistatic interactions have evolved through natural selection. Why should sex therefore be so common, given the obvious costs?", + "Traditionally, it has been agreed that thenal sex of an individual (phenotypic sex)depends on two sequential processes: the sexdetermination system of the species and thegonad differentiation process (Valenzuela,2008). However, recently, these two seeminglydistinct processes are viewed as part of a general process leading to gonad formation andsex ratios (Sarre et al. , 2004; Quinn et al. , 2011;Uller and Helantera, 2011).However, we expect thatonly at this level, the most signicant contributions brought by integrating epigenetics will bemade. Concluding Remarks and FutureProspectsFish sex ratios are the result of a complex interaction between genetic, biochemical, and environmental interactions. The ultimate resultof these interactions at the individual level isgender: male or female. However, at the population level, the combination of sex determination and differentiation sets the sex ratio. Inturn, sex ratios dene the reproductive capacityof populations and, if sex growth dimorphismexists, also the growth characteristics, something very important in an aquaculture context.The inheritance of sex based on major sexfactors, also known as chromosomal sex determination, includes monofactorial and multifactorial SD mechanisms, with the presence of aFunctional Genomic Analysis of Sex Determination and Differentiation in Teleost Fish(A)ZygoteSex determinationEmbryosSex differentiationLarvaeJuvenilesSex changeAdultsTime(B)Majorsex factorsMinorsex factorsMonofactorialaPolyfactorialdbcEnvironmentalEnvironmentaldifferencesFigure 8.2 Sex determination and differentiationin sh. (A) The processes of sex determination,sex differentiation, and sex change are representedalong the timeline of development.", + "Obehav is, in turn, influenced by offspring genesand environment (Ogene and Oenvir respectively). Hence, indirect genetic effects (blue arrows)and direct genetic effects (red arrow) are important influencers of behaviour. B) Parentoffspring conflict theory predicts that parental resource investment and offspring solicitationbehaviours are influenced by the fitness benefit to a focal individual (O), cost to a socialpartner such as a sibling (S1 and S2) or parent (P), and by their coefficient of relatedness(black arrows). 42Figure 2: Genomic imprinting can result in divergent phenotypes from the samegenotype. A) A paternally imprinted gene, i.e. maternally expressed.", + "Because of the small contribution, through the sperm, ofthe paternal transcriptome to the fertilized zygote, and because of the stronger maternal contributionto child rearing in most model organisms, parental effects are typically thought of as synonymous withmaternal effects, although true paternal effects are known to exist (Rando, 2012). Maternal effects have been shown to be important during embryonic development, leading todifferences in the birth weight of mice depending on the genotype of the mother (Cowley et al. ,1989; Wolf et al. , 2011).Therefore, the resulting phenotypic patterns lag a generationbehind the genetic transmission of the causal variants. The most well-studied parental genetic effectsare caused by deposition of maternal transcripts into the egg prior to fertilization, resulting indifferences in early embryonic development depending on the genotype of the mother. Certain geneshave also been shown to respond to maternal influence after birth through genetically definedmaternal behaviors (Weaver et al. , 2004).", + "It was believed by many that for each trait variant we should expect to find acorresponding genetic change, or gene for that trait. Through historical happenstance therelationship between genes and traits was set up and treated as if it were one-to-one. But theproduction of a trait involves not only genes, but also their interactions with each other and theenvironment, and chance." + ], + [ + "distinguishing prenatalfrom postnatal maternal effects, see below). Maternal effects canaccount for a large proportion of phenotypic variance, especiallyduring early life, and for some traits explain more variation thandirect genetic effects [33, 97, 99, 100, 102115]. However, maternal and offspring genotype are correlated (i.e. half their genes areshared), and in inbred lines they are fully confounded, thus separating the effects of their respective genotypes is difficult. To removethis confounding effect cross-fostering has been used, both in thelaboratory and in the field [119, 131].", + "Using genetic markers, the pattern of inheritance can be tracked throughfamilies. For example, by analyzing a marker linked to the eye color genein several generations, it is possible to determine from which grandparents achild has inherited its eye color alleles. More importantly, nding a markerlinked to a disease can lead to location of the faulty gene causing the disease. Finding the gene is very valuable in the search for the cure. The distance between two loci can be expressed either as physical or genetic distance.", + "Although autosomal SNPs are commonly used as genetic markers to infer ancestry or race/ethnicity membership, haploid such as mitochondria, Y-DNA, and X-lined markers are also important to provide separate stories of ancestry of individuals from paternal and maternal sides [42,43].Therefore, genetic structure created due to autosomal markers could be different from those of lineage markers (often influenced by political, social, and migration history of individuals/populations).mitochondrial DNA or mtDNA haploid is the maternally inherited mitochondrial genome (mtDNA) [44].All children inherit mtDNA from their mother, with no admixture from the father.Like Y-line DNA, mtDNA is passed intact from one generation to the next but through maternal line.a) Autosomal DNA (testing both sexes) markers: autosomal DNA tests utilize DNA from the 22 pairs of autosomal chromosomes.Autosomal DNA is inherited from both parents.Autosomal testing provides percentages of ethnicity using autosomal DNA SNP test (i.e., ancestry informative markers), and it is the most commonly used test to infer ancestry across diploid genome.b) Y-DNA or Y-SNPs (paternal line testing) markers: a haploid Y-DNA is the paternally inherited non-recombining portion of the Y chromosome, and it tests only for males.The Y-DNA testing tests the Y chromosome which is passed intact from father to son with no DNA from the mother.Y-DNA testing can then be used to trace direct paternal line.Y-DNA remains the same in each generation, allowing us to compare surname from different regions to see if we are from the same family.Y-line testing does not indicate anything about the contributions of the other ancestors in a family tree.In other words, you could be 3/4th Native American, with only the direct paternal line being European, and this test would tell you nothing at all about those other three Native lines.When testing the Y-chromosome, there are two types of tests, short tandem repeat (STR) and SNP markers.STR tests are best for recent ancestry while SNP tests tell about more ancient ancestry.c) Mitochondrial DNA (maternal line testing) markers:", + "Additional information about past breeding practices can be gleaned by quantifying the number of reproductive males and females in a population.This can be achieved by comparing levels of genetic diversity between sex chromosomes, autosomes and mtDNA 99 .In cattle, for example, gene flow from aurochs is evident in the autosomes but is absent in mtDNA 41 .This has been interpreted as a management strategy that may have involved allowing insemination of domesticated females by wild bulls 41,100 .In horses, a comparison of the levels of diversity of the Y chromosome and the autosomal chromosomes demonstrated that some cultures allowed fewer males to breed and instead selected specific stallion bloodlines 55 .This male-oriented breeding strategy was not practised by the Romans and only became increasingly prominent in the past 1,000 years as a result of the growing influence of Oriental stallions (Arabian, Persian and Turkmen) 101 .", + "Dr Ring: What makes the maternal gene so peculiar compared to the paternal?Dr Cookson: If you look in the epidemiologic sense, many studies show that there is increased risk of allergic disease if the mother is affected.However, very few studies have actually set out to test that formally and most of them might suffer from some sort of selection bias because the mother is more likely to be aware of her symptoms and feel guilty, and so on.It is very difficult to explain.Is it genomic imprinting, where the gene is only active when transmitted through the mother?I do not think all of these genes would be imprinted, though it is possible.It also seems that there are effects of the maternal phenotype.The maternal phenotype, if the mother is affected or unaffected, determines the strength of the maternal effect.Again, if a gene was imprinted, you would not expect maternal phenotype to be important.So, I think that this has something to do with maternal/fetal interaction, either through the placenta or shortly after birth.There is the issue of immune conflict between mother and child.At the same time, the mother is trying to prime the infant's immune system.", + "Genetic and Genomic Discovery Using Family StudiesIngrid B. Borecki, PhD; Michael A. Province, PhD G enetic studies traditionally have been performed on sets of related individuals, that is, families.Mendel's early studies in sweet peas (Pisum sativum) on the inheritance patterns of discrete traits from parents with specific mating types to offspring has shed light on the basic mechanisms of inheritance, including the fundamental laws of segregation of discrete factors (genes) from parents to offspring and the cosegregation of genes that are closely located on a chromosome (linkage).The distribution of traits within families exhibited mathematical segregation ratios in offspring from known mating types.These expected segregation ratios have been used as an important discovery tool in the study of human diseases in pedigrees, providing evidence for a multitude of single-gene disorders.Furthermore, in some cases, trait cosegregation with genetic markers with known positions provides mapping information that enables localization and, ultimately, identification of the relevant causative gene.", + "In fact, this idea has been pursued before in thecontext of signatures of reproductive isolation and shown to revealpatterns consistent with epistatic gene interactions that arise in theshape of Dobzhansky-Muller incompatibilities [10,11]. In contrast to the mouse data, the available human genotypeswere derived from outbred, ethnically distinct populations. In thiscase pairs of functionally interacting genes can be detectedfollowing a slightly different approach.", + "Fig. 3. Illustrations of the three CEU pedigrees (black) showing how genetic information from distant patrilineal relatives (arrow; red, patrilineal lines) can identify individuals.Filled squares represent sequenced individuals.To respect the privacy of these families, only abbreviated versions are presented.The sex of the CEU grandchildren was randomized.The numbers of grandchildren are not given.", + "DiscussionKinship and genetic driftAuthor ManuscriptThe expanded family of BXDs is a well powered resource for both forward and reversegenetic analyses of genome-to-phenome linkage. As this family has grown, relations amongindividual strains have become complex, requiring the use of linear mixed models (Arends etal. , 2010; Sul et al. , 2016; Zhou and Stephens, 2014) or nonparametric equivalents such asmixed random forests (Stephan et al. , 2015) that account for kinship, epoch, and othercofactors. The family has kinship at several levels.", + "When I was in high school, I remember often trying to match my friends to their parents at various school functions and being surprised at how easy this was.As human geneticists, in spite of the enormous advances being made in our field, we still cannot answer many of the everyday questions that we are asked, such as: \"Why does he look just like his mother? \"Max Perutz [1], in a recent editorial comment in the New Scientist entitled \"The Molecular Biology of the Future,\" suggested some questions, for, as he put it, \"an examination in some future century. \"Here are two of them: (1) \"The time has come\" the Walrus said, \"To talk of many things ...And why the sea is boiling hot And whether pigs have wings. \"Calculate the amount of genetic information this would require in megacricks.", + "Using genetic markers, the pattern of inheritance can be tracked throughfamilies. For example, by analyzing a marker linked to the eye color genein several generations, it is possible to determine from which grandparents achild has inherited its eye color alleles. More importantly, nding a markerlinked to a disease can lead to location of the faulty gene causing the disease. Finding the gene is very valuable in the search for the cure. The distance between two loci can be expressed either as physical or genetic distance.", + "Another way of avoiding stratification is to use family-based samples.This approach has several theoretical advantages: as well as being immune to stratification 114 , these samples can be used to determine whether an allele has different effects on disease when it is inherited maternally or paternally 115 , and DISCORDANT SIB designs [116][117][118] can control for the effects of shared environment.Furthermore, more complex family-based designs are possible 119 that might allow combined association and linkage analysis 120 , and family-based association tests have also been developed for quantitative traits [94][95][96][97][98] .However, pure sibship-based association studies are underpowered relative to case-control studies 107,116,117 , and the requirement for living parents might introduce an age-of-onset bias towards younger patients for diseases that usually arise late in life.Furthermore, family-based samples are often much more difficult to collect, particularly if larger pedigrees are sought.Finally, the most commonly used family-based design, the TRANSMISSION DISEQUILIBIRIUM TEST (TDT; see REF. 114) is susceptible to technical artefacts (see below).", + "There are also a number of companies that utilize ancestry informative markers (AIMs) and claim that they can provide accurate determinations of a person's ancestry.The problem with these services is their assumption that for all populations reliable genetic markers of high ancestry informative value exist.There is also a second assumption that the frequency of these markers has not changed through time.This may be true for persons of european descent, in areas that have not seen large population disruptions.however, it is doubtful that a reliable genetic marker panel can be produced for German or Lithuanian Jews, just as such a panel for Western or Central African regions that were impacted by the slave trade is less likely.To understand this sophistication requires training in evolutionary and population genetics.Unfortunately, many of the scientists working with these companies do not have adequate background in these disciplines.In general, American universities are not providing the majority of biology students training in these disciplines.Marocco (2000) reported that only 46 percent of the phD-granting public universities and 15 percent of the phD-granting private universities required evolution as a core course.Genetics is widely required as a core at the undergraduate level, but the topics of population and quantitative genetics are at the back of the major texts and the genetics courses are usually taught by molecular geneticists.Neither is evolution well covered in anthropology texts (White et al. 2009. )White and colleagues' ( 2009) study showed that these texts did not give a single accurate definition when the topic was present.Additionally, the definitions often changed when books were written for cultural versus physical anthropology and often changed within the same text.This means that even at the undergraduate level, the tools required to critically approach molecular reductionist thinking are not widely provided to students.Graduate curricula tend to be narrower than undergraduate training.Thus, the vast majority of scientists who go into human genetics, bioinformatics, computational biology, and genomics are not well prepared to address the complex interactions that account for the phenotypes we observe in modern societies.", + "To scrutinize the polygenic networks underlying complex diseases, however, mouse resourcesthat are optimized to study the actions of isolated genetic loci ona fixed background will be insufficient on their own. For example, predisposition to the metabolic syndrome is inherited ina non-Mendelian fashion stressing genetic heterogeneity andmultigenetic pathogenesis (Nandi et al. , 2004). With the reawakening as to the extraordinary genetic resources and phenotypicdiversity archived in extant inbred strains, however, a foundationis in place for tracking down these complex traits and quantitative trait loci (QTL).", + "Otherwise, tens of thousands or markers will appear significant inthe genome-wise association studies using up to one million geneticmarkers. Approaches to control for stratification include using ofself report of ancestry or genetically derived principle componentsin the analysis. For studies using inbred mouse lines, a cladogramwhich is a hierarchical grouping based on phylogenetic analysis ofstrain relatedness can be created to subdivide inbred strains intomore genetically homogenous subgroups.", + "These haplotype mosaics form the basis of geneticanalysis and data integration in the CC and DO. In contrastto natural or commercial outbred populations, the founderhaplotypes of these multiparental populations (and similarpopulations in other model organisms) are known and wellcharacterized by sequencing. This presents a tremendousadvantage in the search for causal variants of complextraits: provided a genomic segment in an experimentalanimal can be assigned to a founder haplotype using a fewtagging markers, the remaining known variants can beimputed with essentially complete certainty.", + "Although bilateral descent is the norm in Western societies, it is not universal and there is variation with cultural practices around lineage.In certain societies, individuals place greater importance on (and have greater knowledge about) one side of the family than another (unilineal descent).Thus, individuals in patrilineal groups trace relationships through males only so that your father's brother's children are members of your family, but not your father's sisters (Kottak, 2007).They are members of their husband's group or family.Efforts to create a family pedigree may be hampered if the participant is not familiar with her mother's relatives, but her mother's brother's children (her cousins) may be able to supplement her overall family history.Knowledge about the cultural system of unilineal descent avoids assuming the universality of bilateral descent.Cultural beliefs such as these also have implications in the conduct of genetic research in terms of confidentiality and autonomy (Benkendorf et al., 1997;Wertz, 1997).One cannot assume that the named proband is in a position to speak for the extended family in agreeing to participate in any genetic research (DudokdeWit et al., 1997).", + "In particular in polygynous species, a femalesoffspring may have different fathers and are thus more closely related through the maternalthan the paternal line. Therefore, any fitness cost to mothers, such as increased provisioningand care, affect maternally derived genes more strongly than paternally derived genes,leading to the silencing of the maternal copy (i.e. paternal expression) of genes that increaseresource transfer. 5. Coadaptation between offspring and maternal traitsThe genetics of the co-evolution of parental and offspring traits has been investigated usingquantitative genetics models and in several empirical studies (Agrawal et al.", + "Because of the small contribution, through the sperm, ofthe paternal transcriptome to the fertilized zygote, and because of the stronger maternal contributionto child rearing in most model organisms, parental effects are typically thought of as synonymous withmaternal effects, although true paternal effects are known to exist (Rando, 2012). Maternal effects have been shown to be important during embryonic development, leading todifferences in the birth weight of mice depending on the genotype of the mother (Cowley et al. ,1989; Wolf et al. , 2011)." + ], + [ + "Genetic mapping inmouse strains enhances the power of detecting modifier genes and identifying complexgenetic interactions. Genomewide quantitative trait locus (QTL) analysis, as described inmore detail below, represents a promising approach to detect genetic variants that areassociated with specific phenotypes and interact with each other. 16ACCEPTED MANUSCRIPTIn experimental crosses of two (inbred) strains the first generation (F1) ofoffsprings is genetically heterozygous but equal. Then in the next generation (F2) thePTstrain-specific genetic information is distributed across the genomes of their progeny andRIeach offspring is genetically unique.", + "Second, and perhaps moreimportant, is the difference in the size and types of thegenetic reference populations. In our previous study, wemapped the QTL with 36 F2 mice that were genotyped at82 markers. In the current study, by comparison, we wereable to map QTLs after examining 342 mice from 55 strainsthat were genotyped at approximately 4000 markers.", + "This contrast can be exploited to identify subregions that underlie the trans-QTLs [67]. SNPs were counted for all four pairs of parental haplotypesBvs D, B vs H, B vs C, and L vs Sand SNP profiles for the fourcrosses were compared (figure 6). Qrr1 is a highly polymorphicPLoS Genetics | www.plosgenetics.org8November 2008 | Volume 4 | Issue 11 | e1000260QTL Hotspot on Mouse Distal Chromosome 1Figure 5. QTL for aminoacyl-tRNA synthetases in distal Qrr1.", + "The traditional approach to QTL mapping is to usetwo strains that differ maximally in the phenotype asparental strains for genetic crosses, with the followingcaveats. QTL analysis based on a single cross will mostlikely reflect only a small portion of the net geneticvariation, and QTL detection will be limited to regionswhere the two progenitor strains have functional polymorphisms. Data from multiple crosses, or from an HS,will overcome this limitation and can also be used toreduce QTL intervals [5,30].", + "These candidate genes are then sequenced in the two parental inbredstrains looking for sequence dierences in coding or regulatory regions. After ne mapping the QTL interval and shortening the list of plausiblecandidate polymorphisms, the major challenge remains \u0001 proving denitivelywhich nucleotide polymorphism underlies the QTL. The most direct proofwould be replacing one strains allele with another strains allele (creating aFIG. 1. Intercross breeding strategy for mapping quantitative trait loci (QTLs). On the right, the parental, F1 hybrid, and intercross (F2) mousegenerations are depicted.", + "Furthermore, splicing QTLs(sQTLs) rather than eQTLs could comprise the molecular mechanism linking DNA variants with YFP53; thus, sQTL analysis could uncover genes that would not normally bedetected at the level of differential gene expression (DGE),53 and thus, a differentially181182Molecular-Genetic and Statistical Techniques for Behavioral and Neural ResearchFigure 8.5 Schematic for immediate, rapid ne mapping in select F2 recombinants of the RCC-F2cross. Top panel: Genome-wide signicant QTL (green trace; red dashed line signicance threshold;blue vertical lines Bayes credible interval).", + "Interval-specific haplotype analysisApproximately 97% of the genetic variation betweeninbred mouse strains is ancestral [22], so regions ofidentity by descent (IBD) between two strains used todetect a QTL are highly unlikely to contain the causalgenetic polymorphism underlying the QTL [28]. Forexample, a cross between C57BL/6J and A/J mice detectedwww.sciencedirect.coma blood pressure QTL on Chr 1 [7].", + "Interval-specific haplotype analysisApproximately 97% of the genetic variation betweeninbred mouse strains is ancestral [22], so regions ofidentity by descent (IBD) between two strains used todetect a QTL are highly unlikely to contain the causalgenetic polymorphism underlying the QTL [28]. Forexample, a cross between C57BL/6J and A/J mice detectedwww.sciencedirect.coma blood pressure QTL on Chr 1 [7].", + "At present, the BXD panel is composed of 80 different strains that all have beenfully genotyped.26 Variation in any quantifiable trait can be associated with thesegregation of parental alleles, and linkage genetics can map this variation toquantitative trait loci (QTLs), thereby identifying the genomic region(s) affectingthat trait. An overview of the QTL mapping approach is depicted in Figure 2. Classical QTL analysis has permitted the identification of loci that areassociated with variation in HSC traits.", + "In general,linking genetic variation with trait variation identifies QTL and a significant linkage ofphenotype and genotype suggest that the DNA status helps to determine trait expression. As stated above, mouse QTL studies provide distinct advantages over human studiesin the examination of genetic causes of a quantitative trait (e.g. alcoholism), even in theabsence of specific hypotheses regarding its aetiology or candidate genes.The progenitor mouse strainsshould have sufficient variation for the traits of interest and they should be genetically diverseenough to enable genetic mapping (BENNETT et al. 2006; FLINT 2003; GRISEL 2000). Thesample size required for the identification of QTL depends largely on the effect size that aQTL contributes to phenotypes on interest. Inference about QTL can be made if one or moregenetic markers are over- or underrepresented in the analysed individuals. Genotyping isoften done by means of microsatellite markers, which contains mono, di-, tri-, ortetranucleotide tandem repeats flanked by specific sequences (Figure 4a).This comparison gives information about the reliability of the observed genotypeinformation: The more the marker locations differ between the two maps (which signifiesvariation in marker positions), the higher the possibility of genotyping errors. QTL mapping was done in several stages to identify loci acting individually and QTL thatinteracted, either additively or epistatically. To determine individually-acting QTL, a singleQTL genome scan was conducted with the function scanone.", + "Importantly, whereasthese studies required substantial labor, time, and resources, X-QTL is a quick and easyapproach to achieve a comparable level of genetic dissection. The levels of complexityobserved here (e.g. 14 loci explaining 70% of the genetic variance for 4-NQO resistance) arestill dramatically lower than those seen in for some human traits in GWAS (e.g. 40 lociexplaining 5% of the variance for height 2,5). One obvious explanation is the difference inexperimental designs (line crosses vs. population association studies), but differences ingenetic architectures among species and traits may also contribute.", + "The method uses two pieces of information: mapping data from crosses thatinvolve more than two inbred strains and sequence variants in the progenitor strains within the intervalcontaining a quantitative trait locus (QTL). By testing whether the strain distribution pattern in the progenitor strains is consistent with the observed genetic effect of the QTL we can assign a probability that anysequence variant is a quantitative trait nucleotide (QTN). It is not necessary to genotype the animals exceptat a skeleton of markers; the genotypes at all other polymorphisms are estimated by a multipoint analysis.", + "The method uses two pieces of information: mapping data from crosses thatinvolve more than two inbred strains and sequence variants in the progenitor strains within the intervalcontaining a quantitative trait locus (QTL). By testing whether the strain distribution pattern in the progenitor strains is consistent with the observed genetic effect of the QTL we can assign a probability that anysequence variant is a quantitative trait nucleotide (QTN). It is not necessary to genotype the animals exceptat a skeleton of markers; the genotypes at all other polymorphisms are estimated by a multipoint analysis.", + "Genotyping all the individual progeny formarkers that show allelic variation between the parental strains (either single nucleotide polymorphisms or simple sequence repeats) will allow the detection of associations between trait values and marker genotype, and in this way demonstrate to whichset of markers a QTL is linked. To reduce the genotyping effort, selective genotypingof the individuals at the extremes of the phenotypic spectrum can be performed (20,23). Although these three approaches are in general considered to be the best to detect andmap QTL, they have several disadvantages for quantitative traits involving HSC.", + "So, how do you go about planning and performing a QTL study, and howdo you identify the responsible gene within a QTL that you have identified? Generally, one starts by performing a strain survey to find two parental inbredstrains that have a markedly different trait. One can now look up many differenttraits of inbred mice online at the Mouse Phenome Database (http://phenome. jax.org/pub-cgi/phenome/mpdcgi?rtn=docs/home). However, the trait you maywant to study may not be present in wild type mice, so you may want to crossa mutant (or genetically engineered) strain onto several inbred strains.QTL Theory and PlanningThe theory behind the most basic form of QTL mapping is based upon intercrossing two inbred strains. The mouse genome consists of 19 pairs of autosomes (non sex-determining chromosome) and the X and Y chromosomes. Inthe example shown in Fig. 18.1, we are intercrossing stain A (shown with ablack chromosome pair) with strain B (shown with a white chromosome pair). The initial F1 (filial generation 1) mice are true hybrids, with each individualFrom: Molecular Biomethods Handbook, 2nd Edition.", + "These candidate genes are then sequenced in the two parental inbredstrains looking for sequence dierences in coding or regulatory regions. After ne mapping the QTL interval and shortening the list of plausiblecandidate polymorphisms, the major challenge remains \u0001 proving denitivelywhich nucleotide polymorphism underlies the QTL. The most direct proofwould be replacing one strains allele with another strains allele (creating aFIG. 1. Intercross breeding strategy for mapping quantitative trait loci (QTLs). On the right, the parental, F1 hybrid, and intercross (F2) mousegenerations are depicted.", + "QTL mapping studies thenseek to detect the polymorphisms underlying the complex traits of interest byscanning for alleles that co-vary withthe traits. Similar experiments also can be conducted with special derivatives of inbredstrains known as recombinant inbred(RI) mice. These animals are derivedby cross-breeding two or more distinctparental strains (which often divergewidely for the trait of interest), followedby inbreeding of the offspring for severalgenerations (Bailey 1971). Given thecorrect breeding strategy, this method1This is an issue faced by GWASs researchers when classifyingsamples as cases or controls." + ], + [ + "The project also provides online analysis tools to allowidentification of correlations within its data set. GeneNetwork (http://www.genenetwork.org), encompassing WebQTL, is a database ofgenotypes and complex phenotypes ranging from gene expression to behaviour in standardinbred strains, and six panels of mouse recombinant inbred strains including the two largestsets (BXD and LXS) of approximately 80 strains each. Rat and Arabidopsis populations arealso represented. Approximately 1500 phenotypes spanning the 25 year history of thesestrains are incorporated in this public resource, many of which were retrieved from theliterature.", + "BioinformaticsAll of the genetic analyses were carried out in GeneNetwork, whichis an open source bioinformatics resource for systems genetics thatexists as both a repository for genetic, genomic and phenotypicdata together with a suite of statistical programs for data analysis that includes mapping and evaluating QTLs, examining phenotype/genotype correlations and building interaction networks. QTL mappingThe QTL mapping module of GeneNetwork was used to identifyQTLs for hippocampal morphometry and radial maze trait data. Thismodule enables interval mapping, composite interval mapping anda pairwise scan option to identify epistatic effects.", + "Thereare four options for QTL mapping on the GeneNetwork website: intervalmapping, marker regression analysis, composite interval mapping, and pairscan analysis. In this case, interval mapping was used to compute linkagemaps for the entire genome. The log of odds (LOD) score was used toassert that a causal relation exists between a chromosomal location and aphenotypic variant, such as Gsto1 expression variation.", + "Webqtl is an online database [110] of linked datasets, including genotype and expressiondata, covering multiple species including mouse, macaque monkey, rat, drosophila,arabidopsis, plants and humans [60]. While this tool cannot be used to calculate eQTLs, itcan be used to find and visualize eQTLs in different species, strains and tissues. It canperform single- and multiple-interval QTL mapping of up to 100 selected traits. Users canalso upload their own trait data for populations included in the database. It can also calculateand display trait-correlation matrices and network graphs (also for up to 100 traits).", + "Once the data is normalized appropriately (in our case, no normalization was required), the QTLcan be mapped. To do this, select the mapping tools drop down window (Figure 6). There arethree methods to choose from, GEMMA, Haley-Knott Regression, and R/qtl (Figure 6). Genomewide Efficient Mixed Model Analysis (GEMMA; github.com/genetics-statistics/GEMMA; (Zhouand Stephens, 2012) is a multivariate linear mixed model mapping tool that is used to mapphenotypes with SNPs with a correction for kinship or any other covariate of interest. Thisability to account for covariates is highly useful, but also this increases the time taken forcomputations.", + "WebQTL is the primary module in the GeneNetwork online resource (www.genenetwork.org),and provides a powerful environment to analyzetraits controlled by genetic variants (Chesler et al. 2004; Wang et al. 2003). It includes data from many485Fig. 2. Complexity of eQTL data. The graph shows a threedimensional schematic view of the high dimensionality ofthe eQTL data set generated from the BXH/HXB RI strainpanel (Hubner et al 2005; unpublished).", + "QTL MAPPING AND QTG DISCOVERY IN THE RCCA variety of statistical methods and tools have been developed for QTL mapping andimplemented in free software for public use. These methods are well suited for simplebackcross and F2 RCC populations. R/qtl9,39 was developed for identication ofQTLs and higher order modeling. Another Web-based tool, GeneNetwork orWebQTL (GeneNetwork.org),40 was developed for QTL mapping and to exploreassociations between variants, molecular traits (e.g. , gene expression), and higher orderphenotypes (e.g. , behavior) and facilitate QTG identication.", + "This enables gene expressioncorrelation and interval mapping, candidate gene searches and multitrait analyses. Each exported dataset was subject to an interval mapping analysis,which uses GeneNetworks embedded MapManager software(Manly et al . 2001) to perform HaleyKnott regression. Empirical P values were derived using 1000 permutations using the incorporatedpermutation feature of WebQTL. The peak of each statisticallysignificant (P -value <0.05) or suggestive (P -value <0.63) (Lander& Kruglyak 1995) QTL was determined based on empirical P values (Doerge & Churchill 1996). A one-LOD drop-off was usedto determine the QTL confidence interval about each peak.", + "The peak linkage valueand position was databased in GeneNetwork and userscan rapidly retrieve and view these mapping results forany probe set. Any of the QTL maps can also be rapidlyregenerated using the same Haley-Knott methods, againusing functions imbedded in GeneNetwork. GeneNetwork also enable a search for epistatic interactions (pairscanning function) and composite interval mapping withcontrol for a single marker. Data quality controlWe used two simple but effective methods to confirmcorrect sample identification of all data entered intoGeneNetwork.", + "QTL analysisAll QTL mapping for phenotypes was performed using the WebQTL software module of the170GeneNetwork (www.genenetwork.org) [34]. Interval mapping to evaluate potential QTLs wascalculated from the likelihood ratio statistics (LRS) as the softwares default measurement ofthe association between differences in traits and differences in particular genotype markers. Another common measure score, the log of the odds (LOD) ratio, can be converted from theLRS (LRS/4.61). Suggestive and significant LRS values were determined by applying 1000175permutations.", + "Once the data is normalized appropriately (in our case, no normalization was required), the QTLcan be mapped. To do this, select the mapping tools drop down window (Figure 6). There arethree methods to choose from, GEMMA, Haley-Knott Regression, and R/qtl (Figure 6). Genomewide Efficient Mixed Model Analysis (GEMMA; github.com/genetics-statistics/GEMMA; (Zhouand Stephens, 2012) is a multivariate linear mixed model mapping tool that is used to mapphenotypes with SNPs with a correction for kinship or any other covariate of interest. Thisability to account for covariates is highly useful, but also this increases the time taken forcomputations.", + "Unlike interval-specific haplotype analysis, which is most useful for narrowing a QTL shared bymultiple crosses, genome-wide haplotype analysisrequires only phenotype information from many inbredstrains and can effectively narrow a QTL identified inonly one experimental cross [36]. After narrowing the QTL to an interval that is !5 Mbusing these bioinformatics techniques or classical experimental methods, strain-specific sequence and geneexpression comparisons are effective for focusing on afew strong candidate genes (Figure 7).", + "Unlike interval-specific haplotype analysis, which is most useful for narrowing a QTL shared bymultiple crosses, genome-wide haplotype analysisrequires only phenotype information from many inbredstrains and can effectively narrow a QTL identified inonly one experimental cross [36]. After narrowing the QTL to an interval that is !5 Mbusing these bioinformatics techniques or classical experimental methods, strain-specific sequence and geneexpression comparisons are effective for focusing on afew strong candidate genes (Figure 7).", + "We considered QTL intervals that achieved genome-widesignificance for one phenotype, and genome-wide suggestive forothers, as highest priority for candidate gene analysis. The January 2017 BXD genotype file was used4 . Updated linear mixed model mapping algorithms are nowavailable on GeneNetwork 25 (Sloan et al. , 2016), that account forkinship among strains. These new algorithms include GEMMA(Zhou and Stephens, 2012), pyLMM6 (Sul et al. , 2016), andR/qtl27 .", + "GeneNetwork and WebQTL are our groups first attempts to embrace thesenew opportunities (Wang et al. 2003) and to generatean appropriate research environment that combinesdata sets, statistical resources, and summaries offindingsa knowledgebase (www.genenetwork.org). Mapping traits will become far easier; cloning allelicvariants for molecular and cellular phenotypes willprogress from difficult to trivial as it already has formost cis-QTL with high LOD scores.", + "Next, we used GeneNetwork2, an online analysis tool and data repository containinglegacy SNP and transcriptome datasets to explore gene regulatory networks (Chesler et al. 2004; Mulligan et al. 2017). We conducted both eQTL and PheQTL-eQTL network analysis using several BXD RI gene expressiondatasets from multiple brain regions (datasets documented in Supplementary Information) and using theentirety of > 7,000 BXD Published Phenotypes deposited in GeneNetwork2 [BXDPublish; GN602].", + "Thereare four options for QTL mapping on the GeneNetwork website: intervalmapping, marker regression analysis, composite interval mapping, and pairscan analysis. In this case, interval mapping was used to compute linkagemaps for the entire genome. The log of odds (LOD) score was used toassert that a causal relation exists between a chromosomal location and aphenotypic variant, such as Gsto1 expression variation.", + "Genetic MappingIn this study we utilize GeneNetwork, a database containing phenotypes and genotypes,and also serves as an analysis engine for quantitative trait locus (QTL) mapping, geneticcorrelations, and phenome-wide association studies (PheWAS) (Sloan et al. , 2016; Mulligan etal. , 2017; Watson and Ashbrook, 2020). QTL analysis involves connecting phenotype data withgenotype data to examine genetic variation in traits controlled by multiple genes and theirinteraction with the environment (also called complex traits)(Lynch et al. , 1998; Myles andWayne, 2008; Goddard et al. , 2016).", + "Once the resulting record set of thequery is returned, it can be further restricted by selectingrelevant records based on attached annotations before forwarding it for further analysis. To map genetic loci associated with mRNA abundance ortrait phenotypes, any one of the three QTL mapping functions currently employed by GeneNetwork's WebQTLmodule can be used. These are 1. interval mapping, 2. single-marker regression, or 3. composite mapping [29,30].", + "genenetwork.org/) a set of 3795 markers. Linkage is reported withgenome-wide significance levels based on 2000 permutation tests. Two types of QTL mapping analysessimple mapping using the HaleyKnott regression equation, and composite interval mappingwereutilized in this study. Simple interval mapping was performed toillustrate the significance of any QTLs that regulate the TID. As asecondary analysis, composite interval mapping which controlled forthe influence of Tyrp1 was also performed with the goal of identifyingany secondary QTLs that may have been masked by the major QTL onChr 4." + ], + [ + "BioinformaticsAll of the genetic analyses were carried out in GeneNetwork, whichis an open source bioinformatics resource for systems genetics thatexists as both a repository for genetic, genomic and phenotypicdata together with a suite of statistical programs for data analysis that includes mapping and evaluating QTLs, examining phenotype/genotype correlations and building interaction networks. QTL mappingThe QTL mapping module of GeneNetwork was used to identifyQTLs for hippocampal morphometry and radial maze trait data. Thismodule enables interval mapping, composite interval mapping anda pairwise scan option to identify epistatic effects.", + "Below we detail several software tools thathave been used for eQTL analysis and provide a short description of the advantages andlimitations of each package, with a focus on the ease of use for the wider scientificcommunity (Table 1). Several of the packages were not specifically developed for eQTLanalysis, and thus our speed comparisons should be viewed in that context. However, suchpackages may already be used in individual laboratories, and therefore we include those that,in our opinion, may be scaled for eQTL analysis on data from genome-wide array platforms.", + "Other Useful QTL Mapping SoftwaresQTL Cartographer is a highly capable mapping programone that may beparticularly suitable for those with a background in UNIX and who are comfortable with advanced statistical analysis. The program is available for threeoperating systems at http://statgen.ncsu.edu/qtlcart/cartographer.html. Windows QTL Cartographer at http://statgen.ncsu.edu/qtlcart/ WQTLCart.htmis a command-line sibling and a relatively more user friendly version of QTLCartographer. This program includes a powerful graphic tool for presenting mapping results and can import and export data in a variety of formats and provide agraphical interface to QTL Cartographers features.", + "Built in to this resource are tools such as QTL Analyst to identify candidate genes and polymorphisms; literature correlation analysis by the latent semanticindexing tool, Semantic Gene Organizer (Homayouni et al. , 2005); and integratedanalysis with many large SNP sets, Gene Ontology overrepresentation and pathwaymatching (Zhang et al. , 2004), and annotation resources. By coupling flexible analytic tools with both molecular endophenotypes and higher-order phenotypic data,users can pursue a wealth of integrative systems genetics queries.", + "These relative phenotypevalues are then analyzed in the context of the mouse genotypeusing WebQTL tools available on www.genenetwork.com,which provides the QTL mapping for phenotypes of interest. The bioinformatics tools allow us to inspect the single nucleotide polymorphism density within the mapped loci and toexamine the genes within the loci in order to narrow down thenumber of candidate genes that should be further interrogated. The tools also allow us to identify interactive loci, throughwhich we can discover interactive pathways modulating themeasured phenotype.", + "ReviewTRENDS in Genetics Vol.21 No.12 December 2005Bioinformatics toolbox for narrowingrodent quantitative trait lociKeith DiPetrillo, Xiaosong Wang, Ioannis M. Stylianou and Beverly PaigenThe Jackson Laboratory, 600 Main St, Bar Harbor, ME 04609, USAQuantitative trait locus (QTL) analysis is a powerfulmethod for localizing disease genes, but identifying thecausal gene remains difficult. Rodent models of diseasefacilitate QTL gene identification, and causal genesunderlying rodent QTL are often associated with thecorresponding human diseases.Recently developedbioinformatics methods, including comparativegenomics, combined cross analysis, interval-specificand genome-wide haplotype analysis, followed bysequence and expression analysis, each facilitated bypublic databases, provide new tools for narrowingrodent QTLs. Here we discuss each tool, illustrate itsapplication and generate a bioinformatics strategy fornarrowing QTLs. Combining these bioinformatics toolswith classical experimental methods should accelerateQTL gene identification. IntroductionQuantitative trait locus (QTL) analysis is a method tolocalize chromosomal regions harboring genetic variantsthat affect a continuously distributed, polygenic phenotype(including many common diseases) [1].Summary of bioinformatics tools for dissecting rodent QTLsBioinformatics toolComparative genomicsCombined cross analysisInterval-specific haplotypeanalysisGenome-wide haplotypeanalysisSequence comparisonExpression comparisonSummaryIdentifies regions of chromosomal synteny in QTLs that are concordant acrossspeciesRecodes genotype information from multiple crosses detecting a shared QTL intoone susceptibility and one resistance genotype to combine the crosses in a singleQTL analysisDetects regions of IBD within QTLs shared in multiple crossesAssociates conserved haplotype patterns across the genome with a phenotype ininbred strainsSearches strain-specific sequence databases for regulatory or coding polymorphisms within the QTL intervalSearches EST or microarray databases to identify genes expressed in an organ ofinterest or genes exhibiting differential expression between the strains of interestthe homologous regions in humans, which complicatesthis approach.", + "Recently developedbioinformatics methods, including comparativegenomics, combined cross analysis, interval-specificand genome-wide haplotype analysis, followed bysequence and expression analysis, each facilitated bypublic databases, provide new tools for narrowingrodent QTLs. Here we discuss each tool, illustrate itsapplication and generate a bioinformatics strategy fornarrowing QTLs. Combining these bioinformatics toolswith classical experimental methods should accelerateQTL gene identification. IntroductionQuantitative trait locus (QTL) analysis is a method tolocalize chromosomal regions harboring genetic variantsthat affect a continuously distributed, polygenic phenotype(including many common diseases) [1].ReviewTRENDS in Genetics Vol.21 No.12 December 2005Bioinformatics toolbox for narrowingrodent quantitative trait lociKeith DiPetrillo, Xiaosong Wang, Ioannis M. Stylianou and Beverly PaigenThe Jackson Laboratory, 600 Main St, Bar Harbor, ME 04609, USAQuantitative trait locus (QTL) analysis is a powerfulmethod for localizing disease genes, but identifying thecausal gene remains difficult. Rodent models of diseasefacilitate QTL gene identification, and causal genesunderlying rodent QTL are often associated with thecorresponding human diseases.Summary of bioinformatics tools for dissecting rodent QTLsBioinformatics toolComparative genomicsCombined cross analysisInterval-specific haplotypeanalysisGenome-wide haplotypeanalysisSequence comparisonExpression comparisonSummaryIdentifies regions of chromosomal synteny in QTLs that are concordant acrossspeciesRecodes genotype information from multiple crosses detecting a shared QTL intoone susceptibility and one resistance genotype to combine the crosses in a singleQTL analysisDetects regions of IBD within QTLs shared in multiple crossesAssociates conserved haplotype patterns across the genome with a phenotype ininbred strainsSearches strain-specific sequence databases for regulatory or coding polymorphisms within the QTL intervalSearches EST or microarray databases to identify genes expressed in an organ ofinterest or genes exhibiting differential expression between the strains of interestthe homologous regions in humans, which complicatesthis approach.", + "1 The234IntroductionModern high-throughput technologies generate large amounts of genomic, transcriptomic, proteomic and metabolomic data. However, existing open source web-based tools for QTL analysis, such as webQTL[358] and QTLNetwork [377], are not easily extendable to dierent settings and computationally scalable for whole genome analyses. xQTLworkbench makes it easy to analyse large and complex datasets usingstate-of-the-art QTL mapping tools and to apply these methods to millions of phenotypes using parallelized Big Data solutions [342].", + "Software developed towards facilitating mining ofgenetic expression and variant associations includeeQTL Explorer, eQTL Viewer, FastMap and Lirnet. Bioinformatics concepts relating to eQTL have beenreviewed in [116]. eQTL Explorer (http://web. bioinformatics.ic.ac.uk/eqtlexplorer/) [117] as anaddition to resources provided by previous softwareslike WebQTL [118] and QTL Express [119], enablesintegrated visualization using a Java graphicalinterfaces; extracts eQTL results from externalsources (multiple microarray experiments) andpresents them such that they can be compared amongeach other, and with the pQTL (protein expression)mapped to the genome. eQTL Viewer (http://statgen.", + "These relative phenotypevalues are then analyzed in the context of the mouse genotypeusing WebQTL tools available on www.genenetwork.com,which provides the QTL mapping for phenotypes of interest. The bioinformatics tools allow us to inspect the single nucleotide polymorphism density within the mapped loci and toexamine the genes within the loci in order to narrow down thenumber of candidate genes that should be further interrogated. The tools also allow us to identify interactive loci, throughwhich we can discover interactive pathways modulating themeasured phenotype.", + "Author ManuscriptPrevious studies have used bioinformatics analyses in conjunction with a specific set ofcriteria to narrow down the set of genes into those most likely to underlie the differentialresponse (Baker et al. , 2017, Cook et al. , 2015). In the present study, genes within thesignificant QTLs were identified using the online tools available at GeneNetwork.org. Thegene lists include expressed sequence tags and Riken clones.", + "Built in to this resource are tools such as QTL Analyst to identify candidate genes and polymorphisms; literature correlation analysis by the latent semanticindexing tool, Semantic Gene Organizer (Homayouni et al. , 2005); and integratedanalysis with many large SNP sets, Gene Ontology overrepresentation and pathwaymatching (Zhang et al. , 2004), and annotation resources. By coupling flexible analytic tools with both molecular endophenotypes and higher-order phenotypic data,users can pursue a wealth of integrative systems genetics queries.", + "Another database, WebQTL, provides multiple tools that, when used incombination, provide valuable insight into candidate gene lists (11). WebQTL isan online database with built in statistical tools that take advantage of the isogenicnature of recombinant inbred (RI) animals. It combines a large database ofcomplex traits collected using RI animals with built in software to perform QTLanalysis and produce correlations of traits (11). WebQTL has genotypicinformation on five different RI lines including those derived from the C57/B6and DBA inbred strains (BxDs) (11).", + "Tools for QTL analysis have been developed and released for researchers such asR/qtl, QTL cartographer, MapQTL, and WebQTL. Recently, Wang et al. (2012)developed a free software for QTL mapping called QTL IciMapping which constructsgenetic linkage maps and QTL analysis by simple interval mapping and inclusivecomposite interval mapping. QTL IciMapping is available for segregating and inbred9populations and nested association mapping populations. Unlike R/qtl, QTL IciMappingis not available for Unix/Linux. Running QTL IciMapping using a basic computer for thenumbers of SNPs identified from genotype-by-sequencing (GBS) is time-consuming.", + "Built in to this resource are tools such as QTL Analyst to identify candidate genes and polymorphisms; literature correlation analysis by the latent semanticindexing tool, Semantic Gene Organizer (Homayouni et al. , 2005); and integratedanalysis with many large SNP sets, Gene Ontology overrepresentation and pathwaymatching (Zhang et al. , 2004), and annotation resources. By coupling flexible analytic tools with both molecular endophenotypes and higher-order phenotypic data,users can pursue a wealth of integrative systems genetics queries.", + "Built in to this resource are tools such as QTL Analyst to identify candidate genes and polymorphisms; literature correlation analysis by the latent semanticindexing tool, Semantic Gene Organizer (Homayouni et al. , 2005); and integratedanalysis with many large SNP sets, Gene Ontology overrepresentation and pathwaymatching (Zhang et al. , 2004), and annotation resources. By coupling flexible analytic tools with both molecular endophenotypes and higher-order phenotypic data,users can pursue a wealth of integrative systems genetics queries." + ] + ], + "task_id": [ + "477FC54178046FE98BF97FAAC5FE167F", + "6B80ECC5F657EB7CBDE69D411A30D3EA", + "2DE25ABD7E487B80D0C489319640EACC", + "6498ED71891B79908B2E383D9AA5BAC5", + "B4BB83EB5D5C5C042E07173119046A13" + ] +}
\ No newline at end of file diff --git a/gnqa/data/study1/datasets/human/dataset_citizenscientist_general_4.json b/gnqa/data/study1/datasets/human/dataset_citizenscientist_general_4.json new file mode 100644 index 0000000..171450e --- /dev/null +++ b/gnqa/data/study1/datasets/human/dataset_citizenscientist_general_4.json @@ -0,0 +1,104 @@ +{ + "question": [ + "what are the statistical approaches for qtls analysis?", + "Create a how-to guide for GWAS analysis?", + " Create a how-to guide for genetic sequencing", + " Create a how-to guide for genetic sequencing.", + " What is the significance of the length of telomeres?" + ], + "answer": [ + "The statistical approaches for QTLs (Quantitative Trait Loci) analysis mentioned in the text include regression analysis, permutation tests, Pearson's correlation, and analysis of variance (ANOVA). QTL mapping analysis was performed using the WebQTL module of GeneNetwork. The GEMMA method was also used for QTL mapping on all chromosomes. Additionally, quantitative trait association for SNPs was performed using a series of ANOVA tests.", + "1. Carefully select the populations for the study, ensuring a large number of cases.\n2. Employ centralized SNP genotyping, data coordination, and control centers for quality control checks and standardized annotation.\n3. Conduct SNP-level association tests using methods like the likelihood ratio test (LRT) to obtain SNP level summary statistics.\n4. Perform a gene-level GWAS on the summary statistics using a hierarchically structured prior that incorporates the SNP-gene hierarchical structure.\n5. Use methods like meta-analysis to combine the results of multiple surveys and replication studies on promising variants.\n6. Incorporate existing information about the SNPs into the analysis, such as prior information about linkage or association evidence.\n7. Use tools like ePheWAS for applications in human cohorts.\n8. Share GWAS results to enable further understanding and analyses by other researchers.\n9. Deposit data in a public repository for wider scientific community access.\n10. Follow up on SNPs that merit further replication analysis.\n11. Use post-GWAS tools to make biological sense of the statistical genetic associations.\n12. Finally, report summarization and visualization of the GWAS results.", + "1. Choose the appropriate sequencing method: Depending on the specific requirements of your project, you may choose from various sequencing methods such as whole-genome sequencing, exome sequencing, or targeted sequencing.\n\n2. Prepare the sample: Extract the DNA or RNA from your sample. The quality and quantity of the extracted genetic material are crucial for successful sequencing.\n\n3. Library preparation: Prepare a sequencing library. This involves fragmenting the DNA or RNA into smaller pieces, then attaching specific adapters to the ends of these fragments.\n\n4. Sequencing: Use a sequencing platform to read the genetic information. Platforms like Illumina generate massive amounts of short reads.\n\n5. Data analysis: Analyze the sequencing data. This involves aligning the reads to a reference genome, identifying variants, and interpreting the results. Bioinformatics tools and databases such as UCSC Genome Browser, Ensembl, GenBank, MUSCLE, MAFFT, and PRANK can be used for these steps.\n\n6. Validation: Validate the findings using other techniques such as Sanger sequencing or qPCR.\n\n7. Report: Compile the results into a comprehensive report, including methods, findings, and interpretations.\n\nRemember, each step requires careful planning and execution, and the specific details may vary depending on the sequencing method and the research question.", + "1. Start with an initial linkage or association study to identify potential genes of interest.\n2. Use bioinformatics tools to analyze the genomic sequence, which is a common thread through every stage.\n3. Establish true orthology between genes in different species, especially if a gene is identified to play a role in a disease model. If two genes are truly orthologous, their evolution closely follows patterns of speciation.\n4. Apply the approaches to sequence data from any collection of organisms, with an emphasis on human genetics.\n5. Present an overview of genome structure and content to provide a context for subsequent discussions.\n6. Use DNA sequencing and other genome technologies to gather information about the mutations and functional abnormalities found in multiple samples from many major types of diseases.\n7. Analyze the sequence data using advanced technologies and bioinformatics.\n8. Finally, interpret the results of the genetic experiment, which can be a challenge on any level.", + "The length of telomeres is significant as it is associated with aging, age-related diseases, and mortality. Shorter telomeres are generally considered a sign of advanced age and have been linked to higher risks of heart disease, infection-related death, and other diseases. Longer telomeres, on the other hand, are associated with protection from age-related diseases, better cognitive function, and healthier aging. Telomere length can also potentially influence life expectancy, with some studies suggesting that longer telomeres decrease the risk of dying." + ], + "contexts": [ + [ + "For the observed difference in total variance of SS and LL, 0.93 and1.17 to be significant, the standard errors of the estimates need to be at most 0.06,a 2.5-fold reduction, implying a required sample size of 2.5 2.5 = 6.25 times oursample size or 5150 MZT. Plate 1 presents graphically the individual estimates of the items ai and bi foreach item i (using the model which includes known PLE). The x-axis representsthe normally distributed trait, liability to depression and the y-axis is the probability of endorsement of an item.", + "False-positive rate and correct null distributionTo investigate the properties of the LT statistic under the null we computed the mean value in the simulations above when c = 0.0.As seen in Table 2 this has the correct value of 1.00.In addition it has the correct median, with l GC ~1.00, 5.00% of tests with P-value,0.05and 1.00% of tests with P-value,0.01.We applied Kolmogorov-Smirnov test [31] to determine if the LT statistic differed significantly from a x 2 (1 dof) distribution.The two-tailed K-S test of the full distribution was not significant (Pvalue = 0.34), nor was the K-S test restricted to the tail where the LT statistic had x 2 .3.84 (P-value = 0.21).In order to further investigate the extreme tail of the distribution we ran 10 8 tests under the null and verified that 98 of the 10 8 tests (10 26 ) had a Pvalue,10 26 .The LT statistic is a score test when the parameters are estimated correctly and will therefore have the correct null distribution.We investigated the properties of the LT statistic when the parameters were severely mis-estimated and found no inflation (see Text S1 in File S1).Furthermore, since the LT statistic is an ATT test between g and the posterior mean of the residual of the liability E(eDz,t), it will not have an inflated false- positive rate provided that E(eDz,t) does not have heavy tails or extreme heteroscedasticity [32].E(eDz,t) is the area under the tail of a normal distribution and will therefore not have these properties provided that the clinical covariate does not.", + "Statistical AnalysesCategorical variables and continuous variables were tested and compared for significant differences using ttests.Quantitative trait association for SNPs was performed using a series of analysis of variance (ANOVA) tests.Hardy-Weinberg equilibrium (HWE) was assessed using the 2 goodness-of-fit statistic at a significance level of <0.05.To test all SNPs for genotypic association, adjusted odds ratios (ORs) and 95% confidence intervals (CI) were computed using Vassar statistic (http://vassarstats.net/) and SNPStats (http://bioinfo.iconcologia.net/SNPstats)softwares.All analyses were adjusted using linear regression.", + "statistical analysis after QC procedures (Supplementary data, Table S2).", + "Climate Correlations and Q X statistics for all six phenotypes in the global analysis.", + "3.2).Quantitative data are presented as mean standard error mean (SEM).Differences between two groups were analyzed by the student's t-test or Wilcoxon rank sum test.Differences among multiple groups were analyzed by the ANOVA.P values of multiple tests were adjusted by the Bonferroni method in the meta-analysis and were adjusted by the false discovery rate (FDR) in the differential Prior pairwise F ST values.D: Posterior pairwise F ST values.E: Blood eQTL analysis of rs3743121 (n = 30 in the TT group; n = 59 in the CT group; n = 62 in the CC group; expression levels were calculated as log 2 (fold of change, compared with TT group), *P < 0.05).Differential expression analysis of AQR in the skeletal muscle of rhesus macaques with metabolic syndrome (CTR: normal controls, MS: rhesus macaques with metabolic syndrome; n = 3 per group, data are presented as mean SEM, *P < 0.05; expression levels were calculated as fold of change compared with CTR group).", + "Data analysis and QTL mapping analysisDescriptive statistics like mean, standard deviation (SD), maximum and minimum trait value, coefficient of variation (CV%), analysis of variance (ANOVA) and heritability for each seed germination-related trait, and correlations among pairs of traits were calculated using the SPSS17.0software (http://www.spss.com).", + "Statistical AnalysisThe impact of delivery methods on knowledge and interpretation of their genomic risk and associations to categorical participant characteristics was assessed using 2 tests.These tests were used to analyze associations between categorical variables.Logistic regression was used to model the association between comprehension and the covariates of interest.Since only one participant was found to be of below adequate health literacy, this variable was not included in any further analysis.Change in agreement between patient responses to questions of risk was assessed using Kappa statistics and McNemar's test.Change in perceived risk after testing (1-week follow-up) was evaluated using Wilcoxon signed rank test and Spearman correlation coefficient.Two-sided p values are reported for all tests using a Type I error level of 0.05.", + "Empirical threshold significance values for pairwise interactions were determined using 100 permutationsof the data set. Statistical analyses. Weighted least-squares analysis (WLS) was performed toanalyze the TTD data of 14 inbred strains of 167/169 male and female mice(approximately 99% mortality). Preliminary investigations showed that the distribution of TTDs approximated normality, based on the Shapiro Wilk test. Within-strain variances were found to be heterogeneous, based on Levenes test. The WLS methodology, combined with a normality assumption, has the optimum power to detect differences between means of strains and susceptibilitygroups, compared to nonparametric methods.The same methodology (WLS) was used toanalyze organ CFU values, after a loge transformation was applied to approximate normality. Geometric means were calculated to estimate the medians ofeach strain and of sexes within each strain. Sex-specific analyses of TTDs and logeCFU determinations were also performed using WLS, with ANOVA models tocompare strain means by sex. Comparisons of between-strain to within-strainvariability were obtained by calculating the ratios of the variance between strainsto the residual variance, or the average variance within strains. These werecalculated for all TTD data and for each sex.", + "After applying the ShapiroWilk test to assess the normality of our data, an unpaired t-test was applied to analyse the THmeasurement in different areas. Quantitative trait locus mappingThe QTL mapping was done with http://gn2.genenetwork.org/. The dataset containingdopamine measurements of dorsal striata of 32 CC strains were located with search terms(Species: Mouse (mm10); Group: CC Family; Type: Phenotypes; Dataset: CC Phenotypes)and navigated to Record CCF_10001 and CCF_10002. The QTL mapping was done withGEMMA on all chromosomes, MAF >= 0.05 with LOCO method.", + "Fig. 1. (a) Quantile-quantile (Q-Q) plot after meta-analysis for time to death. (b) Quantile-quantile (Q-Q) plot after meta-analysis for time to event.", + "Yet another approach 34 relies on combining univariate test statistics 35 either by forming an omnibus test or a linear combination of test statistics.With such approaches, however, only part of the information contained in the actual profiles is utilized when the data is reduced to univariate statistics and their covariance.", + "A correlationcoefficient between the two measurements was highly significant (r = 0.99) indicating that technical error at thislevel of the analysis contributes little to case variation orstrain variation. A paired t-test confirmed that the difference between the first and second estimations was not significant (t < 1, NS). http://www.biomedcentral.com/1471-2202/10/44AnalysisData were analyzed using standard ANOVA and multipleregression techniques (JMP, SAS Institute, Cary, NC). QTLanalysis was performed using the WebQTL module ofGeneNetwork (GN, http://www.genenetwork.org).", + "STATISTICAL ANALYSISQuantitative variables were expressed as median with first and third quartiles into brackets.Raw observation counts in taxa summary plots were normalized by calculating relative abundance.Qualitative variables were compared using the Fisher's exact test and quantitative variable using Mann-Whitney's test.A P value less than 0.05 was considered statistically significant.Statistical analyses were performed using SPSS software (version 18.0; IBM Corp., Armonk, NY).", + "It is readily apparent from Figure 1that the values for norm reported here differ between thetwo analysis methods, resulting in different suggestive loci inthe subsequent QTL analysis. Such a difference between thetwo calculation methods raises the question of whether onemethod is indeed more accurate, or whether the two methodscapture different aspects of the mechanical response of thecorneoscleral shell in response to changes in IOP. We furtherinvestigated this difference by performing Bland-Altman analysis(Supplementary Figure 7).", + "Scheme 2: Schematic illustration of the beam-walking test (author: Anna Zemanov;reproduced with permission)4.3 Data analysis and statisticsStatistical tests include two-way ANOVA followed by Student-Newman-Keulsmethod, regression analysis (linear regression), permutation tests, and Pearson'scorrelation. To ensure that the desired traits are mapped, caution hat to be taken because a traitmay be tightly correlated with other traits. It is not desirable to unintentionally map genesthat control motility or anxiety. Therefore there is a possibility of mapping wrong QTL orno QTL at all.", + "QUANTIFICATION AND STATISTICAL ANALYSISStatistical details, including sample size (n), what n represents, and statistical test used can be found in the figure legends.In most cases, sample size was great enough to assume normality based on the central limit theorem, and parametric statistical tests were used.Two-sided tests were employed for more conservative calculations of significance.The threshold for statistical significance was set at p < 0.05.Unless otherwise stated, data in bar graphs are expressed as mean standard deviation.A combination of MAT-LAB_R2019b, R (v3.6), and Prism GraphPad were used for statistical analysis.", + "Other statistical methodsWe used Mann-Whitney tests to estimate the significance of non-normally distributed continuous variables across two groups and analysis of variance to estimate the significance of normally distributed variables across three or more groups.To ensure that low variation in methylation is not fallaciously responsible for observed associations, we ran the top two highly significant T2D CpG sites against 1000 simulated phenotypes, incorporating the same transformations and using the same covariates as for other polygenic analyses.To quantify the agreement between microarray and pyrosequencing techniques, we used the Bland-Altman procedure and estimated the regression-based limits of agreement.These analyses were done using the Stata 12.0 software package.", + "Statistical analysis of T/C dataStatistical tests were performed using MATLAB 6.1 software (The MathWorks, Inc.).Comparative statistics was done with Wilcoxon rank sum test.Values of P !0.05 were considered statistically signifi cant.The gender-specifi c relationship between telomere size and age was estimated by linear regression, the correlations among variables were assessed by Pearson's correlation coeffi cient." + ], + [ + "DiscussionWe proposed in this paper an integrative approach, iGWAS, that is able to analyze multiplatform genomic data under the family-based design.The model can be presented as a causal diagram (Fig. 1), which was set up based on the central dogma of molecular biology that DNA can be transcribed to mRNA expression and mRNA can then be translated to be protein to affect the phenotypic trait such as disease risk.The mediation diagram provides an intuitive illustration of our hypothesis.The iGWAS approach is integrative in different aspects.The model not only integrates different types of genomic data, i.e., SNP and gene expression data, but also incorporates different types of genetic/genomic association studies to delineate clinical outcome rather than perform a GWAS, an expression microarray study, and an eQTL study separately.Moreover, the iGWAS approach integrates biological knowledge into the computational model, as illustrated in the causal mediation diagram.", + "Further work is needed to determine how best to modify the tests proposed herein to deal with GWAS performed in structured populations.", + "GWAS and meta-analysisAt the discovery stage, genotyping data of the in-house Beijing dataset were obtained using the Affymetrix Genome-Wide Human SNP Array 5.0 (ThermoFisher, USA).Genome wide association study was performed using the PLINK v1.07 software (Purcell et al., 2007).", + "However, given that much ofthe variance is driven by societal, lifestyle and behavioralinfluences - and in addition there are also problemsrelated to DSM-based diagnostic criteria (Miller 2010) larger sample sizes for GWA analysis, inclusion of endophenotypes and CFG approaches are warranted. Havingalready massive transcriptomic, genetic and phenotypicdatasets available a Bayesian-like integration strategy canbe applied where multiple independent lines of geneticand genomic evidence is used, each by itself lackingsufficient discriminatory power, but combined leads tothe identification of high probability candidate genes orgene clusters.", + "Statistical analysisThe general statistical methods for linkage and GWA analyses are described in the Overview Methods [19].For diabetes-related quantitative traits we used additive GEE and FBAT models, testing associations between SNP genotypes and age-age 2 -sex-adjusted residual trait values.We kept 70,987 SNPs in the analyses that were on autosomes, had genotypic call rates 80%, HWE p 0.001 and MAF 10%.", + "In this manuscript, we summarize the strategies that we pursued to conduct the 100K genome-wide study, providing an overview for a series of 17 companion manuscripts (Table 1 of the Overview) describing associations with specific collections of traits [26][27][28][29][30][31][32][33][34][35][36][37][38][39][40][41][42].The primary purpose of this project was to generate hypotheses regarding genetic factors that may contribute to the wide spectrum of phenotypic variables collected in the FHS through a genome-wide approach.More specifically, we primarily hypothesized that common genetic variants contributing to phenotypic variation can be detected through a genome-wide association study (GWAS) and that genetic loci contributing to phenotypic variation can be detected through linkage.Each manuscript also examines whether the 100K analyses replicated previously reported associations with consistent evidence from the literature for some specific traits.The main purpose of this series of publications is to describe the association results made available for investigators and to direct readers to their free availability in the database of Genotype and Phenotype (dbGaP) public repository http://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?id=phs000007 at the National Center for Biotechnology Information (NCBI), where these comprehensive results are posted and may be browsed in the context of multiple genomic tracks includ-ing Entrez Gene, RefSeq, dbSNP, genetic markers, and OMIM.The deposition of these data in a public repository is consistent with the long tradition of publishing preliminary results from the FHS to benefit the wider scientific community.", + "NIH-PA Author ManuscriptNIH-PA Author ManuscriptNat Rev Genet. Author manuscript; available in PMC 2013 November 01. Flint and EskinPage 21NIH-PA Author ManuscriptNIH-PA Author ManuscriptFigure 2. Overview of mouse GWASsNIH-PA Author ManuscriptMouse genome-wide association studies (GWASs) follow a common general approach. a |Mice in the study population are phenotyped for the traits of interest. Deciding which miceand their corresponding genetic structure to include in the study population is a key designconsideration in a mouse GWAS.", + "Prioritizing GWAS results: a review of statisticalmethods and recommendations for their application. Am J Hum Genet 2010;86:6e22. [76] Leiserson MDM, Eldridge JV, Ramachandran S, Raphael BJ. Network analysis of GWASdata. Curr Opin Genet Dev 2013;23:602e10. [77] Jia P, Zhao Z. Network.assisted analysis to prioritize GWAS results: principles,methods and perspectives. Hum Genet 2014;133:125e38. [78] Jensen MK, Pers TH, Dworzynski P, Girman CJ, Brunak S, Rimm EB. Proteininteraction-based genome-wide analysis of incident coronary heart disease. Circ Cardiovasc Genet 2011;4:549e56. [79] Schadt EE, Sachs A, Friend S. Embracing complexity, inching closer to reality. Sci STKE2005;2005:pe40.", + ", 2015) or GWAS summary statistics (Gusev et al. , 2016)could be used to facilitate the applications of our tools, especially ePheWAS, in such human cohorts. Altogether, this integrated systems genetics toolkit, which isfreely accessible on systems-genetics.org, can expedite in silicohypothesis generation and testing, facilitating the identificationand validation of new gene functions and gene networks inpopulations, which generally are robust and translate well acrossCell Systems 6, 113, January 24, 2018 11Please cite this article in press as: Li et al.", + "It is worth mentioning that other reports shared the concern about deciding the SNPs in GWAs that merit follow-up and further replication analysis.Chen et al. 19 recently proposed an approach for selecting SNPs based on a hierarchical model.This approach, which is not strictly based on biological plausibility of candidate's genes, allows the users to incorporate existing information about the SNPs into the analysis.For instance, the algorithm ranks P values assuming a weighting function that incorporates prior information about linkage or association evidence.", + "We performed a Bayesian gene-based GWAS analysis which is composed of 2 steps: We first conducted SNP-level association tests for the trio data using the likelihood ratio test (LRT) and obtained SNP level summary statistics and then conducted a gene-level GWAS on the summary statistics using a hierarchically structured prior that incorporates the SNPgene hierarchical structure.", + "Even on an individual GWA study scale, there are numerous benefits toinstituting the approaches discussed herein. Active capture of the complete processwill not only aid in the accurate interpretation of the individual study results but willalso permit the interpretation of results in a more comprehensive fashion through theintegration multiple data sets and results. 7 Constructing Gene Networks to Enhance GWASand GOGE ResultsAs discussed, generating a GOGE data set and performing a first-pass analysis onthis scale of data is a major undertaking.6 Further Recommendations for Efficiency Gainsin GOGE StudiesRecent large-scale GWAS initiatives have made gains by employing economies ofscale in instituting centralized SNP genotyping, data coordination and control centers (http://www.hapmap.org, http://www.wtccc.org.uk/) [48], providing data setsthat have undergone common quality control checks and standardized annotation tomultiple researchers for individual analysis [31, 49]. Additionally, one of the mostrecent operational advances is the use of a single large common control populationfor multiple casecontrol GWAS studies [31].", + "Step 6: Report summarization and visualization. of GWAS in disease prediction.There are many steps during a gene-set analysis.They are shown below as Steps 1 through Step 6:", + "ConclusionAuthor ManuscriptWe propose a shift from cataloging statistical genetic associations to using post-GWAS toolsto make biological sense of them. Incorporating the strategies outlined here should helpprioritize individual gene targets amenable to functional and mechanistic validation acrossspecies, which can create opportunities to better characterize polygenic risk for AUD, testthe prognostic utility of these loci and scores, and identify therapeutic starting points. AcknowledgmentsThis study was supported by National Institutes of Health grants R01AA020634 and P50AA022537 (MM);R01AA022994 (SH); K02DA032573 and U01MH109532 (AA); and K01AA024152 (JES).Author ManuscriptThe principal challenge that we are confronted with is the lack of a well- or even adequatelypowered GWAS of AUD. Even though the approaches outlined here more efficiently harnessall existing GWAS data, reliability of the results from these GWAS hinge on their samplesize. One of the largest efforts that is currently under way is being led by the PsychiatricGenomics Consortiums Substance Use Disorders group and includes 15,000 cases withDSM-IV alcohol dependence and >37, 000 controls that are largely alcohol exposed(Agrawal et al. 2016).", + "The ultimate objectives full descriptions ofthe susceptibility architecture of major biomedical traitsand translation of the findings into clinical practice remain distant. With completion of the initial wave of GWA scans, itis timely to consider the status of the field. This reviewconsiders each major step in the implementation of aGWA scan, highlighting areas where there is an emerging consensus over the ingredients for success, and thoseaspects for which considerable challenges remain.Joint (meta) analysis of data364 | May 2008 | volume 9from comparable GWA scans9,34,35,38,103 provides a lowcost approach to enhance power for both main andjoint (genegene and geneenvironment) effects, obtainin silico replication, inform SNP selection for subsequentreplication efforts and explore potential sources of heterogeneity.", + "Methodological aspectsThe success of GWAS in detecting new associations and potential risk factors for any particular disease or condition depends greatly on the experimental design, on careful selection of the populations, on large number of cases and on collaborative analytical approaches.Meta-analysis is a method that combines the results of a number of surveys and of replication studies on the most promising variants.It investigates the underlying processes and has become standard practice for publications of GWAS that search for common genetic variants regulating complex traits and disease risk.", + "To date, a growing body of comprehensive methods has been developed for downstream analyses of GWAS.Sharing of summary statistics can help enable these analyses, for example, by providing researchers with a more convenient way to look-up genetic association effect estimates to conduct causal inference analyses using methods such as two-sample Mendelian Randomization which assumes samples are non-overlapping 3,4 .In addition, sharing GWAS results can help researchers to further their understanding of the shared genetic basis of T2D with other traits of interest, to perform fine-mapping to pinpoint the causal genetic variants or identify genetic loci shared with other risk factors and disease outcomes.Therefore, the aim of this current work was to provide a reference dataset for researchers to utilize in order to conduct further genetic analyses, generate hypotheses and improve understanding of the aetiology, the biological pathways and mechanisms of T2D and related metabolic and cardiovascular diseases." + ], + [ + "2009;25:175460. 82. Elshire RJ, Glaubitz JC, Sun Q, Poland JA, Kawamoto K, Buckler ES, et al. Arobust, simple genotyping-by-sequencing (GBS) approach for high diversityspecies. PLoS One. 2011;6, e19379. 83. Ensemble Genomes. Available at: ftp://ftp.ensemblgenomes.org/. 84. Leinonen R, Sugawara H, Shumway M. The sequence read archive. NucleicAcids Res. 2011;39(Database issue):D1921. 85. Martin M. Cutadapt removes adapter sequences from high-throughputsequencing reads. EMBnet J. 2011;17:102. 86. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. Thesequence alignment/map format and SAMtools. Bioinformatics. 2009;25:20789. 87.", + "Biesecker, L., Mullikin, J., Facio, F., Turner,C., Cherukuri, P., Blakesley, R., Bouffard, G.,Chines, P., Cruz, P., Hansen, N., Teer, J.,Maskeri, B., Young, A., Manolio, T., Wilson,A., Finkel, T., Hwang, P., Arai, A., Remaley,A., Sachdev, V., Shamburek, R., Cannon, R.,and Green, E. (2009) The ClinSeq Project:Piloting large-scale genome sequencing forresearch in genomic medicine. Genome Res. 19, 16651674. 32. Wang, Z., Gerstein, M., and Snyder, M. (2009)RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet. 10, 5763. 33. Mortazavi, A., Williams, B., McCue, K.,Schaeffer, L., and Wold, B.Ng, S., Turner, E., Robertson, P., Flygare, S.,Bigham, A., Lee, C., Shaffer, T., Wong, M.,Bhattacharjee, A., Eichler, E., Bamshad, M.,Nickerson, D., and Shendure, J. (2009)Targeted capture and massively parallelsequencing of 12 human exomes. Nature. 461, 272276. 31.", + "Extensive documentation canbe downloaded in either pdf or Hypertext formats. The Tutorial is especially helpful, but readers should be aware that its files are somewhat inconspicuously tuckedin with Sample Data files, rather than being included in the Map Manager QTXmanual. For the current example, genotype data were downloaded from the Mouse GenomeDatabase (2001) (http://www.informatics.jax.org/). Specifically, it consists of mousechromosome 1 genotypes from the CopelandJenkins backcross, and a selected subset of 10 markers spanning the entire 100 cM length of the chromosome.", + "Resequencing of genomic regions of interest will also be necessary (see Electronic-DatabaseInformation for current examples).", + ", 2012;Chesler et al. , 2003; Jha et al. , 2018b, 2018a; Li et al. , 2018; Williams et al. , 2016). We havetherefore assembled deep companion resources, including full sequence for both parents(Baker et al. , 2019; Keane et al. , 2011; McKnite et al. , 2012; Wang et al. , 2016b; Wu et al. ,2014). Access to data and statistical tools are available from open-source web services(GeneNetwork.org and Systems-Genetics.org) (Li et al. , 2018; Sloan et al. , 2016; Williamsand Williams, 2017).", + "Best practice guidelines for the use of next-generation sequencing applications in genome diagnostics: A national collaborativestudy of dutch genome diagnostic laboratories. Human Mutation,34(10):13131321, Aug 2013. [363] D. Welter, J. MacArthur, J. Morales, T. Burdett, P. Hall,H. Junkins, A. Klemm, P. Flicek, T. Manolio, L. Hindor, and290BIBLIOGRAPHYet al. The nhgri gwas catalog, a curated resource of snp-trait associations. Nucleic Acids Research, 42(D1):D1001D1006, Dec2013. [364] Harm-Jan Westra, Marjolein J Peters, Tonu Esko, HaniehYaghootkar, Claudia Schurmann, Johannes Kettunen, Mark WChristiansen, Benjamin P Fairfax, Katharina Schramm, Joseph EPowell, and et al.", + "Zweig, A. S., Karolchik, D., Kuhn, R. M., Haussler, D., and Kent,W. J. (2008). UCSC genome browser tutorial. Genomics 92, 75 84. Vol. 9, Summer 2010107Downloaded from http://www.lifescied.org/ by guest on May 12, 2015", + "Useful Online Genomics Resources.", + "Bioinformatics tools for pathogen whole-genome sequencing", + "Genomic databases UCSC Human Genome Browser: visualize and browse genomes [4] https://genome.ucsc.edu/Ensembl: genomes and species tree [45] https://ensembl.org/GenBank: open access sequence database [44] https://www.ncbi.nlm.nih.gov/genbank/Multiple sequence alignment software MUSCLE: tool to align multiple sequences [52] http://www.ebi.ac.uk/Tools/msa/muscle/MAFFT: tool to align multiple genomic sequences [53] http://mafft.cbrc.jp/alignment/software/PRANK: tool to align multiple genomic sequences [54] http://www.ebi.ac.uk/goldman-srv/prank/Phylogenetic information TimeTree: database of divergence times [55] http://www.timetree.org/Mammalian supertrees: evolutionary trees [39] Fritz et al. [39] Analysis tools BLAST-basic local alignment search tool [46] https://blast.ncbi.nlm.nih.gov/Codeml-test for positive selection on phylogenies using multiple sequence alignments [56] http://abacus.gene.ucl.ac.uk/software/paml.html", + "The '1000 genomes' project and related individual sequencing projects", + "Sequencing a draft reference genomefor rainbow trout has been recently initiated; aconsortium involves collaborators from USDANational Cold and Cool Water Aquaculture(USDA-NCCCWA), Washington State University, University of Oregon, University ofCalifornia Davis, and West Virginia University (Palti et al. , 2011). The project will producea genome map and a draft reference genomesequence using the Illumina platform by sequencing the bacterial articial chromosomes(BACs) from the physical map minimal tilingpath (Palti, 2010; Miller et al. , 2011).", + "Extensive documentation canbe downloaded in either pdf or Hypertext formats. The Tutorial is especially helpful, but readers should be aware that its files are somewhat inconspicuously tuckedin with Sample Data files, rather than being included in the Map Manager QTXmanual. For the current example, genotype data were downloaded from the Mouse GenomeDatabase (2001) (http://www.informatics.jax.org/). Specifically, it consists of mousechromosome 1 genotypes from the CopelandJenkins backcross, and a selected subset of 10 markers spanning the entire 100 cM length of the chromosome.", + "The large number of bioinformatic tools that have beenmade available to scientists during the last few years has presented theproblem of which to use and how best to obtain scientifically valid answers(3). In this chapter, we will provide a guide for the most efficient way toanalyze a given sequence or to collect information regarding a gene, protein,structure, or interaction of interest by applying current publicly available software and databases that mainly use the World Wide Web.Hum Hered 63:678432122Gene/Protein Sequence AnalysisA Compilation of Bioinformatic ToolsBernd H. A. Rehm and Frank Reinecke1. IntroductionThe advent of automated high throughput DNA sequencing methods hasstrongly enabled genome sequencing strategies, culminating in determination ofthe entire human genome (1,2). An enormous amount of DNA sequence dataare available and databases still grow exponentially (see Fig. 22.1). Analysisof this overwhelming amount of data, including hundreds of genomes fromboth prokaryotes and eukaryotes, has given rise to the field of bioinformatics.", + "2014) (https://github.com/jaxcs/Seqnature) developed in the Churchill group at the Jackson Laboratory is similar, andis tailored to RNA-seq in the DO. Author ManuscriptConstruction of an individualized pseudogenome for a sample requires prior knowledge ofvariant sites in that samples genome. In, for instance, an F1 cross between strains for whichwhole-genome sequencing data are available, imputing the pseudogenome is trivial. Genomes of recombinant individuals (e.g. , CC or DO) can be expressed as mosaics offounder haplotypes on the basis of genotyping (discussed previously), and a pseudogenomestitched together accordingly.", + "In the following section, we provide an overview of the finished genome sequencing projects and report them in chronological order of their publication.", + "Extensive documentation canbe downloaded in either pdf or Hypertext formats. The Tutorial is especially helpful, but readers should be aware that its files are somewhat inconspicuously tuckedin with Sample Data files, rather than being included in the Map Manager QTXmanual. For the current example, genotype data were downloaded from the Mouse GenomeDatabase (2001) (http://www.informatics.jax.org/). Specifically, it consists of mousechromosome 1 genotypes from the CopelandJenkins backcross, and a selected subset of 10 markers spanning the entire 100 cM length of the chromosome.", + "Extensive documentation canbe downloaded in either pdf or Hypertext formats. The Tutorial is especially helpful, but readers should be aware that its files are somewhat inconspicuously tuckedin with Sample Data files, rather than being included in the Map Manager QTXmanual. For the current example, genotype data were downloaded from the Mouse GenomeDatabase (2001) (http://www.informatics.jax.org/). Specifically, it consists of mousechromosome 1 genotypes from the CopelandJenkins backcross, and a selected subset of 10 markers spanning the entire 100 cM length of the chromosome." + ], + [ + "We (Hein, Schierup and Wiuf) have published a300 page book on molecular population genetics titled Gene Genealogies, Sequence Variation and Evolution OxfordUniversity Press, and are presently developing a tutorial in association mapping that we hope to publish as a booklet in2006 and are also involved in a very large EU collaboration (Holland, Denmark, Iceland and UK) to find susceptibilitygenes for breast and prostate cancer. In comparative genomics, the most fundamental investigation is to find genes in a pair of aligned genomes.", + "Key bioinformatic steps totake a genetic study from an initial linkage or association to laboratory genotyping are illustrated. The reader should note the role of genomic sequence as a common thread through every stageregions in man (see Chapter 5). Similar issues also exist in the establishment oftrue orthology between genes in different species, where one is identified to play arole in a disease model. If two genes are truly orthologous, their evolution closelyfollows patterns of speciation (Fitch, 2000).In general terms, the approaches we describe can be applied to sequence data from any collection of organisms, but our emphasis here is primarily onBioinformatics for Geneticists, Second Edition. Edited by Michael R. Barnes2007 John Wiley & Sons, Ltd ISBN 978-0-470-02619-9 (HB) ISBN 978-0-470-02620-5 (PB)\u0002C106CH 6 COMPARATIVE GENOMICSquestions of relevance to human genetics. We begin, in Section 6.2 by presenting anoverview of genome structure and content, providing a context for the subsequentdiscussions.4Assembling a View of theHuman GenomeColin A. M. SempleBioinformatics, MRC Human Genetics Unit, Edinburgh EH4 2XU, UK4.1 IntroductionThe miraculous birth of the draft human genome sequence took place againstthe odds. It was only made possible by parallel revolutions in the technologiesused to produce, store and analyse the sequence data, and by the development ofnew, large-scale consortia to organize and obtain funding for the work (Watson,1990). The initial flood of human sequence has subsided as the sequencing centres have sequenced genomes from other mammalian orders and beyond.2. von Heijne, G. Sequence analysis in molecular biology: Treasure trove or trivial pursuit(Academic Press, London, 1987). 3. Wolfsberg, T. G., Wetterstrand, K. A., Guyer, M. S., Collins, F. S. & Baxevanis, A. D. A usersguide to the human genome. Nature Genetics 32 (suppl) (2002). PrefaceI say locuslocus instead of genegene because if you work in human genetics longenough, you realize that you may never have a gene. But you learn not to let thatput you off. Peter A. HolmansMaking sense of the results of a genetic experiment is a challenge on any level.", + "Methods for DNA sequencing are constantly being improved, with the ultimate goal of sequencing a human genome in a single day for a cost of about US $1,000, an end that appears to be in sight (Hayden, 2014).In the very near future, whole-genome sequencing will be routinely available for clinical purposes, perhaps even beginning at birth.The major challenge ahead is the interpretation of this information.How do our genes interact with each other, and how does the environment contribute to the development of health and disease?What are the individual and societal implications of knowing our genome sequence?The answers to these and other important questions will unfold in the years ahead.Thus, we are truly in an era where precision medicine may soon become a reality.", + "Key bioinformatic steps totake a genetic study from an initial linkage or association to laboratory genotyping are illustrated. The reader should note the role of genomic sequence as a common thread through every stageregions in man (see Chapter 5). Similar issues also exist in the establishment oftrue orthology between genes in different species, where one is identified to play arole in a disease model. If two genes are truly orthologous, their evolution closelyfollows patterns of speciation (Fitch, 2000).In general terms, the approaches we describe can be applied to sequence data from any collection of organisms, but our emphasis here is primarily onBioinformatics for Geneticists, Second Edition. Edited by Michael R. Barnes2007 John Wiley & Sons, Ltd ISBN 978-0-470-02619-9 (HB) ISBN 978-0-470-02620-5 (PB)\u0002C106CH 6 COMPARATIVE GENOMICSquestions of relevance to human genetics. We begin, in Section 6.2 by presenting anoverview of genome structure and content, providing a context for the subsequentdiscussions.4Assembling a View of theHuman GenomeColin A. M. SempleBioinformatics, MRC Human Genetics Unit, Edinburgh EH4 2XU, UK4.1 IntroductionThe miraculous birth of the draft human genome sequence took place againstthe odds. It was only made possible by parallel revolutions in the technologiesused to produce, store and analyse the sequence data, and by the development ofnew, large-scale consortia to organize and obtain funding for the work (Watson,1990). The initial flood of human sequence has subsided as the sequencing centres have sequenced genomes from other mammalian orders and beyond.2. von Heijne, G. Sequence analysis in molecular biology: Treasure trove or trivial pursuit(Academic Press, London, 1987). 3. Wolfsberg, T. G., Wetterstrand, K. A., Guyer, M. S., Collins, F. S. & Baxevanis, A. D. A usersguide to the human genome. Nature Genetics 32 (suppl) (2002). PrefaceI say locuslocus instead of genegene because if you work in human genetics longenough, you realize that you may never have a gene. But you learn not to let thatput you off. Peter A. HolmansMaking sense of the results of a genetic experiment is a challenge on any level.", + "Key bioinformatic steps totake a genetic study from an initial linkage or association to laboratory genotyping are illustrated. The reader should note the role of genomic sequence as a common thread through every stageregions in man (see Chapter 5). Similar issues also exist in the establishment oftrue orthology between genes in different species, where one is identified to play arole in a disease model. If two genes are truly orthologous, their evolution closelyfollows patterns of speciation (Fitch, 2000).In general terms, the approaches we describe can be applied to sequence data from any collection of organisms, but our emphasis here is primarily onBioinformatics for Geneticists, Second Edition. Edited by Michael R. Barnes2007 John Wiley & Sons, Ltd ISBN 978-0-470-02619-9 (HB) ISBN 978-0-470-02620-5 (PB)\u0002C106CH 6 COMPARATIVE GENOMICSquestions of relevance to human genetics. We begin, in Section 6.2 by presenting anoverview of genome structure and content, providing a context for the subsequentdiscussions.4Assembling a View of theHuman GenomeColin A. M. SempleBioinformatics, MRC Human Genetics Unit, Edinburgh EH4 2XU, UK4.1 IntroductionThe miraculous birth of the draft human genome sequence took place againstthe odds. It was only made possible by parallel revolutions in the technologiesused to produce, store and analyse the sequence data, and by the development ofnew, large-scale consortia to organize and obtain funding for the work (Watson,1990). The initial flood of human sequence has subsided as the sequencing centres have sequenced genomes from other mammalian orders and beyond.2. von Heijne, G. Sequence analysis in molecular biology: Treasure trove or trivial pursuit(Academic Press, London, 1987). 3. Wolfsberg, T. G., Wetterstrand, K. A., Guyer, M. S., Collins, F. S. & Baxevanis, A. D. A usersguide to the human genome. Nature Genetics 32 (suppl) (2002). PrefaceI say locuslocus instead of genegene because if you work in human genetics longenough, you realize that you may never have a gene. But you learn not to let thatput you off. Peter A. HolmansMaking sense of the results of a genetic experiment is a challenge on any level.", + "Ample time was allotted to answer questions and a copy of \"A Guide to Your Genome\" (National Human Genome Research Institute 2007) was provided to further assist participants' understanding and ability to communicate results with family members or others.", + "Whether within 10 or 12 (or 8) years, such inexpensive sequencing will change both research and clinical care, and progress does not need to wait even that long.The National Human Genome Research Institute (NHGRI) plans to focus a significant portion of the sequencing capacity that it supports on medical sequencing.For instance, the NHGRI and the National Cancer Institute are actively considering a Human Cancer Genome Project, 22 which would use DNA sequencing and a host of other genome technologies to gather information about the mutations and functional abnormalities found in multiple samples from many major types of cancer.Medical sequencing should also provide important insight into many other diseases.For example, sequencing all exons in X-linked mental retardation syndromes may reveal much about their etiology.Sequencing candidate genes in the extremes of the distribution of quantitative traits should also reveal much of importance about common diseases, such as coronary atherosclerosis. 23With further technological advances, other previously unimaginable research approaches will become real.", + "Key bioinformatic steps totake a genetic study from an initial linkage or association to laboratory genotyping are illustrated. The reader should note the role of genomic sequence as a common thread through every stageregions in man (see Chapter 5). Similar issues also exist in the establishment oftrue orthology between genes in different species, where one is identified to play arole in a disease model. If two genes are truly orthologous, their evolution closelyfollows patterns of speciation (Fitch, 2000).In general terms, the approaches we describe can be applied to sequence data from any collection of organisms, but our emphasis here is primarily onBioinformatics for Geneticists, Second Edition. Edited by Michael R. Barnes2007 John Wiley & Sons, Ltd ISBN 978-0-470-02619-9 (HB) ISBN 978-0-470-02620-5 (PB)\u0002C106CH 6 COMPARATIVE GENOMICSquestions of relevance to human genetics. We begin, in Section 6.2 by presenting anoverview of genome structure and content, providing a context for the subsequentdiscussions.4Assembling a View of theHuman GenomeColin A. M. SempleBioinformatics, MRC Human Genetics Unit, Edinburgh EH4 2XU, UK4.1 IntroductionThe miraculous birth of the draft human genome sequence took place againstthe odds. It was only made possible by parallel revolutions in the technologiesused to produce, store and analyse the sequence data, and by the development ofnew, large-scale consortia to organize and obtain funding for the work (Watson,1990). The initial flood of human sequence has subsided as the sequencing centres have sequenced genomes from other mammalian orders and beyond.2. von Heijne, G. Sequence analysis in molecular biology: Treasure trove or trivial pursuit(Academic Press, London, 1987). 3. Wolfsberg, T. G., Wetterstrand, K. A., Guyer, M. S., Collins, F. S. & Baxevanis, A. D. A usersguide to the human genome. Nature Genetics 32 (suppl) (2002). PrefaceI say locuslocus instead of genegene because if you work in human genetics longenough, you realize that you may never have a gene. But you learn not to let thatput you off. Peter A. HolmansMaking sense of the results of a genetic experiment is a challenge on any level." + ], + [ + "In birds, where erythrocyte telomere length (ETL) is measured, the majority of species sampled have shown no sex difference (36).Nonetheless, bird telomere dynamics are complex and, as with humans, may be affected by environment and stress.For example, a longitudinal study of black-tailed gulls (Larus crassitostris) over 2-5 years found no correlation between ETL and age or sex.Rather, ETL attrition was correlated with reduced food availability and environmental stressors (55).In a captive zebra finch (Taeniopygia guttata) population, male and female mean telomere length decreased with increasing age of the animals, but did differ between sexes (56).As these examples illustrate, the relationship between telomere length, lifespan, and sex is likely to be complex in other vertebrates.Telomere attrition rates in humans are not constant, and when sex differences in telomere length first appear is unclear.Telomere attrition occurs rapidly from birth, slowing around 4 years of age, and the subsequent trajectory of telomere attrition continues to change in an age-and sex-specific fashion (43).In some studies, neonates show no sex differences, regardless of tissue used [eg, (44)].But in others, female newborns are reported to have longer telomeres than males [eg, (45)].Interestingly, a twin study comparing adults reported that women had longer mean LTL than men when samesex twin pairs (mono-and dizygotic) were compared.In contrast, men and women from opposite-sex twin pairs had similar telomere lengths, a difference that the authors attributed to antenatal influences of opposite-sex twins on one another (46).Results from studies of LTL in prepubescent children are mixed, reporting mean LTL either greater in females than in males (47) or not different (48).In two studies of adolescents (ages 13-18 years old), mean LTL was greater in females than in males (49,50), suggesting that sex differences in telomere length may arise during sexual maturation.A longitudinal study of Danish twins found that women had longer LTLs at baseline and displayed decelerated LTL attrition following menopause (51).Crucially, while LTL in women declined with age, the relationship between LTL attrition and age was no longer significant if menopausal status was included as a covariate.These examples illustrate that while many studies find greater telomere length in females, this trend is not universal.In some organisms, there is no clear relationship between telomere length and lifespan.Age-related telomere attrition could not be detected in Daphnia pulex (57) or sea urchin species (Strongylocentrotus franciscanus and Lytechinus variegatus) (58).Studies in C. elegans examining natural variation in telomere length and experimentally manipulated telomere length detect no correlation with lifespan (59,60), and in Drosophila, which uses a telomerase-independent mechanism for telomere maintenance, there is a similar lack of correlation between longevity and telomere length (61).Similarly, data on sex differences in age-related telomere shortening are mixed.For example, in the ant species Lasius niger, the rate of telomere shortening is more rapid in short-lived males compared to longer-lived females.But, mean telomere length does not differ between the two types of females, queens and workers, despite the fact that queens live much longer than workers (up to 28 years vs 2-3 months) (62).These findings suggest that the question of how telomere shortening affects aging across species and how sex affects telomere attrition rates are complex.In sum, as adults men have shorter telomeres than women in most populations sampled (39).Whether the sex difference in telomere length appears shortly after conception or later in life is unclear.Similarly, whether the sex difference in telomere lengths observed in adult humans results from slower attrition rates, differential telomere length at earlier ages, sex differences in the effects of telomere length on survival, sex differences in telomere maintenance, or other factors in not clear.Additional, carefully controlled longitudinal studies on the dynamics of telomere length and attrition rates in multiple tissues using standardized methods are needed to better evaluate the mechanisms creating sex differences in human telomere attrition during aging.", + "With new methodologies to assess relative telomere length by Q-PCR, studies were designed to address the impact of telomere length on aging, aging associated pathologies, and mortality.One such study has correlated shorter leukocyte telomere lengths at age 60 with a three times higher risk of heart disease and an eightfold increase in risk of infection-related death (36), thereby associating measured relative cellular aging with disease and life expectancy.In a similar way, chronic stress was shown to correlate with short leukocyte telomere length, a phenomenon attributed to higher levels of oxidative stress at the cellular level (70).More recent studies have linked telomere length in smooth muscle cells with senescence and disease severity in patients with atherosclerosis (141,150).Leukocyte telomere length was also short in a cohort of similar patients and associated with a higher risk of developing occult cardiovascular disease (71).More data are needed to understand and validate the use of leukocyte telomere length as a biomarker for cardiovascular and other diseases.", + "Shortening of the telomeres at the ends of chromosomes has been associated with age-related disease and mortality [16][17][18].A recent study identified a common haplotype of four SNPs in the human telomerase reverse transcriptase gene (hTERT) that is enriched in centenarians and associated with longer telomere length [19].It was also shown that centenarians and their offspring maintain longer telomeres compared with controls and that longer telomeres are associated with protection from age-related diseases, better cognitive function and lipid profiles of healthy ageing [19].", + "Adult studies have also found a negative correlation with baseline telomere length, suggesting a negative feedback regulation of leukocyte telomere length (Farzaneh-Far et al. 2010;Aviv et al. 2009;Epel et al. 2008;Nordfjall et al. 2009).It is possible that while our follow-up period was shorter than Shalev et al. 2013 and adult studies, which had a minimum of 5 year intervals with the exception of Puterman et al. (2015) who followed for a one-year time period, there may be biological regulation of telomere length at 4 and 5 years of age such that shorter telomeres are more robustly maintained, whereas longer telomeres have greater rates of decline, over a short period of one year.It is unlikely that this relationship is due to assay error or regression to the mean given the consistency of our findings across studies.We have had similar findings of longer telomeres having greater rates of decline and shorter telomeres being maintained in our different studies (Farzaneh-Far et al. 2010;Epel et al. 2008;Puterman et al. 2015).The single, consistent predictor of the rate of telomere attrition shown in multiple adult and the few child longitudinal studies is the baseline measurement of telomere length at the start of each study.This suggests the importance of understanding predictors of telomere length prior to adulthood, as it determines in part the rate of change (Revesz et al. 2014a, b;Nordfjall et al. 2009).Moreover, longitudinal studies in adults have had found that telomere attrition rate is dependent on baseline telomere length independent of any phenotypic predictors of shortening, such as disease or demographic variables (Nordfjall et al. 2009), attesting to the importance of studies to evaluate risk factors for shortening prior to adulthood.Rates of decline in childhood may be particularly relevant for later chronic disease risk as shorter telomere length has been implicated in disease progression through exposure to cellular senescence, inflammatory cytokines and adipocyte hypertrophy (Raschenberger et al. 2015;Willeit et al. 2014;Monickaraj et al. 2012;Fyhrquist et al. 2013).Telomere attrition in adultsAdditional early childhood telomere studies are also needed so as to better interpret disease across the lifespan.Specifically, retrospective adult studies have pointed to the importance of early life exposures, finding associations of shorter telomere length in adulthood with perinatal complications, and stressful and deprived early childhood environments, including factors associated with lower socioeconomic status and those of physical or social neglect (Drury et al. 2012;Shalev et al. 2014;Tyrka et al. 2010, Kananen et al. 2010).Some adult studies have found that predictors of shorter telomere length involving inflammation and oxidative stress exposures such as depression are only observed in younger adults compared with middle aged and older ones (Philips et al. 2013), suggesting it may be harder to tease out risk factors for accelerated attrition as, firstly, these processes happen early in life, and, secondly, repeat exposures to inflammation and oxidative stress may statistically plateau out across older age groups.", + "Blackburn and Epel, a health psychologist who did original research on how specific lifestyle and psychological habits can protect telomeres, published The Telomere Effect (Blackburn & Epel, 2017), in which they suggested that individuals with shorter telomeres developed diseases earlier in life (a shorter \"disease span\").What follows is the evidence from these authors, their colleagues, and other researchers describing how length of telomeres contributes to mind-body connection and healthy longevity.", + "As early as at the time of birth, each of the 92 telomeres of the human genome has its own characteristic length.Additionally, each telomere shortens by its individual attrition rate.In general, longer telomeres at birth are associated with higher age-dependent attrition rates and vice versa.Overall, telomere shortening appears more dynamic in males.In recent literature, there are hints that the average telomere length may be higher in women and that their annual shortening rate may be somewhat lower (Vaziri et al., 1993;Rufer et al., 1998;Jeanclos et al., 2000), but these reported differences failed to reach statistical signifi cance except for one study (Jeanclos et al., 2000).Here, we provide compelling evidence that this is indeed the case.It is generally accepted that telomeres shorten during DNA replication both in vitro and in vivo.In individuals, short telomeres are considered to be a sign of advanced age.Cawthon and coworkers (2003) showed that telomere shortening in humans likely contributes to mortality, supporting the hypothesis that they might act as a mitotic clock (Allsopp et al., 1992).Telomere length dynamics, however, does not seem to Abstract.During aging, telomeres are gradually shortened, eventually leading to cellular senescence.By T/C-FISH (telomere/centromere-FISH), we investigated human telomere length differences on single chromosome arms of 205 individuals in different age groups and sexes.For all chromosome arms, we found a linear correlation between telomere length and donor age.Generally, males had shorter telomeres and higher attrition rates.Every chromosome arm had its individual age-specifi c telomere length and erosion pattern, resulting in an unexpected heterogeneity in chromosomespecifi c regression lines.This differential erosion pattern, however, does not seem to be accidental, since we found a correlation between average telomere length of single chromosome arms in newborns and their annual attrition rate.Apart from the above-mentioned sex-specifi c discrepancies, chromosome arm-specifi c telomere lengths were strikingly similar in men and women.This implies a mechanism that arm specifi cally regulates the telomere length independent of gender, thus leading to interchromosomal telomere variations.In conclusion, these data suggest that with increasing physical and genetic length of whole chromosomes, the corresponding telomeres also tend to be longer and that recombination rate and telomere length are inversely proportional.In conclusion, a combination of overall and chromosomespecifi c shorter telomeres and more pronounced age-dependent telomere erosion could be observed in males.There is a prospective clinical study strongly suggesting that longer telomeres decrease the risk of dying (Cawthon et al., 2003).With this in mind, the telomere length discrepancies between the sexes may indeed be a factor infl uencing the differences in their life expectancy.In every chromosome a linear decline of telomere length with age was observed, being more pronounced in men independent of the examined chromosome arm.This might suggest that telomere length on single chromosome arms may be infl uenced by the same factors which determine overall telomere length.S. Mayer a S. Brderlein a S. Perner a I. Waibel a A. Holdenried a N. Ciloglu a C. Hasel a T. Mattfeldt a K.V. Nielsen b P. Mller a a Institute of Pathology, University of Ulm, Ulm (Germany); b DakoCytomation A/S, Glostrup (Denmark) follow uniformity.In previous studies, sex-specifi c differences in telomere length and attrition rate of men and women were found (Benetos et al., 2001;Cawthon et al., 2003;Nawrot et al., 2004), suggesting gender differences in behavior of telomeres.In individual chromosome arms, telomere length was also shown not to be homogeneous (Lansdorp et al., 1996;Benn, 1997;Martens et al., 1998;Surralles et al., 1999;Hao and Tan, 2001;Londono-Vallejo et al., 2001;Graakjaer et al., 2003), some telomeres being signifi cantly shorter, others longer than the average length.To date, these characteristics in telomere lengths could not be set in a biological context, as only a few groups have provided detailed information about chromosome-specifi c patterns of telomere distribution (Lansdorp et al., 1996;Graakjaer et al., 2003).Whether accumulation of short telomeres (Martens et al., 2000;Londono-Vallejo et al., 2001) or rather the shortest telomere of one specifi c chromosome arm (Hemann et al., 2001) elicits senescence, remains an open question so far.", + "Shortening of the telomeres at the ends of chromosomes has been associated with age-related disease and mortality [16][17][18].A recent study identified a common haplotype of four SNPs in the human telomerase reverse transcriptase gene (hTERT) that is enriched in centenarians and associated with longer telomere length [19].It was also shown that centenarians and their offspring maintain longer telomeres compared with controls and that longer telomeres are associated with protection from age-related diseases, better cognitive function and lipid profiles of healthy ageing [19]." + ] + ], + "task_id": [ + "D88EF655762CE3D524A7A1EEA3FA16ED", + "245DD8093F5D16F44C2AD7618245086C", + "F9F7EA3DC28534B161ED70DB401C7D11", + "4A06F8DF54C82D90E02F81D0E1E8B08A", + "BA6A505E62A0529DB883D036CBC1FD92" + ] +}
\ No newline at end of file diff --git a/gnqa/data/study1/datasets/human/dataset_citizenscientist_general_5.json b/gnqa/data/study1/datasets/human/dataset_citizenscientist_general_5.json new file mode 100644 index 0000000..1fe6931 --- /dev/null +++ b/gnqa/data/study1/datasets/human/dataset_citizenscientist_general_5.json @@ -0,0 +1,112 @@ +{ + "question": [ + "Create a how-to guide for genetic sequencing", + "Create a guide for genetic sequencing", + "Define dyslipidemia.", + "What is cytochrome?", + "How does one tell the difference between X and Y DNA, with respect to DNA tracing and determining QTLs?" + ], + "answer": [ + "1. Identify the gene or genome of interest for sequencing.\n2. Choose the appropriate sequencing technology, such as next-generation sequencing.\n3. Prepare the sample by extracting DNA and creating a library for sequencing.\n4. Run the sequencing process using the chosen platform.\n5. Analyze the sequencing data using bioinformatics tools. This may include aligning the sequence reads to a reference genome, identifying variants, and interpreting the results.\n6. Validate the results using additional experiments or databases.\n7. Document and share the findings in a suitable format, such as a scientific paper or a database entry.\n8. For further analysis or research, use online resources like the '1000 genomes' project, the Mouse Genome Database, or the UCSC genome browser tutorial.\n9. Stay updated with best practice guidelines for the use of next-generation sequencing applications in genome diagnostics.", + "1. Initiate a project for sequencing, like the '1000 genomes' project or the rainbow trout project, involving collaborators from various institutions.\n2. Choose a sequencing platform, such as Illumina, and prepare the sample for sequencing.\n3. Use bioinformatics tools to analyze the sequence data. Tools like Seqnature for RNA-seq or Acembly for predicting gene structures can be used.\n4. Construct an individualized pseudogenome for the sample using prior knowledge of variant sites in the sample's genome.\n5. Use online resources like NCBI Map Viewer for graphical depictions of genetic and physical maps and to locate genes, markers, and SNPs on the assembled sequences.\n6. Use websites offering annotation of the draft genome for various analyses such as gene predictions and similarity searches.\n7. Monitor the progress of sequencing online and aim for a resolution of selective constraint down to a segment length of eight nucleotides.", + "Dyslipidemia is the term for blood fat disorders, which include high triglycerides, low HDL cholesterol, and high LDL cholesterol. These conditions can foster plaque buildups in artery walls.", + "Cytochrome is a type of protein that contains heme groups and is responsible for the transport of electrons. They are found in aerobic cells and play a crucial role in the respiratory chain, aiding in the process of oxidative phosphorylation. There are different types of cytochromes, including cytochromes a, b, and c, each undergoing oxidation-reduction changes in a determined sequence.", + "The difference between X and Y DNA in terms of DNA tracing and determining QTLs is not explicitly discussed in the background text. However, it is mentioned that the mouse genome consists of 19 pairs of autosomes and the X and Y chromosomes. In general, the X and Y chromosomes are different in size, gene content, and inheritance patterns, which could potentially influence the process of DNA tracing and determining QTLs. However, specific methods or implications related to these differences are not detailed in the provided text." + ], + "contexts": [ + [ + "Biesecker, L., Mullikin, J., Facio, F., Turner,C., Cherukuri, P., Blakesley, R., Bouffard, G.,Chines, P., Cruz, P., Hansen, N., Teer, J.,Maskeri, B., Young, A., Manolio, T., Wilson,A., Finkel, T., Hwang, P., Arai, A., Remaley,A., Sachdev, V., Shamburek, R., Cannon, R.,and Green, E. (2009) The ClinSeq Project:Piloting large-scale genome sequencing forresearch in genomic medicine. Genome Res. 19, 16651674. 32. Wang, Z., Gerstein, M., and Snyder, M. (2009)RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet. 10, 5763. 33. Mortazavi, A., Williams, B., McCue, K.,Schaeffer, L., and Wold, B.", + "Extensive documentation canbe downloaded in either pdf or Hypertext formats. The Tutorial is especially helpful, but readers should be aware that its files are somewhat inconspicuously tuckedin with Sample Data files, rather than being included in the Map Manager QTXmanual. For the current example, genotype data were downloaded from the Mouse GenomeDatabase (2001) (http://www.informatics.jax.org/). Specifically, it consists of mousechromosome 1 genotypes from the CopelandJenkins backcross, and a selected subset of 10 markers spanning the entire 100 cM length of the chromosome.The full listof organisms, target sequence coverage and progress in sequencing can be monitored online (http://www.genome.gov/10002154). Based on the equations of Eddy( 2005) and simulations of Margulies et al. (2005), these genome sequences shouldprovide resolution of selective constraint down to a segment length of eight nucleotides, approaching the same scale as individual transcription factor-bindingsites.", + "Resequencing of genomic regions of interest will also be necessary (see Electronic-DatabaseInformation for current examples).", + "Best practice guidelines for the use of next-generation sequencing applications in genome diagnostics: A national collaborativestudy of dutch genome diagnostic laboratories. Human Mutation,34(10):13131321, Aug 2013. [363] D. Welter, J. MacArthur, J. Morales, T. Burdett, P. Hall,H. Junkins, A. Klemm, P. Flicek, T. Manolio, L. Hindor, and290BIBLIOGRAPHYet al. The nhgri gwas catalog, a curated resource of snp-trait associations. Nucleic Acids Research, 42(D1):D1001D1006, Dec2013. [364] Harm-Jan Westra, Marjolein J Peters, Tonu Esko, HaniehYaghootkar, Claudia Schurmann, Johannes Kettunen, Mark WChristiansen, Benjamin P Fairfax, Katharina Schramm, Joseph EPowell, and et al.", + "Zweig, A. S., Karolchik, D., Kuhn, R. M., Haussler, D., and Kent,W. J. (2008). UCSC genome browser tutorial. Genomics 92, 75 84. Vol. 9, Summer 2010107Downloaded from http://www.lifescied.org/ by guest on May 12, 2015", + "Useful Online Genomics Resources.", + "Bioinformatics tools for pathogen whole-genome sequencing", + "The '1000 genomes' project and related individual sequencing projects", + "Sequencing a draft reference genomefor rainbow trout has been recently initiated; aconsortium involves collaborators from USDANational Cold and Cool Water Aquaculture(USDA-NCCCWA), Washington State University, University of Oregon, University ofCalifornia Davis, and West Virginia University (Palti et al. , 2011). The project will producea genome map and a draft reference genomesequence using the Illumina platform by sequencing the bacterial articial chromosomes(BACs) from the physical map minimal tilingpath (Palti, 2010; Miller et al. , 2011).", + "Extensive documentation canbe downloaded in either pdf or Hypertext formats. The Tutorial is especially helpful, but readers should be aware that its files are somewhat inconspicuously tuckedin with Sample Data files, rather than being included in the Map Manager QTXmanual. For the current example, genotype data were downloaded from the Mouse GenomeDatabase (2001) (http://www.informatics.jax.org/). Specifically, it consists of mousechromosome 1 genotypes from the CopelandJenkins backcross, and a selected subset of 10 markers spanning the entire 100 cM length of the chromosome.The full listof organisms, target sequence coverage and progress in sequencing can be monitored online (http://www.genome.gov/10002154). Based on the equations of Eddy( 2005) and simulations of Margulies et al. (2005), these genome sequences shouldprovide resolution of selective constraint down to a segment length of eight nucleotides, approaching the same scale as individual transcription factor-bindingsites.", + "The large number of bioinformatic tools that have beenmade available to scientists during the last few years has presented theproblem of which to use and how best to obtain scientifically valid answers(3). In this chapter, we will provide a guide for the most efficient way toanalyze a given sequence or to collect information regarding a gene, protein,structure, or interaction of interest by applying current publicly available software and databases that mainly use the World Wide Web.Hum Hered 63:678432122Gene/Protein Sequence AnalysisA Compilation of Bioinformatic ToolsBernd H. A. Rehm and Frank Reinecke1. IntroductionThe advent of automated high throughput DNA sequencing methods hasstrongly enabled genome sequencing strategies, culminating in determination ofthe entire human genome (1,2). An enormous amount of DNA sequence dataare available and databases still grow exponentially (see Fig. 22.1). Analysisof this overwhelming amount of data, including hundreds of genomes fromboth prokaryotes and eukaryotes, has given rise to the field of bioinformatics.", + "2014) (https://github.com/jaxcs/Seqnature) developed in the Churchill group at the Jackson Laboratory is similar, andis tailored to RNA-seq in the DO. Author ManuscriptConstruction of an individualized pseudogenome for a sample requires prior knowledge ofvariant sites in that samples genome. In, for instance, an F1 cross between strains for whichwhole-genome sequencing data are available, imputing the pseudogenome is trivial. Genomes of recombinant individuals (e.g. , CC or DO) can be expressed as mosaics offounder haplotypes on the basis of genotyping (discussed previously), and a pseudogenomestitched together accordingly.", + "Geschwind and KonopkaPage 9Box 3NIH-PA Author ManuscriptThe challenges of next-generation sequencingNext-generation sequencing will be revolutionary in the amount and content of datagenerated, but there are many obstacles to surmount. Extensive comparisons ofsequencing data have not been published demonstrating whether there are batch effects indata due to sample preparation, library generation, flow cell preparation or machine run. Few studies have compared the commercial platforms for either gene expression or generegulation81,82. Data storage and analysis are currently a much larger challenge than datageneration.", + "In the following section, we provide an overview of the finished genome sequencing projects and report them in chronological order of their publication.", + "Extensive documentation canbe downloaded in either pdf or Hypertext formats. The Tutorial is especially helpful, but readers should be aware that its files are somewhat inconspicuously tuckedin with Sample Data files, rather than being included in the Map Manager QTXmanual. For the current example, genotype data were downloaded from the Mouse GenomeDatabase (2001) (http://www.informatics.jax.org/). Specifically, it consists of mousechromosome 1 genotypes from the CopelandJenkins backcross, and a selected subset of 10 markers spanning the entire 100 cM length of the chromosome.The full listof organisms, target sequence coverage and progress in sequencing can be monitored online (http://www.genome.gov/10002154). Based on the equations of Eddy( 2005) and simulations of Margulies et al. (2005), these genome sequences shouldprovide resolution of selective constraint down to a segment length of eight nucleotides, approaching the same scale as individual transcription factor-bindingsites.", + "Extensive documentation canbe downloaded in either pdf or Hypertext formats. The Tutorial is especially helpful, but readers should be aware that its files are somewhat inconspicuously tuckedin with Sample Data files, rather than being included in the Map Manager QTXmanual. For the current example, genotype data were downloaded from the Mouse GenomeDatabase (2001) (http://www.informatics.jax.org/). Specifically, it consists of mousechromosome 1 genotypes from the CopelandJenkins backcross, and a selected subset of 10 markers spanning the entire 100 cM length of the chromosome." + ], + [ + "gov/mapview/) evolved to allow graphical depictions of, and comparisons between,a wide range of genetic and physical maps in parallel with NCBI draft and finished sequence contigs. The locations of genes, markers, and SNPs are indicatedon the assembled sequences. As with Ensembl, there is a NCBI analysis protocolwhich aims to predict gene structures based upon EST and mRNA alignments withthe draft genome. This is carried out by a program called Acembly (unpublished;http://www.ncbi.nlm.nih.gov/IEB/Research/Acembly/index.html), which aims toderive gene structure from these alignments alone.There are now three well-designed websites (Table 4.1) offering users the chance tobrowse annotation of the draft human genome. All three sites offer a graphical interface to display the results of various analyses, such as gene predictions and similaritysearches, for draft and finished genomic sequence. These interfaces are indispensable for rapid, intuitive comparisons between the features predicted by differentprograms. For instance, one can see at once where an exon prediction overlaps withinterspersed repeats or a SNP.", + "Resequencing of genomic regions of interest will also be necessary (see Electronic-DatabaseInformation for current examples).", + "Useful Online Genomics Resources.", + "Bioinformatics tools for pathogen whole-genome sequencing", + "The '1000 genomes' project and related individual sequencing projects", + "Sequencing a draft reference genomefor rainbow trout has been recently initiated; aconsortium involves collaborators from USDANational Cold and Cool Water Aquaculture(USDA-NCCCWA), Washington State University, University of Oregon, University ofCalifornia Davis, and West Virginia University (Palti et al. , 2011). The project will producea genome map and a draft reference genomesequence using the Illumina platform by sequencing the bacterial articial chromosomes(BACs) from the physical map minimal tilingpath (Palti, 2010; Miller et al. , 2011).", + "gov/mapview/) evolved to allow graphical depictions of, and comparisons between,a wide range of genetic and physical maps in parallel with NCBI draft and finished sequence contigs. The locations of genes, markers, and SNPs are indicatedon the assembled sequences. As with Ensembl, there is a NCBI analysis protocolwhich aims to predict gene structures based upon EST and mRNA alignments withthe draft genome. This is carried out by a program called Acembly (unpublished;http://www.ncbi.nlm.nih.gov/IEB/Research/Acembly/index.html), which aims toderive gene structure from these alignments alone.There are now three well-designed websites (Table 4.1) offering users the chance tobrowse annotation of the draft human genome. All three sites offer a graphical interface to display the results of various analyses, such as gene predictions and similaritysearches, for draft and finished genomic sequence. These interfaces are indispensable for rapid, intuitive comparisons between the features predicted by differentprograms. For instance, one can see at once where an exon prediction overlaps withinterspersed repeats or a SNP.The full listof organisms, target sequence coverage and progress in sequencing can be monitored online (http://www.genome.gov/10002154). Based on the equations of Eddy( 2005) and simulations of Margulies et al. (2005), these genome sequences shouldprovide resolution of selective constraint down to a segment length of eight nucleotides, approaching the same scale as individual transcription factor-bindingsites.", + "The large number of bioinformatic tools that have beenmade available to scientists during the last few years has presented theproblem of which to use and how best to obtain scientifically valid answers(3). In this chapter, we will provide a guide for the most efficient way toanalyze a given sequence or to collect information regarding a gene, protein,structure, or interaction of interest by applying current publicly available software and databases that mainly use the World Wide Web.", + "2014) (https://github.com/jaxcs/Seqnature) developed in the Churchill group at the Jackson Laboratory is similar, andis tailored to RNA-seq in the DO. Author ManuscriptConstruction of an individualized pseudogenome for a sample requires prior knowledge ofvariant sites in that samples genome. In, for instance, an F1 cross between strains for whichwhole-genome sequencing data are available, imputing the pseudogenome is trivial. Genomes of recombinant individuals (e.g. , CC or DO) can be expressed as mosaics offounder haplotypes on the basis of genotyping (discussed previously), and a pseudogenomestitched together accordingly.", + "In the following section, we provide an overview of the finished genome sequencing projects and report them in chronological order of their publication.", + "gov/mapview/) evolved to allow graphical depictions of, and comparisons between,a wide range of genetic and physical maps in parallel with NCBI draft and finished sequence contigs. The locations of genes, markers, and SNPs are indicatedon the assembled sequences. As with Ensembl, there is a NCBI analysis protocolwhich aims to predict gene structures based upon EST and mRNA alignments withthe draft genome. This is carried out by a program called Acembly (unpublished;http://www.ncbi.nlm.nih.gov/IEB/Research/Acembly/index.html), which aims toderive gene structure from these alignments alone.There are now three well-designed websites (Table 4.1) offering users the chance tobrowse annotation of the draft human genome. All three sites offer a graphical interface to display the results of various analyses, such as gene predictions and similaritysearches, for draft and finished genomic sequence. These interfaces are indispensable for rapid, intuitive comparisons between the features predicted by differentprograms. For instance, one can see at once where an exon prediction overlaps withinterspersed repeats or a SNP.The full listof organisms, target sequence coverage and progress in sequencing can be monitored online (http://www.genome.gov/10002154). Based on the equations of Eddy( 2005) and simulations of Margulies et al. (2005), these genome sequences shouldprovide resolution of selective constraint down to a segment length of eight nucleotides, approaching the same scale as individual transcription factor-bindingsites.", + "gov/mapview/) evolved to allow graphical depictions of, and comparisons between,a wide range of genetic and physical maps in parallel with NCBI draft and finished sequence contigs. The locations of genes, markers, and SNPs are indicatedon the assembled sequences. As with Ensembl, there is a NCBI analysis protocolwhich aims to predict gene structures based upon EST and mRNA alignments withthe draft genome. This is carried out by a program called Acembly (unpublished;http://www.ncbi.nlm.nih.gov/IEB/Research/Acembly/index.html), which aims toderive gene structure from these alignments alone.There are now three well-designed websites (Table 4.1) offering users the chance tobrowse annotation of the draft human genome. All three sites offer a graphical interface to display the results of various analyses, such as gene predictions and similaritysearches, for draft and finished genomic sequence. These interfaces are indispensable for rapid, intuitive comparisons between the features predicted by differentprograms. For instance, one can see at once where an exon prediction overlaps withinterspersed repeats or a SNP.The full listof organisms, target sequence coverage and progress in sequencing can be monitored online (http://www.genome.gov/10002154). Based on the equations of Eddy( 2005) and simulations of Margulies et al. (2005), these genome sequences shouldprovide resolution of selective constraint down to a segment length of eight nucleotides, approaching the same scale as individual transcription factor-bindingsites.", + "LETTER RESEARCHthe sequence information available from the assembled scaftigs), resulting in the final reference gene catalogue used in this study." + ], + [ + "The dyslipidaemia in patients with Type 2 diabetes is characterised by hypertriglyceridaemia, low High Density Lipoprotein (HDL) and relatively normal Low Density Lipoprotein (LDL) and total cholesterol (Owen, et al., 2002).The lipid profiles of HNF1A MODY patients are similar to non-diabetic controls, but have higher HDL cholesterol and lower triglyceride levels compared to Type 2 diabetes patients (McDonald, et al., 2012;Owen, et al.,", + "Dyslipidaemia Atherogenic dyslipidemia is the overall term for blood fatdisordershigh triglycerides, low HDL cholesterol and high LDL cholesterolthat foster plaque buildups in artery walls. Ecogenetics The interaction of genetics with the environment. Eicosanoid Any of a family of naturally occurring substances derived from20-carbon polyunsaturated fatty acids; they include prostaglandins,thromboxanes, leukotrienes and epoxyeicosatrienoic acids, and function ashormones. Endophenotype Measurable biological characteristics thought to lie along thepathway from gene to disorder, but that are closer to the gene and are thoughtto have a simpler relationship with a given gene.", + "M A N U S C R I P T A C C E P T E D ACCEPTED MANUSCRIPTAbnormal lipid profiles have been known to be associated with the metabolic syndrome and AD for over a decade (Kuo et al., 1998;Roher et al., 1999;Merched et al., 2000).Plasma lipidomics allows the detection of sphingolipids and glycerophospholipids such as Cer, PI and PE that are present in very small amounts in the plasma.Changes in lipids disturb plasma membrane asymmetry (Axelsen et al., 2011), and this is likely to disturb peripheral liver A endosomal metabolism that is essential for mediating the clearance of A via ApoE-or another apolipoprotein-mediated pathway.", + "Serum lipids are important determinants of cardiovascular diseases and are related to morbidity [187] .The high heritability of circulating lipid levels is well established, and earlier studies of individuals with extreme lipid values or families with Mendelian forms of dyslipidemias have reported the involvement of numerous genes and respective proteins in lipid metabolism [188] .Recent GWA studies mostly carried out in samples enriched for T2DM cases have implicated a total of 19 loci controlling serum high density lipoprotein (HDL) cholesterol, low density lipoprotein (LDL) cholesterol and triglycerides (TG).The loci include the genes encoding ABCA1 , APOB , CELSR2 , CETP , DOCK7 , GALNT2 , GCKR , HMGCR , LDLR , LIPC , LIPG , LPL , MLXIPL , shown according to the number of at -risk alleles in each class (for a total of 15 single nucleotide polymorphisms genotyped in each individual, from the best replicated variants following the results of genome -wide association studies).The study was performed in 4232 patients with diabetes and 4595 normoglycemic adult subjects.Adapted from Cauchi et al. [193] .", + "Familial HypercholesterolemiaFamilial hypercholesterolemia is an inherited condition in which patients have extremely high blood levels of low-density lipoprotein (LDL) cholesterol, which results in abnormal deposition of cholesterol in various parts of the body and a dramatically increased risk of cardiovascular disease, which often manifests at an early age.Several genes have been implicated in this disorder.Mutations in LDLR, which encodes the LDL receptor, can affect the synthesis, structure, and function of the LDL receptor in a variety of ways, 6 resulting in the impaired ability of cells to remove cholesterol-carrying LDL particles from the bloodstream and thus the accumulation of LDL cholesterol in the blood.Although familial hypercholesterolemia is often regarded as an autosomal dominant disorder, LDLR mutations have an additive (codominant) effect such that patients who have 2 LDLR mutations have higher blood LDL cholesterol levels and experience earlier cardiovascular disease (as early as childhood) compared with patients with 1 LDLR mutation.Mutations in the APOB gene, which encodes the apolipoprotein B protein, which is a core protein of LDL particles and facilitates their removal from the bloodstream, can mimic the effects of LDLR mutations and result in familial hypercholesterolemia. 7 Finally, mutations in 2 other genes that encode proteins that affect the function of the LDL receptor, PCSK9 and LDLRAP1, can also result in familial hypercholesterolemia. 8,9 Unlike the other 3 genes, LDLRAP1 mutations are recessive and thus are required to affect both copies of the gene for patients to manifest disease.", + "Dyslipidemia was defined according to ATP III.High TC was defined as serum level of TC equal to or greater than 6.21 mmol/L (!240 mg/dL).Low HDL-C was defined as serum level of HDL-C less than 1.03 mmol/L (<40 mg/dL).High LDL-C was defined as serum level of LDL-C equal to or greater than 4.16 mmol/L (!160 mg/dL).High TG was defined as serum level of triglyceride equal to or greater than 2.26 mmol/L (!200 mg/dL).The ratio of TC/HDL-C equal to or greater than five was defined as abnormal.All the participants were free of lipid-lowering medicines.Studies reported that dyslipidemia was associated with hypertension [27e30].However, in our current study, this association was not observed in the Yi people by multivariate logistic regression analysis.Significant association between ever alcohol drinking, dyslipidemia, low HDL-C and high ratio of TC/HDL-C was observed (OR<1).Our findings showed an association between increasing BMI and increasing prevalence of dyslipidemia.The association between ever smoking and dyslipidemia was not observed in the current research by multivariate logistic regression analysis.Compared with females aged below 45 years, females at menopausal stage had higher risk for dyslipidemia, adjusted for family income, educational level, physical activity, ever smoking, ever alcohol drinking, BMI, and history of hypertension and T2DM.In exploring associated factors for prevalence of dyslipidemia, we used multivariate logistic regression analysis with high TC, TG and LDL-C, low HDL-C and abnormal ratio of TC/HDL-C as dependent variables, respectively.Independent variables included age, gender, family income, educational level, physical activity, ever smoking, ever alcohol drinking, BMI, and history of hypertension and T2DM, each of them had a p-value less than 0.05.No factor associated with high LDL-C was observed.Overweight or obesity, and history of T2DM were independent factors related to dyslipidemia.Male Yi people had significantly higher risk for dyslipidemia, as compared with that in females.Ever alcohol drinking was associated with dyslipidemia, i.e., low HDL-C and higher ratio of TC/ HDL-C.Compared with light physical activity, people engaged in moderate or heavy labor had lower risk for high TG Fig. 1.Prevalence of dyslipidemia by time before and after emigration in the Yi migrants.There are some limitations in this study that require consideration in interpretation of our findings.One is limited sample size.Another one is that no dietary survey was conducted in the current study, so no dietary factors associated with dyslipidemia could be estimated.", + "Perturbation of lipid metabolismIt is known that the characteristic features of dyslipidemia in humans with type 2 diabetes are high plasma triglyceride concentration, low HDL cholesterol concentrations and increased concentration of small dense LDL-cholesterol particles, while total cholesterol is not increased in patients with diabetes.These lipid changes in these individuals may be due to an increased free fatty acid flux secondary to insulin resistance [49].However, the perturbations observed here in lipid metabolism reflect the state of already treated diabetes.Association studies with blood lipid parameters [Adamski et al., unpublished data, [50]] show that many PC species associate with HDL and total cholesterol levels while PE species associate with triglyceride levels.In this study, we observed lower phosphatidylcholine (PC) and higher phosphatidylethanolamine (PE) concentration in the diabetes group matches the lower HDL and total cholesterol levels and higher triglyceride levels in this group, indicating that these glycerophospholipids may provide a more differentiated view of the shifted lipid homeostasis in patients with diabetes as what can be obtained from the bulk blood cholesterol and triglyceride parameters alone.Consistent with this finding, Gall et al. [34] observed reduced levels of multiple acylglycerophosphocholine species that were highly correlated with insulin resistance as measured by the euglycemic clamp.", + "INTRODUCTIONCardiovascular disease (CVD) is the leading cause of morbidity and mortality worldwide (He et al., 2005;Lozano et al., 2013;Murray et al., 2013).Suboptimal lipid levels contribute to the atherosclerotic process, with clinical trials and observational studies demonstrating a strong relation between blood lipid concentrations and CVD (Hokanson and Austin, 1996;LaRosa et al., 1999;Di Angelantonio et al., 2009;Huxley et al., 2011).The heritabilities of low-density lipoprotein cholesterol (LDL-C), high-density lipoprotein cholesterol (HDL-C), and triglyceride concentrations have long been established (Friedlander et al., 1997;Malhotra and Wolford, 2005;Luo et al., 2010;Zhang et al., 2010).More recently, genome-wide association studies (GWASs) have made important strides in identifying single nucleotide polymorphisms (SNPs) that contribute to the inter-individual variability in these complex phenotypes (Saxena et al., 2007;Kathiresan et al., 2008;Kooner et al., 2008;Wallace et al., 2008;Willer et al., 2008;Aulchenko et al., 2009;Kathiresan et al., 2009;Teslovich et al., 2010;Waterworth et al., 2010;Kim et al., 2011;Tan et al., 2012).Despite such progress, up to 75% of the variance in lipid levels due to genetic factors remains unexplained (Teslovich et al., 2010).Further research is needed to identify novel variants, genes, and biological pathways with important influences on lipid phenotypes.", + "AACE, Association of Clinical Endocrinologists; ADA, American Diabetes Association; HDL, high-density lipoprotein; IDF, International Diabetes Federation; LDL, low-density lipoprotein; NR, no recommendation; T2DM, type 2 diabetes mellitus; WDF, World Diabetes Foundation. *Individualized goals. High-risk or established cardiovascular disease.", + "Although hyperlipidemia is traditionally considered a risk factor for type 2 diabetes (T2D), evidence has emerged from statin trials and candidate gene investigations suggesting that lower LDL cholesterol (LDL-C) increases T2D risk.We thus sought to more comprehensively examine the phenotypic and genotypic relationships of LDL-C with T2D.Using data from the UK Biobank, we found that levels of circulating LDL-C were negatively associated with T2D prevalence (odds ratio 0.41 [95% CI 0.39, 0.43] per mmol/L unit of LDL-C), despite positive associations of circulating LDL-C with HbA 1c and BMI.We then performed the first genome-wide exploration of variants simultaneously associated with lower circulating LDL-C and increased T2D risk, using data on LDL-C from the UK Biobank (n 5 431,167) and the Global Lipids Genetics Consortium (n 5 188,577), and data on T2D from the Diabetes Genetics Replication and Meta-Analysis consortium (n 5 898,130).We identified 31 loci associated with lower circulating LDL-C and increased T2D, capturing several potential mechanisms.Seven of these loci have previously been identified for this dual phenotype, and nine have previously been implicated in nonalcoholic fatty liver disease.These findings extend our current understanding of the higher T2D risk among individuals with low circulating LDL-C and of the underlying mechanisms, including those responsible for the diabetogenic effect of LDL-C-lowering medications.Lipid-lowering medications, in particular from the statin drug class, are effective at lowering levels of circulating LDL-C and rates of adverse cardiovascular events (4) but convey an increased T2D risk (odds ratio [OR] 1.09) (5,6) in a dose-dependent manner (7).This increased risk, however, is outweighed at a population level by the cardiovascular event rate reduction.An increased T2D risk has also been reported in observational studies.Individuals with low levels of circulating LDL-C (e.g., ,60 mg/dL) exhibit a higher risk of prevalent and incident T2D (8,9), and among individuals with coronary disease, LDL-C and T2D are inversely related (10).In addition, individuals with familial hypercholesterolemia exhibit a decreased risk of T2D as well as lower BMI and triglyceride (TG) levels (11).", + "HypercholesterolemiaHyperlipidemia in the form of elevated cholesterol is among the most common medical disorder seen in individuals above the age of 40 years.Statins are the most widely obvserved drug class directed at lowering serum cholesterol.Statin drugs act by inhibiting HGM-CoA reductase activity and are among the most effective cholesterol-lowering agents available [156][157][158].However, there is a degree of variability in the response to statins among patients.Genetic variation at the APOE locus has been associated with plasma lipoprotein concentrations in both fasting and postprandial states [159].In this regard, APOE E2 carriers have been reported to be more responsive to lipid-lowering therapies [160].Differential response to statin medications has also been reported in relation to the patient's genotype status of the cholesteryl ester transfer gene [161], -fibrinogen gene [162] and lipoprotein lipase gene [163].Likewise, the ApoE protein has been associated with late-onset and sporadic Alzheimer's disease (AD) [164,165].However, in a study comparing influence of APOE genotype with clinical response to tacrine (acetylcholinesterase inhibitor), approximately two-thirds of the APOE E4 carriers presented ADAS scores that were worse compared with baseline levels [165].Whether ApoE will become clinically relevant in predicting response to AD therapy in the treatment of AD remains to be seen.", + "The Genetics of Hypercholesterolemia and Related Lipid PhenotypesHypercholesterolemia, Lipid Levels, and Their Familial Nature Population-based, long-term prospective studies and large clinical trials of the late 20th century incontrovertibly demonstrated that elevated LDL cholesterol (LDL-C) and reduced high-density lipoprotein cholesterol (HDL-C) were CVD risk factors. 88Clinical trials have demonstrated that lowering LDL-C and raising HDL-C can ameliorate risk. 89As this knowledge has been incorporated into clinical practice, mean serum total cholesterol concentrations have dropped in the United States in recent decades; however, 50% of US adults still have total cholesterol concentrations of at least 5.2 mmol/L (200 mg/dL), 90 the level that the National Cholesterol Education Program Expert Panel considers \"borderline-high risk.\" 91ecause at least half of the variation in serum cholesterol and other lipids can be explained by genetic variation, 91,92 unraveling the genetic pathogenesis of hypercholesterolemia and other lipid abnormalities could reap significant public health benefits.For example, identifying the common variants in genes that contribute to LDL-C and HDL-C could provide a knowledge base for the development of novel treatments and/or screening tests to determine who would most benefit from lifestyle modification or treatment for dyslipidemias.Important strides to this end have, in fact, already been made.", + "LipidsPopulation-based, long-term prospective studies and large clinical trials of the late 20th century incontrovertibly demonstrated that elevated LDL-C and reduced high-density lipoprotein cholesterol are CVD risk factors. 111Because at least half of the variation in serum cholesterol and other lipids can be explained by genetic variation, 112,113 unraveling the genetic pathogenesis of hypercholesterolemia and other lipid abnormalities could reap significant public health benefits by providing a knowledge base for the development of novel treatments or screening tests to determine who would most benefit from lifestyle modification or treatment for dyslipidemias.Important strides to this end have, in fact, already been made.", + "Obesityn=7255 in 2 population studies; genome metabolome integrated network analysis; serum Valcrcel et al 39 Fatty acids and lipoprotein subclasses n=1269 individual twins, including 561 complete pairs; genetic and environmental cause of the associations of serum fatty acids with lipoprotein profile; serum Jelenkovic et al 41 Biomarkers and risk assessment Subclinical atherosclerosis n=1595 young adults; circulating biomarkers for 6-year high carotid intima media thickness, new systemic biomarkers with improved risk stratification for subclinical atherosclerosis in comparison with conventional lipids; serum Wrtz et al 19 Type 1 diabetes mellitus and kidney disease Up to n=3544 patients with type 1 diabetes mellitus; cross-sectional and prospective associations of various systemic metabolites and lipoprotein subclass measures with the severity of diabetic kidney disease and mortality; introducing multiparametric risk assessment of diabetic nephropathy; serum Mkinen et al 36 Mkinen et al 26 All-cause mortality n=17 345 from 2 general population cohorts; 4 circulating biomarkers for 5-year risk of death; biomarker associations with multiple causes of death suggest novel systemic connectivities across seemingly disparate morbidities; improved prediction of the short-term risk of death from all causes above established risk factors; serum and plasma Fischer et al 32", + "IntroductionCoronary artery disease (CAD) is the leading cause of morbidity and mortality worldwide.Although our understanding of cardiovascular disease is improving, the underlying mechanisms and the comprehensive and detailed pathogenesis of CAD remain unclear.Disequilibrium of lipid metabolism is a causative factor of predisposition to CAD (Weber and Noels 2011).As a key structural component of functioning lipoproteins such as chylomicrons, very low-density lipoprotein (VLDL), intermediate-density lipoprotein (IDL), and low-density lipoprotein (LDL), apolipoprotein B (APOB) is a crucial protein involved in the metabolism and maintenance of serum cholesterol homeostasis.These functioning lipoproteins participate in the process of transporting cholesterol and triglycerides throughout the circulation.Additionally, APOB binds to the LDL receptor, which mediates LDL degradation.Furthermore, a recent review indicated that subendothelial retention of APOB-containing lipoproteins was responsible for the initiation of atherogenesis (Benn 2009).Therefore, APOB plays an important role in the development of CAD." + ], + [ + "Recent genetic analysis suggests that free radical production can be increased by decoupling electron transport of ubiquinone to O 2 .A missense mutation in C. elegans cytochrome b560 causes a decrease in life span and oxygen hypersensitivity (as well as radiation hypersensitivity, a reasonable pleiotrophy because radiation induces free radicals, which damage DNA).Paradoxically, this is the opposite phenotype from the clk-1 defect in coenzyme Q biosynthesis.A model that explains this is that the cytochrome missense mutation causes a toxic build up of ubisemiquinone (a free radical that can generate superoxide) because the normal pathway for further reduction of singly reduced coenzyme Q is compromised (50).", + "Keywords: ethanol, Coenzyme Q, oxidative stress, hippocampus, mouse models, genetics, genomicsINTRODUCTIONCoenzyme Q (CoQ or ubiquinol) is a lipophilic molecule present in every cell membrane in thebody (Crane, 2001; Turunen et al. , 2004). It is best known for its roles as a mitochondrial electrontransporter and a potent membrane anti-oxidant (Ernster and Dallner, 1995; Bentinger et al. , 2007). CoQ is made up of a benzoquinone ring with an isoprenoid side chain (containing 610 units)conserved across species from yeast (as CoQ6 ), to mice (as CoQ7 ), to humans (as CoQ10 ) (Lenaz,1985).", + "The MT-CYB, a polypeptide with approximately 400 amino acid residues is one of the integral subunit of complex III of electron transport chain.Previously, mutations in MT-CYB gene have been described in uterine tumors (Shaik et al. 2011), cardiomyopathy (Feigenbaum et al. 2006), exercise intolerance (Massie et al. 2010) and histiocytoid cardiomyopathy (andreu et al. 2000).The MT-CYB T15062C, C15238a, T15378G and C15491G variants identified in the present study were specific to right atrial appendage tissues.These were previously not found to be reported in any of the human diseases in Mitomap database.", + "Mitochondria are indispensable organelles as they are responsible for the production of the majority of ATP in the cell.Most cellular ATP is generated by oxidative phosphorylation (OxPhos), a process through which electrons are extracted from reducing equivalents and transferred through four different respiratory complexes (RCs) present in the mitochondria inner membrane (CI-CIV).Electron transfer is coupled with the generation of a proton gradient through the mitochondrial membrane that drives the phosphorylation of ADP to ATP by the ATP-synthase complex (also known as complex V).", + "Second, the protein product of the CYP24A1 (cytochrome P450, family 24, subfamilyA, polypeptide 1; HGNC:2602) gene is responsible for degradation of vitamin D intoa physiologically inactive form. Vitamin D was shown to be essential for propermuscle functioning (Endo et al. , 2003; Pfeifer et al. , 2002) and polymorphisms in thevitamin D receptor are associated in humans with changed muscle strength in bothgenders (Windelinckx et al. , 2007); these changes are likely to influence the levels ofphysical activity.", + "The product of the UQCR gene is the ubiquinol-cytochrome c reductase complex, also called mitochondrial complex III.It functions to form a part of the mitochondrial respiratory chain.It may also act as a binding factor for the iron-sulfur protein.Mitochondrial Complex III is composed of one mitochondrial-encoded subunit (MT-CYB) and ten nuclear-encoded subunits.The complex is located within the mitochondrial inner membrane and plays an important role in biochemical synthesis of ATP.It functions to catalyze electrons to transfer from succinate and nicotinamide adenine dinucleotide linked dehydrogenases to mitochondrially encoded cytochrome b.It also functions to utilize the energy to translocate protons across the membrane 27 .Deficiency of isolated complex III has been detected in patients of neuromuscular and nonneuromuscular disorders in both children and adults 28 .", + "Figure 3. Mito-nuclear co-adaptation from functional assays. (a) Activities of oxidative phosphorylation (OXPHOS) enzymes in cultured cells with a Mus musculus (Mm) nucleus and mitochondria from other species (i.e.cytonuclear hybrid, or 'cybrid' cells).Species names, abbreviations and divergence times in millions of years (my) are as follows: Mus spretus (Ms) w2 my, Mus caroli (Mc) w3 my, Mus dunni (Md) w4 my, Mus pahari (Mp) w6 my, Rattus norvegicus (Rn) and Otomys irroratus (Oi) w12 my.OXPHOS enzyme activities were normalized to control activity (Mm, 100%Gs.d. ), and show normal levels with mitochondria from Ms, Mc and Md.The Mp cybrids showed normal complex I, II and III activity, but a deficiency in complex IV activity.The Rn cybrid showed deficiencies of complex I and III and a partial defect of complex IV, whereas the Oi cybrid showed a marked complex I and IV defect, and a severe complex III defect. (*P!0.05; **P!0.005; reproduced, with permission, from[49]. )(b) Cytochrome oxidase complex (COX) activities in the copepod Tigriopus californicus using mitochondria isolated from Santa Cruz (SC) or San Diego (SD) individuals and cytochrome c isolated from either SC (light-green bars) or SD (dark-green bars).Assays at 188C (bi) or 258C (bii). (Error barsZs.e. ; P!0.0001 for COX-by-CYTC, COX-by-Temperature, and CYTC-by Temperature interactions.Reproduced, with permission, from[54].)", + "Mitochondria are indispensable organelles as they are responsiblefor the production of the majority of ATP in the cell. Most cellularATP is generated by oxidative phosphorylation (OxPhos), a processthrough which electrons are extracted from reducing equivalentsand transferred through four different respiratory complexes(RCs) present in the mitochondria inner membrane (CICIV). Electron transfer is coupled with the generation of a protongradient through the mitochondrial membrane that drives the phosphorylation of ADP to ATP by the ATP-synthase complex (also knownas complex V). A full list of affiliations appears at the end of the paper.", + "One is a conversion interaction; its inputsare citratecytosol + CoAcytosol + ATPcytosol and its outputs are acetyl-CoAcytosol +oxaloacetic acidcytosol + ADPcytosol + P04cytosol . The second is a catalytic interaction; its input is ATP citrate lyasecytosol . In another example, to represent thetranslocation of citrate from the mitochondrion to the cytosol, two entities and asingle conversion interaction are used: citratemitochondrion goes to citratecytosol . Theformation or modication of a protein complex can be represented. For example,ACLA and ACLB are the subunits that compose the enzyme ACL.", + "In 1925, Keilin (188) discovered cytochromes in aerobic cells.He concluded that there are three distinct pigments, which he called cytochromes a, b, and c, and that they underwent oxidation-reduction changes in a determined sequence, which bridges dehydrogenase discovered by Wieland (414) and oxygenase by Warburg (407), leading to the concept of the respiratory chain.NADH + Flavoprotein -+ Cytochromes b +c+a+a3+OzDuring 1940 -1950s extensive studies identified mitochondria as centers of energy metabolism.In 1950-1960s studies on isolated mitochondria had clarified gross structure and bioenergetics of the respiratory chain that produces most of bioenergy in a cell.Mitochondria carry out the tricarboxylic acid cycle and the P-oxidation pathway for fatty acids.These degradative sequences essentially remove hydrogen from metabolic fuels with the release of CO2 and transfer it through coenzymic carrier to the respiratory chain in the mitochondrial inner membrane.The chain passes the electrons sequentially through complex I (NADH dehydrogenase) or complex II (succinate dehydrogenase), coenzyme Q (CoQ), complex III (ubiquinol: cytochrome c oxidoreductase), cytochrome c, and complex IV (cytochrome oxidase) to oxygen to give water, as schematically illustrated in Figure 1.The released energy is used to pump protons out of the mitochondrial inner membrane, creating an electrochemical gradient.The energy stored in this gradient is the driving force for complex V (ATP synthetase), which is also associated with the inner membrane, to condense ADP and Pi to make ATP.From the above mechanism of oxygen reduction, a large quantity of ROS is expected to generate from the genetically defected active sites of cytochrome oxidase and/or cytochrome b, or with too much oxygen supply over enzymic capability to dispose ROS.For example, the cytochrome oxidase subunit II is assessed as the binding site of cytochrome c.Hence, lack of the subunit II, which is commonly detected in the patients with mitochondrial myopathy (379), with KSS (241), or with MERRF (218), inevitably results in the decreased binding capacity cytochrome c to the oxidase, namely, the increased Michaelis constant (K,) of cytochrome c, leading to enhanced reduction of the respiratory chain at cytochrome b region and complex I, from where ROS production will explode similar to the ischemia-reperfusion episode of heart.All the patients harboring severe point mutations in the cytochrome oxidase subunit genes or in the cytochrome b gene expressed most severe clinical phenotype (280); for instance, a recipient of heart transplantation at age 7 (283) or a case of fatal infantile cardiomyopathy died at age 1 (291).A greater magnitude of ROS is produced by isolated mitochondria exposed to hyperbaric oxygen (36) or in vivo reperfusion of ischemic heart (204,267).The histochemical examination of biopsied hepatic tron transfer carriers and coupling ATPase, with a direct conversion of promitochondria to respiratory functional organelles (398).Promitochondria look clearly identifiable as normal mitochondria, having a normal outer membrane and an inner membrane with poorly developed cristae.Hence, a correlation between mitochondrial morphology and human aging would be expected not in the gross structure, but in histochemical demonstration of mtDNA encoded enzymes.cells (386) revealed no distinct difference with age in the activity of succinate dehydrogenase that is encoded bY nuclear gene.In contrast, mtDNA-encoded cytochrome oxidase (complex IV) in 1 40 autopsied hearts revealed randomly distributed cardiomyocytes without enzyme activity (254).The expression of the defect was independent of an underlying heart disease, but age was a discriminating factor.The defects occurred sporadically in the second decad .e but were regularly present from the sixth decade on.The results indicate that cytochrom .eoxidasedeficient heart muscle cells represent a degenerative lesion associated with cellular aging and may be involved Concerning number and size of mitochondria with aging, Tauchi and Sato (386) carried out an extensive survey of area and circumference of mitochondria of the in the reduction of myocardial contractile ability in senescence.Similarly, histochemical activity of cytochrome oxidase in limb muscle and in diaphragm (255) revealed randomly distributed muscle fibers without the enzyme activity, in contrast to normal histochemical reactivity for succinate dehydrogenase.A histochemical analysis of in diaphragmatic muscles from 49 subjects of different ages (54) demonstrated respiratory failure (cytochrome oxidase negativity) in occasional fibers from the fourth decade on with an -lo-fold increase between the fourth and ninth decade (from 0.16 to 2.85%).It thus had been a matter of great interest to discover why cytochrome oxidase has not been shown to generate significant amounts of such intermediates.On the basis of optical studies of oxy-and peroxy-cytochrome oxidase by Chance et al. ( 63), it became clear that the intermediates of oxygen reduction remain within the active site of cytochrome oxidase until the final reaction stage of water is achieved, probably for protection against cellular intoxication.From general properties of the mitochondrial generation of HzOz and effect of hyperbaric oxygen, it was postulated (36) that besides the well-known flavin reaction, formation of HZOZ may be due to interaction with an energy-dependent component of the respiratory chain at the cytochrome b level.These findings clearly indicated that the active sites of the complex IV and III, consisting of cytochromes a and b, respectively, play a crucial role not only for the cellular energy production, but also for protection against cellular oxidative damage.Hence, attenuation of the active sites, even low absolute level, could result in serious outcome in cellular viability.This point, however, has been not well recognized by the researchers until recently.", + "Second, the proteinproduct of the CYP24A1 (cytochrome P450, family 24, subfamilyA, polypeptide 1; HGNC:2602) gene is responsible for degradationof vitamin D into a physiologically inactive form. Vitamin D wasshown to be essential for proper muscle functioning [48,49] andpolymorphisms in the vitamin D receptor are associated in humanswith changed muscle strength in both genders [50]; these changesare likely to inuence the levels of physical activity. However, neither Mc3r nor Cyp24a1 contain a known non-synonymous codingSNP between the progenitor strains, A and B6.", + "The cytochrome P450s (CYPs) causes the site-specific oxidization of the cyclic skeleton synthesized by OSCs.", + "Second, the proteinproduct of the CYP24A1 (cytochrome P450, family 24, subfamilyA, polypeptide 1; HGNC:2602) gene is responsible for degradationof vitamin D into a physiologically inactive form. Vitamin D wasshown to be essential for proper muscle functioning [48,49] andpolymorphisms in the vitamin D receptor are associated in humanswith changed muscle strength in both genders [50]; these changesare likely to inuence the levels of physical activity. However, neither Mc3r nor Cyp24a1 contain a known non-synonymous codingSNP between the progenitor strains, A and B6.", + "IntroductionThe mitochondrion of the modern human cell is the product of an ancient symbiosis in which an oxidative bacterium took up residence in the proto-nucleated cell that had developed motility and endocytosis.Following this initial symbiotic event, most of the genes of the mitochondrion were transferred to the nuclear DNA (nDNA) where they now reside, are replicated and transcribed.The resulting nDNA-encoded mitochondrial mRNAs are then translated on cytosolic ribosomes into proteins which are selectively imported into the mitochondrion.This mitochondrial protein import is frequently mediated by an amino terminal targeting peptide which is removed on entrance of the polypeptide into the mitochondrial matrix.", + "The ITCHY library gave rise to variants with improved kcat with the substrateused for selection compared to either of the parental enzymes and additionallyshowed activity on ethacrynic acid, a compound recognized by neither parental enzyme. This combination of a human with nonhuman enzymes to formactive chimeras shows that this method could be used for the humanization ofproteins with therapeutic values that show no conserved framework allowingfor rational grafting. Chapter 36 Directed Protein Evolution3.1.9. SHIPRECCytochromes are proteins that contain heme groups and are responsible forthe transport of electrons.", + "One is a conversion interaction; its inputsare citratecytosol + CoAcytosol + ATPcytosol and its outputs are acetyl-CoAcytosol +oxaloacetic acidcytosol + ADPcytosol + P04cytosol . The second is a catalytic interaction; its input is ATP citrate lyasecytosol . In another example, to represent thetranslocation of citrate from the mitochondrion to the cytosol, two entities and asingle conversion interaction are used: citratemitochondrion goes to citratecytosol . Theformation or modication of a protein complex can be represented. For example,ACLA and ACLB are the subunits that compose the enzyme ACL." + ], + [ + "Genetic mapping inmouse strains enhances the power of detecting modifier genes and identifying complexgenetic interactions. Genomewide quantitative trait locus (QTL) analysis, as described inmore detail below, represents a promising approach to detect genetic variants that areassociated with specific phenotypes and interact with each other. 16ACCEPTED MANUSCRIPTIn experimental crosses of two (inbred) strains the first generation (F1) ofoffsprings is genetically heterozygous but equal. Then in the next generation (F2) thePTstrain-specific genetic information is distributed across the genomes of their progeny andRIeach offspring is genetically unique.", + "This contrast can be exploited to identify subregions that underlie the trans-QTLs [67]. SNPs were counted for all four pairs of parental haplotypesBvs D, B vs H, B vs C, and L vs Sand SNP profiles for the fourcrosses were compared (figure 6). Qrr1 is a highly polymorphicPLoS Genetics | www.plosgenetics.org8November 2008 | Volume 4 | Issue 11 | e1000260QTL Hotspot on Mouse Distal Chromosome 1Figure 5. QTL for aminoacyl-tRNA synthetases in distal Qrr1.", + "The traditional approach to QTL mapping is to usetwo strains that differ maximally in the phenotype asparental strains for genetic crosses, with the followingcaveats. QTL analysis based on a single cross will mostlikely reflect only a small portion of the net geneticvariation, and QTL detection will be limited to regionswhere the two progenitor strains have functional polymorphisms. Data from multiple crosses, or from an HS,will overcome this limitation and can also be used toreduce QTL intervals [5,30].", + "These candidate genes are then sequenced in the two parental inbredstrains looking for sequence dierences in coding or regulatory regions. After ne mapping the QTL interval and shortening the list of plausiblecandidate polymorphisms, the major challenge remains \u0001 proving denitivelywhich nucleotide polymorphism underlies the QTL. The most direct proofwould be replacing one strains allele with another strains allele (creating aFIG. 1. Intercross breeding strategy for mapping quantitative trait loci (QTLs). On the right, the parental, F1 hybrid, and intercross (F2) mousegenerations are depicted.", + "One key advantage of lookingat transcripts with expression levels linking to behavioral QTL is the potential tomake inferences about the causal DNA variants underlying behavioral traits andtheir mechanisms. In analysis of behavioral traits alone, since linkagedisequilibrium is far-ranging in the BXD cross and each marker represents a widestretch of genomic sequence across which inheritance is correlated, it is difficultto narrow down which gene and DNA sequence variation at a given mappedlocus influences the behavioral trait.", + "Interval-specific haplotype analysisApproximately 97% of the genetic variation betweeninbred mouse strains is ancestral [22], so regions ofidentity by descent (IBD) between two strains used todetect a QTL are highly unlikely to contain the causalgenetic polymorphism underlying the QTL [28]. Forexample, a cross between C57BL/6J and A/J mice detectedwww.sciencedirect.coma blood pressure QTL on Chr 1 [7].", + "Interval-specific haplotype analysisApproximately 97% of the genetic variation betweeninbred mouse strains is ancestral [22], so regions ofidentity by descent (IBD) between two strains used todetect a QTL are highly unlikely to contain the causalgenetic polymorphism underlying the QTL [28]. Forexample, a cross between C57BL/6J and A/J mice detectedwww.sciencedirect.coma blood pressure QTL on Chr 1 [7].", + "At present, the BXD panel is composed of 80 different strains that all have beenfully genotyped.26 Variation in any quantifiable trait can be associated with thesegregation of parental alleles, and linkage genetics can map this variation toquantitative trait loci (QTLs), thereby identifying the genomic region(s) affectingthat trait. An overview of the QTL mapping approach is depicted in Figure 2. Classical QTL analysis has permitted the identification of loci that areassociated with variation in HSC traits.", + "The progenitor mouse strainsshould have sufficient variation for the traits of interest and they should be genetically diverseenough to enable genetic mapping (BENNETT et al. 2006; FLINT 2003; GRISEL 2000). Thesample size required for the identification of QTL depends largely on the effect size that aQTL contributes to phenotypes on interest. Inference about QTL can be made if one or moregenetic markers are over- or underrepresented in the analysed individuals. Genotyping isoften done by means of microsatellite markers, which contains mono, di-, tri-, ortetranucleotide tandem repeats flanked by specific sequences (Figure 4a).In general,linking genetic variation with trait variation identifies QTL and a significant linkage ofphenotype and genotype suggest that the DNA status helps to determine trait expression. As stated above, mouse QTL studies provide distinct advantages over human studiesin the examination of genetic causes of a quantitative trait (e.g. alcoholism), even in theabsence of specific hypotheses regarding its aetiology or candidate genes.This comparison gives information about the reliability of the observed genotypeinformation: The more the marker locations differ between the two maps (which signifiesvariation in marker positions), the higher the possibility of genotyping errors. QTL mapping was done in several stages to identify loci acting individually and QTL thatinteracted, either additively or epistatically. To determine individually-acting QTL, a singleQTL genome scan was conducted with the function scanone.", + "Importantly, whereasthese studies required substantial labor, time, and resources, X-QTL is a quick and easyapproach to achieve a comparable level of genetic dissection. The levels of complexityobserved here (e.g. 14 loci explaining 70% of the genetic variance for 4-NQO resistance) arestill dramatically lower than those seen in for some human traits in GWAS (e.g. 40 lociexplaining 5% of the variance for height 2,5). One obvious explanation is the difference inexperimental designs (line crosses vs. population association studies), but differences ingenetic architectures among species and traits may also contribute.", + "The method uses two pieces of information: mapping data from crosses thatinvolve more than two inbred strains and sequence variants in the progenitor strains within the intervalcontaining a quantitative trait locus (QTL). By testing whether the strain distribution pattern in the progenitor strains is consistent with the observed genetic effect of the QTL we can assign a probability that anysequence variant is a quantitative trait nucleotide (QTN). It is not necessary to genotype the animals exceptat a skeleton of markers; the genotypes at all other polymorphisms are estimated by a multipoint analysis.", + "The method uses two pieces of information: mapping data from crosses thatinvolve more than two inbred strains and sequence variants in the progenitor strains within the intervalcontaining a quantitative trait locus (QTL). By testing whether the strain distribution pattern in the progenitor strains is consistent with the observed genetic effect of the QTL we can assign a probability that anysequence variant is a quantitative trait nucleotide (QTN). It is not necessary to genotype the animals exceptat a skeleton of markers; the genotypes at all other polymorphisms are estimated by a multipoint analysis.", + "which allows the the estimation of the degree of dominance of detected QTLs.but point out that various designs or linkage analysis methods have advantages and disadvantages and need to be chosen for the question at hand. In general, there are two common statistical approaches for detecting and/or locating QTL. Both approaches involve moving along the chromosome and considering data for one or several markers at a time and relating these to the traits of interest.", + "Genotyping all the individual progeny formarkers that show allelic variation between the parental strains (either single nucleotide polymorphisms or simple sequence repeats) will allow the detection of associations between trait values and marker genotype, and in this way demonstrate to whichset of markers a QTL is linked. To reduce the genotyping effort, selective genotypingof the individuals at the extremes of the phenotypic spectrum can be performed (20,23). Although these three approaches are in general considered to be the best to detect andmap QTL, they have several disadvantages for quantitative traits involving HSC.", + "So, how do you go about planning and performing a QTL study, and howdo you identify the responsible gene within a QTL that you have identified? Generally, one starts by performing a strain survey to find two parental inbredstrains that have a markedly different trait. One can now look up many differenttraits of inbred mice online at the Mouse Phenome Database (http://phenome. jax.org/pub-cgi/phenome/mpdcgi?rtn=docs/home). However, the trait you maywant to study may not be present in wild type mice, so you may want to crossa mutant (or genetically engineered) strain onto several inbred strains.QTL Theory and PlanningThe theory behind the most basic form of QTL mapping is based upon intercrossing two inbred strains. The mouse genome consists of 19 pairs of autosomes (non sex-determining chromosome) and the X and Y chromosomes. Inthe example shown in Fig. 18.1, we are intercrossing stain A (shown with ablack chromosome pair) with strain B (shown with a white chromosome pair). The initial F1 (filial generation 1) mice are true hybrids, with each individualFrom: Molecular Biomethods Handbook, 2nd Edition.", + "These candidate genes are then sequenced in the two parental inbredstrains looking for sequence dierences in coding or regulatory regions. After ne mapping the QTL interval and shortening the list of plausiblecandidate polymorphisms, the major challenge remains \u0001 proving denitivelywhich nucleotide polymorphism underlies the QTL. The most direct proofwould be replacing one strains allele with another strains allele (creating aFIG. 1. Intercross breeding strategy for mapping quantitative trait loci (QTLs). On the right, the parental, F1 hybrid, and intercross (F2) mousegenerations are depicted.", + "QTL mapping studies thenseek to detect the polymorphisms underlying the complex traits of interest byscanning for alleles that co-vary withthe traits. Similar experiments also can be conducted with special derivatives of inbredstrains known as recombinant inbred(RI) mice. These animals are derivedby cross-breeding two or more distinctparental strains (which often divergewidely for the trait of interest), followedby inbreeding of the offspring for severalgenerations (Bailey 1971). Given thecorrect breeding strategy, this method1This is an issue faced by GWASs researchers when classifyingsamples as cases or controls." + ] + ], + "task_id": [ + "CDFC418BD568E839C09656C57808ADA1", + "5DEB102510F48D0BF9C278DC895A8BD1", + "58D6F365917926445960756A26B3FDC8", + "A5DEAEAC441B3BDC65B58EA6923FAE73", + "0CDD1C9219114BB2770C28D541F1060A" + ] +}
\ No newline at end of file diff --git a/gnqa/data/study1/datasets/human/dataset_citizenscientist_general_6.json b/gnqa/data/study1/datasets/human/dataset_citizenscientist_general_6.json new file mode 100644 index 0000000..250aed7 --- /dev/null +++ b/gnqa/data/study1/datasets/human/dataset_citizenscientist_general_6.json @@ -0,0 +1,104 @@ +{ + "question": [ + "how does environment influence fertilisation", + "how does diet impact someone's height", + "which animal has the same number of chromosomes as human", + "what's ensures brains work", + "how do our brains maintain emotions" + ], + "answer": [ + "Environmental factors can influence fertilisation in several ways. Factors such as temperature, pressure, and chemical activity can affect the physiological processes involved in fertilisation. Lifestyle factors like smoking, irradiation, oral contraceptives, and low socioeconomic status can also impact natural conception. Moreover, environmental conditions can affect the quality of maternal care and offspring development. For instance, crowded larval environments can lead to a decline in quality over time, affecting the genetic polymorphism in populations. Additionally, environmental stressors can influence the responsiveness to prenatal stress, potentially affecting fertility.", + "The text does not provide information on how diet impacts someone's height.", + "The Bama miniature pig has the same number of chromosomes as humans.", + "The functioning of the brain is ensured by the complex interplay of neuronal networks, synaptic connections, and the precise regulation of molecular and cellular events. This includes the spatiotemporal regulation of the transcriptome, the signaling dynamics of integrated circuitry, and the balance between activating and inhibiting systems. Additionally, the maintenance of genomic integrity in neuronal cells is crucial for the brain's high metabolic activity.", + "Our brains maintain emotions through complex neural systems and processes. The reward-motivation system, which includes the striatum, amygdala, ventral tegmental area, orbitofrontal cortex, ventromedial prefrontal cortex, and anterior cingulate cortex, plays a key role in processing emotional and motivational information. The amygdala, a principal component of the limbic system, controls emotionality. Stress, anxiety, and other emotion-related traits are influenced by multiple genetic, environmental, experiential, and epigenetic factors. Additionally, hormones released by stressful experiences can enhance memory consolidation, further influencing emotional responses." + ], + "contexts": [ + [ + "This may be due to the cost of increased solicitation (reflected in reduced bodyweight for the effort expended) for which we found evidence in our study. Bodyweight is indeednegatively correlated with the level of offspring solicitation (GLM, F1,66 = 20.57, P < 0.001 e.g. day10, r = -0.39, and day 14, r = -0.44; Figure 6 and Supplementary file 2, e). DiscussionOur study of the genetics underlying family interactions has revealed that genes expressed in offspring can indirectly influence the quality of maternal behaviour and thus offspring fitness.", + "UNINTENDED SELECTION IN LABORATORY ENVIRONMENTSIn nature, environmental variables are often highly correlated. For example, high-pressuredeep-sea habitats are generally cold, hydrothermal vents being an extremely rare (butextremely interesting) exception. The solubility of oxygen in water is negatively related totemperature; thus, even oxygen-saturated aquatic environments can have less availableoxygen than colder, subsaturated regions. In terrestrial environments, the saturatingvapor pressure of water increases dramatically with temperature, so that a parcel of aircontaining the same absolute quantity of water vapor will have a lower relative humidityas it warms.It hasalso been observed that, over the span of one generation, crowded larval environmentsshow a temporal decline in quality (Borash et al. 1998). Ammonia levels increase overtime, while food and ethanol levels decrease. This complexity appears to be responsiblefor a genetic polymorphism in crowded populations. Very early-developing genotypeshave high feeding rates but low tolerance to ammonia, while late-developing genotypesfeed more slowly and can tolerate higher ammonia levels. There may be many naturalenvironments that exhibit similar patterns of temporal decay (Borash et al. 1998).Temperature is the mostimportant and common physical variable affecting the distribution and abundance oforganisms in nature, as a 10C increase in temperature causes most biochemical reactions to increase in rate two- to threefold. Typical physiological temperatures span040C, although more extreme limits are well known (e.g. , overwintering plants andinsects, hot springs bacteria). Thus, selection experiments using temperature may behighly relevant to the real world. For aquatic organisms, the osmotic strength of the surrounding medium is an important environmental variable.In nature, thermodynamic variables such as temperature, pressure, and chemicalactivity (i.e. , the concentration of salts, hydrogen ions, etc. )differ across habitats. Lifeitself requires input of raw materials from the environment (nutrients, water, ions, etc. )that can then be used to drive physiological processes and make more organisms. We consider here two categories of environmental variables that have been used asselective agents in laboratory natural selection experiments.In ahumid environment, higher temperature will increase metabolism but wont increaseevaporative water loss; but in a dry environment, higher temperatures will increasemetabolism and water loss. A call for greater ecological realism is not without precedent. Ecologists have developed sophisticated laboratory facilities that can mimic simple terrestrial ecosystems. AtSilwood Park, for instance, the Ecotron consists of fifteen environmental chambers ableto control and manipulate photoperiod, illumination (balanced spectrum, dawn/dusksimulation), temperature, humidity, rainfall, and even CO2 (Lawton 1996). The chambers house multispecies ecosystems, allowing for complex ecological interactions ofplants and animals.", + "Alternatively, the \"limited oocyte pool\" hypothesis (Warburton, 1989) suggests a more direct effect of antral oocyte pool size on the risk of aneuploidy.The limited number of antral follicles available in older women could lead to the selection of a suboptimal oocyte for ovulation, for example one that is either immature or postmature.Some experimental evidence in other mammals supports the idea that such Sample described in Warburton et al. (1986).", + "In the most general terms, three types of environmental factors can influence human health during aging: physical, chemical, and biological.Physical factors include temperature and solar radiation.Chemical factors from natural and biological sources include trace toxins (asbestos, lead, tobacco smoke), but also trace morphogens that can cause subtle abnormalities in development.Biological factors include diet and infectious organisms, but also stress from social interactions.We know little about the concentrations of a vast number of bioactive substances that may be present sporadically in the environment.It seems fair to say that our concept of the environment will evolve rapidly with new technical developments and may come to include multigenerational effects.For example, in the case of diabetes, the maternal physiological state existing before pregnancy can influence fetal growth.Moreover, the ovary acquires its full stock of eggs in the fetus: thus, the egg cell from which all of our cells stem was exposed to the environment of our maternal grandmother (Finch and Loehlin, 1998).The depth of the transgenerational environment is a completely obscure aspect of human experience.", + "Low human fertilityAnother area of interest is that of changing fertility patterns in the developed world.Currently ESRC investment is focused upon both the economic and social trajectories of demographic change.There are a number of significant questions that need to be addressed in relation to involuntary infertility however.For example, infertility rates, which appear to be rising, and also the considerable variation that exists in the timing of the ending of the human reproductive span.Researchers need to know whether these factors are genetic, gene/environment interactions, or entirely environmentally induced, and why there is a need for heterogeneity of fecundity in biometric models of fertility, coital frequency or genetics (Hobcraft, 2003).", + "UNINTENDED SELECTION IN LABORATORY ENVIRONMENTSIn nature, environmental variables are often highly correlated. For example, high-pressuredeep-sea habitats are generally cold, hydrothermal vents being an extremely rare (butextremely interesting) exception. The solubility of oxygen in water is negatively related totemperature; thus, even oxygen-saturated aquatic environments can have less availableoxygen than colder, subsaturated regions. In terrestrial environments, the saturatingvapor pressure of water increases dramatically with temperature, so that a parcel of aircontaining the same absolute quantity of water vapor will have a lower relative humidityas it warms.It hasalso been observed that, over the span of one generation, crowded larval environmentsshow a temporal decline in quality (Borash et al. 1998). Ammonia levels increase overtime, while food and ethanol levels decrease. This complexity appears to be responsiblefor a genetic polymorphism in crowded populations. Very early-developing genotypeshave high feeding rates but low tolerance to ammonia, while late-developing genotypesfeed more slowly and can tolerate higher ammonia levels. There may be many naturalenvironments that exhibit similar patterns of temporal decay (Borash et al. 1998).Temperature is the mostimportant and common physical variable affecting the distribution and abundance oforganisms in nature, as a 10C increase in temperature causes most biochemical reactions to increase in rate two- to threefold. Typical physiological temperatures span040C, although more extreme limits are well known (e.g. , overwintering plants andinsects, hot springs bacteria). Thus, selection experiments using temperature may behighly relevant to the real world. For aquatic organisms, the osmotic strength of the surrounding medium is an important environmental variable.In nature, thermodynamic variables such as temperature, pressure, and chemicalactivity (i.e. , the concentration of salts, hydrogen ions, etc. )differ across habitats. Lifeitself requires input of raw materials from the environment (nutrients, water, ions, etc. )that can then be used to drive physiological processes and make more organisms. We consider here two categories of environmental variables that have been used asselective agents in laboratory natural selection experiments.In ahumid environment, higher temperature will increase metabolism but wont increaseevaporative water loss; but in a dry environment, higher temperatures will increasemetabolism and water loss. A call for greater ecological realism is not without precedent. Ecologists have developed sophisticated laboratory facilities that can mimic simple terrestrial ecosystems. AtSilwood Park, for instance, the Ecotron consists of fifteen environmental chambers ableto control and manipulate photoperiod, illumination (balanced spectrum, dawn/dusksimulation), temperature, humidity, rainfall, and even CO2 (Lawton 1996). The chambers house multispecies ecosystems, allowing for complex ecological interactions ofplants and animals.", + "How do we improve reproductive success and reduce the effects of maternal aging in the natural population as well as in the clinic?Identification of lifestyle factors that affect natural conception is important.Several factors including smoking, irradiation, oral contraceptives and low socioeconomic status (Christianson et al., 2004;Hunter et al., 2013) have been implicated but their molecular basis has yet to be elucidated.Disentangling the factors that influence aneuploidy may provide us with lifestyle interventions to reduce miscarriage rates and may move the J curve to the right and prevent the early truncation of reproductive lifespan caused by aneuploidy.", + "Several lines of evidence further suggest that fetal genetic effects may influence birth timing.First, fetal genes that are paternally imprinted mainly control placental and fetal membrane growth [16] .Because the placenta and fetal membranes likely play a role in preterm birth, fetal genes controlling these tissues may also contribute.Additionally, a study comparing the correlation in gestational age between full and half siblings suggests that preterm birth is influenced in part by fetal genetic factors [15] .Lastly, several studies suggest that paternity affects risk for the disorder.For example, several studies indicate that partner changes between pregnancies reduced risk of preterm birth [17,18] ; however, changes in paternity may reflect association with long interpregnancy intervals rather than paternity effects per se.Paternal race also has been associated with preterm birth risk.Previous studies observed that preterm birth rates are highest when both parents are Black and remain higher when one parent is Black, whether that parent is the mother or father [19,20] , suggesting that fetal race also influences birth timing.However, father's family history of preterm birth has been shown to have only a weak association with risk.While an early study of a Norwegian birth registry demonstrated a correlation between father and children's gestational ages [21] , a more recent and extensive study of this registry suggested fathers contributed little to no risk to preterm delivery [22] .Similarly, a recent study [14] suggested that paternal genetics contributed little to gestational age, but could not refute the possible role of maternally-inherited genes expressed in the fetus.Hence, while paternally-in-herited genes may contribute little to preterm birth or other disorders, maternally-inherited genes expressed in the fetus may still be important.Together, these data suggests that the fetal genome may contribute to birth timing, motivating further study defining the infant as the proband.", + "Young maternal age at conception may play a role in longer child telomere length but again the biology of these relationships including environmental versus genetic factors need to be better studied (Prescott et al. 2012).Furthermore, as rate of change reflects both genetic and environmental influences, it is important to determine whether parental rate of change might covary with child rate of change.", + "6.2 Mechanisms of indirect genetic effects on maternal care6.2.1 Prenatal effectsOffspring effects on maternal investment and postnatal behaviour begin in utero. Theplacenta is vital for the development of offspring in eutherian mammals (John and Surani,2000) by regulating transfer of nutrients from mother to offspring (Constncia et al. , 2002),which in turn increases maternal food intake (Newbern and Freemark, 2011) and alsoprimes the maternal brain for parenting behaviour (Bridges et al. , 1990, 1997).", + "DISCUSSIONDespite the fact that genetic factors that reduce the ability of an individual to reproduce are expected to be under intensive negative selection, reduced fertility is a common health condition in humans (de Kretser, 1997;Agarwal et al., 2015) and an important economic trait in dairy cattle.Previous studies that included measurements of progesterone and pregnancy-specific protein B levels suggested that a large portion of recorded non-conceptions in human and cattle are apparently the result of unrecognized EA (Edmonds et al., 1982;Humblot, 2001;Santos et al., 2004;Carthy et al., 2015).Markers are sorted in descending order of the probability to reject the null hypothesis of no effect on putative early abortion rate.The substitution effects and coefficients of determination are given for each marker for putative early abortion and conception status.", + "by the gross limitations of forward genetic approaches in humans,including limited ability to dissect environmental factors and gene XWhat this study addsenvironment interactions, particularly the contribution of environmen- Identifies candidate genes that may moderate the effectstal factors in utero (Burmeister, McInnis, & Zllner, 2008; Henriksen,of prenatal stress on cocaine responsiveness. Nordgaard, & Jansson, 2017), and these limitations in turn hinder the Demonstrates sex as a factor that moderates the effectsdevelopment of a mechanistic understanding of aetiology. Here, weof early life stress on cocaine responsiveness." + ], + [ + "Year Period, lbsNote.Weight changes are shown per increase in daily serving of the food or beverage.All weight changes were adjusted simultaneously for age, baseline body mass index, sleep duration, smoking status, physical activity, television watching, alcohol use, and all the dietary factors shown.Source.Adapted from Mozaffarian et al.6Women who increased their adherence to a Western pattern (high intakes of red and processed meats, refined grains, sweets or desserts, SSBs, and potatoes) gained the most weight across 8 years of follow-up.In parallel, women who increased their adherence to a prudent pattern (high intakes of fruits, vegetables, whole grains, fish, poultry, and salad dressing) gained the least weight. 14n a recent study of dietary quality characterized by established healthy diet indices (i.e., a Mediterranean-style diet, the Alternate Healthy Eating Index, and the Dietary Approaches to Stop Hypertension diet), higher or increasing adherence to any of these indices was associated with less weight gain in a given 4-year interval through midlife, with greater benefits observed in overweight women.15 (For additional information on dietary assessments in the NHS, please see Hu et al. in", + "In onestudy, vitamin D levels were inversely correlated with BMI(r = -0.22, p = 0.025), suggesting some potential benefitsfor individuals living with obesity, although this remainsto be investigated in a prospective study.14 For individuals living with obesity, an eight-week low-calorie dietprogramme supplemented with vitamin D led to a significant decrease in inflammatory markers, compared withthe same diet with a placebo supplement.15 Obesity isassociated with low plasma levels of 25-hydroxy-vitaminD, which can result from vitamin D deficiency.1619 Therefore, we compared body weights of the mice from eachgroup before and after treatment.", + "In all study cohorts, height and weight were measured wearing light clothing and no shoes, and BMI was calculated as weight divided by the square of the height (kg/m 2 ).Written informed consent was obtained from all participants and the research protocol was approved by the local human research ethics committees.", + "age-adjusted height residuals, cm.", + "explained by genes predisposing to obesity.The National Academy of Sciences-National Research Council (NAS-NRC) World War (WW) II Veteran Twin Registry of White male twin pairs 5 had their height, weight, and BP measured at the induction physical examination, which offers a unique opportunity to investigate the following questions: (1) the relative influence of genetic and environmental factors on height, weight, BMI, and BP (SBP and DBP); (2) the extent to which genetic and environmental influences on SBP and DBP are shared with those influencing BMI; (3) whether BMI has any modifying effect on genetic and environmental influences on SBP and DBP.What Is New? In the largest twin cohort with measured (rather than self-reported) weight, height, and blood pressure (BP), we investigated (1) the relative influence of genetic and environmental factors on height, weight, body mass index (BMI), and BP (systolic BP [SBP] and diastolic BP [DBP]); (2) the extent to which genetic and environmental influences on SBP and DBP are shared with those influencing BMI; (3) whether BMI has any modifying effect on genetic and environmental influences on SBP and DBP.", + "Over the last few decades, the adoption in Asian populations of western-style diets of increased fats and carbohydrates and of more sedentary habits has led to a marked increase in obesity (23,24).In particular, a cohort of women from the ongoing Cebu Longitudinal Health and Nutrition Survey (CLHNS) based in the Philippines showed a sixfold increase in prevalence of overweight and obesity associated with nearly two decades of substantial and continuing socioeconomic modernization (also illustrated by an increase in mean weight of 6.8 7.1 kg) (24).The portion of increased prevalence due to the changes in environment vs. increased age of these women is unclear.", + "In onestudy, vitamin D levels were inversely correlated with BMI(r = -0.22, p = 0.025), suggesting some potential benefitsfor individuals living with obesity, although this remainsto be investigated in a prospective study.14 For individuals living with obesity, an eight-week low-calorie dietprogramme supplemented with vitamin D led to a significant decrease in inflammatory markers, compared withthe same diet with a placebo supplement.15 Obesity isassociated with low plasma levels of 25-hydroxy-vitaminD, which can result from vitamin D deficiency.1619 Therefore, we compared body weights of the mice from eachgroup before and after treatment.", + "ResultsTable 2 displays anthropometric, biochemical and dietary characteristics of the study sample.The sample included 288 men and 383 women, with a mean age of 40.59 8 14.79 years.The individuals were on average overweight (BMI = 27.75 8 7.63) and their dietary fat intake represented 34.3% of daily energy intake.The results of significant interactions (p value ^ 0.01) are presented in table 3 and suggest that the majority of the SNPs that have been initially associated with T2DM at high levels of statistical significance in GWAS reports did not interact with dietary fat intake to influence either adiposity-or glucose homeostasis-related phenotypes.", + "Child weight parameters and accelerated shortening in childhoodObesity at different points in early childhood did not correlate with the rate of telomere attrition from 4 to 5 years of age (Table 4).Similarly, abdominal obesity did not correlate with the rate of telomere attrition (p = 0.65) (Table 4).", + "Diet significantly alters lifespan, not weight gain per seAuthor ManuscriptWe chose to focus on two time points for body weight analyses100 days on diet as a pointto evaluate early weight gain on HFD, and 400 days on diet, a stage that is close to themaximal weight on both diets. The mean weight of the population plateaus around 500 daysof age and declines thereafter on both diets.The consensus model highlighted a potential causal effect of diet on peak bodyweight measured relatively late in life (500 days), acting through circulating levels of totalNat Metab. Author manuscript; available in PMC 2022 March 22. Roy et al. Page 7Author Manuscriptand high-density lipoprotein cholesterol measured in the old-aged group (Extended Figure2). The Bayesian network analysis, as we structured it, failed to show any causality betweenserum metabolites and variability in lifespan.Early body weight gain associated with reduction in lifespanBody weight measured after 100 days on both diets also correlates negatively with lifespan,after adjusting for strain differences (Figure 3C), a one-gram increase now corresponding toa decrease of 4 days (p<0.0001, r = 0.22). Looking at change in body weight after 100 dayson diet, early body weight gain in response to HFD, but not CD, trended to be negativelycorrelated with lifespan, with a one-gram gain corresponding to a decrease of ~1.5 days (p =0.08, r = 0.06) (Figure 3D).(E) After 400 days on diet (~500 days age), body weight does not predictvariance in lifespan (see line labeled d in Panel A) (p = 0.63, r = 0.01) (n = 447 on CDand HFD). (F) Substantial weight change after prolonged HFD feedingdifference frombaseline to 400 days on diet (blue line)does not predict lifespan (p = 0.26, r = 0.02). (G)Strain-wise changes in median weight after 100 days on diets. Red points represent lifespansof cases on CD and blue points those on HFD. Lines represent median body weight (lefty axis).(C) Body weight after 100 dayson both diets (~260 days age) correlates negatively with lifespan (4 days/g, p <0.001, r =0.3, see line labeled c in Panel A) (n = 626 on CD, 665 on HFD). (D) Early weight changein response to HFD (blue line)the difference from baseline after 100 days on dietwasnegatively related with lifespan (4 days/g, p = 0.004, r = 0.1), but this is not true of casesremaining on CD.", + "Her father was 170 cm in height and grew significantly at the age of 14 years.Her mother was 153 cm in height with menarche at the age of 13 years.Her younger brother was 6 years old and maintained a height above the 50th percentile of the population with the same age and gender.There was no family history of diabetes mellitus or short stature.Her father is 172 cm, grew significantly at the age 15 years.Her mother is 158 cm and had menarche at age 14 years.Her elder brother is 22 years old and his height is 180 cm.There is no family history of diabetes mellitus or short stature.", + "Adult height is the result of both growth throughout childhood and loss of height during the aging process.We therefore assessed the influence of age on the 20 robust associations.We did not find any evidence that the effects on height were different in individuals o50 years compared to those aged 450 years (all P 4 0.01; similar results were obtained when we used a cut-off of 40 years of age), or when adjusting for age decade (see Supplementary Table 4 online).This suggests that the effects are predominantly on developmental and childhood growth rather than on processes involved in loss of height, although studies of more young adults and children are needed to confirm this.", + "IntroductionHeight, fat mass, and fat distribution differ substantially between men and women, and these differences may, in part, explain the sex-specific susceptibilities to certain diseases [1,2].A subtle sexual dimorphism in body composition is already apparent during childhood, and emerges more prominently during adolescence as boys start exceeding girls with regard to height and muscle mass, while girls accumulate more fat mass [3][4][5].These considerable differences in anthropometry may reflect sex-specific differences in steroid hormone regulation, adipogenesis, lipid storage, muscle metabolism, composition, and contractile speed, skeletal growth and maturation, or lipolysis, and suggest a genetic underpinning [1,2,[6][7][8][9][10]." + ], + [ + "To facilitate comparative research, the Alliance of GenomeResources provides an interface that allows users to searchfor and view genes, functional data, and disease associations from databases of the fly, mouse, rat, yeast, nematode,and zebra fish (http://www.alliancegenome.org, last access:3 January 2018). 3The mouse as a model animal for livestockresearchMice are mammals, sharing 92 to 95 % of protein coding genes with humans and other mammalian livestockspecies, such as cattle (Elsik et al. , 2009), pigs (Humphrayet al. , 2007), sheep (Iannuzzi et al. , 1999), and goats(Schibler et al. , 1998).", + "Sex ChromosomesSeveral studies have revealed high degrees of homology among autosomal chromosomes of bovids with similar banding patterns and gene order among the chromosome arms of cattle, river buffalo, sheep, and goats [14,15].Bovid sex chromosomes, unlike the highly similar autosomal chromosomes, share a slightly more complex rearrangement of sequences [5].Chromosome banding comparisons show that while large portions of these chromosomes are conserved, BBU-X has large blocks of constitutive heterochromatin that BTA-X lacks.Cytogenetic studies representing loci order on these sex chromosomes show complex rearrangements that may have occurred during the karyotype evolution of river buffalo and cattle.BBU-X and BTA-X share the same gene order but a different centromere position, indicating a centromere translocation event with the loss of constitutive heterochromatin in BTA-X, which differentiates it from BBU-X [5].Comparative FISH mapping shows the existence of a similar situation in river buffalo and cattle Y-chromosomes.BTA-Y and BBU-Y differ in an inversion including the centromere and breakage points in both arms (pericentric inversion) where BBU-Y is larger than BTA-Y and gains heterochromatin [5].Figure 1: At the cytogenetic level, water buffalo chromosomes can be matched to bovine chromosomes arm for arm.Each biarmed water buffalo chromosome is derived from the fusion of two bovine acrocentrics. (a) This shows the similar banding patterns for bovine chromosomes 29 and 16 to water buffalo chromosome 5 [22], (b) This shows similar banding patterns for bovine chromosome 12 and water buffalo chromosome 13 [22].", + "Second,it is possible to replicate experiments in reference cohorts (also known as referencepanels or reference populations), which is impossible in humans except for in cases ofmonozygotic twins. Third, it is easy to control the environment and model geneenvironment (GXE) interactions in mice [75]. Fourth, despite strong functional effects,the minor allele frequencies are often too low in the human population to attain sufficientstatistical power and significance in large association studies. In contrast, most of murinecrosses have been derived from two inbred strains, and as a result allele frequencies areclose to 0.5.", + "Figure S3.Chromosome karyotype of Bama miniature pig, Related to Figure 3. (A) Male and (B) Female.The examination of karyotype of Bama miniature pig by means of peripheral blood lymphocytes culture showed that the diploid chromosomes number was 38, 18 pairs of autosomes and one pair of sex chromosomes in both males (XY) and females (XX).The chromosomes were divided into four groups of a, b, c and d according to the standard of Reading Congress.The karyotype of the autosomes was 10sm+4st+10m+12t.The X chromosome was a metacentric chromosome whose length was between the 8 th and 9 th chromosome, while the Y chromosome was the smallest metacentric chromosome.Comparison of the BM genome with the human, and three common experimental animal (macaque, mouse, and dog), genomes unveiled three gene families, including ARF1 and IGHD, shared between the BM and human genomes but absent in macaque, mouse, and dog genomes (Figure S10).These genes may play roles in Alzheimer disease, pituitary dwarfism, and growth failure (from database ''Dis-GeNET'').The presence of these genes in the BM potentially facilitates research on the abovementioned diseases using this animal model.Moreover, BM has fewer unique genes compared with the Duroc (1,303 versus 1,531) (Figure S10), and the genes specific to BM were significantly enriched in the ''steroid hormone biosynthesis'' Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway (p = 0.00908), which is associated with sex hormone secretion, male testicles development, and rapid maturation of sperm.", + "Mice are evolutionarily relatively close to humans, and their size and short generationtime allows experiments to be set up and run with large enough numbers for statistical signicance. However, other types of model organisms such as zebrash[206] and worm[176] can oer unique advantagesover using rodents. While these organisms have a larger evolutionarydistance to humans, they are cheaper, faster and easier to breed and281.4. BIOINFORMATIC OPPORTUNITIEShave transparent bodies that are easy to dissect.[226]have shown that the conservation level between C. elegans and manis sucient to infer gene-gene interactions in man from worm data. Even though the global disease phenotypes may not be at all comparable, the molecular basis may be common (e.g. breast cancer andhigh male incidence of progeny). For example, research on stress response in C. elegans has provided detailed insight into the genetic andmolecular mechanisms underlying complex human diseases [294].", + "Even within mammals, where SrYis the gene responsible for testis determination,monotremata show a different multichromosomal sex determination mechanism (involving5X + 5Y chromosomes), and recently, threespecies in the rodent line have demonstrateda different switching gene to SrY (Graves andPeichel, 2010). The high conservation of sexdetermination within birds and mammals hasprobably to do with their high developmentFunctional Genomic Analysis of Sex Determination and Differentiation in Teleost Fishhomeostasis, including constant body temperature (Barske and Capel, 2008).", + "Drosophila melanogaster 240Xenopus laevis 600", + "Based on the branch-length valuesin Figure 6.1, a comparison of man and mouse has D = 0.63, but adding rat as athird species increases total D to 0.72. When calculating total D for an analysis, eachunique section of branch is counted only once, so rat adds only D = 0.086 to thetotal analysis; considerably more power could be added by using dog instead of, or inaddition to rat, as it would contribute D = 0.244 of unique branch length.", + "5C), being shared with ve other vertebrates,including dog (XM_848628), horse (XM_001916545), cow(NM_001099130), chimpanzee (XM_001150577), and human(NM_002202). This is also true of wild-derived subspecies andspecies of Mus, including Mus musculus musculus, Mus musculuscastaneus, and Mus spretus, which have also been sequenced aspart of the Mouse Genome Project. In marked contrast, 14 otherstrains of mice have the less conserved B allele. This suggeststhat the E-box is a regulatory element now widely propagated ina subset of laboratory mice, including B6/J.", + "Animal models have been widely used to study topicsthat could not be easily studied using human populations. In particular, rodent models such as those in mice havecontributed tremendously to our understanding of humangenetics and genomics. We will examine the sex similarityand dierence using data of whole genome gene expressionproles from a well-known mouse population of recombinant inbred (RI) strains derived from C57BL/6J andDBA/2J (BXD), which is the largest RI mouse populationand with remarkable data on whole genome expressionproles and phenotypes [1618].", + "They arenot more complex than mice or more deeply thoughtful than dogsor pigs, but they are incontrovertibly more similar biologically tohumans. This in itself is a reason for using these species in researchdespite the cost and emotional qualms their use engenders. A vocal wing of the legal community, led by Steven Wise,Laurence Tribe, and Alan Dershowitz, is exploring the idea thathumanity quotients can be assigned to life forms as diverse ashoneybees and chimpanzees, arguing that each species should beprovided with scaled legal protection.This scaling applies evento an organ such as brain that is considered unusually large inhumans; the brains of both mice and humans comprise roughly2% of total body mass, and in terms of neuron numbers mice areactually proportionally brainier than humans (approx 75 millionvs 100 billion neurons). Like humans, mice have significant bodysize sexual dimorphism; males typically weigh 3050% more thanfemales. The main advantage of small size is that a set of 810 animalscan be maintained in good health in a shoebox-sized cage.", + "Marsupial Sex Chromosomes and Sex DeterminationComparative sequencing, gene mapping, and chromosome painting between marsupials and eutherians, along with comparison with a chicken out-group, have revealed that the human X chromosome is made up of two ancient gene blocks, both of which are autosomal in chickens (Figure 6).One block, representing the marsupial X, is shared with approximately two-thirds of the eutherian X.A second block is also autosomal in marsupials and so must represent a region added to the X in a eutherian ancestor (44).Mapping the same genes in elephants shows that the fusion point of the ancient and added region corresponds to the centromere (109), suggesting an original Robertsonian fusion 160-105 Mya followed by a centric shift in the ancestor of non-afrotherian lineages.The lack of homology between the mammal XY and bird ZW sex chromosomes (81) and between the mammal XY and the varied systems of reptiles, frogs, and fish implies that the mammal XY system (and the SRY gene) arose later than 310 Mya (46).A much later date emerged from the surprising findings that the two gene blocks that make up the human XY pair are both autosomal in monotremes (platypus and echidna) and that the monotreme XY complex has homology instead to the bird ZW (133).This dates the emergence of the therian sex chromosomes and SRY at 166-160 Mya.Marsupials, like eutherians, normally have an XX female:XY male chromosomal sex determination or some simple variant [X 1 X 2 Y and XY 1 Y 2 systems, in which an autosome has become fused to the X or Y chromosome, are quite common in marsupials (49)].The X chromosome is smaller than the highly conserved 5% of the eutherian genome, and the basic Y chromosome is minute.The X and Y chromosomes do not undergo homologous pairing over a pseudoautosomal region in marsupials, which is a requirement for fertility in mice and humans.Instead, pairing makes use of a proteinaceous basal plate to which the X and Y are attached during meiosis and from which they segregate (33).", + "Based on the branch-length valuesin Figure 6.1, a comparison of man and mouse has D = 0.63, but adding rat as athird species increases total D to 0.72. When calculating total D for an analysis, eachunique section of branch is counted only once, so rat adds only D = 0.086 to thetotal analysis; considerably more power could be added by using dog instead of, or inaddition to rat, as it would contribute D = 0.244 of unique branch length.", + "Taking the most conservative estimate, Comparison of genome wide studies in vertebrates and flies" + ], + [ + "The neuronal networks formed by this largenumber of massively interconnected neurons generate complex spatiotemporal patterns ofneuronal activity that require coordinated activity across large populations of neurons usingboth short- and long-range synaptic connections. On an even larger scale, the mammalianbrain is composed of many structurally diverse networks, including the neocortex, thalamus,basal ganglia etc. Healthy brains are characterized by the continuous generation of behaviorrelated spatiotemporal activity patterns that propagate across multiple brain areas.", + "To retaingenes that are more active when the brain is still undergoing corematurational processes in humans, we used BrainSpan to select autosomaltranscripts expressed at least 1.5-fold more during the early postnataldevelopment (018 months after birth) than in adulthood (2040 years ofage), with the nal networks consisting of 154 genes in the PFC (seeTable S4) and 72 genes in the NAcc (see Table S5).", + "Heath: Do you have a hypothesized mechanism by which you get from earlyseparation to altered response when you are genetically vulnerable? Battaglia: Yes, one mechanism might have to do with the cholinergic system:intense stress causes some alternative splicing of acetylcholine esterase (Kaufer etal 1998). This has been found to be protective for the brain. One of our hypotheseshowever is that the same mechanism which may be protective for the higher braincan be a risk factor for the lower brain, for instance the medulla, because it mayenhance sensitivity to suffocatory stimuli (Battaglia & Ogliari 2005).", + "The rapidly expanding set of inference engines currently has 5 interrelated modules: BrainParts (gray matter regions, major fiber tracts, and ventricles),Cell Types, Molecules, Connections (between regions and celltypes), and Relations (between parts identified different neuroanatomical atlases). Nature Precedings : doi:10.1038/npre.2009.4000.1 : Posted 23 Nov 2009A genome-wide, 3-dimensional map of gene expression inthe adult mouse brain, the ABA reveals the expression patterns of approximately 20 000 genes throughout the adultmouse brain to the cellular level.", + "Furthermore, it was suggested that thebrain is prone to hyperactivity, and this hyperactive tone is down-regulated by brainregions and neurotransmitter systems that decrease the PA (Rowland 1998; Viggiano2008). It was proposed that the reticular activating system is responsible for arousalwhile cerebral cortex is mostly inhibitory (Rowland 1998). All in all, the interplay ofthe activating and inhibiting systems serves the purpose of maintaining so calledsensoristasis (term created by Schultz in 1965) which is an optimal level of sensorystimulation of the nervous system for each individual (Rowland 1998).", + "In the brain, more than any other organ, function followsform, he says. Cellular resolution of expression patterns will provenecessary to uncover as yet unknown relationships betweencircuitry, cell type, and gene expression in the brain, saysArthur Toga, a neuroscientist at the University of California,Los Angeles, and Allen Brain Atlas advisor. Ed Lein, aneuroscientist at the Allen Brain Institute, thinks thatmapping at the cellular scale will also redene anatomy. Traditionally, neuroanatomists have delineated brain regionspretty much by eye, identifying clusters of cells and patternsof connections that look the same.", + "Sensory, motor, and cognitive functionsrely on the signaling dynamics of integrated circuitry that isestablished during brain development. The CNS develops in asequence of events characterized by an initial stage of neurogenesis and migration prenatally, followed by an extended lateprenatal and postnatal period of neuronal and glial differentiation, establishment of synaptic connections, and renement ofthe integrated circuits [Levitt, 2003]. In humans, neurogenesisstarts at 8 weeks of gestation and, in some areas of the brain, suchas the frontal lobe, maturation of CNS circuitry continues wellinto adolescence.", + "To retaingenes that are more active when the brain is still undergoing corematurational processes in humans, we used BrainSpan to select autosomaltranscripts expressed at least 1.5-fold more during the early postnataldevelopment (018 months after birth) than in adulthood (2040 years ofage), with the nal networks consisting of 154 genes in the PFC (seeTable S4) and 72 genes in the NAcc (see Table S5).", + "Because brain tissue from individuals in these cohorts is not accessible during their lives, many studies use postmortem tissue to identify molecules (e.g. , RNA, protein) that are associatedwith performance on cognitive tests prior to death. However, in these studies, brain regions used to examine the molecularmediators of resilience are typically selected based on already-known involvement in disease risk (e.g. , hippocampus andprefrontal cortex). It is possible that molecular changes that confer resilience originate in brain regions outside those classically affected in AD and are thus not typically selected for analyses.In addition, mechanisms and molecules important forresilience are likely expressed and act well before the time at which these tissues can be accessed. This lack of access tobrain tissue early in the disease course is a signicant barrier to understanding the molecules most closely associated withthe onset of resilience (and/or dementia). In addition, the ability to test mechanistic hypotheses is generally limited in humanpopulations, as the identication of molecules associated with cognitive outcomes is largely correlative [81]. Other potentialcaveats and considerations, such as the limitations associated with retrospective group assignments, have been highlighted elsewhere [81].", + "It is a remarkablycomplex organ that integrates electrochemical signals, it receives and coordinatesactivities throughout the entire body. Despite the nonreplicative properties of thenervous system cells, it is proposed that through evolutionary pressures, thiscompartment has acquired distinct processes and mechanisms to minimizeneurodegeneration. One potential source of damage comes from our immunesystem, which has the capacity to scan the CNS and periphery for the presence offoreign antigens. The immune system is equipped with numerous effectorsmechanisms and can greatly alter the homeostasis and function of the CNS.", + "In the brain, more than any other organ, function followsform, he says. Cellular resolution of expression patterns will provenecessary to uncover as yet unknown relationships betweencircuitry, cell type, and gene expression in the brain, saysArthur Toga, a neuroscientist at the University of California,Los Angeles, and Allen Brain Atlas advisor. Ed Lein, aneuroscientist at the Allen Brain Institute, thinks thatmapping at the cellular scale will also redene anatomy. Traditionally, neuroanatomists have delineated brain regionspretty much by eye, identifying clusters of cells and patternsof connections that look the same.", + "The brain is the master organ of the body.It controls all other functions either directly or indirectly.The brain has two major types of cells, the neurons and glial cells.It is known that neurons, once differentiated are nondividing, and even in glial cells only a small fraction of them are dividing in adult and old ages (Korr, 1980).Thus it can be considered that most of the cells in an adult brain are postmitotic.Further, in the majority of the species the final number of differentiated neurons is reached very early in life (Dobbing, 1971) and therefore a neuron's life-span is almost equal to that of the whole animal.Considering the high metabolic activity in a neuronal cell, it must be of great necessity and importance to maintain the genomic integrity over a long period of time in order to keep up the fidelity of the cellular processes.Thus the processes of genomic damage and its repair assume special significance in nervous tissue.", + "Because brain tissue from individuals in these cohorts is not accessible during their lives, many studies use postmortem tissue to identify molecules (e.g. , RNA, protein) that are associatedwith performance on cognitive tests prior to death. However, in these studies, brain regions used to examine the molecularmediators of resilience are typically selected based on already-known involvement in disease risk (e.g. , hippocampus andprefrontal cortex). It is possible that molecular changes that confer resilience originate in brain regions outside those classically affected in AD and are thus not typically selected for analyses.", + "The brain is responsible for cognition, behavior, and much of what makes us uniquely human.The development of the brain is a highly complex process, and this process is reliant on precise regulation of molecular and cellular events grounded in the spatiotemporal regulation of the transcriptome.Disruption of this regulation can lead to neuropsychiatric disorders.RATIONALE: The regulatory, epigenomic, and transcriptomic features of the human brain have not been comprehensively compiled across time, regions, or cell types.Understanding the etiology of neuropsychiatric disorders requires sights into human development and disease.The brain is responsible for cognition, behavior, and much of what makes us uniquely human.The development of the brain is a highly complex process, and this process is reliant on precise regulation of molecular and cellular events grounded in the spatiotemporal regulation of the transcriptome.Disruption of this regulation can lead to neuropsychiatric disorders.RATIONALE: The regulatory, epigenomic, and transcriptomic features of the human brain have not been comprehensively compiled across time, regions, or cell types.Understanding the etiology of neuropsychiatric disorders requires sights into human development and disease.", + "Nonetheless, several CNS measures,including behavioral phenotypes, correlated with both brainand blood 5-HT traits. One conspicuous correlation with 5-HTturnover was found for brain weight, hippocampal weightand hippocampal stem cell number. Studies that manipulate5-HT turnover independent of CNS 5-HT turnover are neededto determine whether these correlations reflect a result of 5HT homeostasis per se vs. a shared determinant that may inearly life impact the trajectory of brain development (Bonninet al. 2007; Janusonis et al. 2004; Mazer et al. 1997).", + "In vitro,for example, blocking extrasynaptic NMDAtype glutamate receptors prevents amyloid-induced DSBs in neuronal cultures. In theaggregate, these data are all well controlledand very convincing. And yet, we find ourselves asking, How canthis possibly be? Do the neurons of our brainreally do serious damage to their genome everytime we execute a mental task? If 2 h of thinkingis enough to trigger DSBs in even a small percentage of our nerve cells, then each cell mustput its genome in jeopardy many times over thecourse of a year.", + "Nonetheless, several CNS measures,including behavioral phenotypes, correlated with both brainand blood 5-HT traits. One conspicuous correlation with 5-HTturnover was found for brain weight, hippocampal weightand hippocampal stem cell number. Studies that manipulate5-HT turnover independent of CNS 5-HT turnover are neededto determine whether these correlations reflect a result of 5HT homeostasis per se vs. a shared determinant that may inearly life impact the trajectory of brain development (Bonninet al. 2007; Janusonis et al. 2004; Mazer et al. 1997).", + "Although neuroimaging techniques unveil certain facets of CNSstructure and function, the human brains molecular profile is only attainable throughexamination of postmortem tissue. Many of the characteristics of the human brain may notbe conserved across species, emphasizing the inherent value of postmortem human tissue forinterrogating neuropsychiatric disorders (Hynd et al. 2003; Sutherland et al. 2016). Further,high-resolution maps for gene expression of the human brain across developmental periods,combined with separate genetic and proteomic datasets, can reveal potential neurobiologicalpathways and circuits underlying disease (Parikshak et al. 2013; Willsey et al. 2013). Addict Biol." + ], + [ + "Interestingly, fMRI studies in which personalized stressful imagery relating to pastexperiences was shown to healthy subjects, revealed significant increases in activationof the medial prefrontal cortex (mPFC), anterior cingulate, striatum, substantia nigra,thalamus, caudate, putamen, and hippocampus (Sinha, 2004), indicating a role for thestriatal-limbic-prefrontal circuits in response to emotional distress. In support of this,13eQTL mapping studies in the Miles laboratory implicate genes within the nucleusaccumbens (NAc) as potential modulators of anxiolytic-like phenotypes following acuteethanol administration (Putman, 2008; Wolen, 2012).", + "Proc Natl Acad Sci USA103:780785123242LeDoux JE (2000) Emotion circuits in the brain. Annu RevNeurosci 23:155184Lee GP, Meador KJ, Loring DW, Allison JD, Brown WS, PaulLK, Pillai JJ, Lavin TB (2004) Neural substrates of emotionas revealed by functional magnetic resonance imaging. CogBehav Neurol 17:917Li CX, Wei X, Lu L, Peirce JL Williams RW, Waters RS (2005)Genetic analysis of barrel field size in the first somatosensory area (S1) in inbred and recombinant inbred strains ofmice.J Neurosci 21:35033514McGaugh JL (2004) The amygdala modulates the consolidationof memories of emotionally arousing experiences. AnnuRev Neurosci 27:128McIntyre CK, Power AE, Roozendaal B, McGaugh JL (2003)Role of the basolateral amygdala in memory consolidation. Ann NY Acad Sci 985:273293Meyer-Lindenberg A, Buckholtz JW, Kolachana B, Hariri AR,Pezawas L, Blasi G, Wabnitz A, Honea R, Verchinski B,Callicott JH, Egan M, Mattay V, Weinberger DR (2006)Neural mechanisms of genetic risk for impulsivity andviolence in humans.Somatosens Mol Res 22:141150Lin CH, Hansen S, Wang Z, Storm DR, Tapscott SJ, Olson JM(2005) The dosage of the neuroD2 transcription factorregulates amygdala development and emotional learning. Proc Natl Acad Sci USA 102:1487714882Ling EA, Paterson JA, Privat A, Mori S, Leblond CP (1973)Investigation of glial cells in semithin sections. I. Identification of glial cells in the brain of young rats. J CompNeurol 149:4371Lu L, Airey DC, Williams RW (2001) Complex trait analysis ofthe hippocampus: mapping and biometric analysis of twonovel gene loci with specific effects on hippocampal structure in mice.", + "When attachments form in early infancy, activation and closer links are observed among neurobiological brain systems underpinning affiliation, reward, and stress management (Ulmer-Yaniv et al., 2016).Functional magnetic resonance imaging (fMRI) has been used to investigate the brain activity associated with humans' various social attachments (Feldman, 2017).These fMRIs provide evidence for three main inter-connected neural systems that integrate to establish, maintain, and enhance our attachments to others, including the rewardmotivation system (Berridge and Robinson, 1998), the embodied simulation/empathy network (Gallese, 2014), and mentalizing processes (Frith and Frith, 2006).The reward-motivation system comprises the striatum (nucleus accumbens, caudate, and putamen), amygdala, ventral tegmental area, orbitofrontal cortex, ventromedial prefrontal cortex, and anterior cingulate cortex (ACC).The existence of convergent projections from the cortex to the striatum, along with hippocampal and amygdala-striatal projections, places the striatum as a central entry port for processing emotional/motivational information supporting human attachment (Haber and Knutson, 2010;Robinson et al., 2012;Pauli et al., 2016).The reward-motivation system employs DA and oxytocin rich pathways (Schultz, 2000;Berridge et al., 2009;Haber and Knutson, 2010) and supports multiple attachment-related motivational behaviors, such as social orienting, social seeking, and maintaining contact (Acevedo et al., 2012;Chevallier et al., 2012).Attachments have an intrinsic motivational value that combine immediate hedonic responses with approach motivation, goal-directed behavior, and learning (Berridge and Robinson, 1998).", + "Genetics of emotional regulation: therole of the serotonin transporter in neural function. Trends CognSci 10: 182191. Hefner K, Holmes A (2007). Ontogeny of fear-, anxiety- anddepression-related behavior across adolescence in C57BL/6Jmice. Behav Brain Res 176: 210215. Herman JP, Ostrander MM, Mueller NK, Figueiredo H (2005). Limbic system mechanisms of stress regulation: hypothalamopituitaryadrenocortical axis. Prog Neuropsychopharmacol BiolPsychiatry 29: 12011213. Herry C, Bach DR, Esposito F, Di Salle F, Perrig WJ, Scheffler Ket al (2007). Processing of temporal unpredictability in humanand animal amygdala. J Neurosci 27: 59585966.Nat Neurosci 8: 828834. Phelps EA, LeDoux JE (2005). Contributions of the amygdala toemotion processing: from animal models to human behavior. Neuron 48: 175187. Porsolt RD, Bertin A, Jalfre M (1978). Behavioural despair in ratsand mice: strain differences and the effects of imipramine. Eur JPharmacol 51: 291294. Quirk GJ, Mueller D (2007). Neural mechanisms of extinctionlearning and retrieval. Neuropsychopharmacology 33: 5672. Radley JJ, Rocher AB, Miller M, Janssen WG, Liston C, Hof PR et al(2006). Repeated stress induces dendritic spine loss in the ratmedial prefrontal cortex. Cereb Cortex 16: 313320.", + "Like most, if not all, neural phenotypes, stress,anxiety, and other emotion-related traits are extremely complex and are defined by theinterplay of multiple genetic, environmental, experiential, and epigenetic factors. The work presented in this dissertation is a multi-scalar, integrative analysis of themolecular and neuroanatomic substrates that underlie emotion-related behavior. Theamygdala is a principle component of the limbic system that controls emotionality. UsingBXD recombinant inbred (RI) mice as model organisms, the anatomy and cellulararchitecture of the amygdalaspecifically, the basolateral amygdala (BLA)wasexamined to assess the level of structural variation in this brain region.To this end, weemployed a repeated restraint stress protocol that has been shown to cause significantalterations in the morphology of principal neurons in three key corticolimbic regionsknown to mediate the effects of stress: medial prefrontal cortex (mPFC), amygdala andhippocampus (e.g. , Vyas et al. , 2002; Vyas et al. , 2003; Govindarajan et al. , 2006;Bennur et al. , 2007; Shansky et al. , 2009).Sensory inputs with emotional components aretransmitted to the amygdala where they are processed and further relayed to other regionsto modulate autonomic and behavioral responses, and to form emotional memories(LeDoux, 2000; Rosen, 2004). As a neural substrate of emotionality, manyneuropsychiatric disorders have been associated with structural changes in the amygdala. Individuals with genetically predisposed susceptibility to anxiety and depression havebeen shown to have higher amygdala reactivity and smaller amygdala volumes (Pezawaset al. , 2005). Structural changes in the amygdala have also been associated with traumaticstress disorder, bipolar disorder, and aggressive behavior (Hayek et al.These studies have uncovered complex geneticsunderlying behavior with multiple loci modulating stress responsiveness, fear, andanxiety levels in mice (Willis-Owen & Flint, 2006). Significant concordance betweenhuman and mouse traits also exists, e.g. , the gene Rgs2 was shown to modulate anxiety inmice, and was subsequently found to be associated with anxiety in human (Yalcin et al. ,2004; Smoller et al. , 2008). 1.3Neuroanatomy of Stress and Emotion-Related BehaviorThe amygdala is a neural structure central to the experience of emotions andexpression of emotional behavior.", + "Alterations in BDNF expression werealso found in response to emotions such asanxiety or fear in rodents (Rasmusson et al. ,2002), and it has been shown that BDNF affects emotional preferences in humans (Gasicet al. , 2009). It remains to be determined howthe stress itself or the associated behavioral responses contribute to mediating these changes. From this perspective, sh, whose brain organization is very similar to that of higher vertebrates, but is generally considered free ofemotional reactions, is interesting as an animalmodel of stress.", + "enhance our attachments to others, including the rewardmotivation system (Berridge and Robinson, 1998), the embodiedsimulation/empathy network (Gallese, 2014), and mentalizingprocesses (Frith and Frith, 2006). The reward-motivationsystem comprises the striatum (nucleus accumbens, caudate,and putamen), amygdala, ventral tegmental area, orbitofrontalcortex, ventromedial prefrontal cortex, and anterior cingulatecortex (ACC). The existence of convergent projections fromthe cortex to the striatum, along with hippocampal andamygdala-striatal projections, places the striatum as a centralentry port for processing emotional/motivational informationsupporting human attachment (Haber and Knutson, 2010;Robinson et al. , 2012; Pauli et al. , 2016).We focused on theprefrontal cortex due to its association with cognitive, emotionalfunctions, impulse control, and adaptive behaviors (Morecraftand Yeterian, 2002; Bechara and Van Der Linden, 2005), andthe striatum for its involvement in the reward motivation systemand potential to relate to attachment formation in infancyspecifically (Feldman, 2017). Convergent projections from thecortex to the striatum, along with hippocampal and amygdalastriatal projections, places the striatum as a central entry portfor processing emotional/motivational information supportinghuman attachments (Haber and Knutson, 2010; Robinson et al. ,2012; Pauli et al. , 2016; Feldman, 2017).", + "Central nervous system regions that regulate mood,emotion, feeding and reward are prominent sites of 5-HTsynthesis and release (Steinbusch 1981). As such, a numberof disorders have been reported to display alterations in CNS5-HT homeostasis, including anxiety, depression, obsessivecompulsive disorder and addiction (Barondes 1994). Althougha subject of significant debate (Risch et al. 2009), biochemicaland genetic evidence continues to drive consideration thatrisk for depression in some individuals may be linked to alimited capacity for normal brain 5-HT signaling.", + "enhance our attachments to others, including the rewardmotivation system (Berridge and Robinson, 1998), the embodiedsimulation/empathy network (Gallese, 2014), and mentalizingprocesses (Frith and Frith, 2006). The reward-motivationsystem comprises the striatum (nucleus accumbens, caudate,and putamen), amygdala, ventral tegmental area, orbitofrontalcortex, ventromedial prefrontal cortex, and anterior cingulatecortex (ACC). The existence of convergent projections fromthe cortex to the striatum, along with hippocampal andamygdala-striatal projections, places the striatum as a centralentry port for processing emotional/motivational informationsupporting human attachment (Haber and Knutson, 2010;Robinson et al. , 2012; Pauli et al. , 2016).We focused on theprefrontal cortex due to its association with cognitive, emotionalfunctions, impulse control, and adaptive behaviors (Morecraftand Yeterian, 2002; Bechara and Van Der Linden, 2005), andthe striatum for its involvement in the reward motivation systemand potential to relate to attachment formation in infancyspecifically (Feldman, 2017). Convergent projections from thecortex to the striatum, along with hippocampal and amygdalastriatal projections, places the striatum as a central entry portfor processing emotional/motivational information supportinghuman attachments (Haber and Knutson, 2010; Robinson et al. ,2012; Pauli et al. , 2016; Feldman, 2017).", + "(2010)Genetic control over the resting brain. Proc Natl Acad Sci U S A 107, 12231228. Glasser, M.F. , Smith, S.M. , Marcus, D.S. , Andersson, J.L.R. , Auerbach, E.J. , Behrens, T.E.J. ,Coalson, T.S. , Harms, M.P. , Jenkinson, M., Moeller, S., Robinson, E.C. , Sotiropoulos, S.N. ,Xu, J., Yacoub, E., Ugurbil, K. & Van Essen, D.C. (2016) The Human Connectome Projectsneuroimaging approach. Nat Neurosci 19, 11751187. Gracia-Rubio, I., Moscoso-Castro, M., Pozo, O.J. , Marcos, J., Nadal, R. & Valverde, O. (2016)Maternal separation induces neuroinflammation and long-lasting emotional alterations inmice.", + "He and othersconsidered that hormones released by stressful experiences couldenhance memory consolidation, indicating particularly the hormonesepinephrine and glucocorticoids as memory modulators (McGaugh &Roozendaal, 2009). It was suggested that several brain regions work insynergy to assure that emotionally significant experiences are wellremembered, and this could prepare the organism for future experiences by inducing long-term behavioral changes (Bahtiyar, Karaca,Henckens, & Roozendaal, 2020; McGaugh, 2013).", + "Central nervous system regions that regulate mood,emotion, feeding and reward are prominent sites of 5-HTsynthesis and release (Steinbusch 1981). As such, a numberof disorders have been reported to display alterations in CNS5-HT homeostasis, including anxiety, depression, obsessivecompulsive disorder and addiction (Barondes 1994). Althougha subject of significant debate (Risch et al. 2009), biochemicaland genetic evidence continues to drive consideration thatrisk for depression in some individuals may be linked to alimited capacity for normal brain 5-HT signaling." + ] + ], + "task_id": [ + "6D733CABEB70E4DBF150EAAFFED6C973", + "6267E2FEFF0332F88C2294C8F32C1FC1", + "3FFA45D7124495B37B6F7F2B7B780AF3", + "499C63633BB95DE93DC3A89615496443", + "405240F6F75C3927C1088287E19920AD" + ] +}
\ No newline at end of file diff --git a/gnqa/data/study1/datasets/human/dataset_citizenscientist_general_7.json b/gnqa/data/study1/datasets/human/dataset_citizenscientist_general_7.json new file mode 100644 index 0000000..53a2a37 --- /dev/null +++ b/gnqa/data/study1/datasets/human/dataset_citizenscientist_general_7.json @@ -0,0 +1,40 @@ +{ + "question": [ + "what hormones do our brains release during stressful experiences?", + "what is the use of corticosterone?" + ], + "answer": [ + "During stressful experiences, our brains release hormones such as epinephrine, glucocorticoids, and corticosterone. The hypothalamic-pituitary-adrenal (HPA) axis also plays a significant role in this process, stimulating the release of these hormones.", + "Corticosterone, a type of glucocorticoid, is used in the body's response to stress. It initiates metabolic and physiological responses that help the body cope with stressors. It also provides negative feedback to inhibit activity in the hypothalamic-pituitary-adrenal (HPA) axis, which helps to dampen stress-activated defense reactions, including immune responses. Additionally, corticosterone can influence the expression of certain cytokines, potentially having anti-inflammatory effects." + ], + "contexts": [ + [ + "40. Roozendaal B, McEwen BS, Chattarji S. Stress, memory and theamygdala. Nat Rev Neurosci 10: 426 433, 2009. 41. Sala M, Perez J, Soloff P, Ucelli di Nemi S, Caverzasi E, Soares JC,Brambilla P. Stress and hippocampal abnormalities in psychiatric disorders. Eur Neuropsychopharmacol 14: 393405, 2004. 42. Shiba T, Kakuda S, Ishiguro M, Morita I, Oka S, Kawaski T,Wakatsuki S, Kato R. Crystal structure of GlcAT-S, a human glucuronyltransferase, involved in the biosynthesis of the HNK-1 carbohydrateepitope. Proteins 65: 499 508, 2006. 43.", + "Sci. 132.ter Heegde, F., De Rijk, R.H., Vinkers, C.H. , 2015. The brain mineralocorticoid receptorand stress resilience. Psychoneuroendocrinology 52, 92110. ter Horst, J.P., van der Mark, M.H. , Arp, M., Berger, S., de Kloet, E.R. , Oitzl, M.S. , 2012. Stress or no stress: mineralocorticoid receptors in the forebrain regulate behavioraladaptation. Neurobiol. Learn. Mem. 98, 3340. van Leeuwen, N., Bellingrath, S., de Kloet, E.R. , Zitman, F.G., DeRijk, R.H., Kudielka,B.M. , Wust, S., 2011. Human mineralocorticoid receptor (MR) gene haplotypesmodulate MR expression and transactivation: implication for the stress response. Psychoneuroendocrinology 36, 699709. Waterham, H.R.", + "Acute stress and dexamethasone rapidly increase hippocampal somatostatin synthesis and release fromthe dentate gyrus hilus. Hippocampus 11, 469477. Aubry, J.-M., Bartanusz, V., Jezova, D., Belin, D., Kiss, J.Z. , 1999. Single stress induceslong-lasting elevations in vasopressin mRNA levels in CRF hypophysiotrophic neurones, but repeated stress is required to modify AVP immunoreactivity. J.Neuroendocrinol. 11, 377384. Baker, J.A. , Li, J., Zhou, D., Yang, M., Cook, M.N. , Jones, B.C. , Mulligan, M.K. , Hamre,K.M. , Lu, L., 2017. Analyses of dierentially expressed genes after exposure to acutestress, acute ethanol, or a combination of both in mice.", + "Chronic unpredictable stress beforepregnancy reduce the expression of brain-derived neurotrophic factor and N-methyl-D-aspartate receptor inhippocampus of offspring rats associated with impairment of memory. Neurochem Res 35, 1038-49. Konig, P., Dedio, J., Oess, S., Papadakis, T., Fischer, A., Muller-Esterl, W. and Kummer, W., 2005. NOSIP and itsTinteracting protein, eNOS, in the rat trachea and lung. J Histochem Cytochem 53, 155-64. Longo, A., Oberto, A., Mele, P., Mattiello, L., Pisu, M.G. , Palanza, P., Serra, M. and Eva, C., 2015. NPY-Y1IPcoexpressed with NPY-Y5 receptors modulate anxiety but not mild social stress response in mice.Gynecol Endocrinol 12,61-7. MASajdyk, T.J., Schober, D.A. and Gehlert, D.R. , 2002. Neuropeptide Y receptor subtypes in the basolateral nucleusof the amygdala modulate anxiogenic responses in rats. Neuropharmacology 43, 1165-72. Shan, T., Ma, J., Ma, Q., Guo, K., Guo, J., Li, X., Li, W., Liu, J., Huang, C., Wang, F. and Wu, E., 2013.beta2-AR-HIF-1alpha: a novel regulatory axis for stress-induced pancreatic tumor growth and angiogenesis. DCurr Mol Med 13, 1023-34. TEShen, Q., Wang, X., Chen, Y., Xu, L., Wang, X. and Lu, L., 2009.", + "CRH-expressing neurons located in the PVN of the hypothalamus play a pivotal role in orchestrating thecentral stress response. CRH stimulates the release of ACTH from the anterior pituitary gland. In turn, ACTH acts on the adrenal cortex to increase the production andrelease of GC hormones. Proper functioning of all of these neurons is essential formaintaining a homeostatic state following a stressful event. Several neuronal pathways modulate HPA axis activity. For example, the hippocampus and prefrontalcortex inhibit the HPA axis, and the amygdala and monoaminergic input from thebrainstem stimulate CRH production by PVN neurons.Norepinephrinergic activation leadsto afferent catecholaminergic signaling from the NTS and ventrolateral medulla tothe periventricular nucleus (PVN) [52], which in turn activates the hypothalamicpituitaryadrenal (HPA) axis. Thus, activation of stress response systems is initiated, which assists in directing mood and behavior. Blood-borne cytokines are also known to produce a central response by acting oncytokine receptors within the brain. However, given their size and hydrophilicnature, trafcking to the brain was once considered to be isolated to circumventricular organs. In 1983, Blatteis et al.They havedetermined that psychological stress results in decreased hippocampal neurogenesisalongside signicant increases in hippocampal IL-1b protein, depressive-likebehavior, and ACTH and corticosterone release [169, 170]. The importance of IL-1in producing these behavioral and HPA axis outcomes was conrmed both genetically using IL-1r knockout mice and pharmacologically using IL-1ra. In IL-1R1knockout mice stress-induced brain, behavioral and endocrine perturbationsobserved in the wild-type mice were all attenuated [169, 170]. More importantly,these perturbations were abolished if wild-type mice were adrenalectomized [170].Koo JW, Duman RS (2008) IL-1beta is an essential mediator of the antineurogenic and anhedonic effects of stress. Proc Natl Acad Sci USA 105:751756118. Tanapat P, Hastings NB, Rydel TA, Galea LA, Gould E (2001) Exposure to fox odor inhibitscell proliferation in the hippocampus of adult rats via an adrenal hormone-dependent mechanism. J Comp Neurol 437:496504119. Malberg JE, Duman RS (2003) Cell proliferation in adult hippocampus is decreased by inescapable stress: reversal by uoxetine treatment. Neuropsychopharmacology 28:15621571120.RecentProg Horm Res 43:113173143. Rivier CL, Plotsky PM (1986) Mediation by corticotropin releasing factor (CRF) of adenohypophysial hormone secretion. Annu Rev Physiol 48:475494144. Jones MT, Gillham B (1988) Factors involved in the regulation of adrenocorticotropic hormone/beta-lipotropic hormone. Physiol Rev 68:743818145. Sapolsky RM, Romero LM, Munck AU (2000) How do glucocorticoids inuence stressresponses? Integrating permissive, suppressive, stimulatory, and preparative actions. EndocrRev 21:5589146. De Kloet ER, Vreugdenhil E, Oitzl MS, Joels M (1998) Brain corticosteroid receptor balancein health and disease. Endocr Rev 19:269301147. McEwen BS, Stellar E (1993) Stress and the individual.The HPA axis mediates theneuroendocrine response to stressors, both systemic stressors that threaten homeostasis and/or survival and perceived threats or psychogenic stressors [141]. Inputs tothe HPA axis provided by stressors and the endogenous circadian rhythm [142] actthrough central neural pathways to the paraventricular nuclei (PVN) of the hypothalamus, where CRH is synthesized. CRH (potentiated by arginine vasopressin[AVP]) [143, 144] stimulates the release of adrenocorticotropin (ACTH) from theanterior pituitary, which in turn stimulates synthesis and secretion of glucocorticoids from the adrenal cortex.", + "(2007). Corticotropin releasing hormonereceptor alterations elicited byacute and chronic unpredictablestressor challenges in stressorsusceptible and resilient strainswww.frontiersin.orgof mice. Behav. Brain Res. 181,180190. Bartels, A., and Zeki, S. (2004). The neural correlates of maternaland romantic love. Neuroimage 21,11551166. Bennett, H. A., Einarson, A., Taddio, A., Koren, G., and Einarson, T. R. (2004). Prevalence ofdepression during pregnancy: systematic review. Obstet. Gynecol. 103,698709. Beuzen, A., and Belzung, C. (1995). Link between emotional memoryand anxiety states: a study by principal component analysis. Physiol. Behav. 58, 111118.", + "Yun SJ, Park HJ, Yeom MJ, Hahm DH, Lee HJ, et al. (2002) Effect ofelectroacupuncture on the stress-induced changes in brain-derived neurotrophicfactor expression in rat hippocampus. Neurosci Lett 318: 8588. 31. Bousios S, Karandrea D, Kittas C, Kitraki E (2001) Effects of gender and stresson the regulation of steroid receptor coactivator-1 expression in the rat brain andpituitary. J Steroid Biochem Mol Biol 78: 401407. 32. Thome J, Pesold B, Baader M, Hu M, Gewirtz JC, et al. (2001) Stressdifferentially regulates synaptophysin and synaptotagmin expression in hippocampus. Biol Psychiatry 50: 809812. 33.", + "Vyas A, Jadhav S, Chattarji S (2006) Prolonged behavioral stress enhances synaptic connectivity in the basolateral amygdala. Neuroscience 143:387393. Wellman CL, Izquierdo A, Garrett JE, Martin KP, Carroll J, Millstein R, LeschKP, Murphy DL, Holmes A (2007) Impaired stress-coping and fear extinction and abnormal corticolimbic morphology in serotonin transporter knock-out mice. J Neurosci 27:684 691. Wiedholz LM, Owens WA, Horton RE, Feyder M, Karlsson RM, Hefner K,Sprengel R, Celikel T, Daws LC, Holmes A (2008) Mice lacking the AMPAGluR1 receptor exhibit striatal hyperdopaminergia and schizophreniarelated behaviors. Mol Psychiatry 13:631 640.", + "Crhbp regulates the activity of CRH (corticotropin releasinghormone), a stress hormone in the HPA axis (Westphal andSeasholtz, 2006), and is upregulated following stress (McClennenet al. , 1998). The HPA axis has a complex relationship with learning and memory- transient activation results in enhancement oflearning and memory (de Kloet et al. , 1999), while persistent activation levels results in cognitive deficit (de Kloet et al. , 2005). Enoch et al. reported that dense whole genome linkage scan ofhippocampal activation assessed by EEG resulted in a linkagepeak containing crhbp (Enoch et al. , 2008).", + "First, the CRH or corticotrophin releasing factor(CRF) is the principal regulator of the stress response, whichhas receptor genes expressed in several organs including brainsand heart (here we used crhr1, which is the type 1 CRHreceptor). UCN I (urocortin I precursor), UCN II (urocortin Iprecursor, which is a stresscopin-related petide), and UCN III(stresscopin) were among the 10 genes. It has been shown thatthis gene group is a member of the CRH peptide family and isfound in many discrete brain regions and that it influencesfeeding, anxiety, and auditory processing behaviors (28).", + "It has beendemonstrated that stress and corticosterone release are important regulators of hippocampal 5-HT1A receptors [294], and thatprolonged corticosterone treatment alters the responsiveness of5-HT1A receptors to 8-OH-DPAT in rat CA1 hippocampal neurons [295]. Interestingly, changes in the hippocampally mediated5-HT1A receptor responses depend on whether stress or corticosterone predominantly activate the mineralocorticoid receptoror the glucocorticoid receptor [296]. Depression is characterized by a dysregulated response to stress, which may result inchanges in hippocampal 5-HT1A receptors in humans.", + "Herman JP & Cullinan WE Neurocircuitry of stress: central control of the hypothalamopituitaryadrenocortical axis. Trends in Neurosciences 20, 7884 (1997). [PubMed: 9023876]10. Barbazanges A, Piazza PV, Le Moal M & Maccari S Maternal glucocorticoid secretion mediateslong-term effects of prenatal stress. J. Neurosci 16, 39433949 (1996). [PubMed: 8656288]11. Montano MM, Wang MH, Even MD & vom Saal FS Serum corticosterone in fetal mice: sexdifferences, circadian changes, and effect of maternal stress. Physiol. Behav 50, 323329 (1991). [PubMed: 1745676]12.", + "He and othersconsidered that hormones released by stressful experiences couldenhance memory consolidation, indicating particularly the hormonesepinephrine and glucocorticoids as memory modulators (McGaugh &Roozendaal, 2009). It was suggested that several brain regions work insynergy to assure that emotionally significant experiences are wellremembered, and this could prepare the organism for future experiences by inducing long-term behavioral changes (Bahtiyar, Karaca,Henckens, & Roozendaal, 2020; McGaugh, 2013).", + "Keywords: stress, hippocampus, microarray, C57BL/6J, DBA/2J, C57BL/6NJINTRODUCTIONStress is a generic term used to describe physiological and behavioral responses to realor perceived challenges (Wang et al. , 2013). These responses are integrated over multiplesystems (e.g. , autonomic, behavioral, endocrine, and immune) with the hypothalamic-pituitaryadrenal (HPA) axis being a main pillar of the neuroendocrine response to stress. Ultimately,stimulation of the HPA axis results in glucocorticoid hormone [corticosterone (CORT) in miceand cortisol in humans] release." + ], + [ + "Cortisol has widespread effects on the body, but its major roleis to counteract the stress response and attempt to return the body to homeostasis. Once this is achieved, cortisol can negatively regulate its own production throughinhibitory feedback loops at the level of secretion of both CRH and ACTH. Thisbiochemical pathway is known as the hypothalamic-pituitary-adrenal (HPA) axis.", + "Gulf War and Health: Volume 10: Update of HealthEffects of Serving in the Gulf War, 2016. National Academies Press. O'Callaghan, J.P., Kelly, K.A. , Locker, A.R. , Miller, D.B. , Lasley, S.M. , 2015a. Corticosterone primes theneuroinflammatory response to DFP in mice: potential animal model of Gulf War Illness. Journal ofneurochemistry 133, 708-721. O'Callaghan, J.P., Kelly, K.A. , Locker, A.R. , Miller, D.B. , Lasley, S.M. , 2015b. Corticosterone primes theneuroinflammatory response to DFP in mice: potential animal model of Gulf War Illness. J Neurochem133, 708-721. O'Callaghan, J.P., Michalovicz, L.T. , Kelly, K.A. , 2016.", + "The glucocorticoids, cortisol in humans and corticosterone in most rodents, have numerous metabolic and physiological effects andprovide negative feedback to inhibit HPA activity at the level of the pituitary, PVN,hippocampus, prefrontal cortex, and other brain areas [142144]. Glucocorticoids,acting on their receptors (mineralocorticoid receptors, MR; glucocorticoid receptors, GR), initiate metabolic and physiological responses that facilitate response toand coping with the stressor and, ultimately, dampen stress-activated defense reactions, including immune responses, to prevent them from overshooting and themselves causing harm [145, 146].In the short term, the metabolic and physiologicalchanges induced by the glucocorticoids promote survival (increased gluconeogenesis and blood pressure, suppressed immune and reproductive function). However,prolonged exposure to glucocorticoids can result in metabolic, cognitive, andimmune dysfunction [147]. Thus, it is important that the HPA axis be tightlycontrolled through efcient feedback and efcient termination of the stress response;the ability to turn off the stress response is as important as the ability to respondinitially [145].[45, 54] or by injection of glucocorticoid synthesis inhibitors [58] enhances theexpression of IL-1 and other cytokines in brain as a result of stress exposure. Thesendings parallel earlier studies showing that ADX enhanced the plasma IL-6response evoked by exposure to a novel environment [78] and t with the canonicalviewpoint that glucocorticoids have powerful anti-inammatory properties. Whilewe do not wish to dispute this viewpoint, it is interesting to note that antiinammatory actions of CORT are most pronounced at high and supraphysiologicalconcentrations, whereas lower concentrations of CORT appear to have someimmune-potentiating effects (e.g. , [6]).As such, an immunological stressor, such as infection or a mimetic such asLPS which works to activate the HPA axis, will result in downstream glucocorticoidrelease. This secretion of glucocorticoids should, in turn, attenuate the originatinginammation. However, this is often not the case when studying for patients ofdepression showing enhanced immune activation, and hence, additional processesmust also be at play such as the proinammatory nature of cytokines and glucocorticoid resistance (discussed below). It is important to recognize that cytokine regulation of HPA axis function is complex and requires a multifaceted approach to studying such complexity.CORT, on the other hand, appears to negatively inuence the expression of IL-1 and other cytokines throughinteraction with glucocorticoid responsive elements (GREs) in the promoter region of cytokine genes. There are also reciprocal interactions between NE andCORT, whereby NE is a key driver in central regulation of the HPA axis (via a1-adrenergic receptors) and CORT tempers the release of NE.[45] also demonstrated the profound impact of endogenous CORT to inhibit brain cytokine responses evoked by stress, showing that boththe magnitude and spatial distribution of IL-1 changes provoked by stress wereaugmented in ADX rats. Since then, our lab and others have gone on to successfullyexamine cytokine expression using gross dissections and ELISA detection (e.g. ,[22, 5557]) or at the level of gene expression using RT-PCR [5861], whichtogether further support the view that stress challenges have the ability to drivecytokine changes in the uninjured brain.The HPA axis has been shown to regulate immune responses. The main hormones involved in the HPA axis are corticotropin-releasing hormone (CRH), adrenocorticotropic hormone (ACTH), andglucocorticoids (GC), which are also called stress hormones. These hormones contribute to the regulation of immune responses and can also affect neuronal survival,neurogenesis, synaptic plasticity, and behavioral responses [1, 2]. The HPA axis isa three-tiered biological system that begins at the highest level with the release ofCRH from the hypothalamic paraventricular nucleus (PVN).Whether these low-dose facilitation effectsrelate more directly to the timing of CORT injection relative to cytokine measurements, or represent differential tissue sensitivity to glucocorticoids, remains to bedetermined [79, 80]. Moreover, ambient levels of CORT taken during or immediately after stress exposure do not appear to be predictive of central IL-1 expression[63, 65]. Regardless, the point here is that the long-held assumption that glucocorticoids exert unilateral, anti-inammatory actions does not broadly account for theliterature writ large.During theresolution phase of disease, hormone levels did not differ among the three prenatalgroups, and if anything, corticosterone and ACTH levels were somewhat lower inPAE compared to control animals with clinical signs of arthritis. It has been suggested [227] that low levels of glucocorticoid hormones in the context of high levelsof inammation may reect a disconnect between the immune and endocrine systems. As noted above, we have evidence for such a disconnect in our previous work.As such, CORT can interfere with IL-1 gene expression (and expression of othercytokines) through numerous pathways. When taken together, a simple framework for understanding the interactionbetween the major stress-responsive systems (SNS and HPA axis) and inammatoryconsequences of stress is that neuroinammatory consequences of stress are mechanistically intertwined between the stimulatory actions of the SNS and the inhibitoryactions of CORT (see [84] for a review), though much work remains to be done inthis area. This concept is depicted in Fig.The HPA response is slower, and the secretion of the glucocorticoid hormones initiates numerous metabolic and behavioral effects that mediateeffective coping with a stressor in the longer term. The HPA and autonomic systemsappear to be regulated by similar neurotransmitters (e.g. , acetylcholine, serotonin,norepinephrine, GABA). In addition, there is reciprocal stimulation of HPA andautonomic activity by CRH and norepinephrine and reciprocal actions of the glucocorticoids and catecholamines. The glucocorticoids are thought to feed back torestrain activity of both systems. Further, the activity and sensitivity of both systemsare modulated by stress and circadian inuences [140].Regardless, the evidence todate suggests that NE release is a key driver of inammatory processes in at leastsome brain structures during times of stress. In contrast to the actions of NE, which appear to stimulate the expression of IL-1and other cytokines, CORT appears to constrain cytokine gene and protein expression during times of stress, at least in broad strokes. Indeed, there is a plethora ofstudies demonstrating that removal of endogenous glucocorticoids through ADX142T. Deak et al.With that said, there are a variety of cellular mechanisms by which glucocorticoids impact gene expression for cytokines during times of stress, particularly in thecase of IL-1 gene expression. Many of these effects are produced through interaction of the CORT-receptor complex with repressor sites in the promoter region ofthe IL-1 gene, including the nGRE repressor site and interference with nuclear factor k-light chain-enhancer of activated B cells (NFkB) signaling. In addition, CORThas been shown to block cAMP response element-binding (CREB) phosphorylation, prevent NFkB binding to the DNA, and destabilize mRNA for IL-1 [8183].Similarly,paradigms involving neonatal immune activation have demonstrated increased propensities to develop stress-related behaviors alongside hypersecretion of corticosterone, elevated hippocampal cytokines, and altered GR abundance in adulthood[184186]. Such animal studies have been useful in dissociating the relationshipbetween the actions of cytokines and the HPA axis in the brain to provide a moredirect account of how these factors may inuence behavior. Raz Yirmiyas laboratory has been particularly inuential in this regard, reporting brain IL-1b to beresponsible for depressive-like behavior in relation to stress physiology.", + "Corticosterone primes theneuroinflammatory response to DFP in mice: Potential animal model of Gulf War Illness. J. Neurochem. 2015,133, 708721. OCallaghan, J.P.; Miller, D.B. Neuroinflammation disorders exacerbated by environmental stressors. Metabolism 2019, 100, 153951. [CrossRef] [PubMed]Ashbrook, D.G. ; Arends, D.; Prins, P.; Mulligan, M.K. ; Roy, S.; Williams, E.G. ; Lutz, C.M. ; Valenzuela, A.;Bohl, C.J. ; Ingels, J.F. ; et al. The expanded BXD family of mice: A cohort for experimental systems geneticsand precision medicine. BioRxiv 2019. [CrossRef]Peirce, J.L. ; Lu, L.; Gu, J.; Silver, L.M. ; Williams, R.W.", + "For example, the measurement of plasma cortisol, the main glucocor-ticoid secreted by the interrenal tissue of theteleosts in response to stress and the most usedindication of a stress status, may not be sufcient to assess physiological conditions under chronic stress, in particular due to theacclimation of the interrenal gland and theinuence of negative feedback mechanismson the hypothalamuspituitaryinterrenal axis(Rotllant et al. , 2000).Otherworks have assessed the changes in gene expression by microarray using an in vitro approach with cultured cells treated with animmune stressor (LPS) and compared the transcriptomic response when adding cortisol. Theresults showed that cortisol is able to counteract the immune activation, but other responseswere taking place anyway, such as the recoveryof cell activity, increase of protein synthesis, andenergetic metabolism (Mackenzie et al. , 2006).", + "Pruett SB, Fan R, Myers LP, Wu WJ, Collier S. Quantitative analysis of the neuroendocrine-immune axis: linearmodeling of the effects of exogenous corticosterone and restraint stress on lymphocyte subpopulations in thespleen and thymus in female B6C3F1 mice. Brain Behav Immun 2000 Dec;14(4):270-287. 56. Pruett SB, Fan R. Quantitative modeling of suppression of IgG1, IgG2a, IL-2, and IL-4 responses to antigen inmice treated with exogenous corticosterone or restraint stress. J Toxicol Environ Health A 2001 Feb9;62(3):175-189. 57. Munck A, Guyre PM, Holbrook NJ. Physiological functions of glucocorticoids in stress and their relation topharmacological actions." + ] + ], + "task_id": [ + "DA2C5FBAA7806455F89E896E641DD642", + "7B0629638DF00DF1183B67EE3BF39B1C" + ] +}
\ No newline at end of file diff --git a/gnqa/data/study1/datasets/human/dataset_domainexpert_aging_1.json b/gnqa/data/study1/datasets/human/dataset_domainexpert_aging_1.json new file mode 100644 index 0000000..fc034c8 --- /dev/null +++ b/gnqa/data/study1/datasets/human/dataset_domainexpert_aging_1.json @@ -0,0 +1,103 @@ +{ + "question": [ + "What is the significance of the length of telomeres?", + "Which mouse genes have been associated with longevity?", + "what genetic factor are associated with aging", + "which genes are typically associated with early aging?", + "How do I generate a linkage or association mapping study in mice to understand aging?" + ], + "answer": [ + "The length of telomeres is significant as it is associated with aging, disease, and mortality. Longer telomeres are generally associated with better health outcomes, including protection from age-related diseases, better cognitive function, and healthier lipid profiles. Shorter telomeres, on the other hand, have been linked to higher risks of heart disease, infection-related death, and other diseases. Telomere length can also be influenced by factors such as stress, with high-stress situations potentially leading to faster telomere shortening and premature aging.", + "The mouse genes associated with longevity include the C3H allele at D2Mit58, the BALB allele at D16Mit182, the C57BL/6 allele at D4Mit84, the C3H allele at D9Mit110, and the C57BL/6 and C3H alleles at D12Mit167. Additionally, a locus on chromosome 2 and another on chromosome 6 have been associated with longevity.", + "Several genetic factors are associated with aging. These include allele variants, polymorphisms (SNPs), and specific genes such as the microsomal transfer protein (MTP), PKA-anchoring protein (AKAP2) gene, FOXO3A, APOE, and genes in the HLA-DQA1/DRB1 and LPA regions. Other genes associated with aging are those highly expressed in the brain like HECW2, HIP1, BIN2, GRIA1, and genes involved in neural development and function like KCNQ4, LMO4, GRIA1, NETO1. Genes involved in autophagy like ATG4C are also associated with aging.", + "The genes typically associated with early aging are APOE and FOXO3A.", + "To generate a linkage or association mapping study in mice to understand aging, you would first need to select appropriate mouse strains. You could use inbred strains like C57BL/6J (B6) and DBA/2J (D2), or a recombinant inbred strain like BXD. You would then breed these mice, possibly creating an F2 generation cross or a backcross. After breeding, you would genotype and phenotype the offspring. For aging studies, you would monitor the mice over their lifespan, noting any changes in health, behavior, or physical characteristics. You could also perform genome-wide association mapping and correlation analyses against existing phenotypic and expression data sets to identify candidate genes involved in age-related decline. Additionally, you could use bioinformatics tools to analyze data and find patterns hinting at a common molecular mechanism. Finally, you would validate your findings using statistical analysis." + ], + "contexts": [ + [ + "In birds, where erythrocyte telomere length (ETL) is measured, the majority of species sampled have shown no sex difference (36).Nonetheless, bird telomere dynamics are complex and, as with humans, may be affected by environment and stress.For example, a longitudinal study of black-tailed gulls (Larus crassitostris) over 2-5 years found no correlation between ETL and age or sex.Rather, ETL attrition was correlated with reduced food availability and environmental stressors (55).In a captive zebra finch (Taeniopygia guttata) population, male and female mean telomere length decreased with increasing age of the animals, but did differ between sexes (56).As these examples illustrate, the relationship between telomere length, lifespan, and sex is likely to be complex in other vertebrates.Comparative studies of age-related telomere attrition in other species also reveal a variety of patterns.Barrett and Richardson (36) recently summarized the comparative data available on sex differences in telomere length.They found a strong correlation between male-biased mortality and either shorter telomeres or greater telomere attrition in males across bird and mammal taxa.However, telomere length did not differ between males and females in species where females are shorter-lived than males (36), suggesting that telomere shortening is not associated with species-specific longevity in a simple linear fashion.These studies generally suffer from relatively small sample sizes and are largely cross-sectional.Further, the use of diverse assays, different tissues (eg, leukocytes in mammals vs erythrocytes in birds), and lack of standardized benchmarks for accuracy makes comparisons between studies difficult.In some organisms, there is no clear relationship between telomere length and lifespan.Age-related telomere attrition could not be detected in Daphnia pulex (57) or sea urchin species (Strongylocentrotus franciscanus and Lytechinus variegatus) (58).Studies in C. elegans examining natural variation in telomere length and experimentally manipulated telomere length detect no correlation with lifespan (59,60), and in Drosophila, which uses a telomerase-independent mechanism for telomere maintenance, there is a similar lack of correlation between longevity and telomere length (61).Similarly, data on sex differences in age-related telomere shortening are mixed.For example, in the ant species Lasius niger, the rate of telomere shortening is more rapid in short-lived males compared to longer-lived females.But, mean telomere length does not differ between the two types of females, queens and workers, despite the fact that queens live much longer than workers (up to 28 years vs 2-3 months) (62).These findings suggest that the question of how telomere shortening affects aging across species and how sex affects telomere attrition rates are complex.", + "With new methodologies to assess relative telomere length by Q-PCR, studies were designed to address the impact of telomere length on aging, aging associated pathologies, and mortality.One such study has correlated shorter leukocyte telomere lengths at age 60 with a three times higher risk of heart disease and an eightfold increase in risk of infection-related death (36), thereby associating measured relative cellular aging with disease and life expectancy.In a similar way, chronic stress was shown to correlate with short leukocyte telomere length, a phenomenon attributed to higher levels of oxidative stress at the cellular level (70).More recent studies have linked telomere length in smooth muscle cells with senescence and disease severity in patients with atherosclerosis (141,150).Leukocyte telomere length was also short in a cohort of similar patients and associated with a higher risk of developing occult cardiovascular disease (71).More data are needed to understand and validate the use of leukocyte telomere length as a biomarker for cardiovascular and other diseases.", + "Shortening of the telomeres at the ends of chromosomes has been associated with age-related disease and mortality [16][17][18].A recent study identified a common haplotype of four SNPs in the human telomerase reverse transcriptase gene (hTERT) that is enriched in centenarians and associated with longer telomere length [19].It was also shown that centenarians and their offspring maintain longer telomeres compared with controls and that longer telomeres are associated with protection from age-related diseases, better cognitive function and lipid profiles of healthy ageing [19].", + "New research has indicated how social factors, such as subordination, may translate into biological effects (epel et al. 2004;Chae et al. 2014).In a now classic study, epel et al. ( 2004) examined the telomere lengths of fifty-eight healthy premenopausal women who either had a healthy child (n = 19) or were giving care to a chronically ill child (n = 39. )They measured perceived stress, years of caregiving, telomere length, and oxidative stress.They found highly statistically significant differences in telomere length between women taking care of chronically ill children and those who had healthy children.They found highly statistically significant negative correlations between telomere length and perceived stress and years of caregiving.Telomerase activity had highly statistically significant negative correlations with perceived stress and years of caregiving.Oxidative stress was highly positively correlated with perceived stress and years of caregiving.They concluded that the telomere length shortening was equivalent to 9 to 17 years of aging in the high stress group.Telomere length is considered a biomarker of aging (Finch and Kirkwood 2000).Thus, this study showed that caregiver stress had essentially aged these women 9 to 17 years compared to women who had healthy children.", + "The single, consistent predictor of the rate of telomere attrition shown in multiple adult and the few child longitudinal studies is the baseline measurement of telomere length at the start of each study.This suggests the importance of understanding predictors of telomere length prior to adulthood, as it determines in part the rate of change (Revesz et al. 2014a, b;Nordfjall et al. 2009).Moreover, longitudinal studies in adults have had found that telomere attrition rate is dependent on baseline telomere length independent of any phenotypic predictors of shortening, such as disease or demographic variables (Nordfjall et al. 2009), attesting to the importance of studies to evaluate risk factors for shortening prior to adulthood.Rates of decline in childhood may be particularly relevant for later chronic disease risk as shorter telomere length has been implicated in disease progression through exposure to cellular senescence, inflammatory cytokines and adipocyte hypertrophy (Raschenberger et al. 2015;Willeit et al. 2014;Monickaraj et al. 2012;Fyhrquist et al. 2013).Adult studies have also found a negative correlation with baseline telomere length, suggesting a negative feedback regulation of leukocyte telomere length (Farzaneh-Far et al. 2010;Aviv et al. 2009;Epel et al. 2008;Nordfjall et al. 2009).It is possible that while our follow-up period was shorter than Shalev et al. 2013 and adult studies, which had a minimum of 5 year intervals with the exception of Puterman et al. (2015) who followed for a one-year time period, there may be biological regulation of telomere length at 4 and 5 years of age such that shorter telomeres are more robustly maintained, whereas longer telomeres have greater rates of decline, over a short period of one year.It is unlikely that this relationship is due to assay error or regression to the mean given the consistency of our findings across studies.We have had similar findings of longer telomeres having greater rates of decline and shorter telomeres being maintained in our different studies (Farzaneh-Far et al. 2010;Epel et al. 2008;Puterman et al. 2015).We found primarily maintenance and lengthening from 4 to 5 years of age in children, with minimal telomere attrition, indicating that most of the telomere loss happens in the first 4 years, plateauing by age 4. Lastly, we found close to 10 % of the variance in rate of change in children shared by mothers.While some of this shared variance is genetic, there are likely environmental factors that need to be further identified that impact rate of telomere length change.Abstract Telomeres are the protective complexes at the end of chromosomes, required for genomic stability.Little is known about predictors of attrition in young children or the relationship between parental and child patterns of telomere change.Telomere length was assessed twice over one year, at 4 and at 5 years of age, in Latino preschool children (n = 77) and their mothers (n = 70) in whole blood leukocytes.Maternal and child rates of attrition during the same time period were compared in 70 mother-child pairs.More children showed lengthened telomeres over one year compared to their mothers and very few children showed attrition (2.6 %).Approximately 31 % of children and 16 % of mothers displayed lengthening over one year while 66 % of children showed maintenance in contrast with 74 % of mothers.The strongest predictor for child telomere length change was child's baseline telomere length (r = 0.61,p < 0.01).Maternal rate of change was associated with child rate of change (r = 0.33, p < 0.01).After controlling for child baseline telomere length, the relationship between child and maternal rate of change trended towards significance (Coeff = 0.20, 95 % CI 0.03 to 0.43; p = 0.08).", + "Blackburn and Epel, a health psychologist who did original research on how specific lifestyle and psychological habits can protect telomeres, published The Telomere Effect (Blackburn & Epel, 2017), in which they suggested that individuals with shorter telomeres developed diseases earlier in life (a shorter \"disease span\").What follows is the evidence from these authors, their colleagues, and other researchers describing how length of telomeres contributes to mind-body connection and healthy longevity.", + "As early as at the time of birth, each of the 92 telomeres of the human genome has its own characteristic length.Additionally, each telomere shortens by its individual attrition rate.In general, longer telomeres at birth are associated with higher age-dependent attrition rates and vice versa.Overall, telomere shortening appears more dynamic in males.In conclusion, a combination of overall and chromosomespecifi c shorter telomeres and more pronounced age-dependent telomere erosion could be observed in males.There is a prospective clinical study strongly suggesting that longer telomeres decrease the risk of dying (Cawthon et al., 2003).With this in mind, the telomere length discrepancies between the sexes may indeed be a factor infl uencing the differences in their life expectancy.In every chromosome a linear decline of telomere length with age was observed, being more pronounced in men independent of the examined chromosome arm.This might suggest that telomere length on single chromosome arms may be infl uenced by the same factors which determine overall telomere length.S. Mayer a S. Brderlein a S. Perner a I. Waibel a A. Holdenried a N. Ciloglu a C. Hasel a T. Mattfeldt a K.V. Nielsen b P. Mller a a Institute of Pathology, University of Ulm, Ulm (Germany); b DakoCytomation A/S, Glostrup (Denmark) follow uniformity.In previous studies, sex-specifi c differences in telomere length and attrition rate of men and women were found (Benetos et al., 2001;Cawthon et al., 2003;Nawrot et al., 2004), suggesting gender differences in behavior of telomeres.In individual chromosome arms, telomere length was also shown not to be homogeneous (Lansdorp et al., 1996;Benn, 1997;Martens et al., 1998;Surralles et al., 1999;Hao and Tan, 2001;Londono-Vallejo et al., 2001;Graakjaer et al., 2003), some telomeres being signifi cantly shorter, others longer than the average length.To date, these characteristics in telomere lengths could not be set in a biological context, as only a few groups have provided detailed information about chromosome-specifi c patterns of telomere distribution (Lansdorp et al., 1996;Graakjaer et al., 2003).Whether accumulation of short telomeres (Martens et al., 2000;Londono-Vallejo et al., 2001) or rather the shortest telomere of one specifi c chromosome arm (Hemann et al., 2001) elicits senescence, remains an open question so far.In recent literature, there are hints that the average telomere length may be higher in women and that their annual shortening rate may be somewhat lower (Vaziri et al., 1993;Rufer et al., 1998;Jeanclos et al., 2000), but these reported differences failed to reach statistical signifi cance except for one study (Jeanclos et al., 2000).Here, we provide compelling evidence that this is indeed the case.It is generally accepted that telomeres shorten during DNA replication both in vitro and in vivo.In individuals, short telomeres are considered to be a sign of advanced age.Cawthon and coworkers (2003) showed that telomere shortening in humans likely contributes to mortality, supporting the hypothesis that they might act as a mitotic clock (Allsopp et al., 1992).Telomere length dynamics, however, does not seem to Abstract.During aging, telomeres are gradually shortened, eventually leading to cellular senescence.By T/C-FISH (telomere/centromere-FISH), we investigated human telomere length differences on single chromosome arms of 205 individuals in different age groups and sexes.For all chromosome arms, we found a linear correlation between telomere length and donor age.Generally, males had shorter telomeres and higher attrition rates.Every chromosome arm had its individual age-specifi c telomere length and erosion pattern, resulting in an unexpected heterogeneity in chromosomespecifi c regression lines.This differential erosion pattern, however, does not seem to be accidental, since we found a correlation between average telomere length of single chromosome arms in newborns and their annual attrition rate.Apart from the above-mentioned sex-specifi c discrepancies, chromosome arm-specifi c telomere lengths were strikingly similar in men and women.This implies a mechanism that arm specifi cally regulates the telomere length independent of gender, thus leading to interchromosomal telomere variations.", + "Shortening of the telomeres at the ends of chromosomes has been associated with age-related disease and mortality [16][17][18].A recent study identified a common haplotype of four SNPs in the human telomerase reverse transcriptase gene (hTERT) that is enriched in centenarians and associated with longer telomere length [19].It was also shown that centenarians and their offspring maintain longer telomeres compared with controls and that longer telomeres are associated with protection from age-related diseases, better cognitive function and lipid profiles of healthy ageing [19]." + ], + [ + "DOI: https://doi.org/10.7554/eLife.75244\b24 of 30Chromosomes and Gene Expression | Genetics and GenomicsResearch articleContinuedAuthor(s)YearDataset titleDataset URLDatabase and IdentifierLongevityteam2021Genetics of longevity inBXD micehttp://www.BDL_10006, 10006genenetwork.org/show_trait?trait_id=10006&dataset=BXD-LongevityPublishLongevityteam2021Genetics of longevity inBXD micehttp://www.BDL_10010, 10010genenetwork.org/show_trait?trait_id=10010&dataset=BXD-LongevityPublishLongevityteam2021Genetics of longevity inBXD micehttp://www.BDL_10011, 10011genenetwork.org/show_trait?trait_id=10011&dataset=BXD-LongevityPublishLongevityteam2020Genetics of longevity inBXD micehttp://www.BDL_10021, 10021genenetwork.org/show_trait?trait_id=10021&dataset=BXD-LongevityPublishLongevityteam2020Genetics of longevity inBXD micehttp://www.BDL_10022, 10022genenetwork.org/show_trait?trait_id=10022&dataset=BXD-LongevityPublishLongevityteam2020Genetics of longevity inBXD micehttp://www.BDL_10025, 10025genenetwork.org/show_trait?trait_id=10025&dataset=BXD-LongevityPublishLongevityteam2021Genetics and epigeneticsof aging and longevity inBXD micehttp://www.BDL_10066, 10066genenetwork.org/show_trait?trait_id=10066&dataset=BXD-LongevityPublishReferencesAlbertsen HM, Smith SA, Mazoyer S, Fujimoto E, Stevens J, Williams B, Rodriguez P, Cropp CS, Slijepcevic P,Carlson M. 1994.", + "DOI: https://doi.org/10.7554/eLife.75244\b24 of 30Chromosomes and Gene Expression | Genetics and GenomicsResearch articleContinuedAuthor(s)YearDataset titleDataset URLDatabase and IdentifierLongevityteam2021Genetics of longevity inBXD micehttp://www.BDL_10006, 10006genenetwork.org/show_trait?trait_id=10006&dataset=BXD-LongevityPublishLongevityteam2021Genetics of longevity inBXD micehttp://www.BDL_10010, 10010genenetwork.org/show_trait?trait_id=10010&dataset=BXD-LongevityPublishLongevityteam2021Genetics of longevity inBXD micehttp://www.BDL_10011, 10011genenetwork.org/show_trait?trait_id=10011&dataset=BXD-LongevityPublishLongevityteam2020Genetics of longevity inBXD micehttp://www.BDL_10021, 10021genenetwork.org/show_trait?trait_id=10021&dataset=BXD-LongevityPublishLongevityteam2020Genetics of longevity inBXD micehttp://www.BDL_10022, 10022genenetwork.org/show_trait?trait_id=10022&dataset=BXD-LongevityPublishLongevityteam2020Genetics of longevity inBXD micehttp://www.BDL_10025, 10025genenetwork.org/show_trait?trait_id=10025&dataset=BXD-LongevityPublishLongevityteam2021Genetics and epigeneticsof aging and longevity inBXD micehttp://www.BDL_10066, 10066genenetwork.org/show_trait?trait_id=10066&dataset=BXD-LongevityPublishReferencesAlbertsen HM, Smith SA, Mazoyer S, Fujimoto E, Stevens J, Williams B, Rodriguez P, Cropp CS, Slijepcevic P,Carlson M. 1994.", + "Leduc MS, Hageman RS, Meng Q et al (2010) Identification ofgenetic determinants of IGF-1 levels and longevity among mouseinbred strains. Aging Cell 9(5):823836. doi:10.1111/j.14749726.2010.00612.x10. Lang DH, Gerhard GS, Griffith JW et al (2010) Quantitative traitloci (QTL) analysis of longevity in C57BL/6J by DBA/2J (BXD)recombinant inbred mice. Aging Clin Exp Res 22(1):81911. Gelman R, Watson A, Bronson R et al (1988) Murine chromosomalregionscorrelatedwithlongevity. Genetics118(4):69370412. Jackson AU, Galecki AT, Burke DT et al (2002) Mouse lociassociated with life span exhibit sex-specific and epistatic effects.Conclusions These results suggest a novel locus influencing survival in the B6/D2 genetic background, perhapsvia a metabolic disorder that emerges by 200 days of age inmale animals. KeywordsPathologyLongevity \u0001 Lifespan \u0001 Mouse \u0001 Linkage \u0001IntroductionLongevity, the quintessential complex trait, likely reflectsall aspects of an organisms life history. In humans, theestimated heritability of age at death is estimated at2533 % [1]. Genetic contributions to mortality rates arethus of great interest and may aid in the understanding ofdisease etiology and the process of aging itself [2].Here, we have extended this analysis to search forgenotypes related to survival to the age of 800 days in apopulation of a reciprocal F2 cross between (B6) and (D2)mice. Since QTL for longevity in mice have shown strongsex specificity [10, 12], we conducted sex-specific analyses. In addition, we also determined whether there wereany change in pathology changes associated with the locithat showed frequency distortions with aging. To confirmthe associations of the loci of interest with longevity andpathology, we performed replication analyses on a panel ofBXD recombinant inbred strains.", + "352(6291): p. aad0189. Liao, C.Y. , et al. , Genetic variation in the murine lifespan response to dietary restriction: from life extension to lifeshortening. Aging Cell, 2010. 9(1): p. 92-5. Johnson, M., Laboratory Mice and Rats. Mater. Methods, 2012. 2: p. 113. Fontaine, D.A. and D.B. Davis, Attention to Background Strain Is Essential for Metabolic Research: C57BL/6 andthe International Knockout Mouse Consortium. Diabetes, 2016. 65(1): p. 25-33. Simon, M.M. , et al. , A comparative phenotypic and genomic analysis of C57BL/6J and C57BL/6N mouse strains. Genome Biol, 2013. 14(7): p. R82. Lilue, J., et al.", + "Mamm Genome 2001;12: 9302. 21 Gelman R, Watson A, Bronson R, Yunis E. Murine chromosomalregions correlated with longevity. Genetics 1988;118:693704. 22 Peirce JL, Lu L, Gu J, Silver LM, Williams RW. A new set of BXDrecombinant inbred lines from advanced intercross populations inmice. BMC Genet 2004;5:7. 23 Rahman ZS, Tin SK, Buenaventura PN et al. A novel susceptibilitylocus on chromosome 2 in the (New Zealand Black \u0004 New ZealandWhite) F1 hybrid mouse model of systemic lupus erythematosus. J Immunol 2002;168:30429. 24 Kono DH, Burlingame RW, Owens DG et al.", + "Conversely, the BXD strain with the shortest life span(BXD14) has the lowest responsiveness to the stimulatory effect ofTGF-2 when old (48). The region on chromosome 2 where asuggestive QTL regulating the responsiveness to TGF-2 in oldmice is located also contains two QTL for longevity (32). Finally,the strongest support for this hypothesis is the correlation betweenlongevity and the age-related increase in the serum-dependent effect of TGF-2 on LSK cells, the extent of which may determinestem cell function in aged mice.", + "FIGURE 8-5 Genetic regulation of longevity in mice stratified by cause of death.Female mice that inherit the C3H allele at D2Mit58 plus the BALB allele at D16Mit182 (light gray bars) have significantly higher longevity than their sisters (dark gray bars) with the C57BL/6 plus DBA/2 allele combination (\"all causes\" of death combined).Subsets of mice that died either of cancer or of a nonneoplastic (\"benign\") illness both show the association between genotype and longevity.Among the mice dying of neoplasia, subsets dying of lymphoma or of fibrosarcoma show equivalent, and significant, genotypic effects.Bars indicate means plus standard error of the mean.SOURCE:Miller et al. (unpublished results).The available dataset also provides examples in which genetic variants seem to influence the risk of specific late-life diseases.Figure 8-6, for example, shows longevity results for mice stratified by their inheritance at the 12th chromosome locus D12Mit167.This is a locus associated with differential longevity in both male and female mice, with the strongest effect (adjusted p < 0.01) seen in those mice living more than 657 days (Jackson et al., unpublished results).The longest-lived mice are those that inherit both the C57BL/6 allele from their mother and the C3H allele from their father; on average, they survive 93 days longer than siblings with the BALB plus C3H combination.Figure 8-6 shows that the D12Mit167, like the pair of loci illustrated in Figure 8-5, has significant and similar effects in mice dying of cancer (85 days) and in mice dying of non-neoplastic diseases (126 days).A more detailed analysis of the cancers, however, suggests that while lymphoma and hepatoma victims are equally protected by the favorable alleles (effect sizes of 93 and 167 days, respec- mice of two subgroups: those dying of the urinary syndrome MUS, and those dying of all other causes.The genetic analysis contrasts mice with both the C57BL/6 allele at D4Mit84 and the C3H allele at D9Mit110 to mice with any of the three other allele combinations.In the males dying of causes other than MUS, this allele pair is associated with a 170-day increment in longevity (post-hoc p < 0.00003).But for males that do die of MUS, the same allele combination is associated with a 187-day decline in mean life span (post-hoc p < 0.03).This effect is thus pleiotropic, in that these alleles accelerate death in mice susceptible to MUS, while postponing death for all other males in the population.Although these loci are associated with differential longevity in mice that do develop MUS, they do not have a significant effect on the chances that MUS will indeed occur (not shown).The risk of developing MUS seems to be under control of a separate locus on chromosome 6.As shown in the bottom panel of Figure 8-7, males that inherit the C3H allele at D6Mit268 are far more likely to develop MUS (28 percent risk) than are their brothers who receive the DBA/2 allele at this locus (7 percent risk; p = 0.012 by two-tailed Fisher's exact test).High levels of CD8M cells are associated with diminished longevity in mated females (left panel; p < 0.001), but not in virgin females (center panel).Among virgin males, those dying of diseases other than the urinary syndrome MUS show no association between CD8M and longevity (open circles, upper line), but those dying because of MUS show a nonsignificant trend (filled circles, lower line, R = -0.27,p = 0.13) similar to the relationship observed in mated females.SOURCE : Miller et al. (unpublished results).Male or female mice that inherit the C57BL/6 (maternal) and C3H (paternal) alleles at D12Mit167 (light gray bars) are longer lived than their siblings that inherit the BALB plus C3H combination.The \"effect size\" shown at the right represents that difference in mean longevity between mice in the two genetically different groups, with (**) = p < 0.01 and (*) = p < 0.05 by t-test.Similar effect sizes are seen for mice dying of cancer or of non-neoplastic illnesses (\"benign\"), and among the cancer deaths the genetic effect is similar for deaths due to lymphoma and hepatoma.The genetic effect on longevity seems to be minimal, however, for mice dying of fibrosarcoma.Bars show means plus standard errors.SOURCE : Miller et al. (unpublished results).Our own work has taken a different tack: we have attempted to determine whether mutations with differential effects on aging may be present within the many available populations of laboratory-adopted inbred mice.The goal is not so much to clone these genes-if indeed they existbecause positional cloning strategies of this kind require many thousands of animals and would be extremely expensive using an assay, age at death, that is itself so costly.Instead, the goal has been to use gene mapping methods to test hypotheses about aging and to develop new animal models that will be useful for testing well-specified hypotheses about the molecular basis for age-dependent changes.In the absence of a validated battery of biomarkers of aging, we (like most others) have reluctantly decided to use mouse life span as a crude surrogate for aging itself, reasoning that genetic alleles that extend life span well beyond the median for the tested population may be operating via an influence on aging itself.Work conducted using recombinant inbred mouse stocks (Gelman et al., 1988;de Haan and Van Zant, 1999) has suggested that life-span differences between pairs of inbred mouse lines might reflect the influence of as few as 4-7 polymorphic loci, providing some basis for hope that some of these would have an effect large enough to be detected by a genome scan experiment involving 300-1,200 mice.", + ", Vogler, G.P. , Vandenbergh,D.J. , Blizard, D.A. , Stout, J.T. & McClearn, G.E. Quantitative TraitLocus (QTL) Analysis of Longevity in C57BL/6J byDBA/2J (BXD)Recombinant Inbred Mice. Aging Clin Exp Res (in press). Lionikas, A., Blizard, D.A. , Vandenbergh, D.J. , Glover, M.G. ,Stout, J.T. , Vogler, G.P. , McClearn, G.E. & Larsson, L. (2003)Genetic architecture of fast- and slow-twitch skeletal muscleweight in 200-day-old mice of the C57BL/6J and DBA/2J lineage. Physiol Genomics 16, 141152. Lionikas A., Blizard D.A. , Gerhard G.S. , Vandenbergh D.J. , Stout J.T. ,Vogler G.P. , McClearn G.E.", + "Deficiency mapping of quantitative trait loci affecting longevityin Drosophila melanogaster. Genetics 2000;156:11291146. [PubMed: 11063689]33. Ma RZ, et al. Identification of Bphs, an autoimmune disease locus, as histamine receptor H1. Science2002;297:620623. [PubMed: 12142541]Nat Rev Genet. Author manuscript; available in PMC 2007 November 5. Page 12NIH-PA Author Manuscript34. Vivian JL, Chen Y, Yee D, Schneider E, Magnuson T. An allelic series of mutations in Smad2 andSmad4 identified in a genotype-based screen of N-ethyl-N-nitrosourea-mutagenized mouseembryonic stem cells. Proc. Natl Acad. Sci. USA 2002;99:1554215547. [PubMed: 12432092]35. Vogel G. Scientists dream of 1001 complex mice.", + "34. Gelman R, Watson A, Bronson R & Yunis E Murine chromosomal regions correlated withlongevity. Genetics 118, 693704 (1988). [PubMed: 3163317]35. Houtkooper RHet al.The metabolic footprint of aging in mice. Sci. Rep1, (2011). 36. Houtkooper RHet al.Mitonuclear protein imbalance as a conserved longevity mechanism. Nature497, 451457 (2013). [PubMed: 23698443]37. Williams EGet al.An Evolutionarily conserved role for the aryl hydrocarbon receptor in theregulation of movement. PLOS Genet. 10, e1004673 (2014). [PubMed: 25255223]38. Lang DHet al.Quantitative trait loci (QTL) analysis of longevity in C57BL/6J by DBA/2J (BXD)recombinant inbred mice. Aging Clin. Exp. Res. 22, 819 (2010).", + "In addition,the B6 mouse strain is one of the longest-lived mouse strains with a mean lifespan of 3years versus other mouse strains with mean lifespan from 1.5-2 years. Therefore, it isevident that the genetic background of a particular mouse strain can have a profoundeffect on the biology of the HSC population as well as organismal longevity. Indeed, it isfor this reason that it is difficult to compare findings from various laboratories wheredifferent mouse strains are used.", + "NIH-PA Author ManuscriptThis study indicated a large amount of genetic variation for mouse longevity; heritabilitywas 34% for AL and 36% for DR (60% of AL food intake). There was no significantcorrelation between mean longevity under these two conditions, although maximumlifespans of the AL and DR mice were significantly correlated. Similar observations weremade at the UTHSCSA on the ILSXISS RI mice (Liao et al. , 2010a, b; Mattson 2010),where they also observed similar heritability (28% AL males, 36% AL females, 55% DRmales, 53% DR females).For females, hairs of the congenic mice grew 31% faster, also highly significant (P =0.0006, 1-tailed). These results validated the presence of a gene in the differential regionaffecting FE. DiscussionWe report the outcomes of a quantitative genetic study on aging and longevity in the mouse. We studied an extant series of recombinant inbred strains (ILSXISS) that have been usedboth in DR aging studies as well as to study alcohol sensitivity (Williams et al. , 2004).(2007) is a separate issue from the analyses conducted in thisstudy (the AL efficiency model will be tested in future studies). Exp Gerontol. Author manuscript; available in PMC 2011 September 1. Rikke et al. Page 8NIH-PA Author ManuscriptOther studies have also reported that individual mice that maintained the highest BW werelikely to be the longest-lived individuals among cohorts of genetically identical mice(Weindruch et al. , 1986; Harper et al. , 2006).", + "Age-associated changes are conserved between mouse strainsLife span and aging vary between mouse strains.For example, C57BL/6 mice are long-lived compared to the short-lived DBA/2 mice (Turturro et al. 1999).To test the generality of our observations, we also examined LT-HSCs, ST-HSC and MPPs in young and old mice from the DBA/2 strain, which originates from a distinct breeding lineage (Fox 1997)." + ], + [ + "Genomic analysis of longevity offers the potential to illuminate the biology of human aging.Here, using genome-wide association meta-analysis of 606,059 parents' survival, we discover two regions associated with longevity (HLA-DQA1/DRB1 and LPA).We also validate previous suggestions that APOE, CHRNA3/5, CDKN2A/B, SH2B3 and FOXO3A influence longevity.Next we show that giving up smoking, educational attainment, openness to new experience and high-density lipoprotein (HDL) cholesterol levels are most positively genetically correlated with lifespan while susceptibility to coronary artery disease (CAD), cigarettes smoked per day, lung cancer, insulin resistance and body fat are most negatively correlated.We suggest that the effect of education on lifespan is principally mediated through smoking while the effect of obesity appears to act via CAD.Using instrumental variables, we suggest that an increase of one body mass index unit reduces lifespan by 7 months while 1 year of education adds 11 months to expected lifespan.Genomic analysis of longevity offers the potential to illuminate the biology of human aging.Here, using genome-wide association meta-analysis of 606,059 parents' survival, we discover two regions associated with longevity (HLA-DQA1/DRB1 and LPA).We also validate previous suggestions that APOE, CHRNA3/5, CDKN2A/B, SH2B3 and FOXO3A influence longevity.Next we show that giving up smoking, educational attainment, openness to new experience and high-density lipoprotein (HDL) cholesterol levels are most positively genetically correlated with lifespan while susceptibility to coronary artery disease (CAD), cigarettes smoked per day, lung cancer, insulin resistance and body fat are most negatively correlated.We suggest that the effect of education on lifespan is principally mediated through smoking while the effect of obesity appears to act via CAD.Using instrumental variables, we suggest that an increase of one body mass index unit reduces lifespan by 7 months while 1 year of education adds 11 months to expected lifespan.", + "Recent developments on the genetics of aging can be seen as several streams of effort.In general, humans show a relatively modest (<50%) heritability of life spans (results obtained from twin studies discussed below).The apoE polymorphisms are remarkable for their influence on both cardiovascular disease and Alzheimer disease.In contrast, rare mutant genes with high penetrance cause these same diseases but with early onset and a major shortening of the life span.Shortlived laboratory models (fruit flies, nematodes, mice) are yielding rapid advances, with the discovery of mutants that increase life spans in association with altered metabolism, which leads to questions on the physiological organization of aging processes.Although these early findings do not show that a conserved genetic program actually controls aging processes across animal phylogeny, it is striking how frequently findings of metabolic rate, insulin signaling, and free radicals have emerged from very different approaches to aging in nematodes and mammals, for example.These findings hint that the genetic control of life span was already developed in the common ancestor of modern animals so that subsequent evolution of life spans was mediated by quantitative changes in the control of metabolism through insulin and the production of free radicals.", + "Background: Genetic research on longevity has provided important insights into the mechanism of aging and aging-related diseases.Pinpointing import genetic variants associated with aging could provide insights for aging research.Methods: We performed a whole-genome sequencing in 19 centenarians to establish the genetic basis of human longevity.Results: Using SKAT analysis, we found 41 significantly correlated genes in centenarians as compared to control genomes.Pathway enrichment analysis of these genes showed that immune-related pathways were enriched, suggesting that immune pathways might be critically involved in aging.HLA typing was next performed based on the whole-genome sequencing data obtained.We discovered that several HLA subtypes were significantly overrepresented.Conclusions: Our study indicated a new mechanism of longevity, suggesting potential genetic variants for further study.", + "Background: Biological aging estimators derived from DNA methylation data are heritable and correlate with morbidity and mortality.Consequently, identification of genetic and environmental contributors to the variation in these measures in populations has become a major goal in the field.Results: Leveraging DNA methylation and SNP data from more than 40,000 individuals, we identify 137 genome-wide significant loci, of which 113 are novel, from genome-wide association study (GWAS) meta-analyses of four epigenetic clocks and epigenetic surrogate markers for granulocyte proportions and plasminogen activator inhibitor 1 levels, respectively.We find evidence for shared genetic loci associated with the Horvath clock and expression of transcripts encoding genes linked to lipid metabolism and immune function.Notably, these loci are independent of those reported to regulate DNA methylation levels at constituent clock CpGs.A polygenic score for GrimAge acceleration showed strong associations with adiposityrelated traits, educational attainment, parental longevity, and C-reactive protein levels.Conclusion: This study illuminates the genetic architecture underlying epigenetic aging and its shared genetic contributions with lifestyle factors and longevity.", + "INTRODUCTIONHuman aging is affected by genes, life style, and environmental factors.The genetic contribution to average human aging can be modest with genes explaining 20-25% of the variability of human survival to the mid-eighties (Herskind et al., 1996;Fraser and Shavlik, 2001).By contrast, genetic factors may have greater impact on survival to the ninth through eleventh decades (Tan et al., 2008).Notably, exceptional longevity is rare and may involve biological mechanisms that differ from those implicated in usual human aging.", + "Before the advent of NGS technologies, several scientists were interested in the study of allele variants associated with aging, but they were limited by the lack of aging rate biomarkers.Now with NGS technologies, these biomarkers have been emerged such as the epigenetic clock that is described in the DNA methylation sequencing section of this chapter.In this post-genomic era, different strategies have been developed in order to understand the genetic factors involved in aging [17].One strategy used is the study of aging in extreme longevity groups of people, called centenarians.Centenarians are a group that can reach an age above 100 years and has an incidence of 1 every 10,000 people [18].In a pioneering study using extreme longevity people (308 individuals belonging to 137 sibships showing extreme longevity), genome-wide scan analysis identified a region on chromosome 4 associated with extreme longevity [19] that corresponds to the microsomal transfer protein (MTP) [20], which is associated with abetalipoproteinemia and hypobeta lipoproteinemia in humans [21,22].Another approach to study the genetic factors involved in longevity consists in assessing allele frequencies from people of different ages, looking for those polymorphisms (SNPs) with enhanced allele frequencies in high-longevity individuals.Those alleles with diminished frequencies in aged individuals may be associated with age-related diseases.Using this approximation, an SNP that shifts isoleucine to valine was identified in the PKA-anchoring protein (AKAP2) gene.This polymorphism is associated with reduced longevity and cardiac disease [23].Genome-wide association studies (GWAS) have confirmed only three loci that affect longevity: FOXO3A, APOE, and an intergenic locus on chromosome 5q33.3[24][25][26].", + "Even more disappointing result is that some genes predisposing to geriatric diseases discovered by GWAS appear to be not correlated with human longevity (Beekman et al. 2010;Deelen et al. 2011).This result questions whether findings obtained from GWAS may provide insights into the bio-genetic mechanisms underlying a healthy lifespan.In fact, this finding is very surprising because (1) genetic studies of non-human species have discovered numerous genes predisposing to aging-related processes (Cutler and Mattson 2006;Vijg and Suh 2005;Kenyon 2005;Johnson 2006;Greer and Brunet 2008), (2) nongenetic association studies show that the long-living individuals are typically in better health compared to the short-living individuals (Barzilai et al. 2003;Willcox et al. 2008b;Willcox et al. 2008a;Evert et al. 2003), and (3) candidate-gene studies (but not GWAS) document that the same genes can affect diseases and lifespan (Koropatnick et al. 2008;Kulminski et al. 2011).This is an apparent paradox which has to be carefully examined.A prominent geneticist and evolutionary biologist T. G. Dobzhansky asserts that \"nothing in biology makes sense except in the light of evolution. \"Evolution primarily maximizes fitness of individuals of reproductive age.The classical evolutionary biological theory of aging claims that aging occurs because of decline in the force of natural selection with age (Kirkwood and Austad 2000).Then, according to that theory, aging-related (senescent) phenotypes with post-reproductive manifestation are non-adaptive and subject to stochastic variation.Therefore, at a first glance evolution should not be relevant to senescent phenotypes (apart so-called grandmother hypothesis; Hawkes et al. 1998).Such phenotypes, however, can be caused by reproductive-age-related risk factors making, thus, evolution to be relevant to them (Vijg and Suh 2005;Di Rienzo and Hudson 2005;Drenos and Kirkwood 2010).", + "In conclusion, we performed a genome-wide association study of longevity-related phenotypes in individuals of European, East Asian and African American ancestry and identified the APOE and GPR78 loci to be associated with these phenotypes in our study.Moreover, our gene-level association analyses highlight a role for tissue-specific expression of genes at chromosome 5q13.3,12q13.2,17q21.31,and 19q13.32 in longevity.Genetic correlation analyses show that our longevity-related phenotypes are genetically correlated with several disease-related phenotypes, which in turn could help to identify phenotypes that could be used as potential biomarkers for longevity in future (genetic) studies.", + "This population geneticmechanism also can maintain genetic variability for aging, like antagonistic pleiotropy. LARGE-EFFECT MUTANTS AND THE GENETICS OF AGINGOne approach that has become increasingly common in the characterization of the genetics of aging is to isolate aging mutants, usually from mutagenesis experiments, andthen to determine the mechanistic basis for the unusual life span in the mutants. Thisapproach has led to the discovery of genes that can enhance (e.g. , Maynard Smith 1958;Lin et al. 1988; reviewed in Guarente and Kenyon 2000, Kim 2007) or reduce life span(e.g. , Pearl and Parker 1922).", + "M OST genetic studies involved with aging have focused on identifying genes contributing to particular diseases.More recently, it has been recognized that it is also valuable to examine genetic factors related to diseasefree or healthy aging (1,2).Utilizing twins from the National Academy of Sciences-National Research Council (NAS-NRC) twin panel, we have demonstrated that healthy physical aging is under a significant degree of genetic influence, with a heritability over 50% (3).Our definition of healthy aging focused principally on freedom from cardiovascular disease, and has received considerable support in the more recent literature.Brand and colleagues (4) reported that parental age at death was a significant predictor of coronary heart disease death in the Framingham offspring study and concluded that familial similarities for age at death may be mediated through shared coronary heart disease risk factors.Frederiksen and colleagues (5) reported that increased parental life was associated with a reduction in odds ratio for their children to have diabetes, ischemic heart disease, heart failure, stroke, and hypertension.We have found that better midlife lipid levels and blood pressures were associated with increased parental longevity in the National Heart, Lung, and Blood Institute twin study (6).Centenarian siblings and offspring, besides having increased longevity, have been shown to have better health and better cardiovascular risk factor profiles (7)(8)(9)(10).", + "The lack of success in the identification of genes related to aging in humans may be due to the complexity of the phenotype.One approach to investigate aging and longevity is to compare frequencies of genetic variants between nonagenarians or centenarians and the general population.This approach led to the discovery of an association between APOE (Deelen et al., 2011;Ewbank, 2007;Gerdes et al., 2000) and more recently FOXO3A (Anselmi et al., 2009;Flachsbart et al., 2009;Li et al., 2009a;Pawlikowska et al., 2009;Willcox et al., 2008) and human aging and longevity.However, a recent genome-wide association study (GWAS) of individuals reaching the age of 90 or older failed to identify genome-wide significant variants (Newman et al., 2010).Human longevity and healthy aging show moderate heritability (20%-50%).We conducted a meta-analysis of genome-wide association studies from 9 studies from the Cohorts for Heart and Aging Research in Genomic Epidemiology Consortium for 2 outcomes: (1) all-cause mortality, and (2) survival free of major disease or death.No single nucleotide polymorphism (SNP) was a genome-wide significant predictor of either outcome (p 5 10 8 ).We found 14 independent SNPs that predicted risk of death, and 8 SNPs that predicted event-free survival (p 10 5 ).These SNPs are in or near genes that are highly expressed in the brain (HECW2, HIP1, BIN2, GRIA1), genes involved in neural development and function (KCNQ4, LMO4, GRIA1, NETO1) and autophagy (ATG4C), and genes that are associated with risk of various diseases including cancer and Alzheimer's disease.In addition to considerable overlap between the traits, pathway and network analysis corroborated these findings.These findings indicate that variation in genes involved in neurological processes may be an important factor in regulating aging free of major disease and achieving longevity.IntroductionThe recent, remarkable extension of life expectancy is largely attributed to the postponement of mortality at old age (Vaupel, 1997(Vaupel, , 2010)).The years of life gained in the older population residing in developed nations are a success story of public health measures and improved health care.In addition to such external factors, longevity and healthy aging consistently show a modest heritability between 20% and 50% and aging-associated genetic research may provide further insights into the mechanisms of aging (Herskind et al., 1996;McGue et al., 1993;Reed and Dick, 2003).It has been postulated that genes involved in pathways associated with aging identified in animal models, such as insulin-like growth factor (IGF)-insulin signaling, regulation of lipoprotein metabolism, the mTOR pathway, and the oxidative stress response may also influence survival to old or even exceptionally old age in humans (Christensen et al., 2006;Kenyon, 2010;Vellai et al., 2003).However, in humans, common variants within genes involved in these pathways have not been consistently associated with lifespan (Chris-tensen et al., 2006;Kenyon, 2010;Kuningas et al., 2008;Vijg and Suh, 2005).Human longevity and healthy aging show moderate heritability (20%-50%).We conducted a meta-analysis of genome-wide association studies from 9 studies from the Cohorts for Heart and Aging Research in Genomic Epidemiology Consortium for 2 outcomes: (1) all-cause mortality, and (2) survival free of major disease or death.No single nucleotide polymorphism (SNP) was a genome-wide significant predictor of either outcome (p 5 10 8 ).We found 14 independent SNPs that predicted risk of death, and 8 SNPs that predicted event-free survival (p 10 5 ).These SNPs are in or near genes that are highly expressed in the brain (HECW2, HIP1, BIN2, GRIA1), genes involved in neural development and function (KCNQ4, LMO4, GRIA1, NETO1) and autophagy (ATG4C), and genes that are associated with risk of various diseases including cancer and Alzheimer's disease.In addition to considerable overlap between the traits, pathway and network analysis corroborated these findings.These findings indicate that variation in genes involved in neurological processes may be an important factor in regulating aging free of major disease and achieving longevity.", + "Many factors contribute to aging, including genes.This is the first article in a 10-part series that highlight some of what is known about the influence of genes on aging and emerging treatment options that may slow down or potentially reverse the aging process.The series will address \\genes, adducts, and telomeres, decreased immune defenses, oxidation and inefficient mitochondria, toxins and radiation, glycosylation, caloric intake and sirtuin production, neurotransmitter imbalance, hormone mechanisms, reduced nitric oxide, and stem cell slowdown.Underpinning these factors are wear and tear on cells and aging as a result of inability to repair or replace these affected cells.These topics have been addressed in research, health magazines, and even by talk show hosts.There is even a LongevityMap website addressing significant and nonsignificant genetic association studies in aging across the human genome (http://genomics.senescence.info/longevity/).The series will address a scientific and clinical approach to genome-related aging topics.", + "The genetic basis of human longevity has so far been primarily investigated by association studies.Most results from these experiments have been difficult to confirm in independent samples, probably owing to the modest heritability, multifactorial nature, and heterogeneity of the phenotype (Christensen et al., 2006).To date, variation in only two genes has been identified, which has an effect on longevity in various populations: (i) the apolipoprotein E gene (APOE) (Scha chter et al., 1994;Christensen et al., 2006) and (ii) the forkhead box O3A (FOXO3A) gene in the insulin-IGF1 signaling (IIS) pathway (Willcox et al., 2008;Flachsbart et al., 2009).Given the apparent lack of susceptibility candidates, it is conceivable that other genetic factors influence the function or expression of genes relevant for human longevity.", + "IntroductionApproximately 25-30% of the variation in adult lifespan is attributable to genetic factors that become more important with increasing age and exert their strongest effects in nonagenarians and centenarians (Go gele et al., 2010;Hjelmborg et al., 2006).As yet, however, only a few genetic variants have been found consistently to influence longevity.The first to be discovered was the e4 allele of the apolipoprotein E (APOE) gene, a mortality factor that predisposes to both Alzheimer's and cardiovascular diseases (Corder et al., 1993; Panza et al., 2004).APOE e4 is the only variant with a reportedly large adverse effect upon survival at advanced age (Scha chter et al., 1994), and this association has been replicated in several populations (Christensen et al., 2006).Variation in the human forkhead box O3A gene (FOXO3A), in contrast, has been found to be associated with the ability to live long, an effect corroborated by studies in Japanese, German, Italian, US-American, Jewish, Chinese and Danish populations (Anselmi et al., 2009;Flachsbart et al., 2009;Li et al., 2009;Pawlikowska et al., 2009;Soerensen et al., 2010;Willcox et al., 2008).More recently, we have identified exonuclease 1 (EXO1) as a potential novel longevity gene (Nebel et al., 2009).All three genes were detected through candidate-gene approaches.", + "GenAge: the aging gene database Philosophy and overview of resourcesIt is undisputed that genetic factors influence aging.In a remarkable series of recent breakthroughs, a number of genes capable of altering the aging process as a whole -or at least to a large degree -have been identified in animal models and even a few in humans (Finch & Ruvkun, 2001;de Magalhes, 2005;Kenyon, 2005).Furthermore, multiple alleles have been examined for their association with human exceptional longevity (Vijg & Suh, 2005).This is a fascinating and important area of research, yet there are now so many genes being associated with aging and longevity that keeping track of them all is becoming increasingly more difficult.Moreover, it is necessary now to study not only individual genes but their interactions with each other and with the environment, and how together genes give rise to a given phenotype: the so-called systems biology approach.To help researchers address these issues we created GenAge, a database of genes related to longevity and/or aging.", + "I NCREASES in longevity of the general population world- wide are an unprecedented phenomenon with significant health and social impact.Although environmental factors have led to an increase in life span, there is ample evidence that genetic factors are involved in extreme longevity both in humans (1-7) and in other organisms (8).The protective genetic factors that lead to longevity are likely to involve fundamental processes of aging that may be different from those associated with early mortality or premature onset of age-related diseases in younger individuals.The mechanisms of aging in humans are far from understood, but available evidence suggests that several pathways-inflammation, oxidative stress and stress responses, cellular senescence, DNA damage and repair, and the growth hormone or insulinlike growth factor and insulin (GH, IGF, INS) axis-may play key roles (9)(10)(11)(12).Model organisms suggest that inhibiting the GH, IGF, or INS axis, which is involved in regulating cell proliferation, cell death, wound repair, and metabolism, may promote longevity by reducing oxidative stress and slowing the rate of cell replication and the accumulation of somatic-cell DNA mutations (13).There is also evidence for other important pathways such as the heatshock proteins and heat-shock factors that are highly conserved across species and play a role in prolongevity transcription pathways.Clinical and epidemiological investigations, including candidate gene studies, have suggested that inflammation pathways may affect life span and risk of age-related conditions such as cardiovascular disease (CVD) and its risk factors (14)(15)(16)(17)(18)(19).A combination of multiple genetic variants may be required for an individual to achieve exceptional longevity, which may account in part for its rarity." + ], + [ + "Genomic analysis of longevity offers the potential to illuminate the biology of human aging.Here, using genome-wide association meta-analysis of 606,059 parents' survival, we discover two regions associated with longevity (HLA-DQA1/DRB1 and LPA).We also validate previous suggestions that APOE, CHRNA3/5, CDKN2A/B, SH2B3 and FOXO3A influence longevity.Next we show that giving up smoking, educational attainment, openness to new experience and high-density lipoprotein (HDL) cholesterol levels are most positively genetically correlated with lifespan while susceptibility to coronary artery disease (CAD), cigarettes smoked per day, lung cancer, insulin resistance and body fat are most negatively correlated.We suggest that the effect of education on lifespan is principally mediated through smoking while the effect of obesity appears to act via CAD.Using instrumental variables, we suggest that an increase of one body mass index unit reduces lifespan by 7 months while 1 year of education adds 11 months to expected lifespan.Genomic analysis of longevity offers the potential to illuminate the biology of human aging.Here, using genome-wide association meta-analysis of 606,059 parents' survival, we discover two regions associated with longevity (HLA-DQA1/DRB1 and LPA).We also validate previous suggestions that APOE, CHRNA3/5, CDKN2A/B, SH2B3 and FOXO3A influence longevity.Next we show that giving up smoking, educational attainment, openness to new experience and high-density lipoprotein (HDL) cholesterol levels are most positively genetically correlated with lifespan while susceptibility to coronary artery disease (CAD), cigarettes smoked per day, lung cancer, insulin resistance and body fat are most negatively correlated.We suggest that the effect of education on lifespan is principally mediated through smoking while the effect of obesity appears to act via CAD.Using instrumental variables, we suggest that an increase of one body mass index unit reduces lifespan by 7 months while 1 year of education adds 11 months to expected lifespan.", + "Studies revealed from 300 to 750 genes related to longevity that are critically involved in a variety of life activities, such as growth and development, energy metabolism, oxidative stress, genomic stability maintenance, and neurocognition [4].These candidate genes include mainly APOE, a gene involved in lipoprotein metabolism [5,6].Others are those involved in cell cycle regulation, cell growth and signal transduction, the maintenance of genome stability, and the endocrine-related pathway [7][8][9].In addition, the candidates for longevity encompass genes related to drug metabolism, the ones involved in protein folding, stabilization, and degradation, as well those related to coagulation and regulation of circulation [10], etc.In most cases, these genes or their polymorphic sites were examined in multiple population replication studies, which discovered certain longevity-associated genes or pathways [4][5][6][7][8][9][10].", + "Additional association studies with these families and replication of these results with an independent data set should facilitate the positional cloning of a gene that influences the ability to age well and achieve exceptional longevity.Identification of the genes in humans that allow certain individuals to live to extreme old age should lead to insights on cellular pathways that are important to the aging process.", + "Before the advent of NGS technologies, several scientists were interested in the study of allele variants associated with aging, but they were limited by the lack of aging rate biomarkers.Now with NGS technologies, these biomarkers have been emerged such as the epigenetic clock that is described in the DNA methylation sequencing section of this chapter.In this post-genomic era, different strategies have been developed in order to understand the genetic factors involved in aging [17].One strategy used is the study of aging in extreme longevity groups of people, called centenarians.Centenarians are a group that can reach an age above 100 years and has an incidence of 1 every 10,000 people [18].In a pioneering study using extreme longevity people (308 individuals belonging to 137 sibships showing extreme longevity), genome-wide scan analysis identified a region on chromosome 4 associated with extreme longevity [19] that corresponds to the microsomal transfer protein (MTP) [20], which is associated with abetalipoproteinemia and hypobeta lipoproteinemia in humans [21,22].Another approach to study the genetic factors involved in longevity consists in assessing allele frequencies from people of different ages, looking for those polymorphisms (SNPs) with enhanced allele frequencies in high-longevity individuals.Those alleles with diminished frequencies in aged individuals may be associated with age-related diseases.Using this approximation, an SNP that shifts isoleucine to valine was identified in the PKA-anchoring protein (AKAP2) gene.This polymorphism is associated with reduced longevity and cardiac disease [23].Genome-wide association studies (GWAS) have confirmed only three loci that affect longevity: FOXO3A, APOE, and an intergenic locus on chromosome 5q33.3[24][25][26].", + "In conclusion, we performed a genome-wide association study of longevity-related phenotypes in individuals of European, East Asian and African American ancestry and identified the APOE and GPR78 loci to be associated with these phenotypes in our study.Moreover, our gene-level association analyses highlight a role for tissue-specific expression of genes at chromosome 5q13.3,12q13.2,17q21.31,and 19q13.32 in longevity.Genetic correlation analyses show that our longevity-related phenotypes are genetically correlated with several disease-related phenotypes, which in turn could help to identify phenotypes that could be used as potential biomarkers for longevity in future (genetic) studies.", + "The only two genes associated with human longevity that have been replicated in multiple populations are FOXO3A and APOE [11,12,15,26,28 -31].The effect sizes of these two genes for longevity are small with odds ratios of 1.26 and 1.45 for survival to age 100 in replicate studies for FOXO3A and APOE, respectively [10,29].These genes account for only a small portion of the genetic contribution to longevity measured through family heritability studies [4,5].Therefore, much of the heritability of lifespan remains to be explained.Gene associations with age-related traits found using longitudinal study data.", + "In most experimentally modified animal model systems, single-gene mutations in many different genes have major life extension effects (Fontana et al., 2010;Kenyon, 2010).However, natural human and animal longevity is presumed to be a complex trait (Finch & Tanzi, 1997).In humans, both candidate gene and genome-wide genetic association approaches have been applied in an attempt to identify longevity loci.The frequency of genetic variants has been typically compared between nonagenarian cases and young controls, revealing loci at which genetic variants may contribute to a higher or lower probability of survival into old age.The initial candidate gene studies aimed at finding human longevity genes were dominated by contradictory results (Christensen et al., 2006).The more consistent evidence obtained by repeated observation in independent cohort studies for association with longevity has so far only been observed for three loci, the apolipoprotein E (APOE) locus (Schachter et al., 1994;Christensen et al., 2006), the FOXO3A locus (Willcox et al., 2008;Flachsbart et al., 2009;Pawlikowska et al., 2009;Soerensen et al., 2010), and the AKT1 locus (Pawlikowska et al., 2009).Thus, despite the expectation that longevity would be influenced by many genetic variants with small effect sizes, the effect of variants has consistently been shown in only three genes.", + "The lack of success in the identification of genes related to aging in humans may be due to the complexity of the phenotype.One approach to investigate aging and longevity is to compare frequencies of genetic variants between nonagenarians or centenarians and the general population.This approach led to the discovery of an association between APOE (Deelen et al., 2011;Ewbank, 2007;Gerdes et al., 2000) and more recently FOXO3A (Anselmi et al., 2009;Flachsbart et al., 2009;Li et al., 2009a;Pawlikowska et al., 2009;Willcox et al., 2008) and human aging and longevity.However, a recent genome-wide association study (GWAS) of individuals reaching the age of 90 or older failed to identify genome-wide significant variants (Newman et al., 2010).Human longevity and healthy aging show moderate heritability (20%-50%).We conducted a meta-analysis of genome-wide association studies from 9 studies from the Cohorts for Heart and Aging Research in Genomic Epidemiology Consortium for 2 outcomes: (1) all-cause mortality, and (2) survival free of major disease or death.No single nucleotide polymorphism (SNP) was a genome-wide significant predictor of either outcome (p 5 10 8 ).We found 14 independent SNPs that predicted risk of death, and 8 SNPs that predicted event-free survival (p 10 5 ).These SNPs are in or near genes that are highly expressed in the brain (HECW2, HIP1, BIN2, GRIA1), genes involved in neural development and function (KCNQ4, LMO4, GRIA1, NETO1) and autophagy (ATG4C), and genes that are associated with risk of various diseases including cancer and Alzheimer's disease.In addition to considerable overlap between the traits, pathway and network analysis corroborated these findings.These findings indicate that variation in genes involved in neurological processes may be an important factor in regulating aging free of major disease and achieving longevity.", + "Thus, substantially more work is needed in this area to establish whether longevity is driven by nuclear genomic stability.Diverse and unexpected bits of evidence support a relationship.For example, a disproportionate number of genes identified in unbiased and targeted genome-wide association studies (GWASs) as associated with longevity are involved in genome maintenance (75).One study involved age of natural menopause in 70,000 women and led to the identification of 44 genetic variants associated with early or late menopause, a strong biomarker of healthy TIFs (telomere dysfunction-induced foci): co-localization of multiple DNA damage response factors and repair proteins on uncapped telomeric DNA aging (76).Approximately two-thirds of these are associated with genome maintenance genes.Seven of ten significantly associated pathways are involved in DNA repair.The highly significant overrepresentation of DNA repair pathways indicates an intimate connection between genome maintenance and aging phenotypes.From unrelated studies, we know that reduced expression of the repair endonuclease ERCC1-XPF causes accelerated aging (3), whereas ERCC1 is one of the top genes under positive selective pressure in the longest-lived mammalian species, the bowhead whale (77).Intriguingly, hepatocytes from old rats have impaired NER, whereas caloric restriction, which extends longevity, restored the NER capacity of old rats to that of youthful levels (42).In a human interventional study, brief caloric restriction increased NER capacity in PBMCs of individuals who had low NER prior to dietary intervention (78).Therefore, increased DNA repair capacity could promote longevity and may even prove amenable to improvement.", + "The antagonistic pleiotropy and hyperfunction theories of ageing predict the presence of genetic variants important for growth and development in early life with deleterious effects towards the end of the reproductive window 19,20 .While we are unable to directly capture the genetic effects on individuals before age 40 due to the study design of our datasets, we found that the life-extending variant near FOXO3 is associated with a delay in the age at menarche and a decrease in intracranial volume and cognitive abilities.It thus appears that there are loci exhibiting antagonistic effects, although we are unable to discern whether this is due to true pleiotropy or due to linkage of causal variants within a region Genes which showed a significant effect (FDR < 5%) of gene expression on ageing traits are displayed here.Gene names are annotated with the direction of effect, where + andindicate whether the life-extending association of the locus is linked with higher or lower gene expression, respectively.Locus: nearest gene to lead variant in the multivariate analysis, Chr: chromosome, Position: base-pair position of lead variant (GRCh37), Cis-genes: genes in physical proximity (<500 kb) to the lead variant of the locus which colocalise with the multivariate signal, Trans-genes: genes located more than 500 kb from the lead variant of the locus.", + "In addition to aging-and CR-related genes, another source of candidate genes and pathways for drug design are human longevity-associated genes (Barzilai and Shuldiner, 2001;Browner et al., 2004;Kenyon, 2010).Dozens of genes have now been associated with human longevity (de Magalha es et al., 2009a), although only a handful of genes have been shown to have consistent effects across populations.", + "Genes/loci identified by genome-wide association studies of longevity and lifespan traits.", + "The genetic basis of human longevity has so far been primarily investigated by association studies.Most results from these experiments have been difficult to confirm in independent samples, probably owing to the modest heritability, multifactorial nature, and heterogeneity of the phenotype (Christensen et al., 2006).To date, variation in only two genes has been identified, which has an effect on longevity in various populations: (i) the apolipoprotein E gene (APOE) (Scha chter et al., 1994;Christensen et al., 2006) and (ii) the forkhead box O3A (FOXO3A) gene in the insulin-IGF1 signaling (IIS) pathway (Willcox et al., 2008;Flachsbart et al., 2009).Given the apparent lack of susceptibility candidates, it is conceivable that other genetic factors influence the function or expression of genes relevant for human longevity.", + "The only two genes associated with human longevity that have been replicated in multiple populations are FOXO3A and APOE [11,12,15,26,28 -31].The effect sizes of these two genes for longevity are small with odds ratios of 1.26 and 1.45 for survival to age 100 in replicate studies for FOXO3A and APOE, respectively [10,29].These genes account for only a small portion of the genetic contribution to longevity measured through family heritability studies [4,5].Therefore, much of the heritability of lifespan remains to be explained.Gene associations with age-related traits found using longitudinal study data.", + "Candidate gene studies identified APOE and FOXO3A as human longevity genesThe first genetic longevity studies mainly focused on lifespan regulating loci that emerged from animal models [22].Lifespan Prospects & Overviews .... extension in animal models was obtained by applying caloric restriction or by modifying gene functions (mutagenesis) using RNA interference, knock-out or overexpression of single genes (GenAge; http://genomics.senescence.info/genes/)[23].The most interesting pathways identified using these models are the growth hormone (GH)/insulin/insulin-like growth factor 1 (IGF-1) signaling and mammalian target of rapamycin (mTOR) signaling pathways [24].Thus far, lifespan has been the main phenotype investigated in animal models.In order to make these models more translatable to human studies research should focus on defining the parameters that reflect the physiology and pathology of aging in both animals and humans [25,26].Most of the human candidate gene studies were performed in cross-sectional designs (Box 1 and Fig. 1), comparing allele frequencies of potential longevity loci between highly aged individuals and young controls.The candidate gene studies based on single genes have pointed a role for genes involved in, e.g., GH/insulin/IGF-1 signaling, immune regulation, and lipoprotein metabolism (Supporting Information Table S1), although most of these results have not (yet) been confirmed in sufficient independent studies.The most convincing human longevity loci today are APOE and FOXO3A which have frequently been associated with longevity in cross-sectional studies (see for a review [26]) and survival in prospective studies [27][28][29] (Fig. 3).APOE encodes the protein apolipoprotein E which seems to play a role in e.g., lipoprotein metabolism, cognitive function, and immune regulation [30].FOXO3A encodes the protein forkhead box O3 which acts as a transcription factor for many different genes involved in processes like apoptosis and oxidative stress [31]." + ], + [ + "Mutation Rate in the Liver of Mice with AgeThe mouse is a good biological tool that allows the analyses of different tissues with little limitation on the amount of biological materials available.Mice are economical compared to larger mammals, and there is a huge volume of literature on the physiology, behavior, and biochemistry of such rodents.Importantly, it is possible to modify the diet of mice or treat them with drugs to mimic specific diseases and/or to improve their health status.Finally, their genomics and genetics have been extensively studied to such a point that now there is a battery of transgenic and knockout mice which, to some extent, phenocopy important age-related diseases.Many mice with mutations in different DNA repair proteins are available.Importantly, at least four transgenic lines with the lacI and/or LacZ reporter genes have been intensively used to estimate the mutation frequency or rate in the genome of different tissues with age.One such transgenic line bears a lambda shuttle vector that carries a lacI target and an alpha lacZ reporter gene [9,10].Genomic DNA is isolated from the tissue under study, and the shuttle vector is recovered by exposing the DNA to lambda phage packaging extracts in vitro.Mutations in the lacI target gene that inactivate the repressor gene allow expression of the alpha lacZ reporter gene, resulting in blue mutant plaques.Sequencing of the DNA from these plaques not only allows the estimation of the mutation frequency, but it also points to the type of mutation providing insights into potential mechanisms [9,10].The lacI gene is highly sensitive to base substitution and frame shift mutations, as well as small deletions and insertions, making the transgene an ideal choice for recovery of spontaneous and induced mutations [11,12].The Big Blue mouse contains approximately 40 copies of the lambda shuttle vector stably integrated as a tandem array at a single position in chromosome 4 [12].The MutaMouse contains the sequence of a phage carrying the lacZ gene integrated in a head-to-tail arrangement of approximately 40 copies located at a single insertion site in chromosome 3 [13,14].The technical difference in identifying mutations in these two mouse systems is that the Big Blue mouse model is based on forward mutations in the lacI reporter sequence derepressing the lacZ gene thereby yielding blue plaques as mutants.Thousands of plaques need to be examined.The Muta mouse is based on forward mutations in the lacZ reporter gene that can be easily selected because only mutants will generate plaques.Finally, the lacZ transgenic mice lines 30 and 60 bear a plasmid carrying the lacZ gene.Line 60 was found to have two integration sites, which were mapped to chromosomes 3 and 4. The plasmid integration site of line 30 is on chromosome 11.Each integration site in both transgenic lines has about ten to twenty plasmids per haploid genome [15].Plasmids are rescued by excision with the restriction enzyme HindIII, followed by separation from mouse genomic DNA by the use of magnetic beads coated with the lacI repressor protein, which will bind the lacI sequence.The recuperated DNA is then self-ligated to obtain circular plasmids that are finally transferred into Escherichia coli C bacteria (harboring a deletion of its own lacZ gene) for sequence analyses [15,16].Mice of line 60 are appropriate transgenic animals for the study genome rearrangements in the aging liver [15,17], and chromosomal translocations and deletions up to 66 megabases have been observed in the tissues of such mice [17].Such chromosomal rearrangements cannot be detected using the phage-based reporter models (the MutaMouse and the Big Blue models).", + "The availability of deep genome sequence data, and unrivaledmulti-omic and phenomic data make the BXDs a powerful tool with which to evaluate the causallinkage between genome, epigenome, and aging rates. In our previous work, we used an enrichment-based sequencing to assay the methylome in a modestnumber of BXD mice and reported rapid age-dependent methylation changes in mice on high-fat diet(HFD) and mice with higher body weight (Sandoval-Sierra et al. , 2020).", + "Byusing bioinformatics tools, data from various studies will be clustered and analyzed to find therelationship between myelin and myelin-related genes and see if any patterns can be found thathint at a common molecular mechanism. METHODSData will be pooled from various studies of alcohol on different strains of mice. Included in thisstudy will be DBA/2J (D2) and C57BL/6J (B6), two inbred mouse strains that exhibit contrastingdrinking behaviors. Other data will come from studies on ISS (inbred short sleep) and ILS(inbred long sleep) mice.", + "The availability of deep genome sequence data, and unrivaledmulti-omic and phenomic data make the BXDs a powerful tool with which to evaluate the causallinkage between genome, epigenome, and aging rates. In our previous work, we used an enrichment-based sequencing to assay the methylome in a modestnumber of BXD mice and reported rapid age-dependent methylation changes in mice on high-fat diet(HFD) and mice with higher body weight (Sandoval-Sierra et al. , 2020).", + "Here, we have extended this analysis to search forgenotypes related to survival to the age of 800 days in apopulation of a reciprocal F2 cross between (B6) and (D2)mice. Since QTL for longevity in mice have shown strongsex specificity [10, 12], we conducted sex-specific analyses. In addition, we also determined whether there wereany change in pathology changes associated with the locithat showed frequency distortions with aging. To confirmthe associations of the loci of interest with longevity andpathology, we performed replication analyses on a panel ofBXD recombinant inbred strains.J Gerontol A Biol Sci Med Sci 57(1):B9B1513. Foreman JE, Lionikas A, Lang DH et al (2009) Genetic architecture for hole-board behaviors across substantial time intervalsin young, middle-aged and old mice. Genes Brain Behav8(7):714727. doi:10.1111/j.1601-183X.2009.00516.x14. Lang DH, Conroy DE, Lionikas A et al (2009) Bone, muscle, andphysical activity: structural equation modeling of relationshipsand genetic influence with age. J Bone Miner Res24(9):16081617. doi:10.1359/jbmr.09041815. Blizard DA, Lionikas A, Vandenbergh DJ et al (2009) Bloodpressure and heart rate QTL in mice of the B6/D2 lineage: sexdifferences and environmental influences. Physiol Genomics36(3):158166.", + "Assessing epigenetic age in long-lived miceThe epigenetic-aging model was applied to the methylation profiles of long-lived mice and the age-matched controls not used for training (Additional file 2: Datasets used summary).Reductions in age were calculated by subtracting the epigenetic ages of the untreated, wild-type mice from those of the treated mice of the same genetic background.To assess the significance, we used an ANOVA for all 22-month-old mice or only 22-month-old UM-HET3 mice.We also compared the epigenetic ages between treatments with their agematched controls from the same genetic background using a t-test (Additional file 4: Treatment vs wild type stats).", + "Experimental Goals and SignificanceThe experimental goal of the work performed in this dissertation was to identifyspecific gene(s) and molecular pathways underlying HSC aging in two commonly usedstrains of inbred mice using a forward genetic approach.In order to understand the geneticbasis for the variation of HSC numbers in old B6 and D2 mice, we used theaforementioned forward genetic approach and performed genetic linkage analysis inBXD RI strains of mice. Using this approach, we identified a locus on murinechromosome 2 that is significantly linked to the variation in frequency of HSCs in agedB6 and D2 animals.Thus, in vitro and invivo results confirmed the linkage analysis, and demonstrated that the influence exertedby the D2 allele caused a significant reduction in HSC frequency and activity with age(Geiger et al. , 2005). Aging phenotypes in mice are difficult to study, partly because test subjects mustage ~2 years before they can be used in an experiment.In mice, the effect of aging onstem cells is highly strain-specific, thus suggesting genetic regulation plays a role in HSCaging. In C57BL/6 (B6) mice, the HSC population steadily increases with age, whereas inDBA/2 (D2) mice, this population declines. Our lab has previously mapped aquantitative trait locus (QTL) to murine chromosome 2 that is associated with thevariation in frequency of HSCs between aged B6 and D2 mice. In these dissertationstudies, I first aim to characterize the congenic mouse model which was generated byintrogressing D2 alleles in the QTL onto a B6 background.", + "We further demonstrated the effectiveness of the combineduse of genome-wide association mapping with correlation analysesagainst existing phenotypic and expression data sets to identifycandidate genes that may be involved in the age-related decline inadult neurogenesis. 18-month-old mice (at least 2 males and 2 females for C57BL/6J,A/J, CBA/J, DBA/2J, 129S1/SvImJ, and 129X1/SvJ; females only for theBALB/cByJ, C3H/HeJ, and FVB/NJ) were examined and comparedamong the 9 strains.", + "Accessing data resources in the mousephenome database for genetic analysis of murine life span and health span. J.Gerontol. A Biol. Sci. Med. Sci. 71 (2), 170177. Brown, R.E. , Stanford, L., Schellinck, H.M., 2000. Developing standardized behavioraltests for knockout and mutant mice. ILAR J. 41 (3), 163174. Bubier, J.A. , Jay, J.J., Baker, C.L. , Bergeson, S.E. , Ohno, H., Metten, P., Crabbe, J.C.,Chesler, E.J. , 2014. Identication of a QTL in Mus musculus for alcohol preference,withdrawal, and Ap3m2 expression using integrative functional genomics and precision genetics. Genetics 197 (4), 13771393. Burn, C.C. , 2008.", + "Breeding schemes for mouse genome-wide association study populationsa | In the classic F2 generation cross, two parental strains are mated to generate F1 strains. The F1 strains are then either mated to each other (intercross) or to one of the parentalstrains (backcross; not shown) to generate F2 offspring. These offspring are then genotypedand phenotyped. b | Recombinant inbred strains are generated by sibling mating F2intercross animals until the resulting progeny, at least 20 generations later, is fully inbred. These inbred lines are maintained in breeding colonies and can be purchased fromcommercial vendors.", + "Our own work has taken a different tack: we have attempted to determine whether mutations with differential effects on aging may be present within the many available populations of laboratory-adopted inbred mice.The goal is not so much to clone these genes-if indeed they existbecause positional cloning strategies of this kind require many thousands of animals and would be extremely expensive using an assay, age at death, that is itself so costly.Instead, the goal has been to use gene mapping methods to test hypotheses about aging and to develop new animal models that will be useful for testing well-specified hypotheses about the molecular basis for age-dependent changes.In the absence of a validated battery of biomarkers of aging, we (like most others) have reluctantly decided to use mouse life span as a crude surrogate for aging itself, reasoning that genetic alleles that extend life span well beyond the median for the tested population may be operating via an influence on aging itself.Work conducted using recombinant inbred mouse stocks (Gelman et al., 1988;de Haan and Van Zant, 1999) has suggested that life-span differences between pairs of inbred mouse lines might reflect the influence of as few as 4-7 polymorphic loci, providing some basis for hope that some of these would have an effect large enough to be detected by a genome scan experiment involving 300-1,200 mice.", + "Future studies may involve examination of aging mice from the CxB sRI strains todetermine whether the differences in EP and marginal cell density persist throughout thelifespan. The relationship between the cochlear lateral wall metrics (strial thickness,marginal cell density, and spiral ligament thickness) may become more obvious in oldermice with more pronounced age-related strial pathology. Additionally, due in part to therelatively small set of CxB RI strains, the suggested QTLs must be confirmed usingadditional methods.", + "To identify genes and molecularpathways regulating memory capabilities during aging, here weperform a forward systems genetic analysis on an aged cohort ofstrains from the BXD GRP. 2. Methods2.1. AnimalsMale and female mice were group housed (2e5 per cage) andmaintained in colony housing (12-hour light/dark cycle) with adlibitum access to food and water.", + "If you are not sure if a list item applies to your research, read the appropriate section before selecting a response.Validation of the aging signature in mice\" subsection.A total of 110 male and virgin female C57BL/6JN mice were used.Mouse groups are summarized in ST9.In the aging cohort, 6 1 months old (mo), 10 3mo, 6 6mo, 6 9mo, 10 12mo, 6 15mo, 10 18mo, 10 21 mo, 5 24 mo, 6 27mo and 6 30mo were used.In the parabiosis cohort, 11 4mo and 18 19mo were used.", + "The DNA methylation results at these sites can be integrated into a multivariablemodel for epigenetic age predictions in B6 mice, which clearly correlate with the chronological age.10MethodsStatistical analysisMiceLaboratory C57BL/6J (B6), DBA/2J (D2) and BXD inbred micewere obtained from Janvier Labs (France). All mice were fed acidified water and food ad libitum, and housed under pathogen-freeconditions at the University of Kentucky, Division of LaboratoryAnimal Resource, the animal facility at CCHMC.", + "34. Gelman R, Watson A, Bronson R & Yunis E Murine chromosomal regions correlated withlongevity. Genetics 118, 693704 (1988). [PubMed: 3163317]35. Houtkooper RHet al.The metabolic footprint of aging in mice. Sci. Rep1, (2011). 36. Houtkooper RHet al.Mitonuclear protein imbalance as a conserved longevity mechanism. Nature497, 451457 (2013). [PubMed: 23698443]37. Williams EGet al.An Evolutionarily conserved role for the aryl hydrocarbon receptor in theregulation of movement. PLOS Genet. 10, e1004673 (2014). [PubMed: 25255223]38. Lang DHet al.Quantitative trait loci (QTL) analysis of longevity in C57BL/6J by DBA/2J (BXD)recombinant inbred mice. Aging Clin. Exp. Res. 22, 819 (2010)." + ] + ], + "task_id": [ + "42847DE50D50E6A9B26ED0B03CFD160E", + "AB589D2E046B211A7486A6C4BD4ECFB4", + "081B2DB92FD09DEBEF28ADBBDE7199D2", + "E3FFB15A9901BD8DB87B0F09D335BEA0", + "38797E46211127E5C7175E707D40325B" + ] +}
\ No newline at end of file diff --git a/gnqa/data/study1/datasets/human/dataset_domainexpert_aging_2.json b/gnqa/data/study1/datasets/human/dataset_domainexpert_aging_2.json new file mode 100644 index 0000000..21d6de3 --- /dev/null +++ b/gnqa/data/study1/datasets/human/dataset_domainexpert_aging_2.json @@ -0,0 +1,28 @@ +{ + "question": [ + "is there a specific genetic variation that can cause someone to live longer? please make your answer accessible to a non-expert" + ], + "answer": [ + "Yes, certain genetic variations have been associated with longer lifespans. For example, variations in the APOE, FOXO3A, and EXO1 genes have been linked to longevity. However, it's important to note that these genes don't guarantee a longer life, as longevity is influenced by a combination of genetic, environmental, and lifestyle factors." + ], + "contexts": [ + [ + "Genomic analysis of longevity offers the potential to illuminate the biology of human aging.Here, using genome-wide association meta-analysis of 606,059 parents' survival, we discover two regions associated with longevity (HLA-DQA1/DRB1 and LPA).We also validate previous suggestions that APOE, CHRNA3/5, CDKN2A/B, SH2B3 and FOXO3A influence longevity.Next we show that giving up smoking, educational attainment, openness to new experience and high-density lipoprotein (HDL) cholesterol levels are most positively genetically correlated with lifespan while susceptibility to coronary artery disease (CAD), cigarettes smoked per day, lung cancer, insulin resistance and body fat are most negatively correlated.We suggest that the effect of education on lifespan is principally mediated through smoking while the effect of obesity appears to act via CAD.Using instrumental variables, we suggest that an increase of one body mass index unit reduces lifespan by 7 months while 1 year of education adds 11 months to expected lifespan.Genomic analysis of longevity offers the potential to illuminate the biology of human aging.Here, using genome-wide association meta-analysis of 606,059 parents' survival, we discover two regions associated with longevity (HLA-DQA1/DRB1 and LPA).We also validate previous suggestions that APOE, CHRNA3/5, CDKN2A/B, SH2B3 and FOXO3A influence longevity.Next we show that giving up smoking, educational attainment, openness to new experience and high-density lipoprotein (HDL) cholesterol levels are most positively genetically correlated with lifespan while susceptibility to coronary artery disease (CAD), cigarettes smoked per day, lung cancer, insulin resistance and body fat are most negatively correlated.We suggest that the effect of education on lifespan is principally mediated through smoking while the effect of obesity appears to act via CAD.Using instrumental variables, we suggest that an increase of one body mass index unit reduces lifespan by 7 months while 1 year of education adds 11 months to expected lifespan.", + "DIFFERENT GENETIC MODELS OF EXCEPTIONAL LONGEVITYWe used the whole genome sequences of these two subjects to test different hypotheses about the genetics of exceptional longevity.These non-exclusive hypotheses and the results of the analyses are described in the sections that follow.We also used the genome sequences of these two subjects to test different genetic models of exceptional longevity.The insulin pathway, caloric restriction, and lipid metabolism significantly influence lifespan in other organisms including the mouse, fly, and worm (Christensen et al., 2006) and have provided natural candidates for the genetics of human exceptional longevity.Our analysis shows that while the man had several of the noted longevity variants in metabolic genes, particularly FOXO3A, the woman was homozygous only for one variant in HSP70 that is also common in the population.No additional novel coding SNPs in these putative genes were discovered, and the different genetic profiles of these candidate genes in the two supercentenarians suggest that not all of the genetic variants associated with exceptional longevity to-date are necessary to achieve such survival, and even if some of these variants may have a role in longevity there are likely many more yet to be discovered.This suggests that the metabolic hypothesis may be just one of the many paths to exceptional lifespan.It is also likely that environmental factors and possibly the genetic ancestry may influence the likelihood of an individual to live long ages directly or by interacting with the genetic background.The NECS has shown that the chance of male and female siblings of centenarians to live past 100 can be 8 and 17 times higher than the risk in the general population (Perls et al., 2002).Consistent with this observation, our data suggest that the genetic contribution increases with older and older ages as the limit of lifespan is approached (Sebastiani et al., 2012).The male supercentenarian included in this study had strong longevity in his family.Although we do not have information about the family history of the female supercentenarian, she has living offspring who are approaching their nineties in good health and are currently enrolled in the NECS.The heterogeneity of the results herein suggest that sequencing additional exceptionally old individuals of different genetic ancestry and possibly their family members will provide the critical information to understand roles of common and rare genetic determinants of exceptional longevity and healthspan.The nature and contribution of genetic variation to exceptional longevity remains unclear, particularly the role for undiscovered rare genetic variants with large effects and/or the presence of many common genetic variants with small effects (Bloss et al., 2010).Exceptional longevity is typically characterized by strong familiality (Perls et al., 2000(Perls et al., , 2002;;Atzmon et al., 2005;Schoenmaker et al., 2006) as well as a marked delay in disability (Terry et al., 2008) and, as human lifespan is approached at about age 110 years, many such individuals compress not only disability but also age-related diseases (Andersen et al., 2011).Studies of centenarians have provided strong evidence to support the hypothesis that a genetic contribution to human exceptional longevity is decisive, although only a small number of genetic variants with modest effects have been irrefutably linked to this phenotype (Schachter et al., 1994;Barzilai et al., 2003;Christensen et al., 2006;Wheeler and Kim, 2011).The technology of next generation sequencing provides a tool to generate data that may eventually provide an answer (Metzker, 2009).", + "Genetics of Interspecies Variation in Genome Instability and LongevityThe influence of genetics in longevity is most obvious when we consider the dramatic life span differences among species.Whereas a nematode worm can live no longer than approximately 30 days, a human can live to 100 years.It is generally assumed that such species-specific differences, which are far larger than the also-not-inconsiderable intraspecies variations in life span, reflect major", + "IntroductionWorldwide human populations have shown an increase in mean life expectancy in the past two centuries (Oeppen & Vaupel, 2002).This is mainly because of environmental factors such as improved hygiene, nutrition, and health care.The large variation in healthy lifespan among the elderly has prompted research into the determinants of aging and lifespan regulation.The genetic contribution to human lifespan variation was estimated at 25-30% in twin studies (Gudmundsson et al., 2000;Skytthe et al., 2003;Hjelmborg et al., 2006).The most prominent genetic influence is observed in families in which the capacity to attain a long lifespan clusters (Perls et al., 2000;Schoenmaker et al., 2006).Exceptional longevity can be reached with a low degree of age-related disability (Christensen et al., 2008;Terry et al., 2008), raising the question whether protective mechanisms against disease exist in long-lived subjects.In most experimentally modified animal model systems, single-gene mutations in many different genes have major life extension effects (Fontana et al., 2010;Kenyon, 2010).However, natural human and animal longevity is presumed to be a complex trait (Finch & Tanzi, 1997).In humans, both candidate gene and genome-wide genetic association approaches have been applied in an attempt to identify longevity loci.The frequency of genetic variants has been typically compared between nonagenarian cases and young controls, revealing loci at which genetic variants may contribute to a higher or lower probability of survival into old age.The initial candidate gene studies aimed at finding human longevity genes were dominated by contradictory results (Christensen et al., 2006).The more consistent evidence obtained by repeated observation in independent cohort studies for association with longevity has so far only been observed for three loci, the apolipoprotein E (APOE) locus (Schachter et al., 1994;Christensen et al., 2006), the FOXO3A locus (Willcox et al., 2008;Flachsbart et al., 2009;Pawlikowska et al., 2009;Soerensen et al., 2010), and the AKT1 locus (Pawlikowska et al., 2009).Thus, despite the expectation that longevity would be influenced by many genetic variants with small effect sizes, the effect of variants has consistently been shown in only three genes.", + "Clear evidence exists for heritability of human longevity, and much interest is focused on identifying genes associated with longer lives.To identify such longevity alleles, we performed the largest genomewide linkage scan thus far reported.Linkage analyses included 2118 nonagenarian Caucasian sibling pairs that have been enrolled in fifteen study centers of eleven European countries as part of the Genetics of Healthy Ageing (GEHA) project.In the joint linkage analyses we observed four regions that", + "Living to a late age without suffering any major health problems is a genetically influenced trait.To identify the genes contributing to this important phenotype, a 10 cM genome screen was performed in 95 pairs of male fraternal twins concordant for healthy aging.Individuals meeting these criteria were defined as those attaining the age of 70 free of cardiovascular disease (coronary surgery, diabetes, heart attack, and stroke) and prostate cancer.Six chromosomal regions were identified with logarithm of odds (LOD) scores greater than 1.2 ( p , .01).A region on chromosome 4 at marker D4S1564 produced a LOD score of 1.67; this was the same marker previously linked to extreme longevity segregating as an autosomal dominant trait in centenarian families.Our results provide independent evidence that a locus on the long arm of chromosome 4 is associated with better physical aging and/or longevity.Living to a late age without suffering any major health problems is a genetically influenced trait.To identify the genes contributing to this important phenotype, a 10 cM genome screen was performed in 95 pairs of male fraternal twins concordant for healthy aging.Individuals meeting these criteria were defined as those attaining the age of 70 free of cardiovascular disease (coronary surgery, diabetes, heart attack, and stroke) and prostate cancer.Six chromosomal regions were identified with logarithm of odds (LOD) scores greater than 1.2 ( p , .01).A region on chromosome 4 at marker D4S1564 produced a LOD score of 1.67; this was the same marker previously linked to extreme longevity segregating as an autosomal dominant trait in centenarian families.Our results provide independent evidence that a locus on the long arm of chromosome 4 is associated with better physical aging and/or longevity.", + "The DNA of over 500,000 people was read to reveal the specific 'genetic fingerprints' of each participant.Then, after asking each of the participants how long both of their parents had lived, Timmers et al. pinpointed 12 DNA regions that affect lifespan.Five of these regions were new and had not been linked to lifespan before.Across the twelve as a whole several were known to be involved in Alzheimer's disease, smoking-related cancer or heart disease.Looking at the entire genome, Timmers et al. could then predict a lifespan score for each individual, and when they sorted participants into ten groups based on these scores they found that top group lived five years longer than the bottom, on average.", + "The search for the genetic determinants of extreme human longevity has been challenged by the phenotype's rarity and its nonspecific definition by investigators.To address these issues, we established a consortium of four studies of extreme longevity that contributed 2,070 individuals who survived to the oldest one percentile of survival for the 1900 U.S. birth year cohort.We conducted various analyses to discover longevity-associated variants (LAV) and characterized those LAVs that differentiate survival to extreme age at death (eSAVs) from those LAVs that become more frequent in centenarians because of mortality selection (eg, survival to younger years).The analyses identified new rare variants in chromosomes 4 and 7 associated with extreme survival and with reduced risk for cardiovascular disease and Alzheimer's disease.The results confirm the importance of studying truly rare survival to discover those combinations of common and rare variants associated with extreme longevity and longer health span.The search for the genetic determinants of extreme human longevity has been challenged by the phenotype's rarity and its nonspecific definition by investigators.To address these issues, we established a consortium of four studies of extreme longevity that contributed 2,070 individuals who survived to the oldest one percentile of survival for the 1900 U.S. birth year cohort.We conducted various analyses to discover longevity-associated variants (LAV) and characterized those LAVs that differentiate survival to extreme age at death (eSAVs) from those LAVs that become more frequent in centenarians because of mortality selection (eg, survival to younger years).The analyses identified new rare variants in chromosomes 4 and 7 associated with extreme survival and with reduced risk for cardiovascular disease and Alzheimer's disease.The results confirm the importance of studying truly rare survival to discover those combinations of common and rare variants associated with extreme longevity and longer health span.", + "Longevity Genes-A Special CaseDemographers are fascinated by the possibility that one or more genes might determine the rate of decline in multiple organ systems.Several such genes have been identified in other species (Vaupel et al., 1998).These genes are sometimes called gerontogenes or longevity genes.The discovery of one or more genes that act as aging \"clocks\" in humans would be a major breakthrough for genetics.However, the mere existence of such genes would not have a major effect on demographic research.For example, a mutation in a longevity gene that was present in 0.1 percent of the population would still be rare (probably less than 1 percent) among centenarians. 19Such a genotype would not explain much about survival to the oldest ages.Therefore, in order to be important for demographic research, there would have to be common polymorphisms associated with large differences in survival.Vaupel has estimated that there could be hundreds of genotypes with frequencies of 5-10 percent that lower death rates by 5-10 percent (Vaupel, personal communication).", + "Here, we review advances in genomic analysis within and across species to help refine the genetic foundations of age-associated diseases and longevity.As such, independent evolutionary occurrences of this species-specific lifespan change can empower comparative approaches to refine the shared mechanisms associating with longevity phenotypes.These evolutionary-refined gene sets can then be leveraged to focus statistical analysis within human cases of extreme longevity to discover core mechanisms of regulation.", + "IntroductionHuman longevity is influenced by multiple genetic and environmental factors.Approximately 25-32% of the overall variation in adult lifespan is because of genetic variation that becomes particularly important for survival at advanced age (Hjelmborg et al., 2006).Epidemiological studies have revealed that long-lived individuals (LLI), that is, people surviving to the 95th percentile of the respective birth cohort-specific age distributions (Gudmundsson et al., 2000), frequently show a favorable ('healthy') course of the aging process, with the absence or a delayed onset of agerelated diseases (Hitt et al., 1999).Hence, the LLI offer the key to elucidate the molecular mechanisms underlying the 'healthy aging' phenotype (Perls, 2006).", + "IntroductionApproximately 25-30% of the variation in adult lifespan is attributable to genetic factors that become more important with increasing age and exert their strongest effects in nonagenarians and centenarians (Go gele et al., 2010;Hjelmborg et al., 2006).As yet, however, only a few genetic variants have been found consistently to influence longevity.The first to be discovered was the e4 allele of the apolipoprotein E (APOE) gene, a mortality factor that predisposes to both Alzheimer's and cardiovascular diseases (Corder et al., 1993; Panza et al., 2004).APOE e4 is the only variant with a reportedly large adverse effect upon survival at advanced age (Scha chter et al., 1994), and this association has been replicated in several populations (Christensen et al., 2006).Variation in the human forkhead box O3A gene (FOXO3A), in contrast, has been found to be associated with the ability to live long, an effect corroborated by studies in Japanese, German, Italian, US-American, Jewish, Chinese and Danish populations (Anselmi et al., 2009;Flachsbart et al., 2009;Li et al., 2009;Pawlikowska et al., 2009;Soerensen et al., 2010;Willcox et al., 2008).More recently, we have identified exonuclease 1 (EXO1) as a potential novel longevity gene (Nebel et al., 2009).All three genes were detected through candidate-gene approaches.", + "Unraveling the heritability of human longevity was one of the first problems faced by geneticists.Just over a century ago, Mary Beeton and Karl Pearson [1] described a resemblance among relatives for the duration of life.A short time later, Yule [2] and Fisher [3] proved that the correlation is to be expected if lifespan is influenced by what had recently been termed 'genes' [4].Indeed, a century of correlation studies have established that something on the order of 30-50% of the total variation in human life span is attributable to genetic variation [5].Despite the wealth of diversity, specific genes contributing to this variation have proven notoriously difficult to identify.Sample size and issues of shared environment limit family-based methods such as linkage analysis, where rough genomic positions of important genetic variants are identified by comparing a small number of exceptionally long-lived people in defined pedigrees." + ] + ], + "task_id": [ + "D53462CE61F52F7D31BB627998F4D75A" + ] +}
\ No newline at end of file diff --git a/gnqa/data/study1/datasets/human/dataset_domainexpert_diabetes_1.json b/gnqa/data/study1/datasets/human/dataset_domainexpert_diabetes_1.json new file mode 100644 index 0000000..f5a64b3 --- /dev/null +++ b/gnqa/data/study1/datasets/human/dataset_domainexpert_diabetes_1.json @@ -0,0 +1,106 @@ +{ + "question": [ + "How is gene expression in the liver affected by diabetes?", + "Is any of the genes SH2B3, IFIH1 or ERBB3 related to diabetes?", + "nutrition is a factor for diabetes. how can genomics be use to better understand nutritional factors of diabets", + "nutrition is a factor for diabetes. construct an abstract about how can genomics be use to better understand nutritional factors of diabets", + "Is the gene TCF7L2 involved in diabetes?" + ], + "answer": [ + "In diabetic conditions, the expression of several key gluconeogenic enzymes in the liver is increased despite higher than normal circulating levels of plasma insulin. Additionally, the activities of insulin-dependent enzymes such as glueokinase, glucose 6-phosphate dehydrogenase, citrate lyase, malic enzyme, and acetyl-CoA synthetase are increased. However, the gene Slc2a2 also shows changes in the liver. Some studies also show that liver cells in diabetic mice are hypertrophied and filled with fat droplets.", + "Yes, the gene IFIH1 has been identified as contributing to susceptibility to type 1 diabetes. However, the text does not mention any direct relation of SH2B3 or ERBB3 to diabetes.", + "Genomics can be used to better understand the nutritional factors of diabetes through the study of nutrient-gene interactions and how an individual's genetic makeup can affect nutrient metabolism and response to nutrient intake. This field, known as nutritional genomics, can help develop tailored diets that maximize the use of nutrients and other functional ingredients present in food, aiding in the prevention and delay of diabetes and its complications. It can also help identify gene variants that interact with specific nutrients, potentially influencing diabetes risk.", + "Genomics provides a comprehensive understanding of the genetic factors contributing to diabetes, a global pandemic. Nutritional genomics studies focus on the interaction between dietary patterns and genetic variations, which play a crucial role in the development and treatment of diabetes. This approach aids in the development of tailored diets, maximizing the use of nutrients and functional ingredients in food to prevent and delay diabetes and its complications. The integration of genomic data with advanced statistical and computational methods can facilitate a better understanding of gene-environment interactions in diabetes manifestation. Furthermore, the identification of novel genetic factors associated with diabetes through advanced genetic techniques can contribute to personalized diabetes management. Therefore, genomics holds significant potential in understanding the nutritional factors of diabetes.", + "Yes, the gene TCF7L2 is involved in diabetes. Studies have shown that variants of the TCF7L2 gene are associated with an increased risk of type 2 diabetes (T2D)." + ], + "contexts": [ + [ + "Studies have reported that SLC2A9 is expressed in both kidney and liver of human and mice and is upregulated in diabetes mice 25 .The SLC2A9 expression was found to be governed by p53 gene and is mediated by oxidative stress 26 .Oxidative stress play major and deterministic role in patho-physiology of T2DM and has been observed to be higher in T2DM patients than healthy controls 27 .The higher expression of SLC2A9 in diabetic condition may be governed by higher oxidative stress in diabetics.In a recent study, Hurba et al. observed that there is no significant difference in transport activity of coding rs16890979 (Val253Ile) variant containing protein and wild type protein in Xenopus oocyte expression system 28 .The higher activity of SLC2A9 in T2DM subjects compared to normoglycemics may be attributed to higher expression of total SLC2A9 protein in T2DM condition.", + "Multiple studies on the transcriptome level have been performed that emphasize the diversity of the disease and the complex pathophysiological interactions between different tissues, including fat, muscle, liver, pancreatic beta cells and brain [1].In several human studies, tissue biopsies from diabetic and normoglycaemic individuals have been profiled [12,13].In mouse studies differences in diet or mouse strains have been used to identify distinct expression profiles [14][15][16].Complementary ChIP-on-Chip studies reveal the associated gene regulatory network of important transcription factors (TFs) active in the rele-vant tissues [17,18].In the context of the onset of diabetes, several studies on the proteomic level have revealed differential expression of intracellular proteins as well as of secretory proteins in adipose tissue [19].Despite the availability of these large amounts of data, their common content as well as their specific differences, in particular in gene sets between human and rodent studies, has not yet been systematically evaluated.On the other side Slc2a2 is also changed in liver.Ptpn1 is expressed in all tissues showing only small fold-changes.Several genes from OMIM or KO-mice do not change at all on the expression level.This indicates that only the complete loss of the associated protein alters the system whereas the gene's expression is not altered in T2DM.For KO-mice we also see a strong tendency to genes only expressed in mice.", + "The activities of several key gluconeogenic enzymes are increased in both young and adult diabetes mice as compared with controls [4,7] in spite of the higher than normal circulating levels of plasma insulin.In contrast the activities of the insulin dependent enzymes such as glueokinase, glucose 6-phosphate dehydrogenase, citrate lyase, malic enzyme and acetyl-CoA synthetase are increased indicating a normal response to elevated concentrations of plasma insulin [7].As in the obese mouse, insulin resistance coupled with a disappearance of receptor sites has been a consistent finding in most tissues studied [26].", + "Regulation of GWAS diabetes genes by glucose in pancreatic isletsMany of the recently discovered type 2 diabetes genes have been suggested to affect the development and/or function of pancreatic islets [6].The function, growth and survival of -cells can be regulated acutely and chronically by glucose [34].Thus, we examined whether the new type 2 diabetes susceptibility genes are regulated by overnight incubation in low (5 mM) or high (25 mM) glucose (Figure 5).Most genes were significantly or tended to be downregulated under conditions of high glucose.Cdkal1, Cdkn2a (Arf, P = 0.07), Ide, Jazf1, Camk1d, and Tspan8 (P = 0.06) expression levels were decreased ~50-60%.Meanwhile, the expression of Cdkn2b, Hhex (P = 0.10), Cdc123, Adamts9 (P = 0.09), and Thada were reduced 30-40%.To ensure the islets incubated in high glucose did not have globally decreased expression, we examined the expression of Txnip, which has been shown to be highly upregulated by glucose [35] and found that its expression was still significantly elevated in the islets cultured in high glucose (Figure 5).Mouse islets consist of -cells and other cell types.Thus, the MIN6 -cell line was also examined.We found that all the genes were expressed in this cell line (not shown), although this does not preclude that they also are expressed in other cell types within the islet.Figure 5 Regulation of new diabetes genes by glucose levels in pancreatic islets.Data are shown as fold-change, (2 Ct ) 2 CtSE[87], relative to those observed in the islets incubated in low (5 mM) glucose.Each group is the average of three replicates, each of which was comprised of pooled islets from two mice. * P < 0.05, *** P < 0.001.It has been hypothesized that most of the new genetic variants affect -cell function, development or survival but not insulin sensitivity [6].Consistent with this, we found all of the genes except Adam30 and Cdkn2a were expressed in pancreatic islets.These genes were expressed, however in the transformed -cell line, MIN6.The expression of all the genes except Lgr5 decreased following incubation of the islets in high glucose concentrations.It can thus be hypothesized that these genes may normally play a beneficial role in islet function, and a reduction in the expression of these genes could contribute to glucotoxic -cell dysfunction or survival.However, we also found evidence that most of the genes could have potential roles in other metabolically-relevant tissues.Genes affecting insulin sensitivity may be expected to be expressed in peripheral insulin sensitive tissues, such as liver and adipose tissue, and be responsive to metabolic status.Consumption of a high fat diet was associated with a tendency for the expression of several of these genes to be decreased.Similarly, many of the genes were regulated by feeding and fasting.Only the two splice isoforms of Cdkn2a had no evidence of metabolic regulation in any of the other tissues examined.", + "A recent study assessed gene expression in different islet cell types including the insulin-producing b-cells (Dorrell et al, 2011).A comparison showed that 240 of our 254 genes are covered by the microarray used by these authors.In all, 170 of these genes have a positive presence call in b-cells.This indicates that the majority of the genes we detected as differentially methylated in T2D islets are expressed in non-diabetic b-cells to a sufficient amount to be reliably detected by microarrays, that is, these are genes actively transcribed in b-cells.", + "Figure3: Challenges with identifying gene expression alterations in type 2 diabetes.Gene expression measurements from RNA-seq data typically represent only a snapshot of tissues' or cell types' transcriptome at a given point in time.In recent comparative analyses of islet intact and single cell transcriptomes from T2D and ND individuals, relatively few genes are significantly altered despite the clear phenotypic differences between them.This may suggest that the mechanisms that precede islet failure and T2D pathogenesis are post-transcriptional and cannot be detected in conventional RNA-seq analyses.However, it is also possible that the putative paths of these genes' alterations over the course of islet physiological decline and T2D development are simply being missed.Genes that are important for islet function and resilience (e.g., Gene A) and those whose expression directly induces or is the consequence of islet failure (e.g., Gene C) may be detected in a comparative analysis between islets at healthy and decompensated states.However, response genes that are temporarily induced by islet stress (e.g., Gene B) would not be detected in this comparison.", + "Figure 2. Diabetes increases the variability of gene expression levels in other experimental paradigms. (A) Microarray data from gene expression profiling in placentas from normal compared to diabetic pregnancies (Salbaum and Kappen, unpublished data) were processed as shown in Figure1B: the coefficient of variation was determined for each gene probe, and a histogram was obtained after logarithmic transformation.The curve representing the diabetic placenta samples was shifted to higher values, similar to the results obtained in embryos from diabetic pregnancies. (B) Publicly available microarray data from diabetic versus normal human kidney (GEO record GSE1009) were treated in the same fashion as described for embryonic or placental gene expression data.Similar to our own datasets, the curve representing the coefficients of variation for the diabetic samples is shifted toward higher values, again implying that the variability of gene expression levels is higher in diabetic samples compared to control samples.Our analysis of various expression profiling data sets suggests that, in the respective paradigms (mouse embryo, mouse placenta, and human kidney), diabetes leads to an increase in the variability of gene expression, possibly by affecting the precision of gene regulation in general.Although this would be consistent with our model for maternal diabetes-elicited NTD etiology, it is important to note that the currently available gene-profiling surveys were never designed to capture variability of gene expression as an explicit experimental parameter.In fact, microarray experiments are typically structured to eliminate variability as a confounding element as much as possible, such as through the use of pooled samples.To directly measure the extent of variability of gene expression brought about by maternal diabetes, it would be necessary to conduct expression-profiling experiments with individual embryo samples, and with a higher number of samples for each side of the experimental paradigm.In this way, it would be possible to not only classify genes according to their change in expression, but also according to their change in variability of gene expression.Such experiments would define which genes exhibit increased variability in expression levels.According to our model, these would be candidate genes to trigger birth defect pathogenesis.Functional assays will then be required to test which genes of this ''highly variable'' group are able to interact with the ''susceptibility'' component-NTD genes with consistent change of expression in all exposed individuals.", + "All these studies show that gene expression, in pancreatic islets, is very sensitive to nutrients and bioactive compounds present in food.The altered expression of genes involved in cell nutrient sensing, insulin synthesis, cell cycle, survival/apoptosis and cell maintenance can impair cell function and at the end facilitates cell failure (Figure 2).Figure 2. Effects of nutrients on cell gene expression.Pancreatic cells are able to sense dietary nutrients and respond to them releasing insulin.Different nutrients and their metabolites affect transcription of genes very important for maintenance of cell function and integrity.Flavonoids upregulate the expression of genes involved in insulin synthesis, nutrient-induced insulin release and cell proliferation and downregulate genes implicated in cell apoptosis.Proteins positively regulate insulin synthesis, insulin release, cell proliferation and growth upregulating the expression of mTOR, calcineurin and Pdx1.Fats upregulate OXPHOS genes leading to the generation of metabolic coupling factors critical for insulin exocytosis.On the other hand, a chronic exposure of -cells to high levels of fats (mainly saturated fatty acids) induces excessive levels of ROS and pro-inflammatory cytokines, leading to an increased apoptosis.The upregulation of the expression of cytokine genes and genes involved in pro-inflammatory signaling pathways, together with the downregulation of genes implicated in the antioxidant defenses of cells, contribute to cell apoptosis.Moreover, chronic exposure to fats and their byproducts downregulate the expression of genes necessary for insulin synthesis, nutrient-induced insulin release, cell integrity, maintenance and survival (Pdx1 and MafA).Impairment of -cell function is a hallmark of pancreatic -cell failure and may lead to development of DM.", + "It is worth mentioning that in [132], a meta-analysis study was conducted, where a collection of gene expression datasets of pancreatic beta-cells, conditioned in an environment resembling T1D induced apoptosis, such as exposure to proinflammatory cytokines, in order to identify relevant and differentially expressed genes.The specific genes were then characterized according to their function and prior literature-based information to build temporal regulatory networks.Moreover, biological experiments were carried out revealing that inhibition of two of the most relevant genes (RIPK2 and ELF3), previously unknown in T1D literature, have a certain impact on apoptosis.", + "The known tissue specificity of gene expression regulation means that the most informative studies will measure transcript levels in the specific tissue(s) relevant to the disease.In the case of type 2 diabetes, characterization of physiological responses (e.g., stimulus-induced insulin secretion, insulin sensitivity) suggests most loci are associated with defects in pancreatic b-cell function (2,3,7).Therefore there is a real need to measure gene expression in human b-cells (or whole islets, as these have been shown to be a suitable proxy [8]).There have, however, been very few reports linking type 2 diabetesassociated variation with islet gene expression using the classical eQTL approach (9,10).", + "Young diabetic mice, at the stage whenthey still have an increased capacity to utilize glucose,had increased hepatic activities of glueokinase, citratelyase and acetyl-CoA synthetase (Table 3). However,glueose-6-phosphate dehydrogenaseactivity in the livers of micein early diabetic stages was notquite as great as in normal livers. This enzyme may be the most sensitive to the action of insulin of thefour enzymes mentioned since thelivers of some diabetic mice inthe group had glucose-6-phosphatedehydrogenase activity equal tothat from normal mice.Thus theoverall decrease in activity in liversfrom the group of 12 diabetic miceprobably includes data from a fewmice in the transitional stage whenthe ability to metabolize glucosewas rapidly declining. Activities of allfour enzymes in liver from older diabetic mice with blood sugar concentrations approaching 600 mg / 100 mlwere greatly reduced. Enzyme activities in adiposetissue showed the same generalpatterns as those in liver with the exception that glucose-6-phosphate dehydrogenase was clearly elevated inadipose tissue from the youngerdiabetic mice over that seen in adipose tissue from normal controls.Many of the liver cells of the diabetic mouse arehypertrophied and filled with fat droplets, especiallyin areas surrounding the hepatic veins (Fig. 5). Theincrease in glycogen content seen in Table 1 is notvisible histologically as PAS-positivc, diastase-digestible material, but a striking difference in glycogendistribution in livers from normals and from diabeticsis apparent. I n normal liver (Fig. 4), glycogen isdistributed fairly uniformly throughout, whereas int h a t from the diabetic (Fig.", + "To evaluate the effects of hyperglycemia or other metabolic consequences of DM per se on expression, we identified 12 genes altered in DM as compared with both nondiabetic groups but not as a function of family history (Table 4, which is published as supporting information on the PNAS web site).This included a 70-kDa heat-shock protein (HSP701A), which was decreased by 42% in DM and whose expression correlated inversely with fasting glucose for all subjects (r 0.77).Expression of a related HSP70 gene was previously found to be reduced in Caucasian diabetic subjects (20).Genes differentially expressed between control and diabetic subjects may reflect either the pathophysiology of insulin resistance (primary alterations) or secondary effects of hyperglycemia, hyperlipidemia, and other metabolic factors.To identify potentially primary expression changes associated with insulin resistance, we compared gene expression in FH (nondiabetic but insulin resistant) and FH controls.One hundred sixty-six genes were differentially expressed between FH and FH (P 0.05) (Table 3, which is published as supporting information on the PNAS web site); 55 were common to both [FH vs. DM] and [FH vs. FH] comparisons.No single gene remained differentially expressed after Benjamini-Hochberg multiple comparison testing.However, ontology classification analysis (17) revealed that 20S and 26S proteasome complexes were the top-ranked cellular component terms (Z 7.7 and 7.3); mitochondrion-linked genes were also overrepresented (Z 3.2).Cell structure (P 0.004), protein degradation (P 3.7 10 4 ), and energy generation (P 0.003) groups were represented to a greater extent than expected for random distribution; with multiple comparison testing, the protein degradation26S proteasome (P 1 10 5 ) group remained significant." + ], + [ + "Figure 8 Molecular changes in the islets of patients with T2D mirror the processes altered in NOD mice.mRNA expression in human pancreatic islets from healthy individuals (n = 105) and those diagnosed with T2D (n = 14) was assessed through RNA-seq analysis. (a) Relationship between GLIS3 and MANF expression in healthy individuals (Spearman correlation P value = 0.043), individuals with T2D (Spearman correlation P value = 0.075) and all individuals (Spearman correlation P value = 0.028). (b-e) Expression of XRCC4 (b), LIG4 (c), H2AFX (d) and CDKN1A (e) in healthy islets as compared to i slets from patients withT2D (P values shown after multiple-testing correction).The median and interquartile range (IQR; box) are shown, with error bars indicating 1.5 times the IQR.Individual values are shown if beyond 1.5 times the IQR. (f) Relationship between H2AFX and LIG4 expression in human islets (Spearman correlation P value = 5 10 9 ).", + "All the genes involved in these pathways, as well as the genes involved in b-cells development and turnover, may be considered candidate genes for T2DM with predominant insulin deficiency.", + "One method of searching for the cause of NIDDM is via the candidate gene approach.Possible candidates for NIDDM include genes involved in specifying pancreatic islet (3-cell phenotype and in directing fj-cell development and (3-cell responses of glucose-mediated insulin synthesis and secretion.The transcription factor islet-1 (Isl-1) has been shown to be a unique protein that binds to the mini-enhancer or Far-FLAT region (nucleotide -247 to -198) of the rat insulin I gene (7).Isl-1, a protein comprised of 349 residues (38 kD), is a member of the LIM/homeodomain family of proteins, named for the first three members described: lin-11, isl-1, and mec-3 (8,9).These proteins are comprised of three putative regulatory regions, two LIM domains (cysteine-rich motifs) in the amino terminus of the protein, a homeobox domain near the middle, and a glutamine-rich transcriptional activation domain at the carboxyl end (7,9).With the use of an antibody to Isl-1, expression was shown to be restricted to a subset of endocrine cells, including islets, neurons involved in autonomic and endocrine control, and selected other tissues in the adult rat (10)(11)(12).", + "ResultsImpairment or alteration of the insulin-signaling pathway is a commonly recognized feature of type 2 diabetes.It is therefore notable that the IS-HD gene set (Dataset S4) was not detected to be significantly transcriptionally altered by application of either hypergeometric enrichmentt test, DEA or GSEA.In particular, applying GSEA to the transcriptional profile dataset of diabetic and normal glucose-tolerant skeletal muscle described in Mootha et al. [10] did not identify a significant level of alteration in the IS-HD gene set (p 0.536), while DEA produced a comparably weak enrichment score (p 0.607).The failure to detect a significant transcriptional alteration in IS-HD may be explained by a number of factors.The enrichment results depended on the specific choice of the IS-HD gene set, and it is possible that an alternatively defined insulin-signaling gene set would be determined as significantly enriched.Additionally, expression changes in a few critical genes in IS-HD may be sufficient to substantially alter insulin signaling, and running DEA on the large IS-HD set may miss the contributions from these few genes.", + "35ABSTRACT 11A GENE EXPRESSION NETWORK MODEL OF TYPE 2 DIABETESESTABLISHES A RELATIONSHIP BETWEEN CELL CYCLEREGULATION IN ISLETS AND DIABETES SUSCEPTIBILITYMP Keller, YJ Choi, P Wang, DB Davis, ME Rabaglia, AT Oler, DS Stapleton,C Argmann, KL Schueler, S Edwards, HA Steinberg, EC Neto, R Klienhanz, STurner, MK Hellerstein, EE Schadt, BS Yandell, C Kendziorski, and AD AttieDepts.", + "Second, we performed an extensive manual curation according to a previously described b-cell-targeted annotation (Kutlu et al, 2003;Ortis et al, 2010).In partial agreement with the IPA, we found these genes to fall into three broad categories: (1) genes related to b-cell dysfunction and death, (2) genes potentially facilitating the adaptation of the pancreatic islets to the altered metabolic situation in T2D and (3) genes whose role in disease pathogenesis remains to be unearthed (Figure 6B).The adaptation-related gene category contains few metabolism-associated genes (e.g., HK1, FBP2; Figure 6B, right part, Figure 7) and many more genes involved in signal transduction or encoding hormones, growth factors (e.g., EGF, FGF1, IGF2/IGF2AS; Figure 7), or transcription factors involved in important regulatory networks (for instance, FOXA2/HNF3B, PAX4 and SOX6) (Figure 6B, right part, Figure 7).In the b-cell dysfunction and death category, there were hypomethylated genes related to DNA damage and oxidative stress (e.g., GSTP1, ALDH3B1; Figure 7), the endoplasmic reticulum (ER) stress response (NIBAN, PPP2R4, CHAC1), and apoptosis (CASP10, NR4A1, MADD; Figure 6B, left part, Figure 7).Some genes of interest from the highlighted categories are depicted in Figure 7. Their annotated functions provide possible explanations of how the epigenetic dysregulation of these genes in diabetic islets is connected to T2D pathogenesis.Numerous genes that were identified by our methylation profiling approach have been functionally implicated in insulin secretion.Examination of the available literature on the function of these genes revealed three aspects of insulin secretion with which they interfere: some of these genes influence the expression of the insulin gene, like MAPK1 and SOX6, or its post-translational maturation, like PPP2R4 (cf. Figure 7 and references therein).Others can deregulate the process of insulin secretion itself (SLC25A5, Ahuja et al, 2007;RALGDS, Ljubicic et al, 2009) or influence synthesis as well as secretion (vitronectin, Kaido et al, 2006).A third group of differentially methylated genes affects (i) signalling processes in the b-cell leading to insulin secretion or (ii) glucose homeostasis in b-cells, thereby modulating insulin response upon stimulation.GRB10 (Yamamoto et al, 2008), FBP2 and HK1 (Figure 7) are examples for these genes.Additional genes found in our study have been implicated in the b-cells' capability to secrete insulin, though the mechanisms have not yet been fully established.The putative functions of these genes indicate a potential epigenetic impact on insulin secretion at multiple levels, namely signalling, expression/synthesis and secretion.", + "In summary, we have associated mutations in the SLC29A3 gene with diabetes mellitus in humans and the insulin signaling pathway in Drosophila.The mechanistic basis of these findings remains to be determined.This is strong evidence supporting the investment of resources to further investigate the role of SLC29A3 and its orthologs in diabetes and glucose metabolism in model systems.DISCUSSIONWe have identified mutations in the equilibrative nucleoside transporter 3 protein that are associated with an inherited syndrome of insulin-dependent DM, and provide prima facie evidence that the Drosophila ortholog of this protein interacts with the insulin signaling pathway.This is the first evidence that mutations in the human SLC29A3 gene can be associated with a diabetic phenotype.", + "These observations taken together suggest that molecules involved in innate immunity could serve as candidate genes that determine the susceptibility of sensitive strains of mice to virusinduced diabetes.Interestingly, deficiency of the Tyk2 gene results in a reduced antiviral response 24 .In addition, the human TYK2 gene was mapped to the possible type 1 diabetes susceptibility locus 25 .", + "A recent sequencing study provides an example of detection of rare variants in type 1 diabetes.Targeted sequencing in a series of candidate coding regions resulted in IFIH1 being identified as the causal gene in a region associated with type 1 diabetes by GWA studies (58).IFIH1 encodes a cytoplasmic helicase that mediates induction of the interferon response to viral RNA.The discovery of IFIH1 as a contributor to susceptibility to type 1 diabetes has strengthened the hypothesis (70) about a mechanism of disease pathogenesis involving virusgenetic interplay and raised type 1 interferon levels as a cofactor in -cell destruction.Nonetheless, it should be recognized that a component of the missing heritability (familial aggregation) in type 1 diabetes could well be due to unrecognized intra-familial environmental factors.Disease pathogenesis.Contemporary models of pathogenesis of type 1 diabetes support the involvement of two primary dramatis personae: the immune system and the -cell.The known and newly identified genetic risk factors for type 1 diabetes present exciting opportunities to build on to the current cast of disease mechanisms and networks.Most of the listed genes of interest (Table 2) and those in extended regions are assumed to regulate immune function.Some of these genes, however, may also have roles in the -cell (insulin being the most obvious example).Another gene, PTPN2, encoding a protein tyrosine phosphatase, was identified as affecting the risk for type 1 diabetes as well as for Crohn disease (47,71).PTPN2 is expressed in immune cells, and its expression is highly regulated by cytokines.However, PTPN2 is expressed also in -cells, where it modulates interferon (IFN)- signal transduction and has been shown to regulate cytokineinduced apoptosis (72).Other candidate genes, such as NOS2A, IL1B, reactive oxygen species scavengers, and candidate genes, identified in large GWA studies of type 2 diabetes, have not been found to be significant contributors to the susceptibility of type 1 diabetes (73).", + "Differential Expression Analyses of Type 1 Diabetes Mellitus Associated GenesFor the aforementioned 171 'novel' genes, we used t-test to compare ribonucleic acid expression signals in PBMCs or monocytes between type 1 diabetes mellitus patients and healthy controls.We found that 37 genes, including 21 non-HLA genes (e.g.FAM46B, OLFML3 and HIPK1), were differentially expressed between type 1 diabetes mellitus patients and controls (Table 2).For the differential expression study, the significance level of P < 5.0E-02 was used.", + "In this study, we have correlated the function and genotype of human islets obtained from diabetic and nondiabetic (ND) donors.We have analyzed a panel of 14 gene variants robustly associated with T2D susceptibility identified by recent genetic association studies.We have identified four genetic variants that confer reduced b-cell exocytosis and six variants that interfere with insulin granule distribution.Based on these observations, we calculate a genetic risk score for islet dysfunction leading to T2D that involves decreased docking of insulin-containing secretory granules, impaired insulin exocytosis, and reduced insulin secretion.", + "At present, insulin [15], glucokinase [16], amylin [17], mitochondrial DNA [18], and several transcriptional factors [19][20][21][22] are recognized as diabetogenic genes in pancreatic b-cells.In the present study we used the candidate gene approach in the examination of genomic variation in the a 1D and Kir6.2 channel genes in type 2 diabetic patients.", + "In summary, we report AEIs that are consistent with type 2 diabetes-associated variation regulating the expression of cis-linked genes in human islets.For some of the genes where significant AEI was identified (e.g., SLC30A8, WFS1), there is strong evidence from human genetics that small changes in gene dosage may have significant consequences for the pancreatic b-cell.For other genes with significant AEI (e.g., ANPEP, HMG20A), their role is less well defined, and hence this study should provide a platform for further work examining the effects of carefully manipulating the expression of these genes in human islets.", + "Results.Pathway analysis of genes with differentially methylated promoters identified the top 3 enriched pathways as maturity onset diabetes of the young (MODY), type 2 diabetes, and Notch signaling.Several genes in these pathways are known to affect pancreatic development and insulin secretion.", + "The authors then used mouse liver and adipose expressiondata from several mouse crosses to construct causal expression networks for the ERBB3 andRPS26 orthologs in the mouse. They then showed that ERBB3 is not associated with anyknown Type I diabetes genes whereas RPS26 is associated a network of several genes thatare part of the KEGG Type I diabetes pathway (Schadt et al. 2008). This type of analysisdemonstrates the power of combining human and mouse data with a network basedapproach that has been proposed for use in drug discovery (Schadt et al.", + "In conclusion, GWAS studies focusing on the causes of T2D have implicated islet dysfunction as a major contributing factor (18,71).By examining isolated islets for stress responses and cross-referencing gene hits with genes associated with glucose-stimulated insulin release in human populations with T2D, we identified 7 genes that may play a role in promoting or preventing islet decline in T2D.By further examining stress-induced expression changes in each of these genes, we identified 5 genes that stood out: F13a1 as a novel stress-inhibited gene in islets, Klhl6 and Pamr1 as induced genes specific to ER stress, Ripk2 as a broadly stress-induced gene, and Steap4 as an exceptionally cytokine-sensitive gene.These genes provide promising leads in elucidating islet stress responses and islet dysfunction during the development of T2D.Genome-wide association studies in human type 2 diabetes (T2D) have renewed interest in the pancreatic islet as a contributor to T2D risk.Chronic low-grade inflammation resulting from obesity is a risk factor for T2D and a possible trigger of -cell failure.In this study, microarray data were collected from mouse islets after overnight treatment with cytokines at concentrations consistent with the chronic low-grade inflammation in T2D.Genes with a cytokine-induced change of 2-fold were then examined for associations between single nucleotide polymorphisms and the acute insulin response to glucose (AIRg) using data from the Genetics Underlying Diabetes in Hispanics (GUARDIAN) Consortium.Significant evidence of association was found between AIRg and single nucleotide polymorphisms in Arap3 (5q31.3),F13a1 (6p25.3),Klhl6 (3q27.1),Nid1 (1q42.3),Pamr1 (11p13), Ripk2 (8q21.3),and Steap4 (7q21.12).To assess the potential relevance to islet function, mouse islets were exposed to conditions modeling low-grade inflammation, mitochondrial stress, endoplasmic reticulum (ER) stress, glucotoxicity, and lipotoxicity.RT-PCR revealed that one or more forms of stress significantly altered expression levels of all genes except Arap3.Thapsigargininduced ER stress up-regulated both Pamr1 and Klhl6.Three genes confirmed microarray predictions of significant cytokine sensitivity: F13a1 was down-regulated 3.3-fold by cytokines, Ripk2 was up-regulated 1.5-to 3-fold by all stressors, and Steap4 was profoundly cytokine sensitive (167-fold up-regulation).Three genes were thus closely associated with low-grade inflammation in murine islets and also with a marker for islet function (AIRg) in a diabetes-prone human population.This islet-targeted genome-wide association scan identified several previously unrecognized candidate genes related to islet dysfunction during the development of T2D.Genome-wide association studies in human type 2 diabetes (T2D) have renewed interest in the pancreatic islet as a contributor to T2D risk.Chronic low-grade inflammation resulting from obesity is a risk factor for T2D and a possible trigger of -cell failure.In this study, microarray data were collected from mouse islets after overnight treatment with cytokines at concentrations consistent with the chronic low-grade inflammation in T2D.Genes with a cytokine-induced change of 2-fold were then examined for associations between single nucleotide polymorphisms and the acute insulin response to glucose (AIRg) using data from the Genetics Underlying Diabetes in Hispanics (GUARDIAN) Consortium.Significant evidence of association was found between AIRg and single nucleotide polymorphisms in Arap3 (5q31.3),F13a1 (6p25.3),Klhl6 (3q27.1),Nid1 (1q42.3),Pamr1 (11p13), Ripk2 (8q21.3),and Steap4 (7q21.12).To assess the potential relevance to islet function, mouse islets were exposed to conditions modeling low-grade inflammation, mitochondrial stress, endoplasmic reticulum (ER) stress, glucotoxicity, and lipotoxicity.RT-PCR revealed that one or more forms of stress significantly altered expression levels of all genes except Arap3.Thapsigargininduced ER stress up-regulated both Pamr1 and Klhl6.Three genes confirmed microarray predictions of significant cytokine sensitivity: F13a1 was down-regulated 3.3-fold by cytokines, Ripk2 was up-regulated 1.5-to 3-fold by all stressors, and Steap4 was profoundly cytokine sensitive (167-fold up-regulation).Three genes were thus closely associated with low-grade inflammation in murine islets and also with a marker for islet function (AIRg) in a diabetes-prone human population.This islet-targeted genome-wide association scan identified several previously unrecognized candidate genes related to islet dysfunction during the development of T2D.", + "Finally, several of the linking nodes introduced into this islet network through their PPI connections represent interesting candidates for a role in T2D pathogenesis, and there are several examples where external data provides validation of those assignments.An interesting example involves the gene GINS4 which maps at the ANK1 locus.Though this gene generated a low PCS [0.03] and was not included in the set of seed genes for this locus, GINS4 knock-down has an impact in a human beta-cell line [14].In addition, cyclin-dependent kinase 2 (CDK2) has been shown to influence beta-cell mass in a compensatory mechanism related to age-and diet-induced stress, connecting beta-cell dysfunction and progressive beta-cell mass deterioration [54].YHWAG is a member of the 14-3-3 family, known to be signalling hubs for beta-cell survival [55], and disruption of SMAD4 drives islet hypertrophy [56]." + ], + [ + "Researchers are expanding our understanding of genetic risk factors for diabetes through ongoing discoveries.Genetic variants associated with increased susceptibility to type 2 diabetes, a disease that affects more than 200 million people worldwide, have been identified (NHGRI & NIDDK, 2007).Such discoveries accelerate efforts to understand genetic contributions to chronic illness, as well as facilitate greater investigation of how these genetic factors interact with each other and with lifestyle factors.Ultimately, once the association of these variants with diabetes are confirmed, genetic tests may be utilized to identify (even before escalating blood sugars) those individuals, like Vanessa, who may be able to delay or prevent diabetes with healthy lifestyle decisions and behaviors.Information to assist nurses in this challenge is available in a toolkit \"Your Game Plan for Preventing Type 2 Diabetes\" (Your Game Plan, n.d.).Would you have known whether or not genetic testing was available for Vanessa?If you had said no to this question but could have explained the progress currently being made in understanding diabetes, Vanessa would have had access to the best care possible today.", + "Genomics has contributed to a better understanding of many disorders including diabetes.The following article looks at the ethical, social and legal consequences of genomic medicine and predictive genetic testing for diabetes.This is currently a field in its nascent stage and developing rapidly all over the world.The various ethical facets of genomic medicine in diabetes like its effects on patient physician relationship, risk communication, genetic counseling and familial factors are explored and elucidated from a clinical, ethical, social and legal perspective.Genomics has contributed to a better understanding of many disorders including diabetes.The following article looks at the ethical, social and legal consequences of genomic medicine and predictive genetic testing for diabetes.This is currently a field in its nascent stage and developing rapidly all over the world.The various ethical facets of genomic medicine in diabetes like its effects on patient physician relationship, risk communication, genetic counseling and familial factors are explored and elucidated from a clinical, ethical, social and legal perspective.", + "In conclusion, genome-wide studies have added valuable scientific data to our repertoire of diabetes knowledge.However, there have been few genomic nuggets that enable a more robust prediction of diabetes than is achieved by using common environmental risk factors and none that clarify the peculiar ethnic proclivities of type 2 diabetes.The latter realization ought to temper enthusiasm for the indiscriminate use of genetic testing for diabetes.", + "Genomics of T2DDiet, lifestyle, environment, and even genetic variation influence an individual's response to disease therapy.Like GWAS which identify genetic variants conferring risk for a disease, studies have been carried out for identifying genetic variants responsible for patient differences in drug response.Pharmacogenomics in diabetes focuses on the study of gene polymorphisms which influence an individual's response to antidiabetic drugs.Such genetic variants influence the pharmacodynamics and/or pharmacokinetics of the drug, thus affecting its efficacy or toxicity in an individual.The difference in response to treatments and therapies across individuals on account of these factors strengthens the case for personalized medicine in diabetes.", + "It is possible that there are genes that because of their known metabolic involvement are likely to interact with specific nutrients.For example, SLC30A8 which encodes a zinc transporter localized in secretory granules, interacted with dietary zinc to effect fasting insulin levels [132].However, the majority of GWAS variants have not shown interaction with environmental factors for effect on diabetes or related traits.Therefore, it is likely that prospective future studies will utilize improved assessment methods to increase power and avoid false interpretation [133,134].This could be enhanced by prioritizing variants that are most likely to have effects [135] or selective sampling according to extremes of the environmental factor could reduce the requirement for sample size [136].These and other strategies such as meta-analysis, nested case control and genotype-based studies have been recently reviewed [123,133] and the difficulties in measuring environmental exposures have been emphasized, including the application of analyses based on logistic regression [124] and problems with instruments such as physical activity questionnaires [137].Validated food frequency questionnaires are popular instruments for evaluation diabetes risk and are often used in conjunction with food analysis software [138,139].Similar methodology has been adapted to assess two predominant food consumption patterns by Prudent and Western [140], and demonstrated synergistic interaction with genotype and a less healthy Western dietary pattern in determining male risk for T2D by showing that the gene-diet interaction was higher in men with a high genetic risk score determined by a gene counting method [141].Also the effects of diet may predominate at specific developmental periods [142] suggesting that age and associated physiological changes are important as well as differences between genders.It has also been observed that homogeneity of an environmental factor such as physical activity in an Asian Indian study, may reduce ability to detect interaction, but could be solved by subgrouping by the level of activity [143], but increased recruitment would be needed to maintain power.", + "To date, studies of diabetes have played a major role in shaping thinking about the genetic analysis of complex diseases.Based on trends in genomic information and technology, combined with the growing public health importance of diabetes, diabetes will likely continue to be an important arena in which methods will be pioneered and lessons learned.It is with great enthusiasm that we look forward to this effort, and with avid curiosity we await to see whether the lessons of today will be supported by the data of tomorrow.", + "In recent years tremendous changes had occurred in the field of molecular genetics and personalized medicine especially on exploring novel genetic factors associated with complex diseases like T2D with the advancement of new and improved genetic techniques including the next generation sequencing (NGS).In this review, we summarize recent developments from studies on the genetic factors associated with the development of T2D in the Arab world published between 2015 and 2018, which were based on the latest available genetic technologies.Few such studies have been conducted in this region of the world.Therefore, our study will provide valuable contributions to advanced genetic research and a personalized approach to diabetes management.", + "Diabetes mellitus (DM) is considered a global pandemic, and the incidence of DM continues to grow worldwide.Nutrients and dietary patterns are central issues in the prevention, development and treatment of this disease.The pathogenesis of DM is not completely understood, but nutrient-gene interactions at different levels, genetic predisposition and dietary factors appear to be involved.Nutritional genomics studies generally focus on dietary patterns according to genetic variations, the role of gene-nutrient interactions, genediet-phenotype interactions and epigenetic modifications caused by nutrients; these studies will facilitate an understanding of the early molecular events that occur in DM and will contribute to the identification of better biomarkers and diagnostics tools.In particular, this approach will help to develop tailored diets that maximize the use of nutrients and other functional ingredients present in food, which will aid in the prevention and delay of DM and its complications.This review discusses the current state of nutrigenetics, nutrigenomics and epigenomics research on DM.Here, we provide an overview of the role of gene variants and nutrient interactions, the importance of nutrients and dietary patterns on gene expression,Diabetes mellitus (DM) is considered a global pandemic, and the incidence of DM continues to grow worldwide.Nutrients and dietary patterns are central issues in the prevention, development and treatment of this disease.The pathogenesis of DM is not completely understood, but nutrient-gene interactions at different levels, genetic predisposition and dietary factors appear to be involved.Nutritional genomics studies generally focus on dietary patterns according to genetic variations, the role of gene-nutrient interactions, genediet-phenotype interactions and epigenetic modifications caused by nutrients; these studies will facilitate an understanding of the early molecular events that occur in DM and will contribute to the identification of better biomarkers and diagnostics tools.In particular, this approach will help to develop tailored diets that maximize the use of nutrients and other functional ingredients present in food, which will aid in the prevention and delay of DM and its complications.This review discusses the current state of nutrigenetics, nutrigenomics and epigenomics research on DM.Here, we provide an overview of the role of gene variants and nutrient interactions, the importance of nutrients and dietary patterns on gene expression,It is important to promote greater research in this field because these findings will provide a framework for the development of genotype-dependent food health promotion strategies and the design of dietetic approaches for the prevention and management of DM.This knowledge has begun to provide evidence where specific targeted nutritional advice, such as following a Mediterranean Diet, helps to decrease cardiovascular risk factors and stroke incidence in people with polymorphisms strongly associated with T2DM [8].The aim of the present review was to provide insights regarding the role of nutrient-gene interactions in DM pathogenesis, prevention and treatment.In addition, we explored how an individual's genetic makeup can affect nutrient metabolism and the response to nutrient intake, potentially leading to DM.Nutrient-or dietary pattern-gene interactions in the development of DM.", + "A new generation of genetic studies of diabetes is underway.Following from initial genome-wide association (GWA) studies, more recent approaches have used genotyping arrays of more densely spaced markers, imputation of ungenotyped variants based on improved reference haplotype panels, and sequencing of protein-coding exomes and whole genomes.Experimental and statistical advances make possible the identification of novel variants and loci contributing to trait variation and disease risk.Integration of sequence variants with functional analysis is critical to interpreting the consequences of identified variants.We briefly review these methods and technologies and describe how they will continue to expand our understanding of the genetic risk factors and underlying biology of diabetes.A new generation of genetic studies of diabetes is underway.Following from initial genome-wide association (GWA) studies, more recent approaches have used genotyping arrays of more densely spaced markers, imputation of ungenotyped variants based on improved reference haplotype panels, and sequencing of protein-coding exomes and whole genomes.Experimental and statistical advances make possible the identification of novel variants and loci contributing to trait variation and disease risk.Integration of sequence variants with functional analysis is critical to interpreting the consequences of identified variants.We briefly review these methods and technologies and describe how they will continue to expand our understanding of the genetic risk factors and underlying biology of diabetes.", + "In a nutshell, genomic and post-genomic approaches identified a large number of biomarkers to ponder over and explore further but we are yet to identify universally accepted biomarker which can be used for the successful management and prevention of type 2 diabetes.In order to understand environment related modifications of genetic susceptibility, it may be prudent to conduct studies with integrated genomic-metabolomic approach.It is also imperative to gather existing molecular genetic data and curate it into uniform format and analyze the same for understanding the present status of research.A few attempts were, however, made to develop type 2 diabetes informative databases.While the databases T2DGADB and T2D-DB are only a collection of publications related to type 2 diabetes genetic association studies, proteinprotein interactions and expression studies, T2D@ZJU is a comprehensive collection of pathway databases, protein-protein interaction databases, and literature (Yang et al. 2013).Further, T2D@ZJU is a user-friendly interface database that provides graphical output of information organized in networks.These attempts may provide basis for studying type 2 diabetes utilizing systems biology, which is a better approach for understanding complex genetic diseases.In this review, we briefly outlined salient features of pathophysiology and results of the genetic association studies hitherto conducted on type 2 diabetes.Primarily focusing on the current status of genomic research, we briefly discussed the limited progress made during the post-genomic era and tried to identify the limitations of the post-genomic research strategies.We suggested reanalysis of the existing genomic data through advanced statistical and computational methods and recommended integrated genomics-metabolomics approaches for future studies to facilitate understanding of the gene-environment interactions in the manifestation of the disease.We also propose a framework for research that may be apt for determining the effects of urbanization and changing lifestyles in the manifestation of complex genetic disorders like type 2 diabetes in the Indian populations and offset the confounding effects of both genetic and environmental factors in the natural way.In this review, we briefly outlined salient features of pathophysiology and results of the genetic association studies hitherto conducted on type 2 diabetes.Primarily focusing on the current status of genomic research, we briefly discussed the limited progress made during the post-genomic era and tried to identify the limitations of the post-genomic research strategies.We suggested reanalysis of the existing genomic data through advanced statistical and computational methods and recommended integrated genomics-metabolomics approaches for future studies to facilitate understanding of the gene-environment interactions in the manifestation of the disease.We also propose a framework for research that may be apt for determining the effects of urbanization and changing lifestyles in the manifestation of complex genetic disorders like type 2 diabetes in the Indian populations and offset the confounding effects of both genetic and environmental factors in the natural way.", + "The public health genomics approach to type 2 diabetes.So, while exciting gene discoveries are being made, what can we do?The answer may lie in the relatively new field of public health genomics, \"a multidisciplinary field concerned with the effective and responsible translation of genome-based knowledge and technologies to improve population health\" (12).Researchers, policymakers, and practitioners in public health genomics use populationbased data on genetic variation and gene-environment interactions to develop, implement, and evaluate evidencebased tools for improving health and preventing disease.They also apply systematic evidence-based knowledge synthesis and appraisal of the clinical validity and utility of genomic applications in health practice.Validated genomic information is then integrated into disease control and prevention programs (13).", + "Genomics for Type 2 DiabetesMany research studies have been carried out on genetic-based illness prediction.Incorporating machine learning approaches with genetic-based illness prediction could result in an accurate outcome.This has intensified the role of Artificial Intelligence (AI) in healthcare.It has been estimated that approximately $36 billion will be invested in AI by 2025 [48].Deep genomics through machine learning approaches has outperformed accuracy in predicting and diagnosing illnesses such as cancer with minimal inclusion of radiologists.It is desired to have sufficient biological knowledge to understand how genetics can help us predict various conditions and analyze each chromosome to identify the disease-causing gene.Pre-existing research studies have focused on genomics and gene interaction patterns of various persistent illnesses such as Alzheimer's, multiple cancers, and Parkinson's." + ], + [ + "Researchers are expanding our understanding of genetic risk factors for diabetes through ongoing discoveries.Genetic variants associated with increased susceptibility to type 2 diabetes, a disease that affects more than 200 million people worldwide, have been identified (NHGRI & NIDDK, 2007).Such discoveries accelerate efforts to understand genetic contributions to chronic illness, as well as facilitate greater investigation of how these genetic factors interact with each other and with lifestyle factors.Ultimately, once the association of these variants with diabetes are confirmed, genetic tests may be utilized to identify (even before escalating blood sugars) those individuals, like Vanessa, who may be able to delay or prevent diabetes with healthy lifestyle decisions and behaviors.Information to assist nurses in this challenge is available in a toolkit \"Your Game Plan for Preventing Type 2 Diabetes\" (Your Game Plan, n.d.).Would you have known whether or not genetic testing was available for Vanessa?If you had said no to this question but could have explained the progress currently being made in understanding diabetes, Vanessa would have had access to the best care possible today.", + "enetic factors for many decades have been known to play a critical role in the etiology of diabetes, but it has been only recently that the specific genes have been identified.The identification of the underlying molecular genetics opens the possibility for understanding the genetic architecture of clinically defined categories of diabetes, new biological insights, new clinical insights, and new clinical applications.This article examines the new insights that have arisen from defining the etiological genes in monogenic diabetes and the predisposing polymorphisms in type 2 diabetes.", + "Genomics has contributed to a better understanding of many disorders including diabetes.The following article looks at the ethical, social and legal consequences of genomic medicine and predictive genetic testing for diabetes.This is currently a field in its nascent stage and developing rapidly all over the world.The various ethical facets of genomic medicine in diabetes like its effects on patient physician relationship, risk communication, genetic counseling and familial factors are explored and elucidated from a clinical, ethical, social and legal perspective.Genomics has contributed to a better understanding of many disorders including diabetes.The following article looks at the ethical, social and legal consequences of genomic medicine and predictive genetic testing for diabetes.This is currently a field in its nascent stage and developing rapidly all over the world.The various ethical facets of genomic medicine in diabetes like its effects on patient physician relationship, risk communication, genetic counseling and familial factors are explored and elucidated from a clinical, ethical, social and legal perspective.", + "Diabetes is a genetically complex multifactorial disease that requires sophisticated consideration of multigenic and phenotypic influences.As well as standard nonpara- metric methods, we used novel approaches to evaluate and identify locus heterogeneity.It has also proved productive to consider phenotypes such as age at type 2 diabetes onset and obesity, which may define a more homogeneous subgroup of families.A genome-wide scan of 247 African-American families has identified a locus on chromosome 6q and a region of 7p that apparently interacts with early-onset type 2 diabetes and low BMI, as target regions in the search for African-American type 2 diabetes susceptibility genes.", + "In conclusion, genome-wide studies have added valuable scientific data to our repertoire of diabetes knowledge.However, there have been few genomic nuggets that enable a more robust prediction of diabetes than is achieved by using common environmental risk factors and none that clarify the peculiar ethnic proclivities of type 2 diabetes.The latter realization ought to temper enthusiasm for the indiscriminate use of genetic testing for diabetes.", + "To date, studies of diabetes have played a major role in shaping thinking about the genetic analysis of complex diseases.Based on trends in genomic information and technology, combined with the growing public health importance of diabetes, diabetes will likely continue to be an important arena in which methods will be pioneered and lessons learned.It is with great enthusiasm that we look forward to this effort, and with avid curiosity we await to see whether the lessons of today will be supported by the data of tomorrow.", + "In recent years tremendous changes had occurred in the field of molecular genetics and personalized medicine especially on exploring novel genetic factors associated with complex diseases like T2D with the advancement of new and improved genetic techniques including the next generation sequencing (NGS).In this review, we summarize recent developments from studies on the genetic factors associated with the development of T2D in the Arab world published between 2015 and 2018, which were based on the latest available genetic technologies.Few such studies have been conducted in this region of the world.Therefore, our study will provide valuable contributions to advanced genetic research and a personalized approach to diabetes management.", + "Nonetheless, \"evidence\" for the genetics of diabetes risk is mounting, often at the expense of understanding the social context and determinants of the disease.Biogenetic views tend to trump sociological views in the diabetes research imaginary of consortium members.However, the genetic epidemiologists who make up part of the diabetes consortium are not ignorant of the effects of proper diet and adequate exercise. \"Take away the television and the automobile and diabetes would all but disappear,\" quipped the head of one lab.Neither are researchers unsympathetic to those who suffer from social inequality in the United States.Their career and intellectual interests lie in genetic explanations of diabetes, which, as I aim to show in this discussion, involves folding political and economic social relationships into biomedical discourse.In fact, the case of diabetes genetic epidemiology illustrates how, in spite of the sympathies of diabetes scientists, arrangements of racial inequality in the United States find their way into diabetes research publications and drug company promotional campaigns.To illustrate this phenomenon further, I present two tales from the field, one dealing with the naming of a publication article, the other with the marketing of a diabetes drug.", + "DiscussionOur study provides insight into the relative importance of clinical risk factors and those that are related to a panel of DNA variants associated with type 2 diabetes.Obesity was a strong risk factor for future diabetes, a risk that almost doubled in subjects with a family history of diabetes.However, the addition of data from genotyping of the known DNA variants to clinical risk factors (including a family history of diabetes) had a minimal, albeit statistically significant, effect on the prediction of future type 2 diabetes.Notably, the ability of genetic risk factors to predict future type 2 diabetes improved with an increasing duration of follow-up, suggesting that assessment of genetic risk factors is clinically more meaningful the earlier in life they are measured.", + "Diabetes mellitus (DM) is considered a global pandemic, and the incidence of DM continues to grow worldwide.Nutrients and dietary patterns are central issues in the prevention, development and treatment of this disease.The pathogenesis of DM is not completely understood, but nutrient-gene interactions at different levels, genetic predisposition and dietary factors appear to be involved.Nutritional genomics studies generally focus on dietary patterns according to genetic variations, the role of gene-nutrient interactions, genediet-phenotype interactions and epigenetic modifications caused by nutrients; these studies will facilitate an understanding of the early molecular events that occur in DM and will contribute to the identification of better biomarkers and diagnostics tools.In particular, this approach will help to develop tailored diets that maximize the use of nutrients and other functional ingredients present in food, which will aid in the prevention and delay of DM and its complications.This review discusses the current state of nutrigenetics, nutrigenomics and epigenomics research on DM.Here, we provide an overview of the role of gene variants and nutrient interactions, the importance of nutrients and dietary patterns on gene expression,Diabetes mellitus (DM) is considered a global pandemic, and the incidence of DM continues to grow worldwide.Nutrients and dietary patterns are central issues in the prevention, development and treatment of this disease.The pathogenesis of DM is not completely understood, but nutrient-gene interactions at different levels, genetic predisposition and dietary factors appear to be involved.Nutritional genomics studies generally focus on dietary patterns according to genetic variations, the role of gene-nutrient interactions, genediet-phenotype interactions and epigenetic modifications caused by nutrients; these studies will facilitate an understanding of the early molecular events that occur in DM and will contribute to the identification of better biomarkers and diagnostics tools.In particular, this approach will help to develop tailored diets that maximize the use of nutrients and other functional ingredients present in food, which will aid in the prevention and delay of DM and its complications.This review discusses the current state of nutrigenetics, nutrigenomics and epigenomics research on DM.Here, we provide an overview of the role of gene variants and nutrient interactions, the importance of nutrients and dietary patterns on gene expression,The aim of the present review was to provide insights regarding the role of nutrient-gene interactions in DM pathogenesis, prevention and treatment.In addition, we explored how an individual's genetic makeup can affect nutrient metabolism and the response to nutrient intake, potentially leading to DM.It is important to promote greater research in this field because these findings will provide a framework for the development of genotype-dependent food health promotion strategies and the design of dietetic approaches for the prevention and management of DM.This knowledge has begun to provide evidence where specific targeted nutritional advice, such as following a Mediterranean Diet, helps to decrease cardiovascular risk factors and stroke incidence in people with polymorphisms strongly associated with T2DM [8].", + "A new generation of genetic studies of diabetes is underway.Following from initial genome-wide association (GWA) studies, more recent approaches have used genotyping arrays of more densely spaced markers, imputation of ungenotyped variants based on improved reference haplotype panels, and sequencing of protein-coding exomes and whole genomes.Experimental and statistical advances make possible the identification of novel variants and loci contributing to trait variation and disease risk.Integration of sequence variants with functional analysis is critical to interpreting the consequences of identified variants.We briefly review these methods and technologies and describe how they will continue to expand our understanding of the genetic risk factors and underlying biology of diabetes.A new generation of genetic studies of diabetes is underway.Following from initial genome-wide association (GWA) studies, more recent approaches have used genotyping arrays of more densely spaced markers, imputation of ungenotyped variants based on improved reference haplotype panels, and sequencing of protein-coding exomes and whole genomes.Experimental and statistical advances make possible the identification of novel variants and loci contributing to trait variation and disease risk.Integration of sequence variants with functional analysis is critical to interpreting the consequences of identified variants.We briefly review these methods and technologies and describe how they will continue to expand our understanding of the genetic risk factors and underlying biology of diabetes.", + "In this review, we briefly outlined salient features of pathophysiology and results of the genetic association studies hitherto conducted on type 2 diabetes.Primarily focusing on the current status of genomic research, we briefly discussed the limited progress made during the post-genomic era and tried to identify the limitations of the post-genomic research strategies.We suggested reanalysis of the existing genomic data through advanced statistical and computational methods and recommended integrated genomics-metabolomics approaches for future studies to facilitate understanding of the gene-environment interactions in the manifestation of the disease.We also propose a framework for research that may be apt for determining the effects of urbanization and changing lifestyles in the manifestation of complex genetic disorders like type 2 diabetes in the Indian populations and offset the confounding effects of both genetic and environmental factors in the natural way.In this review, we briefly outlined salient features of pathophysiology and results of the genetic association studies hitherto conducted on type 2 diabetes.Primarily focusing on the current status of genomic research, we briefly discussed the limited progress made during the post-genomic era and tried to identify the limitations of the post-genomic research strategies.We suggested reanalysis of the existing genomic data through advanced statistical and computational methods and recommended integrated genomics-metabolomics approaches for future studies to facilitate understanding of the gene-environment interactions in the manifestation of the disease.We also propose a framework for research that may be apt for determining the effects of urbanization and changing lifestyles in the manifestation of complex genetic disorders like type 2 diabetes in the Indian populations and offset the confounding effects of both genetic and environmental factors in the natural way.In a nutshell, genomic and post-genomic approaches identified a large number of biomarkers to ponder over and explore further but we are yet to identify universally accepted biomarker which can be used for the successful management and prevention of type 2 diabetes.In order to understand environment related modifications of genetic susceptibility, it may be prudent to conduct studies with integrated genomic-metabolomic approach.It is also imperative to gather existing molecular genetic data and curate it into uniform format and analyze the same for understanding the present status of research.A few attempts were, however, made to develop type 2 diabetes informative databases.While the databases T2DGADB and T2D-DB are only a collection of publications related to type 2 diabetes genetic association studies, proteinprotein interactions and expression studies, T2D@ZJU is a comprehensive collection of pathway databases, protein-protein interaction databases, and literature (Yang et al. 2013).Further, T2D@ZJU is a user-friendly interface database that provides graphical output of information organized in networks.These attempts may provide basis for studying type 2 diabetes utilizing systems biology, which is a better approach for understanding complex genetic diseases.", + "Genetic factors appear to play a role in determining an individual's risk of developing diabetes.It is hoped that genetic studies will ultimately identify key genetic elements that help determine susceptibility to diabetes, disease progression, and responsiveness to specific therapies, as well as help identify novel targets for future intervention.A substantial number of genetic loci, gene polymorphisms, and mutations have already been reported as having variable degrees of association with one or other type of diabetes (type 1, type 2, maturity onset diabetes of the young [MODY]), while others appear to be involved in response to antihyperglycemic agents.We have compiled the following glossary of genetic and genomic terms relating to diabetes, which we hope will prove a useful reference to researchers and clinicians with an interest in this disease.This is by no means an exhaustive list, but includes many of the genetic loci and variants that have been studied in association with diabetes.Gene encoding insulin-like growth factor 2 mRNA binding protein 2 (also known as IMP-2).SNPs in the gene have been associated with type 2 diabetes IFIH1" + ], + [ + "In 2006, a large-scale association study identified TCF7L2 as an important genetic factor for T2D in Icelandic individuals [10].This discovery was a significant breakthrough as this association was then widely confirmed in populations of European origin and other ethnic groups, such as Japanese and American individuals [50][51][52][53][54][55][56][57].Therefore, TCF7L2 was regarded as the most significant T2D susceptibility gene identified to date.3.1.Impact of TCF7L2 on the Risk of T2D.TCF7L2 is the most intensively studied locus for T2D risk so far.The risk alleles of TCF7L2 were associated with enhanced expression of this gene in human islets as well as impaired insulin secretion both in vitro and in vivo.The authors also observed an impaired incretin effect in subjects carrying risk alleles of TCF7L2 and proposed the engagement of the enteroinsular axis in T2D [119].Dennis and colleagues then verified this result and indicated that TCF7L2 variant rs7903146 affected risk of T2D, at least in part, through modifying the effect of incretins on insulin secretion.This was not due to reduced secretion of glucose-dependent insulinotropic polypeptide (GIP) and glucagon-like peptide 1 (GLP-1), which exhibit an important physiological role in boosting insulin secretion following meals, but rather due to the effect of TCF7L2 on the sensitivity of -cells to incretins [120].TCF7L2 has also been linked to altered pancreatic islet morphology as exemplified by increased individual islet size and altered alpha and beta cell ratio/distribution within human islets [121].This phenomenon is also observed in other in vivo or in vitro studies [122][123][124].This further strengthened the evidence for the role of TCF7L2-associated alteration of cell types in islets in the pathogenesis of T2D.TCF7L2 encodes the transcription factor TCF4 which is related to Wnt signaling pathway and which plays a critical role in the pathogenesis of T2D.The major effector of the canonical Wnt signaling pathway is known as catenin/TCF.This bipartite transcription factor is formed by free -catenin (-cat) and a member of the TCF protein family, including TCF7L2 (previously known as TCF-4) [125].GWAS have revealed the involvement of a Wnt ligand (Wnt-5b), Wnt coreceptor (LRP-5), and the Wnt pathway effector TCF7L2 in the development of diabetes [126].Several previous studies also provide evidence that the -catenin/TCF axis participates in pancreatic cell proliferation and differentiation [127][128][129][130][131]. Treatment of -cells with purified Wnt protein or activated -catenin augmented the proliferation of these cells [132].Intriguingly, deletion of -catenin within the pancreatic epithelium resulted in an almost complete lack of acinar cells, whereas deletion of -catenin specifically in differentiated acinar cells had no such effect [128], suggesting that the TCF7L2-related Wnt signaling mainly perturbs pancreatic growth but not pancreatic function.However, deletion of islet TCF7L2 expression from -cells did not show any demonstrable effects on glucose-stimulated insulin secretion (GSIS) in adult mice, whereas manipulating TCF7L2 levels in the liver caused hypoglycemia and reduced hepatic glucose production [133].In concordance with these results, risk alleles in TCF7L2 were associated with hepatic but not peripheral insulin resistance and enhanced rate of hepatic glucose production in human [119].Therefore, TCF7L2-related disruption of -cell function is probably the indirect consequence of primary events in liver or other organs/systems.", + "Variant of transcription factor 7like 2(TCF7L2) gene confers risk of type 2 diabetes. Nat. Genet. 38: 320323. doi: 10.1038/ng1732GuhaThakurta D., Xie T., Anand M., Edwards S.W. , Li G., WangS.S. & Schadt E.E. 2006. Cis-regulatory variations: A study ofSNPs around genes showing cis-linkage in segregating mousepopulations. BMC Genomics 7: 235. doi: 10.1186/1471-21647-235Gunter C. 2008. Quantitative genetics. Nature 456: 719. doi:10.1038/456719aHaines J.L. , Hauser M.A. , Schnidt S., Scott W.K. , OlsonL.M. , Gallins P., Spencer K.L. , Kwan S.Y. , Noureddine M.,Gilbert J.R., Schnetz-Boutaud N., Agarwal A., Postel E.A.", + "One obvious locus to consider is TCF7L2 in the context of type 2 diabetes.Common genetic variation located within the gene encoding transcription factor 7 like 2 (TCF7L2) has been consistently reported to be strongly associated with the disease.Such reports range from 2006, when we first published the association [3], to the recent transethnic meta-analysis GWAS of type 2 diabetes [4].Our data also lead us to conclude that TCF7L2 could also play a role in the pathogenesis of type 2 diabetes.Note that although TCF7L2 is known to have multiple isoforms, our expression data revealed no significant differences in these splice variants (ESM Table 6).", + "In conclusion, our study confirms the involvement of TCF7L2 gene in the T2DM susceptibility.Moreover, as shown also by the logistic regression analysis results, we describe a significant contribution of the TCF7L2 genetic variability to the emerging diabetic complications such as retinopathy and CAN.DiscussionThis study examined the relationships between genetic variants of TCF7L2 gene and T2DM in an Italian population.Although the disease progression results from an interplay of environmental factors and genetic predisposition, in recent years TCF7L2 gene has been considered the strongest genetic determinant for the risk of developing T2DM [2-4, 19, 20].The gene encodes a transcription factor of the canonical Wnt signaling pathway, expressed in several tissues, known to have developmental roles in determining cell fate, survival, proliferation and movement [9].Wnt signaling plays an important role also in B-cell proliferation and insulin secretion and influences synthesis of glucagon-like peptide 1 (GLP-1) in intestinal L-cells [21].In our study, besides the confirmation of the role of TCF7L2 gene in the susceptibility to T2DM, we investigated whether variants of this gene could also be associated with diabetic complications in our diabetic population.", + "Recently, two moderately linked intronic SNPs (rs7903146 and rs12255372; r 2 0.7) in the confirmed diabetes risk gene TCF7L2 [transcription factor 7-like 2 (T-cell-specific, HMG-box); OMIM entry no.602228] were shown to affect GLP-1 responsiveness of -cells, as evidenced by a hyperglycemic clamp combined with GLP-1 infusion (199).This was confirmed by comparison of the effect of the representative SNP rs7903146 on insulin secretion upon an oral vs. an iv glucose load (200).Plasma GLP-1 levels were not different between the genotypes (199,200).TCF7L2 encodes a component of the bipartite transcription factor complex -catenin/transcription factor 7-like 2 that is involved in the Wnt signaling pathway (236).Using knockdown by RNA interference and overexpression by transfection, it was demonstrated, in human and murine islets, that TCF7L2 is required for -cell survival and -cell proliferation as well as for glucose-and incretin-stimulated insulin secretion (237).Furthermore, expression of the insulin gene was found to strongly correlate with TCF7L2 expression (200) and was decreased after TCF7L2 knockdown, suggesting that the insulin gene represents a direct target gene of transcription factor 7-like 2 (238).Importantly, novel results of Maedler's group (239) revealed that the expression of GLP-1 and GIP receptors in human islets likewise depends on the presence of transcription factor 7-like 2 providing a plausible explanation for this gene's involvement in incretin responsiveness of -cells.", + "In studies where overt T2D has been the phenotype the majority of associated polymorphisms have encoded proteins known to be involved in -cell metabolism; for example TCF7L2, KCNJ11 and HHEX have shown robust association [170,171].This suggests that these genes could prove useful in predicting -cell preservation during the course of T2D.The glucokinase gene (GCK) coding for the initial glucose-sensing step in the -cell can have activating mutations causing hypoglycemia that might provide structural and functional models leading to drug targets for treating T2D [172].In the GoDARTs study, investigators examined the medication response of metformin and sulphonylurea based on the TCF7L2 variants mainly affecting the -cell.The carriers of the at risk 'T' allele responded less well to sulphonylurea therapy than metformin [173].Also it is of significant public health interest that in the Diabetes Prevention Program, lifestyle modifications were shown to reduce the risk of diabetes conferred by risk variants of TCF7L2 at rs7093146, and in placebo participants who carried the homozygous risk genotype (TT), there was 80% higher risk for developing diabetes compared to the lifestyle intervention group carrying the same risk genotypes [35].These findings could herald significant future progress in the field of T2D pharmacogenomics, possibly leading to the development and use of agents tailored on the basis of genotype.", + "The first moves towards large-scale association mappingThe earliest indication that the 'hypothesis-free' association approach to gene identification might succeed for T2D came from the discovery that variants within the transcription factor 7-like 2 (TCF7L2) gene had a substantial effect on T2D susceptibility [15].TCF7L2 encodes a transcription factor that is active in the Wnt-signalling pathway and that had no 'track-record' as a candidate for T2D; indeed, this susceptibility effect was detected through a search for microsatellite associations across a large region of chromosome 10 that had been previously implicated in T2D susceptibility by linkage [16].Subsequent fine-mapping efforts localized the likely causal variant(s) to an intron within TCF7L2 [15,17].The fact that this signal was found within a region of apparent T2D linkage seems to have been serendipitous, because none of these variants within TCF7L2 are capable of explaining the linkage effect [15,17].Across a swathe of replication studies [3][4][5][6][7]18], it has become clear that TCF7L2 variants have a substantially stronger effect on T2D risk than those in PPARG and KCNJ11, with a per-allele odds ratio of $1.4 (Table 1; Figure 2).As a result, the 10% of Europeans that are homozygous for the risk allele have approximately twice the odds of developing T2D as those carrying no copies [15,18].The evidence implicating variants within TCF7L2 in T2D susceptibility has naturally prompted efforts to understand the mechanisms involved.Current evidence indicates that alteration of TCF7L2 expression or function disrupts pancreatic islet function, possibly through dysregulation of proglucagon gene expression, LGR5, leucine-rich repeat-containing G-protein coupled; NOTCH2, Notch homologue 2 (Drosophila); PPARG, peroxisome proliferator-activated receptor gamma; SLC30A8, solute carrier family 30 (zinc transporter), member 8; TCF7L2, transcription factor 7 like 2; THADA, thyroid adenoma associated; TSPAN8, tetraspanin 8; WFS1, Wolfram syndrome1.b Estimates of effect size (given as per-allele odds ratios, i.e. the increase in odds of diabetes per copy of the risk allele) and risk-allele frequencies are all reported for Europeandescent populations based on available data (Figure 2).", + "The genetic association between T2D and variants in transcription factor 7-like 2 (TCF7L2) was first discovered in a 2).It is interesting that the T allele of rs7903146 increases T2D risk while decreasing BMI, opposing the idea that increased BMI leads to insulin resistance and T2D.In comparison to FTO and MC4R variants, TCF7L2 variants have a much larger effect on T2D risk and a smaller effect on BMI, which might indicate that the TCF7L2 variants act via T2D to affect BMI (Fig. 2).TCF7L2 is a transcription factor functioning in WNT signaling, which is crucial for cell proliferation, motility, normal embryogenesis, and regulation of myogenesis and adipogenesis (reviewed in [96]).Although the causal variant is still unclear, the T2D risk allele appears to act via lowering the levels of insulin secretion and influencing beta-cell function (reviewed in [51,96,97]).", + "To date, more than 70 genes have been identified as involved in T2DM, primarily by association analysis [34].In addition, via GWAS arrays, more than 100 SNPs have been identified for T2DM [35].From the 50 novel loci associated with T2DM previously identified, more than 40 loci have been associated with T2DM-related traits, including fasting proinsulin, insulin and glucose (Table 1) [36][37][38][39].However, for T2DM-related traits, such as the HOMA index or pancreatic cell function, there are virtually no published data examining the relationship between these traits or the genotype and environment interactions.Clinical investigations of some loci have suggested that the genetic components of T2DM risk act preferentially through cell function [40].Among all 40 loci associated with T2DM-related traits, only transcription factor-7-like 2 (TCF7L2) was shown to clearly contribute to T2DM risk [41].Several studies in white European [42], Indian [43], Japanese [44], Mexican American [45] and West African [46] individuals have shown a strong association between TCF7L2 and T2DM.It is also noteworthy that these populations represent the major racial groups with a high prevalence of T2DM.In all populations, TCF7L2 showed a strong association, with the odds of developing T2DM increased by 30%-50% for each allele inherited.This finding indicates an approximately double odds ratio compared to most other diabetes susceptibility polymorphisms.TCF7L2 is a transcription factor involved in the Wnt signaling pathway that is ubiquitously expressed, and it has been observed that TCF7L2 risk alleles result in the overexpression of TCF7L2 in pancreatic cells.This overexpression causes reduced nutrient-induced insulin secretion, which results in a direct predisposition to T2DM as well as an indirect predisposition via an increase in hepatic glucose production [47].", + "From the first GWA study of T2D, published recently in Nature [141], the strongest association observed was with a gene that was already established as having a role in the disease, namely the Wnt-signaling pathway member, transcription factor 7-like 2 (TCF7L2) [142], which has already been extensively independently replicated [143][144][145][146][147][148][149][150][151][152].This association has now been refined utilizing a West African patient cohort [153]; this is due to the fact that, in this cohort, the associated SNP is contained in a smaller LD block due to higher haplotype diversity in populations of African ancestry and thus the region most likely to contain the functional variant was narrowed down.The precise mechanism of action for this variant and its influence on the susceptibility to T2D is still to be elucidated; but it is speculated that it could operate through the alteration of levels of the insulinotropic hormone, GLP-1, one of the peptides encoded by the proglucagon gene whose expression in enteroendocrine cells is transcriptionally regulated by TCF7L2 [118].In tandem with insulin, GLP-1 has a strong influence on blood glucose homeostasis [118].Indeed, GLP-1 analogs and inhibitors of dipeptidyl peptidase IV are currently in clinical development.It has been noted that individuals with both impaired glucose tolerance and the at-risk TCF7L2 variant are more likely to go on to develop T2D, with the effect reported to be stronger in a placebo group than in metformin and lifestyle-intervention groups [143].The variant is also associated with decreased insulin secretion, but not increased insulin resistance at baseline [143].The risk-conferring genotypes in TCF7L2 are thus associated with impaired -cell function, but not with insulin resistance and may, therefore, give some indication on optimal therapeutic intervention for the one in five T2D cases this variant impacts.", + "TCF7L2Transcription factor 7-like 2 was first implicated when a signal associated with Type 2 diabetes on chromosome 10q was shown in Icelandic populations to host a microsatellite DG10748, containing single nucleotide polymorphisms rs7903146 and rs12255372 in intron 3 of the TCF7L2 gene [20], associated with a ~45% increase in Type 2 diabetes risk per allele.As such, the TCF7L2 locus presently represents the strongest known genetic determinant of Type 2 diabetes.Risk allele carriers show impaired insulin production [21] and b-cell dysfunction in vitro [22].", + "Among all the loci, TCF7L2 so far has shown the strongest association with the largest effect size for type 2 diabetes in Europeans (5,(7)(8)(9)(10)(11)(12), Amish (25), and Indians (22,26,27), but not in Chinese (28) and Japanese (29) subjects.The present study confirms the association of TCF7L2 with type 2 diabetes with the largest effect size.The TCF7L2 gene product has been implicated in blood glucose homeostasis (5,30), and the variant rs7903146 is reported to be associated with measures of glucose metabolism (25).Consistent with these observations, we also found a strong association of TCF7L2 with HOMA-B and a nominal association with FPG and 2-h PPG, confirming the physiological role of TCF7L2 in glucose homeostasis.", + "In summary, we have identified a variant in a previously unknown candidate gene for type 2 diabetes, TCF7L2, within a previously reported linkage region on 10q 1,8 .We have observed association of a composite at-risk allele of microsatellite DG10S478 within intron 3 of the TCF7L2 gene to type 2 diabetes in Iceland, which was subsequently replicated in Denmark and the US with similar frequency and relative risks.These data from three populations constitute strong evidence in support of the notion that variants of the TCF7L2 gene contribute to the risk of type 2 diabetes.", + "TCF7L22.1.Background.The gene-encoding Transcription 7 Like-2 (TCF7L2, previously called TCF4) is the most important T2D susceptibility gene identified to date, with genetic variants strongly associated with diabetes in all major racial groups [27][28][29].Signals in this locus are the most consistently identified across various GWAS and are associated with the highest elevation of risk of developing adult-onset T2D.Each copy of the risk T-allele at rs7903146 has an increased odds ratio for T2D of 1.4-1.5 [60].Inheritance of the risk allele is also a useful predictor for the likelihood of conversion from a state of prediabetes to T2D [61,62].Additionally, results from a small number of studies also indicate that TCF7L2 variation may play an important role in cases of early onset T2D [63,64].", + "One of the strongest T2DM risk-association in all the GWAS studies was found for common variants in TCF7L2, a gene coding for a transcription factor that is part of the WNT signaling pathway involved in the regulation of myogenesis and angiogenesis, but also critical for the embryonic development of pancreatic islets [19].Recently, it has been shown that the variant allele results in overexpression of TCF7L2 in pancreatic beta-cells, reducing insulin secretion in response to a variety of stimuli [6,8].The odd ratios (OR, is an estimate of the relative risk, with values [1.0 indicating a positive and \\1.0 a negative association, conferred by each additional risk allele carried at each locus) calculated in the pooled studies for the T allele in the snp7903146 of TCF/L2 was 1.37 (1.31-1.43)[13].This variant resides in an intron of the gene.Other variants at this locus also confer increased risk for T2DM, although the specific genetic defect that results in impaired insulin secretion in carriers has not been identified yet.Alternatively, other genes in the region may contribute to T2DM susceptibility.Associations between the T variant of TCF7L2 and T2DM have been consistently confirmed in geographically, ethnically, and environmentally diverse populations (references in [19], without evidence of heterogeneity across ethnic groups [2].", + "The C to T (genomic position: 114748339) substitution at SNP rs7903146 of the intron 3 (IVS3C>T) is associated with T2DM and may function through impaired glucagon-like peptide 1 secretion, which is stimulated more by fat than by carbohydrate ingestion [25,26].TCF7L2 is present on chromosome 10q25, spanning 215.9 kb.It considered the most influential gene in determining the genetic susceptibility for T2DM today [27].TCF7L2 is the key transcriptional factor regulating glucose metabolism through the Wnt signaling pathway and has been reported to be critical for the development of the pancreas and islets during embryonic growth [3].Genetic variants in this gene are associated with increased risk of T2DM in a variety of study populations [28,29].", + "The variants in TCF7L2, MC4R, CDC123, KCNQ1, IGF2BP2, and SLC16A11 have all been previously associated with T2D in adults (20,25).In addition, a prior study in SEARCH reported that genetic variation in TCF7L2 is associated with an increased risk of T2D in African American youth, with the OR for diabetes stronger in African American than in non-Hispanic White youth (12).The divergent ethnicity-based results did not replicate in ProDiGY, suggesting that the earlier findings in SEARCH might be due to statistical fluctuations in the context of smaller sample sizes." + ] + ], + "task_id": [ + "8FFF9DCC307B8DBF2C8485637F2ABEF4", + "C319861B08978CF5F7E6F0CD3A517A81", + "68EF3BE5EC2106766CA9CC700135E2FA", + "8590501C57DC5C321AB5E1036F233027", + "CD1F7EAE0FDC758A8167118927ADFE71" + ] +}
\ No newline at end of file diff --git a/gnqa/data/study1/datasets/human/dataset_domainexpert_diabetes_1_two.json b/gnqa/data/study1/datasets/human/dataset_domainexpert_diabetes_1_two.json new file mode 100644 index 0000000..3758538 --- /dev/null +++ b/gnqa/data/study1/datasets/human/dataset_domainexpert_diabetes_1_two.json @@ -0,0 +1,76 @@ +{ + "question": [ + "Is any of the genes SH2B3, IFIH1 or ERBB3 related to diabetes?", + "How can I use genenetwork to find genes related with diabetes in humans?", + "How can I use the GeneNetwork tool to find genes related with diabetes in humans?" + ], + "answer": [ + "Yes, the gene IFIH1 is identified as a contributor to susceptibility to type 1 diabetes. However, the text does not mention any direct relation of SH2B3 or ERBB3 to diabetes.", + "GeneNetwork can be used to find genes related to diabetes in humans by analyzing Genome-Wide Association Study (GWAS) data. This involves integrating this data with the human gene network, which can boost the performance of recovering validated type 2 diabetes genes. The network can also strongly implicate certain genes in type 2 diabetes. Additionally, the DisGeNET database can be used to collate gene-disease information, which can contribute to understanding the biology of type 2 diabetes. This approach can identify", + "You can use the GeneNetwork tool to find genes related to diabetes in humans by navigating to genenetwork.org and using the global search bar at the top of the page. You can search for genes, mRNAs, or proteins across all of the datasets. Use standard gene symbols containing more than two characters in the name for best results. You can also switch to phenotypes and search for any phenotype of interest. Additionally, you can use the Select and search pull-down menus to choose a population of interest." + ], + "contexts": [ + [ + "Figure 8 Molecular changes in the islets of patients with T2D mirror the processes altered in NOD mice.mRNA expression in human pancreatic islets from healthy individuals (n = 105) and those diagnosed with T2D (n = 14) was assessed through RNA-seq analysis. (a) Relationship between GLIS3 and MANF expression in healthy individuals (Spearman correlation P value = 0.043), individuals with T2D (Spearman correlation P value = 0.075) and all individuals (Spearman correlation P value = 0.028). (b-e) Expression of XRCC4 (b), LIG4 (c), H2AFX (d) and CDKN1A (e) in healthy islets as compared to i slets from patients withT2D (P values shown after multiple-testing correction).The median and interquartile range (IQR; box) are shown, with error bars indicating 1.5 times the IQR.Individual values are shown if beyond 1.5 times the IQR. (f) Relationship between H2AFX and LIG4 expression in human islets (Spearman correlation P value = 5 10 9 ).Parallel transcriptional regulation in human isletsTo determine whether the findings observed in mice were applicable to humans, we investigated whether the pathway identified in NOD mice also demonstrated genetic linkage to diabetes or glucose regulation traits in humans.GLIS3 polymorphisms have previously been associated with altered glucose regulation; we additionally identified nominally significant associations for MANF, XRCC4 and LIG4 polymorphisms (Supplementary Table 2).In an independent approach that takes into account environmental effects, we analyzed RNA-seq data from human pancreatic islets isolated from 119 donors, including 14 diagnosed with T2D 28 .To assess the validity of the Glis3-Manf relationship observed in mice, we investigated the relationship of these two genes in human islets.A trend toward reduced GLIS3 expression was observed in T2D islets, whereas MANF expression appeared unchanged (Supplementary Fig. 13).Critically, a significant positive relationship was observed between GLIS3 and MANF levels in human islets (Fig. 8a).Next, we investigated whether patients with T2D might exhibit reduced XRCC4 expression, analogous to the NOD polymorphisms.We found no change in XRCC4 expression in T2D islets (Fig. 8b); however, the levels of the obligate binding partner encoded by LIG4 were significantly reduced (Fig. 8c).In mice, Xrcc4 polymorphisms were associated with increased senescence; likewise, in patients with T2D, the levels of the senescence markers H2AFX (Fig. 8d) and CDKN1A (Fig. 8e) were increased.Finally, a direct relationship was observed between reduced LIG4 and increased H2AFX levels (Fig. 8f).Although the cause of coregulation cannot be assessed in ex vivo human islets, the parallel with NOD mice strongly supports a conservation of diabetes susceptibility mechanisms across species.3,500,000 3,000,000 2,500,000 2,000,000 1,500,000 1,000,000 500,000 0 Fluorescence", + "All the genes involved in these pathways, as well as the genes involved in b-cells development and turnover, may be considered candidate genes for T2DM with predominant insulin deficiency.", + "One method of searching for the cause of NIDDM is via the candidate gene approach.Possible candidates for NIDDM include genes involved in specifying pancreatic islet (3-cell phenotype and in directing fj-cell development and (3-cell responses of glucose-mediated insulin synthesis and secretion.The transcription factor islet-1 (Isl-1) has been shown to be a unique protein that binds to the mini-enhancer or Far-FLAT region (nucleotide -247 to -198) of the rat insulin I gene (7).Isl-1, a protein comprised of 349 residues (38 kD), is a member of the LIM/homeodomain family of proteins, named for the first three members described: lin-11, isl-1, and mec-3 (8,9).These proteins are comprised of three putative regulatory regions, two LIM domains (cysteine-rich motifs) in the amino terminus of the protein, a homeobox domain near the middle, and a glutamine-rich transcriptional activation domain at the carboxyl end (7,9).With the use of an antibody to Isl-1, expression was shown to be restricted to a subset of endocrine cells, including islets, neurons involved in autonomic and endocrine control, and selected other tissues in the adult rat (10)(11)(12).", + "ResultsImpairment or alteration of the insulin-signaling pathway is a commonly recognized feature of type 2 diabetes.It is therefore notable that the IS-HD gene set (Dataset S4) was not detected to be significantly transcriptionally altered by application of either hypergeometric enrichmentt test, DEA or GSEA.In particular, applying GSEA to the transcriptional profile dataset of diabetic and normal glucose-tolerant skeletal muscle described in Mootha et al. [10] did not identify a significant level of alteration in the IS-HD gene set (p 0.536), while DEA produced a comparably weak enrichment score (p 0.607).The failure to detect a significant transcriptional alteration in IS-HD may be explained by a number of factors.The enrichment results depended on the specific choice of the IS-HD gene set, and it is possible that an alternatively defined insulin-signaling gene set would be determined as significantly enriched.Additionally, expression changes in a few critical genes in IS-HD may be sufficient to substantially alter insulin signaling, and running DEA on the large IS-HD set may miss the contributions from these few genes.", + "35ABSTRACT 11A GENE EXPRESSION NETWORK MODEL OF TYPE 2 DIABETESESTABLISHES A RELATIONSHIP BETWEEN CELL CYCLEREGULATION IN ISLETS AND DIABETES SUSCEPTIBILITYMP Keller, YJ Choi, P Wang, DB Davis, ME Rabaglia, AT Oler, DS Stapleton,C Argmann, KL Schueler, S Edwards, HA Steinberg, EC Neto, R Klienhanz, STurner, MK Hellerstein, EE Schadt, BS Yandell, C Kendziorski, and AD AttieDepts.", + "Second, we performed an extensive manual curation according to a previously described b-cell-targeted annotation (Kutlu et al, 2003;Ortis et al, 2010).In partial agreement with the IPA, we found these genes to fall into three broad categories: (1) genes related to b-cell dysfunction and death, (2) genes potentially facilitating the adaptation of the pancreatic islets to the altered metabolic situation in T2D and (3) genes whose role in disease pathogenesis remains to be unearthed (Figure 6B).The adaptation-related gene category contains few metabolism-associated genes (e.g., HK1, FBP2; Figure 6B, right part, Figure 7) and many more genes involved in signal transduction or encoding hormones, growth factors (e.g., EGF, FGF1, IGF2/IGF2AS; Figure 7), or transcription factors involved in important regulatory networks (for instance, FOXA2/HNF3B, PAX4 and SOX6) (Figure 6B, right part, Figure 7).In the b-cell dysfunction and death category, there were hypomethylated genes related to DNA damage and oxidative stress (e.g., GSTP1, ALDH3B1; Figure 7), the endoplasmic reticulum (ER) stress response (NIBAN, PPP2R4, CHAC1), and apoptosis (CASP10, NR4A1, MADD; Figure 6B, left part, Figure 7).Some genes of interest from the highlighted categories are depicted in Figure 7. Their annotated functions provide possible explanations of how the epigenetic dysregulation of these genes in diabetic islets is connected to T2D pathogenesis.Numerous genes that were identified by our methylation profiling approach have been functionally implicated in insulin secretion.Examination of the available literature on the function of these genes revealed three aspects of insulin secretion with which they interfere: some of these genes influence the expression of the insulin gene, like MAPK1 and SOX6, or its post-translational maturation, like PPP2R4 (cf. Figure 7 and references therein).Others can deregulate the process of insulin secretion itself (SLC25A5, Ahuja et al, 2007;RALGDS, Ljubicic et al, 2009) or influence synthesis as well as secretion (vitronectin, Kaido et al, 2006).A third group of differentially methylated genes affects (i) signalling processes in the b-cell leading to insulin secretion or (ii) glucose homeostasis in b-cells, thereby modulating insulin response upon stimulation.GRB10 (Yamamoto et al, 2008), FBP2 and HK1 (Figure 7) are examples for these genes.Additional genes found in our study have been implicated in the b-cells' capability to secrete insulin, though the mechanisms have not yet been fully established.The putative functions of these genes indicate a potential epigenetic impact on insulin secretion at multiple levels, namely signalling, expression/synthesis and secretion.", + "In summary, we have associated mutations in the SLC29A3 gene with diabetes mellitus in humans and the insulin signaling pathway in Drosophila.The mechanistic basis of these findings remains to be determined.This is strong evidence supporting the investment of resources to further investigate the role of SLC29A3 and its orthologs in diabetes and glucose metabolism in model systems.DISCUSSIONWe have identified mutations in the equilibrative nucleoside transporter 3 protein that are associated with an inherited syndrome of insulin-dependent DM, and provide prima facie evidence that the Drosophila ortholog of this protein interacts with the insulin signaling pathway.This is the first evidence that mutations in the human SLC29A3 gene can be associated with a diabetic phenotype.", + "These observations taken together suggest that molecules involved in innate immunity could serve as candidate genes that determine the susceptibility of sensitive strains of mice to virusinduced diabetes.Interestingly, deficiency of the Tyk2 gene results in a reduced antiviral response 24 .In addition, the human TYK2 gene was mapped to the possible type 1 diabetes susceptibility locus 25 .", + "A recent sequencing study provides an example of detection of rare variants in type 1 diabetes.Targeted sequencing in a series of candidate coding regions resulted in IFIH1 being identified as the causal gene in a region associated with type 1 diabetes by GWA studies (58).IFIH1 encodes a cytoplasmic helicase that mediates induction of the interferon response to viral RNA.The discovery of IFIH1 as a contributor to susceptibility to type 1 diabetes has strengthened the hypothesis (70) about a mechanism of disease pathogenesis involving virusgenetic interplay and raised type 1 interferon levels as a cofactor in -cell destruction.Nonetheless, it should be recognized that a component of the missing heritability (familial aggregation) in type 1 diabetes could well be due to unrecognized intra-familial environmental factors.Disease pathogenesis.Contemporary models of pathogenesis of type 1 diabetes support the involvement of two primary dramatis personae: the immune system and the -cell.The known and newly identified genetic risk factors for type 1 diabetes present exciting opportunities to build on to the current cast of disease mechanisms and networks.Most of the listed genes of interest (Table 2) and those in extended regions are assumed to regulate immune function.Some of these genes, however, may also have roles in the -cell (insulin being the most obvious example).Another gene, PTPN2, encoding a protein tyrosine phosphatase, was identified as affecting the risk for type 1 diabetes as well as for Crohn disease (47,71).PTPN2 is expressed in immune cells, and its expression is highly regulated by cytokines.However, PTPN2 is expressed also in -cells, where it modulates interferon (IFN)- signal transduction and has been shown to regulate cytokineinduced apoptosis (72).Other candidate genes, such as NOS2A, IL1B, reactive oxygen species scavengers, and candidate genes, identified in large GWA studies of type 2 diabetes, have not been found to be significant contributors to the susceptibility of type 1 diabetes (73).", + "Differential Expression Analyses of Type 1 Diabetes Mellitus Associated GenesFor the aforementioned 171 'novel' genes, we used t-test to compare ribonucleic acid expression signals in PBMCs or monocytes between type 1 diabetes mellitus patients and healthy controls.We found that 37 genes, including 21 non-HLA genes (e.g.FAM46B, OLFML3 and HIPK1), were differentially expressed between type 1 diabetes mellitus patients and controls (Table 2).For the differential expression study, the significance level of P < 5.0E-02 was used.", + "In this study, we have correlated the function and genotype of human islets obtained from diabetic and nondiabetic (ND) donors.We have analyzed a panel of 14 gene variants robustly associated with T2D susceptibility identified by recent genetic association studies.We have identified four genetic variants that confer reduced b-cell exocytosis and six variants that interfere with insulin granule distribution.Based on these observations, we calculate a genetic risk score for islet dysfunction leading to T2D that involves decreased docking of insulin-containing secretory granules, impaired insulin exocytosis, and reduced insulin secretion.", + "At present, insulin [15], glucokinase [16], amylin [17], mitochondrial DNA [18], and several transcriptional factors [19][20][21][22] are recognized as diabetogenic genes in pancreatic b-cells.In the present study we used the candidate gene approach in the examination of genomic variation in the a 1D and Kir6.2 channel genes in type 2 diabetic patients.", + "In summary, we report AEIs that are consistent with type 2 diabetes-associated variation regulating the expression of cis-linked genes in human islets.For some of the genes where significant AEI was identified (e.g., SLC30A8, WFS1), there is strong evidence from human genetics that small changes in gene dosage may have significant consequences for the pancreatic b-cell.For other genes with significant AEI (e.g., ANPEP, HMG20A), their role is less well defined, and hence this study should provide a platform for further work examining the effects of carefully manipulating the expression of these genes in human islets.", + "The authors then used mouse liver and adipose expressiondata from several mouse crosses to construct causal expression networks for the ERBB3 andRPS26 orthologs in the mouse. They then showed that ERBB3 is not associated with anyknown Type I diabetes genes whereas RPS26 is associated a network of several genes thatare part of the KEGG Type I diabetes pathway (Schadt et al. 2008). This type of analysisdemonstrates the power of combining human and mouse data with a network basedapproach that has been proposed for use in drug discovery (Schadt et al.", + "Genome-wide association studies in human type 2 diabetes (T2D) have renewed interest in the pancreatic islet as a contributor to T2D risk.Chronic low-grade inflammation resulting from obesity is a risk factor for T2D and a possible trigger of -cell failure.In this study, microarray data were collected from mouse islets after overnight treatment with cytokines at concentrations consistent with the chronic low-grade inflammation in T2D.Genes with a cytokine-induced change of 2-fold were then examined for associations between single nucleotide polymorphisms and the acute insulin response to glucose (AIRg) using data from the Genetics Underlying Diabetes in Hispanics (GUARDIAN) Consortium.Significant evidence of association was found between AIRg and single nucleotide polymorphisms in Arap3 (5q31.3),F13a1 (6p25.3),Klhl6 (3q27.1),Nid1 (1q42.3),Pamr1 (11p13), Ripk2 (8q21.3),and Steap4 (7q21.12).To assess the potential relevance to islet function, mouse islets were exposed to conditions modeling low-grade inflammation, mitochondrial stress, endoplasmic reticulum (ER) stress, glucotoxicity, and lipotoxicity.RT-PCR revealed that one or more forms of stress significantly altered expression levels of all genes except Arap3.Thapsigargininduced ER stress up-regulated both Pamr1 and Klhl6.Three genes confirmed microarray predictions of significant cytokine sensitivity: F13a1 was down-regulated 3.3-fold by cytokines, Ripk2 was up-regulated 1.5-to 3-fold by all stressors, and Steap4 was profoundly cytokine sensitive (167-fold up-regulation).Three genes were thus closely associated with low-grade inflammation in murine islets and also with a marker for islet function (AIRg) in a diabetes-prone human population.This islet-targeted genome-wide association scan identified several previously unrecognized candidate genes related to islet dysfunction during the development of T2D.Genome-wide association studies in human type 2 diabetes (T2D) have renewed interest in the pancreatic islet as a contributor to T2D risk.Chronic low-grade inflammation resulting from obesity is a risk factor for T2D and a possible trigger of -cell failure.In this study, microarray data were collected from mouse islets after overnight treatment with cytokines at concentrations consistent with the chronic low-grade inflammation in T2D.Genes with a cytokine-induced change of 2-fold were then examined for associations between single nucleotide polymorphisms and the acute insulin response to glucose (AIRg) using data from the Genetics Underlying Diabetes in Hispanics (GUARDIAN) Consortium.Significant evidence of association was found between AIRg and single nucleotide polymorphisms in Arap3 (5q31.3),F13a1 (6p25.3),Klhl6 (3q27.1),Nid1 (1q42.3),Pamr1 (11p13), Ripk2 (8q21.3),and Steap4 (7q21.12).To assess the potential relevance to islet function, mouse islets were exposed to conditions modeling low-grade inflammation, mitochondrial stress, endoplasmic reticulum (ER) stress, glucotoxicity, and lipotoxicity.RT-PCR revealed that one or more forms of stress significantly altered expression levels of all genes except Arap3.Thapsigargininduced ER stress up-regulated both Pamr1 and Klhl6.Three genes confirmed microarray predictions of significant cytokine sensitivity: F13a1 was down-regulated 3.3-fold by cytokines, Ripk2 was up-regulated 1.5-to 3-fold by all stressors, and Steap4 was profoundly cytokine sensitive (167-fold up-regulation).Three genes were thus closely associated with low-grade inflammation in murine islets and also with a marker for islet function (AIRg) in a diabetes-prone human population.This islet-targeted genome-wide association scan identified several previously unrecognized candidate genes related to islet dysfunction during the development of T2D.In conclusion, GWAS studies focusing on the causes of T2D have implicated islet dysfunction as a major contributing factor (18,71).By examining isolated islets for stress responses and cross-referencing gene hits with genes associated with glucose-stimulated insulin release in human populations with T2D, we identified 7 genes that may play a role in promoting or preventing islet decline in T2D.By further examining stress-induced expression changes in each of these genes, we identified 5 genes that stood out: F13a1 as a novel stress-inhibited gene in islets, Klhl6 and Pamr1 as induced genes specific to ER stress, Ripk2 as a broadly stress-induced gene, and Steap4 as an exceptionally cytokine-sensitive gene.These genes provide promising leads in elucidating islet stress responses and islet dysfunction during the development of T2D.", + "Finally, several of the linking nodes introduced into this islet network through their PPI connections represent interesting candidates for a role in T2D pathogenesis, and there are several examples where external data provides validation of those assignments.An interesting example involves the gene GINS4 which maps at the ANK1 locus.Though this gene generated a low PCS [0.03] and was not included in the set of seed genes for this locus, GINS4 knock-down has an impact in a human beta-cell line [14].In addition, cyclin-dependent kinase 2 (CDK2) has been shown to influence beta-cell mass in a compensatory mechanism related to age-and diet-induced stress, connecting beta-cell dysfunction and progressive beta-cell mass deterioration [54].YHWAG is a member of the 14-3-3 family, known to be signalling hubs for beta-cell survival [55], and disruption of SMAD4 drives islet hypertrophy [56]." + ], + [ + "Beyond new gene discovery in the field of research, an important challenge in the next coming years is how to set up a more open population-level and high-quality genetic screening strategy aiming to improve etiological diagnosis in almost all of cases with early-onset diabetes.", + "In briefGardner et al. queried the genomes of over 400,000 individuals and identified novel genes associated with type 2 diabetes risk.The biological function of these genes highlights potentially new therapeutic avenues for treatment of type 2 diabetes.", + "Results: Here we report on a meta-analysis approach that integrates data of heterogeneous origin in the domain of type-2 diabetes mellitus (T2DM).Different data sources such as DNA microarrays and, complementing, qualitative data covering several human and mouse tissues are integrated and analyzed with a Bootstrap scoring approach in order to extract disease relevance of the genes.The purpose of the meta-analysis is two-fold: on the one hand it identifies a group of genes with overall disease relevance indicating common, tissue-independent processes related to the disease; on the other hand it identifies genes showing specific alterations with respect to a single study.Using a random sampling approach we computed a core set of 213 T2DM genes across multiple tissues in human and mouse, including well-known genes such as Pdk4, Adipoq, Scd, Pik3r1, Socs2 that monitor important hallmarks of T2DM, for example the strong relationship between obesity and insulin resistance, as well as a large fraction ( 128) of yet barely characterized novel candidate genes.Furthermore, we explored functional information and identified cellular networks associated with this core set of genes such as pathway information, protein-protein interactions and gene regulatory networks.Additionally, we set up a web interface in order to allow users to screen T2DM relevance for any -yet non-associated -gene.DiscussionThe first part of our study was devoted to the identification of genes related to T2DM using different heterogeneous data sources in different organisms.Genes have been scored in each individual study according to their disease relevance and an overall score across the different studies has been computed that reflects their total disease relevance.By this approach we were able to identify 213 genes that have a general disease relevance showing high scores in many different studies as well as genes that have a specific disease relevance expressing high scores in only a few studies.", + "GENE DISCOVERY IN T2DWhy?", + "Genetic approaches to studying type 1 diabetesTwo approaches have been used to identify diabetes susceptibility genes: genome-wide linkage studies and candidate gene association studies [see also Field (57) for a discussion of these approaches as applied to type 1 diabetes].These approaches have definitively shown that the major histocompatibility complex (MHC) locus, also called human leukocyte antigen or HLA, contains the major inherited factor(s) that determines diabetes risk.At least two other genes contain variants that almost certainly affect risk: the insulin gene (INS) and CTLA4.We will review the merits of these two genetic approaches used to identify diabetes susceptibility genes and the results obtained thus far.We also discuss the possible impact of genetic and genomic advances on future genetic studies.", + "Received: 7 May 2009 Accepted: 25 February 2010Published: 25 February 2010References1. Sieberts SK, Schadt EE: Moving toward a system genetics view of disease. Mamm Genome 2007, 18:389-401. 2. Keller MP, Choi Y, Wang P, Davis DB, Rabaglia ME, Oler AT, Stapleton DS,Argmann C, Schueler KL, Edwards S, Steinberg HA, Chaibub Neto E,Kleinhanz R, Turner S, Hellerstein MK, Schadt EE, Yandell BS, Kendziorski C,Attie AD: A gene expression network model of type 2 diabetes links cellcycle regulation in islets with diabetes susceptibility. Genome Res 2008,18:706-716. 3.", + "In conclusion, the findings presented in our study suggest high power for gene-based association analyses in detecting disease-susceptibility genes across the human genome.Our findings point to the involvement of new pathways in the pathogenesis of type 1 diabetes mellitus, and provide more insights into the genetic basis of type 1 diabetes mellitus.", + "A systematic genomewide search for type 2 diabetes-susceptibility genes was performed on a subset of 440 participants in the 27 most informative extended families.Of the 440 individuals, 116 are diabetics (including probands), giving a prevalence of 26.4%.There are 3,745 relative pairs, with varying degrees of genetic", + "Genome-wide association studies (GWAS) have discovered germline genetic variation associated with type 2 diabetes risk (1)(2)(3)(4).One of the largest GWAS, involving DNA taken from individuals of European descent and conducted by the DIAGRAM (DIAbetes Genetics Replication And Meta-analysis) consortium, identified 65 loci associated with type 2 diabetes risk (1).However, for most of these loci, the precise identity of the affected gene and the molecular mechanisms underpinning the altered risk are not known.", + "Figure5.Consideration of the human gene network boosts recovery of validated type 2 diabetes genes from GWAS analysis of 2000 patients and 3000 controls. (A,B) Plotted using the same conventions as in Figure4, analyzing WTCCC GWAS data (Wellcome Trust Case Control Consortium 2007) for type 2 diabetes alone and in combination with HumanNet and measuring performance as AUC (<5% FPR) for recovering the top 20 genes from a type 2 diabetes meta-analysis of 4549 cases and 5579 controls(Zeggini et al. 2008).As for Crohn's disease, consideration of the network boosts performance across a wide range of parameter values.Notably, consideration of the network strongly implicates the genes CTNNB1 and BACH2 in type 2 diabetes; CTNNB1 is well studied in connection with type 2 diabetes and BACH2 has been previously implicated in type 1 diabetes and celiac disease (e.g.,Cooper et al. 2008;Madu et al. 2009), but not type 2 diabetes.", + "A new generation of genetic studies of diabetes is underway.Following from initial genome-wide association (GWA) studies, more recent approaches have used genotyping arrays of more densely spaced markers, imputation of ungenotyped variants based on improved reference haplotype panels, and sequencing of protein-coding exomes and whole genomes.Experimental and statistical advances make possible the identification of novel variants and loci contributing to trait variation and disease risk.Integration of sequence variants with functional analysis is critical to interpreting the consequences of identified variants.We briefly review these methods and technologies and describe how they will continue to expand our understanding of the genetic risk factors and underlying biology of diabetes.", + "One attractive methodology to circumvent the puzzle of choosing either a hypothesis-driven or an exploratory research may be the strategy of gene prioritization offered by the new bioinformatics tools based on the biological plausibility of a gene-disease association and on knowledge of the protein function. 6e propose an approach for expanding the selection of genes or loci of interest and prioritizing associations over GWAs related with genetic susceptibility to type 2 diabetes.The proposal profits from the recent initiatives of data sharing of the genome scan results that make the information publicly available as soon as they are generated and checked for quality.Both the DGI and the WTCCC are committed to embracing these principles as they made available all the phenotype-genotype data for type 2 diabetes.", + "In this review, however, we focus on a different route from human genetics to translation, one that derives estimates of an individual's predisposition to diabetes and its subtypes (in the form of polygenic scores) from the patterns of individual geneticvariation at sites known to influence diabetes predisposition.", + "Family-based studies of the genetic determinants of type 2 diabetes and related precursor quantitative traits (QTs, e.g.plasma insulin and glucose levels) and GWA studies have now provided an abundance of evidence for potentially causative genes.These results have been drawn together onto a single map of the human genome sequence [86].The goal is to look for genomic locations where the presence of a potential underlying type 2 diabetes gene has been attested to repeatedly-diabetes genetic 'hot spots'.Such replication increases our confidence of the presence of an underlying gene.While GWA studies look for diabetes genes using a different approach to linkage analysis, the ultimate goal is the same-to find the genetic determinants of the disease.Therefore, the results of linkage and association must eventually match each other.The current analysis identifies multiple linkage locations that differ from those found in the recent GWA studies [87-89], and suggests the location of additional major type 2 diabetes susceptibility genes.", + "INTRODUCTIONMultiple genome-wide association studies (GWASs) have correlated type 2 diabetes mellitus (T2DM) with genetic variants, yielding a large number of loci and associated gene products that are linked to the disease phenotype-often with little or no insight into the mechanism underlying that link (Hivert et al., 2014).The current challenge is to establish robust systems to systematically evaluate the role of these loci using disease-relevant cells.Previous studies have used patient samples, cell lines, or animal models to seek mechanistic insight but with significant limitations.Large variation is observed in primary patient samples, perhaps due to genetic heterogeneity, whereas animal models present major physiological and metabolic differences that hamper understanding of the precise function of human genes in T2DM.Therefore, a robust system to systematically evaluate the role of T2DM-associated genes using disease-relevant human cells will provide an important tool for diabetes research and spur the development of precision (allele-specific) therapies, exemplified by the use of sulfonylurea drugs to treat patients carrying certain KCNJ11 mutations (Gloyn et al., 2004).", + "Genomic information associated with Type 2 diabetes.", + "To gain insights into how the linking nodes of our final network contribute to T2D biology, we used the DisGeNET database [37], which collates gene-disease information from public data as well as from literature via natural language processing tools.We focused on the 274 linking nodes included in our model to avoid circularity arising from using the seeds, and identified 92 (~33%) with known links to T2D (Additional file 1: Table S2).Examples include as follows: (a) NEUROD1 which encodes a transcription factor that is involved in the development of the endocrine cell lineage and has been implicated in monogenic diabetes [38], (b) PRKCB involved in insulin resistance [39] and (c) GNAS, implicated in beta-cell proliferation [40].For this last gene, mouse knockouts have been shown to produce phenotypes concordant with diabetes [41].These examples demonstrate the potential of these analyses to draw in \"linking\" nodes as related to T2D even when they are not located within genome-wide association signals.", + " Human Genome Project -its Implications in Diabetes GeneticsThe USA coordinator of the Human Genome Project at the National Institute of Health (NIH), Francis Collins (Bethesda, MD), expects the entire human genome to be sequenced by 2002, the complete sequence of chromosomes 22 and 7 already being available in 1999.The NIH will invest US$ 75 million to identify another 500 000 SNPs genome wide.The USA SNP mapping will be based on 500 cell lines and would have to be followed by linkage mapping in all major populations.The other global players of the Human Genome Project, including the SNP consortium and several private companies, are also putting major efforts into the identification of genes encoding type 2 diabetes.Extensive international collaborations will be crucial in order to carry the enormous financial and manpower burden needed to achieve these goals.Therefore, the data generated must be freely accessible throughout the scientific community.As diabetes will become a WHO priority in 2000, this might foster more investment into the research of the genetics of diabetes.", + "Genetic predisposition to diabetes mellitus type 2: will large collaborative efforts be able to overcome the geneticist's nightmare?" + ], + [ + "This approach requires the accumulation and integration of many types of data,and also requires the use of many types of statistical tools to extract relevant patterns ofcovariation and causal relations as a function of genetics, environment, stage, and treatment. Inthis protocol we explain how to use the GeneNetwork web service, a powerful and free onlineresource for systems genetics. We provide workflows and methods to navigate massive multiscalardata sets and we explain how to use an extensive systems genetics toolkit for analysis andsynthesis.", + "GeneNetwork is an interactive software (Geisert et al. , 2009), which enables usersreadily to reconstruct genetic network based on microarraydata without being intimately involved in complicatedmathematical computation. Materials and methodsMiceOne pair of heterozygous (lew/ ) mice was purchasedfrom the Mouse Mutant Stock Resource colonies at TheJackson Laboratory (TJL). A breeding colony was thenestablished by mating them at the University of TennesseeHealth Science Center (UTHSC).", + "T2DM-GeneMiner web toolIn order to allow users to screen the disease potential of any given gene of interest we developed T2DM-GeneMiner, a web interface summarizing the results of our work (Figure 1, [35]).The user interface is shown for the wellknown Adipoq and the resulting bar plots for two other genes, Pdk4 and Cfd, with lower content of available infor-mation.The resource is searchable by gene or protein IDs (for example Ensembl ID or gene symbol).The score distribution is shown as a bar plot and, where available, functional information is displayed.The two rightmost bars show the entropy, indicating uniform or specific score distribution, and the score.The red line at the score bar indicates the cut-off.Background: Multiple functional genomics data for complex human diseases have been published and made available by researchers worldwide.The main goal of these studies is the detailed analysis of a particular aspect of the disease.Complementary, meta-analysis approaches try to extract supersets of disease genes and interaction networks by integrating and combining these individual studies using statistical approaches.Results: Here we report on a meta-analysis approach that integrates data of heterogeneous origin in the domain of type-2 diabetes mellitus (T2DM).Different data sources such as DNA microarrays and, complementing, qualitative data covering several human and mouse tissues are integrated and analyzed with a Bootstrap scoring approach in order to extract disease relevance of the genes.The purpose of the meta-analysis is two-fold: on the one hand it identifies a group of genes with overall disease relevance indicating common, tissue-independent processes related to the disease; on the other hand it identifies genes showing specific alterations with respect to a single study.Using a random sampling approach we computed a core set of 213 T2DM genes across multiple tissues in human and mouse, including well-known genes such as Pdk4, Adipoq, Scd, Pik3r1, Socs2 that monitor important hallmarks of T2DM, for example the strong relationship between obesity and insulin resistance, as well as a large fraction ( 128) of yet barely characterized novel candidate genes.Furthermore, we explored functional information and identified cellular networks associated with this core set of genes such as pathway information, protein-protein interactions and gene regulatory networks.Additionally, we set up a web interface in order to allow users to screen T2DM relevance for any -yet non-associated -gene. Conclusion:In our paper we have identified a core set of 213 T2DM candidate genes by a metaanalysis of existing data sources.We have explored the relation of these genes to disease relevant information and -using enrichment analysis -we have identified biological networks on different layers of cellular information such as signaling and metabolic pathways, gene regulatory networks and protein-protein interactions.The web interface is accessible via http://t2dmgeneminer.molgen.mpg.de.", + "Exploring genes, molecules, and phenotypes is easily accomplished using GeneNetwork. In thismanuscript we will outline some simple use cases, and show how a small number of plausiblecandidate genes can be identified for an immune phenotype. 1. DataOnce you have navigated to genenetwork.org, there are two ways to search for data in GN. Thefirst is to use the global search bar located at the top of the page (Figure 1). This is a newfeature in GN that allows researchers to search for genes, mRNAs, or proteins across all of thedatasets.Similarly, by using the dropdown menu on the left (Figure 1), a user can switch to phenotypes,and search for any phenotype of interest in the same way. Figure 1: The global search bar, also called the Search All function, is a good area to start exploringgenes, mRNA, and proteins within GeneNetwork. To best use this new tool, use standard gene symbolscontaining more than two characters in the name. Another area to acquire data is the Select and search pull-down menus (Figure 2). To getstarted, the user has to choose a population of interest.", + "This approach requires the accumulation and integration of many types of data,and also requires the use of many types of statistical tools to extract relevant patterns ofcovariation and causal relations as a function of genetics, environment, stage, and treatment. Inthis protocol we explain how to use the GeneNetwork web service, a powerful and free onlineresource for systems genetics. We provide workflows and methods to navigate massive multiscalardata sets and we explain how to use an extensive systems genetics toolkit for analysis andsynthesis.", + "The Web tool G2D (Genes to Diseases) prioritizesgenes across a user-entered chromosomal region according to their possible relationto an inherited disease by a combination of data mining of OMIM, PubMed MESH9.6 IDENTIFICATION OF POTENTIALLY FUNCTIONAL POLYMORPHISMS211terms and Gene Ontology (GO) classification. The tool allows users to inspect anyregion of the human genome to find candidate genes related to a genetic disease orphenotype defined in OMIM. It does this by identifying GO terms that match MESHterms for an OMIM record.", + "Researchers, however, have thepossibility to fully explore the results by altering the thresholds on the open web resource. Although onlyprotein-coding genes were included in our analysis, the same approach can be applied to non-coding genes63to reveal their potential functions. Similarly, GeneBridge can also be utilized to identify novel gene-diseaseassociations based on known disease-associated genes from databases, such as the Human DiseaseOntology (DO) [207] or DisGeNET [208]. The GeneBridge toolkit could also be applied to large-scaleproteomics datasets after correcting for the background of all measured proteins.", + "Protein interaction networksWe searched for protein networks spanning the regions shown to interact genetically (P values < 0.05; Table 2).This was performed using a high-confidence human protein inter- Markers of predictive value for T1D identified by decision tree analysis on T1D genome scan data from 1321 affected sib pair families.Markers identified in the total data set are ranked according to significance level (P < 0.05).Markers from data subsets are 'selected markers' and were selected on basis of whether they confirm loci from the latest T1D genome scan [25] or other references [26; 27].D.f. = degrees of freedom.", + "Exploring genes, molecules, and phenotypes is easily accomplished using GeneNetwork. In thismanuscript we will outline some simple use cases, and show how a small number of plausiblecandidate genes can be identified for an immune phenotype. 1. DataOnce you have navigated to genenetwork.org, there are two ways to search for data in GN. Thefirst is to use the global search bar located at the top of the page (Figure 1). This is a newfeature in GN that allows researchers to search for genes, mRNAs, or proteins across all of thedatasets.Similarly, by using the dropdown menu on the left (Figure 1), a user can switch to phenotypes,and search for any phenotype of interest in the same way. Figure 1: The global search bar, also called the Search All function, is a good area to start exploringgenes, mRNA, and proteins within GeneNetwork. To best use this new tool, use standard gene symbolscontaining more than two characters in the name. Another area to acquire data is the Select and search pull-down menus (Figure 2). To getstarted, the user has to choose a population of interest.", + "Users begin by selecting one or more human diseases andclicking on Compare. The genes associated with the selected diseaseare tested for enrichment against all sets of known associated genes forworm phenotypes. The result reveals functionally coherent, evolutionarily conserved gene networks. Alternatively, users can also start by selecting worm phenotypes,which are tested against human diseases. In addition to cross-speciestesting, results of within-species disease enrichment are also available(e.g. to nd the closest related human disease for another input humandisease).", + "GeneNetwork is an interactive software (Geisert et al. , 2009), which enables usersreadily to reconstruct genetic network based on microarraydata without being intimately involved in complicatedmathematical computation. Materials and methodsMiceOne pair of heterozygous (lew/ ) mice was purchasedfrom the Mouse Mutant Stock Resource colonies at TheJackson Laboratory (TJL). A breeding colony was thenestablished by mating them at the University of TennesseeHealth Science Center (UTHSC).", + "Genome Biol 8(2):R25Hubner N, Wallace CA, Zimdahl H, Petretto E, Schulz H et al (2005)Integrated transcriptional profiling and linkage analysis for identification of genes underlying disease. Nat Genet 37(3):243253Ihaka R, Gentleman RC (1996) R: a language for data analysis andgraphics. J Comput Graph Stat 5:299314Keller MP, Choi Y, Wang P, Davis DB, Rabaglia ME et al (2008) Agene expression network model of type 2 diabetes links cellcycle regulation in islets with diabetes susceptibility.", + "We next constructed protein-protein interaction networks.To do this, we selected 76 genes known from monogenic forms of diabetes, obesity, and hypertension or GWAS hits (type 2 diabetes, obesity, and hypertension) for which the lead association lies within the protein-coding part of the gene (Table S3).", + "First, we describe the construction of a functional network for human genes.This network spans 87% of validated protein coding genes, and provides strong predictive power for a majority of currently known genetic diseases.We evaluate six alternate approaches for prioritizing candidate disease genes using this network, and demonstrate the strongest overall performance with algorithms related to Google's PageRank.We then show that this network, in conjunction with genome-wide association data for Type 2 diabetes and Crohn's disease, boosts the identification of disease-associated genes that were discovered in later meta-analyses.This work suggests both a specific strategy and a general path to future improvements for the interpretation of GWAS data.Taken together, our work demonstrates that a high-quality functional network for human genes can provide a powerful resource for identifying causal genes in human disease.A new functional gene network for human genesIn order to test the general ability of a gene network to prioritize disease genes, particularly in conjunction with GWAS studies, we constructed a genome-scale functional network of human genes, incorporating diverse expression, protein interaction, genetic interaction, sequence, literature, and comparative genomics data, including both data collected directly from human genes, as well as that from orthologous genes of yeast, worm, and fly.The resulting HumanNet gene network can be accessed through a web interface (http://www.functionalnet.org/humannet).Using this interface, researchers can easily search the network using a set of ''seed'' Network-guided genome-wide association mining genes of interest.The interface returns a list of genes ranked according to their connections to the seed genes, together with the evidence used to identify each coupling.The interactions and evidence can be downloaded, and a network visualization tool has been incorporated.All linkages can also be downloaded for independent analysis.", + "Construction and analysis of the T2D risk genes networkTo further sift the results and explore functional connections, we also mapped genes onto known gene sets.For this purpose, we used DAVID [22,23] to search for enriched KEGG [24] pathways.We also used GARNET [25] to identify enriched Gene Ontology categories and their relationships.", + "GeneNetwork.org also offers a powerful statistical platform foronline network analyses and mapping, enabling numerous molecular questions to be probed in one centralized location(Chesler et al. , 2003, 2005; Li et al. , 2010; Mulligan et al. , 2012,2017, 2019). Most data are from groups of animals or humanswho have been fully genotyped or even sequenced. As a result, itcan be used to model causal networks that link DNA differencesto traits such as differences in expression, cell number, volumes,and behavior using real-time computation and graphing." + ] + ], + "task_id": [ + "173BE1F01E4A6074A0EBB7CC6137AC8D", + "F55160D302C899B2131AA8502A640684", + "B52AF52D46499DE2B98933F1786EC9E0" + ] +}
\ No newline at end of file diff --git a/gnqa/data/study1/datasets/human/dataset_domainexpert_diabetes_2.json b/gnqa/data/study1/datasets/human/dataset_domainexpert_diabetes_2.json new file mode 100644 index 0000000..4a847fa --- /dev/null +++ b/gnqa/data/study1/datasets/human/dataset_domainexpert_diabetes_2.json @@ -0,0 +1,113 @@ +{ + "question": [ + "what are confounding factors in diabetes?", + "How is the immune system related to diabetes?", + "What are the genomic variants associated with immune system components and diabetes?", + "What is the role of the immune system in the metabolomics of diabetes and associated conditions?", + "Can the landscape of QTL and GWAS hits be used to dissect the role of immune system in diabetes and complications?" + ], + "answer": [ + "Confounding factors in diabetes include age, sex, BMI, waist circumference, family history, smoking, hypertension, diet quality, physical inactivity, obesity, prediabetes, metabolic syndrome, exposure to environmental pollutants, and certain genetic factors. Socioeconomic status, psychological stress, and certain lifestyle-related risk factors such as physical inactivity and poor diet are also considered confounding factors.", + "The immune system is closely related to diabetes, particularly Type 1 diabetes, which is an autoimmune disorder. In genetically susceptible individuals, the immune system can be triggered by certain environmental factors to produce islet autoantibodies against pancreatic cells, increasing their risk for Type-1 diabetes. In Type 1 diabetes, the immune system destroys insulin-producing cells in the pancreatic islets of Langerhans, leading to insulin deficiency and hyperglycemia. The balance between regulatory and effector T cells determines disease risk, activation, and progression. Genetic factors also play a role in controlling the immune system and influencing susceptibility to Type 1 diabetes.", + "The genomic variants associated with immune system components and diabetes include variants in JAZF1, CDC123-CAMK1D, TSPAN8-LGR5, THADA, ADAMTS9, and NOTCH2. These variants have been reported to affect pancreatic -cell functions. Additionally, variants within the HLA locus and non-HLA genetic loci from published GWAS of European background were found to affect immune phenotypes and function. Variants in 63 independent T1D loci were present in the data, and 13 of these were associated with susceptibility to T1D. Other T1D-associated variants were found in the Immunochip, a large scale genotyping platform.", + "The immune system plays a significant role in the metabolomics of diabetes and associated conditions. Chronic low-grade inflammation and activation of the innate immune system are associated with insulin resistance and -cell dysfunction in type 2 diabetes mellitus (T2DM). The infiltration of macrophages into pancreatic islets accelerates -cell dysfunction. These macrophages secrete chemokines and stimulate immune cell migration, as well as the release of pro-inflammatory cytokines. High blood concentrations of pro-inflammatory cytokines, such as C-reactive protein, interleukin-6 (IL-6), and tumour necrosis factor (TNF), are associated with an increased risk of T2DM. Furthermore, cellular oxidative stress, which induces an inflammatory response, is known as one of the leading causes of insulin resistance and islet -cell dysfunction in T2D.", + "Yes, the landscape of QTL and GWAS hits can be used to dissect the role of the immune system in diabetes and its complications. The studies mentioned in the text have identified associations between genetic factors and immune-related mechanisms in diabetes. This includes the identification of pathways and genes that may serve as potential intervention targets. Furthermore, the studies have shown a correlation between immune-cell populations and ex vivo cytokine production in response to various stimulations, suggesting a direct link between genetic variants and immune functionality in diabetes." + ], + "contexts": [ + [ + "A wide array of other dietary compounds and environmental triggers have been shown to affect diabetes development in animal models, and for some of these such as omega-3 fatty acids (312), there is limited proof in human patients.", + "Type 2 diabetes (T2D) is a result of complex gene-environment interactions, and several risk factors have been identified, including age, family history, diet, sedentary lifestyle and obesity.Statistical models that combine known risk factors for T2D can partly identify individuals at high risk of developing the disease.However, these studies have so far indicated that human genetics contributes little to the models, whereas socio-demographic and environmental factors have greater influence 1 .Recent evidence suggests the importance of the gut microbiota as an environmental factor, and an altered gut microbiota has been linked to metabolic diseases including obesity 2,3 , diabetes 4 and cardiovascular disease 5 .", + "Dietary factors [source]Reduced risk Mediterranean diet pattern [130] Fruit and vegetable intake [131] Fermented dairy products [132] Fatty fish intake [133] Tea intake [134] Elevated risk Red and processed meat intake [135] Sweetened beverages [136] Null association Total dairy products or milk intake [132] Total fish intake [133] Dietary energy density [137] Carbohydrate intake [138] a Further information about the InterAct project can be found at www.inter-act.eu.There are also other forthcoming publications on dietary factors and the risk of diabetes.cohort studies also found an increased diabetes incidence among passive smokers [142].Finally, in-utero exposure to maternal smoking is associated with overweight and obesity which may predispose to diabetes and other metabolic disturbances in the offspring [143].Psychosocial factors encompass two broad areas which are more closely related to socioeconomic status or to psychological/psychiatric factors.Within the InterAct study, people who had a lower educational level had a 70% higher relative risk for diabetes, which remained at around 40% even after adjustment for differences in obesity [144].The association between emotional stress, job strain, anxiety and depressive disorders and increased incidence of type 2 diabetes is less well-established, but recent data [145][146][147] strongly indicate that this area merits further study to better understand the relationship between these potential risk factors.", + "It isplausible that such factors may also operate at the very beginning of the humanlifecourse but their identity, and the environmental factors they synergize with,remain unknown (Bloomfield et al 2006), awaiting discovery. Chaufan also makes a strong case that inequalities in the provision of healthcare and education are compounding the growing problem of type 2 diabetes inthe developed (and increasingly, less developed) nations today (Chaufan 2007). This is an important point, and one with which we agree, but it is concerned primarily with issues about resource allocation and distributive justice.Type 2 diabetes mellitus as an illustrative exampleThe persuasiveness of Chaufans argument comes from her dependence on type 2diabetes as her main illustrative example. It is true that environmental factors canaccount for up to 8090% of the population attributable risk for this condition(Cooper & Psaty 2003), and it may be that in a profoundly diabetogenic environment such as exists in many 21st century developed countries, knowing about G E interactions adds little per se to the management of an overweight and inactivepopulation.", + "Understanding risk factors for diabetes is therefore critical to its early diagnosis.Key risk factors for diabetes include obesity (Mokdad et al. 2001;Must et al. 1999) and prediabetes.A fasting blood sugar well into the \"reference range\" has been shown to be a risk factor for diabetes (Tirosh et al. 2005).Indeed, we have shown that the 4-year risk of diabetes among participants in the FHS with prediabetes ranges from a 12.7-fold increase (in men) to a 22.3fold increase (in women) (Levitzky et al. 2008).The metabolic syndrome, a constellation of metabolic risk factors that have been observed to cluster with each other more than would be expected by chance (Meigs et al. 1997), was formally acknowledged as a syndrome involving the fulfillment of at least 3 criteria, including elevated waist circumference, impaired fasting glucose, elevated blood sugar, elevated triglycerides, or low high-density lipoprotein cholesterol (Expert Panel on Detection, Evaluation, and Treatment of High Blood Cholesterol in Adults 2001).The presence of the metabolic syndrome is a strong risk factor for the subsequent development of diabetes, conferring a nearly 7-fold increased risk among those with as compared with those without the metabolic syndrome (Wilson et al. 2005).As a means of better trying to identify who is at early risk for diabetes, a prediction equation for incident diabetes was developed in the FHS (Wilson et al. 2007).A \"simple clinical model\" was derived, which includes parental history of diabetes, obesity, hypertension, low high-density lipoprotein cholesterol, elevated triglyceride levels, and impaired fasting glucose; the c-statistic for this model was robust at 0.85.Importantly, more complex models with variables such as waist circumference, insulin resistance, 2-hour postprandial glucose derived from an oral glucose tolerance test, and C-reactive protein were not independent predictors of diabetes.This prediction model highlights how simple clinical variables that are readily available can be used to identify individuals at high risk for developing diabetes even before they have evidence of the disease.In aggregate, these findings from the FHS make several important points.First, the incidence rate of diabetes is increasing.Second, because the relative risk of diabetes as a CVD risk factor has remained constant over time, the relative importance of diabetes with respect to CVD has increased.Finally, individuals with diabetes remain inadequately managed with regard to CVD risk factor levels.These findings highlight the importance of early identification of diabetes and a means to identify diabetes early in the life course to promote the early aggressive management of CVD risk factors.Another major remaining question is why the relative risk for diabetes as a CVD risk factor has failed to decrease over time.As described earlier, the rates of CVD among participants in the FHS have decreased; but this reduction has been outpaced by those without diabetes (Fox et al. 2004a).In terms of primary prevention, we can aim to reduce the burden of uncontrolled CVD risk factors, including incompletely treated hypertension, dyslipidemia, and participants with diabetes who continue to smoke (Preis et al. 2009a).Observational studies such as the FHS can help to explore rates of treatment and control for known modifiable risk factors.", + "DietExcessive caloric intake is a major driving force behind escalating obesity and type 2 diabetes epidemics worldwide, but diet quality also has independent effects.In the Nurses' Health Study (NHS), we found that the quality of fats and carbohydrates play an important role in the development of diabetes, independent of BMI and other risk factors (11).In particular, higher dietary glycemic load (GL) and trans fat are associated with increased diabetes risk, whereas greater consumption of cereal fiber and polyunsaturated fat is associated with decreased risk (Fig. 2).In a meta-analysis, we found that a 2 serving/day increment in whole-grain intake was associated with a 21% lower risk of diabetes (12).", + "IntroductionThe aetiology of type 2 diabetes is poorly defined: several studies indicate that the disease results from a combination of genetic susceptibility and external risk factors [1].According to this multifactorial model, genetically predisposed subjects will not necessarily develop overt disease unless they are also exposed to particular environmental factors [2].Important risk factors for the development of type 2 diabetes include a family history of diabetes, increased age, hypertension, lack of physical exercise, and obesity [1].", + "Environmental factors such as age, weight gain, excessive energy intake, physical inactivity and inheritance of genes predisposing to insulin resistance are major risk factors for development of T2D.Nutrient imbalances such as deficiency of vitamin D [19] and increased iron absorption and storage in the body [20,21], changes in gut microbiota [22] and exposure to pollutants [23] may confer risk for development of T2D.Early-life or intrauterine environment [24] and epigenetics [25] also play a role in conferring susceptibility to diabetes.Obstructive sleep apnea, which is associated with obesity, insulin resistance and glucose intolerance, also contributes to the pathology of T2D [26].", + "What these predisposing factors share is an ability to negatively impact the glucose homeostasis system through worsening of insulin resistance or to impair b-cell function.Superimposing these factors onto a genetically compromised glucose homeostasis system raises the risk of progressing to hyperglycemia.It is the rapid emergence of these disadvantageous environmental factors that is causing the worldwide diabetes epidemic.This concept of environmental changes promoting diabetes was highlighted many years ago by populations that rarely experienced type 2 diabetes, but then moved from a nomadic or farm existence to urban environments followed by an explosion of diabetes, typically with profound obesity: Pima Indians in the Southwest U.S., Saharan nomadic tribes, Australian Aborigines, and many others.Particularly dramatic were studies that showed reversal of the diabetes when they returned to their prior way of life (15).A recent example of this is the rapidly rising incidence of type 2 diabetes in China and India as people move from the country to cities-there is a 0.1-0.2%incidence of diabetes for rural farmers in China as opposed to well more than 5% for city dwellers.Perhaps the scariest example of this is children in the U.S. where the obesity statistics worsen yearly.As many as 20% of U.S. children are now obese, and they are developing all of the elements of the metabolic syndrome-insulin resistance, hypertension, hyperlipidemia, and glucose intolerance (16).", + "Taken together, non-invasive risk factors including age, sex, BMI, waist circumference, family history, smoking or hypertension form the basis of all diabetes risk scores.Routine clinical biomarkers, such as glucose, HbA 1c , lipids and uric acid, have the potential to improve the predictive ability of these basic risk factors, but AROCs rarely exceed 0.85.This argues in favour of a search for novel risk factors to further improve the accuracy of diabetes risk models.", + "There are two major factors that underlie these alarming projections.The first is T2D is associated with age, and Western populations are aging rapidly.The second major explanation is our lifestyles have changed dramatically in recent years.Epidemiological studies have identified strong T2D risk relationships for obesity, sedentary behavior [2][3][4], and diets rich in energy [5], processed carbohydrates [6], and animal fats [7].Collectively, these lifestyle factors impede the actions of insulin and raise hepatic glucose production, which can result in the diminution of endogenous insulin production and T2D.The strongest evidence for a causal relationship between adverse lifestyle behaviors and T2D comes from randomized controlled trials that show intensive lifestyle interventions involving structured exercise regimes which promote habitual physical activity (PA) and have a major beneficial impact on diabetes incidence in high-risk individuals [8,9].", + "In multivariate analyses (Table 3), diabetes was related to a higher risk of all-cause MCI even after adjusting for age, sex, ethnic group, years of education, APOE 4, hypertension, low-density lipoprotein level, heart disease, stroke, and current smoking (HR, 1.4; 95% confidence interval [CI], 1.1-1.8).", + "Clinical Factors Predicting Incidence of DiabetesIn both the MPP and Botnia studies, a family history of diabetes, an increased BMI, and increased levels of blood pressure and serum levels of triglycerides, apolipoprotein A-I, and liver enzymes were independent predictors of future type 2 diabetes (Table 1).In the MPP study, current smoking was also associated with a marked increase in the risk of diabetes.Impaired insulin secretion and action, particularly insulin secretion adjusted for insulin resistance (disposition index), were strong predictors of future diabetes.The presence of a first-degree family history of diabetes doubled the risk of the disease that was seen with an increased BMI (Fig. 2A) and a low disposition index (Fig. 2B).", + "The worldwide explosion of the rates of diabetes and other metabolic diseases in the last few decades cannot be fully explained only by changes in the prevalence of classical lifestyle-related risk factors, such as physical inactivity and poor diet.For this reason, it has been recently proposed that other \"nontraditional\" risk factors could contribute to the diabetes epidemics.In particular, an increasing number of reports indicate that chronic exposure to and accumulation of a low concentration of environmental pollutants (especially the so-called persistent organic pollutants (POPs)) within the body might be associated with diabetogenesis.In this review, the epidemiological evidence suggesting a relationship between dioxin and other POPs exposure and diabetes incidence will be summarized, and some recent developments on the possible underlying mechanisms, with particular reference to dioxin, will be presented and discussed.The worldwide explosion of the rates of diabetes and other metabolic diseases in the last few decades cannot be fully explained only by changes in the prevalence of classical lifestyle-related risk factors, such as physical inactivity and poor diet.For this reason, it has been recently proposed that other \"nontraditional\" risk factors could contribute to the diabetes epidemics.In particular, an increasing number of reports indicate that chronic exposure to and accumulation of a low concentration of environmental pollutants (especially the so-called persistent organic pollutants (POPs)) within the body might be associated with diabetogenesis.In this review, the epidemiological evidence suggesting a relationship between dioxin and other POPs exposure and diabetes incidence will be summarized, and some recent developments on the possible underlying mechanisms, with particular reference to dioxin, will be presented and discussed.", + "In sum, it is clear that multiple risk factors are involved in diabetes-associated cognitive decrements as well as in dementia in relation to diabetes 38 .On the basis of our assessment of the literature, it is also clear that there are still substantial knowledge gaps on how the risk factors interconnect, how the risk factors translate to potentially modifiable mechanisms and which genetic factors are involved.", + "Aetiological factorsProspective studies suggest that the main pathophysiological defects leading to type 2 diabetes are insulin resistance and a relative insulin secretory defect.The main aetiological risk factors are age, obesity, family history, and physical inactivity.Dietary risk factors have recently emerged: risk is increased by high consumption of red and processed meat 13 and sugar-sweetened beverages, 14 and reduced by intake of fruit and vegetables, 15 some types of dairy products, 16 and some overall dietary patterns. 17Novel strategies to use quantifiable nutritional biomarkers are paving the way for more detailed understanding of the association between diet and diabetes.Although the heritability of type 2 diabetes is high (30e70%) and more than 60 genetic variants related with diabetes risk have now been identified, 18 even when combined into a genetic score, known genes contribute little to the prediction of diabetes.Phenotype-based risk models provide greater discrimination for diabetes, and the addition of genotypic information adds no more than 5e10% improvement in prediction.The current conclusion is that genetic variants provide insights into biological pathways and pathogenesis of diabetes, but not its prediction.It is likely that interactions between the environment/lifestyle and genetic factors provide the explanation for the risk of type 2 diabetes, but demonstrating such interaction is challenging.Encouraging research findings have recently shown higher absolute risk of diabetes associated with obesity at any level of genetic risk. 19evention and screening" + ], + [ + "V. IMMUNE EVENTS IN TYPE 1 DIABETESSeveral silent immune events occur before the clinical symptoms of type 1 diabetes become apparent.Most importantly, autoantibodies are produced and self-reactive lymphocytes become activated and infiltrate the pancreas to destroy the insulin-producing beta-cells in the islets of Langerhans (56).This persistent, targeted destruction may go undetected for many years, and the first clinical symptoms only become apparent after a majority of the beta-cells have been destroyed or rendered dysfunctional, making the individual dependent on insulin for survival (Fig. 2).Therefore, high priority is given to the search for \"biomarkers\" as whistleblowers of an ongoing autoimmune response.We will highlight some important immunological events here.Additional information on immune cell cross-talk in T1D can be found elsewhere (243).", + "IntroductionType 1 diabetes (T1D) results from immune-mediated selective destruction of pancreatic islet cells resulting in insulin deficiency and hyperglycemia [1,2].Symptoms of polydipsia, polyuria, polyphagia and weight loss manifest when significant numbers of islet cells have been destroyed.However, antibodies to islet autoantigens can be detected in peripheral blood prior to clinical disease [1,3].With early diagnosis of disease or assessment of risk, immune therapy may impede islet destruction and preserve insulin production, delaying onset of clinical manifestations [2].", + "Background: The immune system matures mainly during the postnatal period through breastfeeding, and is partly modified by nutritive factors.The manner by which early feeding practices influence the development of type 1 diabetes mellitus (TID) is not clear.Also the use of genetics in prognostic evaluation of the disease has not be studied intensely.", + "Figure 1-Schematic of the pathogenesis of diabetes.Genetic and environmental factors, acting via complex immunological mechanisms, result in b-cell destruction that leads to type 1 diabetes.Gene-environment interactions also underlie susceptibility to type 2 diabetes, the pathophysiological hallmarks of which include insulin resistance and b-cell dysfunction.", + "The results revealed that a major type of immune actors known as T cells are under the control of genetic factors associated with type 1 diabetes susceptibility.For instance, a specific type of T cells showed shared genetic control with type 1 diabetes.In addition, 15 loci were identified that influenced immune responses in the patients.Among those, 12 have never been reported to be involved in immune responses in healthy people, implying that these regions might only regulate the immune system of individuals with type 1 diabetes and other similar disorders.Finally, Chu, Janssen, Koenen et al. propose 11 genes within the identified loci as potential targets for new diabetes medication.These results represent an important resource for researchers exploring the genetic and immune basis of type 1 diabetes, and they could open new avenues for drug development.Many studies have highlighted the role of environmental, genetical, and immunological factors in the pathogenesis of T1D (Pociot and Lernmark, 2016;Rewers and Ludvigsson, 2016).Environmental factors such as being overweight, infections, microbiome composition, and dietary deficiencies have been reported as risk factors for T1D (Rewers and Ludvigsson, 2016).In turn, the immunological pathogenesis (Cabrera et al., 2016) of T1D includes innate inflammation and adaptive immunity, such as enhanced T cell responses (Hundhausen et al., 2016).In the last two decades, large genome-wide association studies (GWAS) performed have underscored the contribution of genetic polymorphisms to T1D for the susceptibility, with ~60 genomic loci associated with T1D risk identified (Barrett et al., 2009;Bradfield et al., 2011;Cooper et al., 2008;Grant et al., 2009;Huang et al., 2012;Onengut-Gumuscu et al., 2015;Ram et al., 2016).While these loci show significant enrichment in specific immune-related biological pathways, such as cytokine signaling and T cell activation (Barrett et al., 2009;Cooper et al., 2008), the functional consequences of many of these loci and genetic variants are still unknown.We thus lack information that could link the genetic susceptibility factors to the immunological pathways potentially important for T1D pathogenesis.The genetically regulated inflammatory response signature in T1D may also be relevant for the inflammatory response in general and may become modified by the chronic hyperglycemic state.The composition and activity of the human immune system is under genetic control, and people with certain changes in their genes are more susceptible than others to develop type 1 diabetes.Previous studies have identified around 60 locations in the human DNA (known as loci) associated with the condition, but it remains unclear how these loci influence the immune system and whether diabetes will emerge.Interrelationship between immune-cell counts and cytokine production in T1DWe collected blood samples from 243 T1D patients (300DM cohort), following a previously described methodology (Aguirre-Gamboa et al., 2016;Ter Horst et al., 2016;Li et al., 2016).The baseline characteristics of the 300DM and a cohort of healthy individuals (500FG) are shown in Supplementary file 1B.Their median age was 53.5 years (range 20-85), and they had a median diabetes duration of 28 years (range 1-71 years).Hence, the cohort generally consisted of middle-aged people with long-standing T1D.We measured 72 types of immune cells covering both lymphocytes and monocyte lineages and 10/6 (300DM/500FG) different cytokines released in response to stimulation with four types of human pathogens in both cohorts (Figure 1A).Background: The large inter-individual variability in immune-cell composition and function determines immune responses in general and susceptibility o immune-mediated diseases in particular.While much has been learned about the genetic variants relevant for type 1 diabetes (T1D), the pathophysiological mechanisms through which these variations exert their effects remain unknown.Methods: Blood samples were collected from 243 patients with T1D of Dutch descent.We applied genetic association analysis on >200 immune-cell traits and >100 cytokine production profiles in response to stimuli measured to identify genetic determinants of immune function, and compared the results obtained in T1D to healthy controls.Results: Genetic variants that determine susceptibility to T1D significantly affect T cell composition.Specifically, the CCR5+ regulatory T cells associate with T1D through the CCR region, suggesting a shared genetic regulation.Genome-wide quantitative trait loci (QTLs) mapping analysis of immune traits revealed 15 genetic loci that influence immune responses in T1D, including 12 that have never been reported in healthy population studies, implying a disease-specific genetic regulation.Conclusions: This study provides new insights into the genetic factors that affect immunological responses in T1D.Background: The large inter-individual variability in immune-cell composition and function determines immune responses in general and susceptibility o immune-mediated diseases in particular.While much has been learned about the genetic variants relevant for type 1 diabetes (T1D), the pathophysiological mechanisms through which these variations exert their effects remain unknown.Methods: Blood samples were collected from 243 patients with T1D of Dutch descent.We applied genetic association analysis on >200 immune-cell traits and >100 cytokine production profiles in response to stimuli measured to identify genetic determinants of immune function, and compared the results obtained in T1D to healthy controls.Results: Genetic variants that determine susceptibility to T1D significantly affect T cell composition.Specifically, the CCR5+ regulatory T cells associate with T1D through the CCR region, suggesting a shared genetic regulation.Genome-wide quantitative trait loci (QTLs) mapping analysis of immune traits revealed 15 genetic loci that influence immune responses in T1D, including 12 that have never been reported in healthy population studies, implying a disease-specific genetic regulation.Conclusions: This study provides new insights into the genetic factors that affect immunological responses in T1D.", + "Type 2 diabetes is characterized by the failure of the -cells to compensate for peripheral insulin resistance (6).Within the last decade, an increasing body of evidence has accumulated in favor of a putative role of immuno-related mechanisms and factors in the pathogenesis of type 2 diabetes, both with regard to the progressive -cell failure and destruction and to the peripheral insulin resistance (2,3).", + "T1DM pathogenesis involves innate and adaptive immune activity (13) coupled with failures in central and peripheral tolerance mechanisms that enable expansion of disease-mediating autoreactive T cells (14).Other immune cells are also involved, including B cells, as evidenced by the development of autoantibodies that precede clinical onset in almost all patients (15).Chemokines and cytokines are involved in T1DM pathogenesis by influencing immune activity, impairing -cell function, and inducing -cell death (16,17).", + "If the pathogenesis of diabetes begins in very early life (perhaps even prenatally), then the immune status of the mother during pregnancy could be as relevant as the immune status of her diabetes-at-risk offspring.If so, then elucidating the genetic basis of Type I diabetes will also require analysis of maternal genotype and maternal-fetal genotype interactions.Very few studies of this nature have been conducted.Furthermore, if viral infection is involved in the initiation of the autoimmune process, then genetic differences between individuals in immune response towards viruses could alter their predisposition to Type I diabetes.", + "Figure 1-Genetic and environmental risk factors impact inflammation, autoimmunity, and metabolic stress.These states affect b-cell mass and/or function such that insulin levels are eventually unable to respond sufficiently to insulin demands, leading to hyperglycemia levels sufficient to diagnose diabetes.In some cases, genetic and environmental risk factors and gene-environment interactions can directly impact b-cell mass and/or function.Regardless of the pathophysiology of diabetes, chronic high blood glucose levels are associated with microvascular and macrovascular complications that increase morbidity and mortality for people with diabetes.This model positions b-cell destruction and/or dysfunction as the necessary common factor to all forms of diabetes.Among the environmental associations linked to type 1 diabetes are enteroviral and other infections (51,52) and altered intestinal microbiome composition (53).The timing of exposure to foods including cereal (54) and nutrients such as gluten ( 55) may influence b-cell autoimmunity.Low serum concentrations of vitamin D have been linked to type 1 diabetes.Perinatal risk factors and toxic doses of nitrosamine compounds have been implicated in the genesis of diabetes.", + "In type 1 diabetes, the autoimmune destruction of cells by the cellular and humoral immune system in the pancreatic islets of Langerhans leads to impaired insulin secretion and subsequently to hyperglycemia.This type of diabetes is characterized by the appearance of antigen-specific T cells and antibodies in peripheral blood which are directed against a variety of -cell antigens including glutamic acid decarboxylase, tyrosine phosphatase IA-2, a zinc transporter and insulin.The onset of type 1 diabetes frequently occurs before 20 years of age, but disease manifestation is also common in adult patients.Exogenous administration of insulin is necessary to maintain glucose homeostasis and to prevent early and late diabetic complications [32,36].In type 2 diabetes, comprising approximately 90% of the cases of diabetes mellitus, hyperglycemia is the consequence of a relative insulin deficiency and insulin resistance of various tissues including muscle and adipose tissue.While in early type 2 diabetes, insulin resistance and the resulting increased metabolic demand may be overcome by increased pancreatic insulin secretion, failure of cells to maintain adequate insulin production and a decrease in -cell mass are common in progressive disease, resulting in chronic hyperglycemia and loss of metabolic control [33,37,38].Hyperinsulinemia is associated with down-regulation of insulin receptors, thus further contributing to the exhaustion of insulin production in cells [39].Overweight and obesity are significant risk factors for type 2 diabetes, which is increasing as a consequence of the Western lifestyle.Hence, diabetes is expected to become be an even greater health problem in the future deserving further attention [33,37].", + "Brief Genetics ReportT ype 1 diabetes results from an immune-mediated destruction of insulin-producing -cells in the pancreatic islets of Langerhans.The activation of autoreactive lymphocytes and the cytokineinduced apoptosis of pancreatic -cells play a major role in the etiology of type 1 diabetes.1,25-Dihydroxyvitamin D 3 [ 1 , 2 5 ( O H ) 2 D 3 ] inhibits lymphocyte activation and affects other elements of the immune system, such as cytokine and immunoglobulin production, as well as major histocompatibility complex (MHC) class II and cluster of differentiation (CD)-4 expression (1).In NOD mice, the development of diabetes can be prevented by administration of 1,25(OH) 2 D 3 ( 2 ) , which inhibits lymphocyte activation and restores the altered ratio of CD4/CD8 cells.", + "Type 1 diabetes is an autoimmune disorder afflicting millions of people worldwide.Once diagnosed, patients require lifelong insulin treatment and can experience numerous disease-associated complications.The last decade has seen tremendous advances in elucidating the causes and treatment of the disease based on extensive research both in rodent models of spontaneous diabetes and in humans.Integrating these advances has led to the recognition that the balance between regulatory and effector T cells determines disease risk, timing of disease activation, and disease tempo.Here we describe current progress, the challenges ahead and the new interventions that are being tested to address the unmet need for preventative or curative therapies.", + "The immune system of some genetically susceptible children can be triggered by certain environmental factors to produce islet autoantibodies (IA) against pancreatic cells, which greatly increases their risk for Type-1 diabetes.An environmental factor under active investigation is the gut microbiome due to its important role in immune system education.", + "At clinical onset (stage 3), celltargeted auto immunity is likely to have occurred for a prolonged period, as indicated by the presence of CD4 + and CD8 + T cells, dendritic cells, macrophages and B cells in and around the islets of Langerhans in many, but not all, patients with newly diagnosed T1DM 2,104 .These data are based on observations from samples obtained at disease onset by fineneedle biopsy 105 or by highrisk minimal pancreatic tail resection 106 , and they have con firmed previous data from pancreatic tissue samples from individ uals who have succumbed to diabetic keto acidosis (that is, acidosis due to the breakdown of lipids to ketones as an alternative source of glucose) 2,107,108 .In this setting, the inflammatory lesion does not affect all islets, and the insulitis process is patchy.Importantly, the volume or mass of islet cells producing gluca gon, somato statin or pancreatic polypeptide remains unaffected at the clinical onset of T1DM 2,104 .At present, there is no explan ation of why the cells and not the cells that produce glucagon, somatostatin or pancreatic polypeptide are attacked by the immune system.Separate auto antibodies that target human pancreatic cells prod ucing glucagon and those that produce somatostatin have been found in some patients, but further studies of these potentially unique patients are needed 109 ." + ], + [ + "In 2008, to increase the power of identifying variants with modest effects, a meta-analysis of three GWAS, including Diabetes Genetics Initiative (DGI), Finland-United States Investigation of NIDDM Genetics (FUSION), and Wellcome Trust Case Control Consortium (WTCCC), were conducted.This study detected at least six previously unknown loci that reached genome-wide significance for association with T2D ( < 5 10 8 ), with the loci being JAZF1, CDC123-CAMK1D, TSPAN8-LGR5, THADA, ADAMTS9, and NOTCH2 [19].Genetic variants in JAZF1, CDC123-CAMK1D, TSPAN8-LGR5, and THADA have been reported to affect pancreatic -cell functions [59,60].", + ", for the Diabetes Genetics Replication And Meta-analysis (DIAGRAM) Consortium 9Genome-wide association (GWA) studies have identified multiple loci at which common variants modestly but reproducibly influence risk of type 2 diabetes (T2D) [1][2][3][4][5][6][7][8][9][10][11] .Established associations to common and rare variants explain only a small proportion of the heritability of T2D.As previously published analyses had limited power to identify variants with modest effects, we carried out meta-analysis of three T2D GWA scans comprising 10,128 individuals of European descent and B2.2 million SNPs (directly genotyped and imputed), followed by replication testing in an independent sample with an effective sample size of up to 53,975.We detected at least six previously unknown loci with robust evidence for association, including the JAZF1 (P 5.0 10 -14 ), CDC123-CAMK1D (P 1.2 10 -10 ), TSPAN8-LGR5 (P 1.1 10 -9 ), THADA (P 1.1 10 -9 ), ADAMTS9 (P 1.2 10 -8 ) and NOTCH2 (P 4.1 10 -8 ) gene regions.Our results illustrate the value of large discovery and follow-up samples for gaining further insights into the inherited basis of T2D.", + "Results from genome-wide association studies (GWAS) of type 1 diabetes (T1D) (Barrett et al., 2009), T2D (reviewed in Prokopenko et al., 2008), and related metabolic traits (Dupuis et al., 2010;Ingelsson et al., 2010;Prokopenko et al., 2009) suggest that genetic variation in cis-regulatory elements may play an important role in b cell (dys)function and diabetes susceptibility (De Silva and Frayling, 2010).Of the 18 most strongly associated single-nucleotide polymorphisms (SNPs) in each of the T2D-associated loci, only 3 are missense variants; the remaining are noncoding (Prokopenko et al., 2008).Furthermore, there is evidence for allele-specific effects of two T2Dassociated SNPs on the islet expression level of nearby genes (TCF7L2 [Lyssenko et al., 2007] and MTNR1B [Lyssenko et al., 2009]).However, the dearth of annotation of functional regulatory elements has limited the capacity to investigate the role of regulatory variation in complex diseases such as T2D.", + "Genetic studies of type diabetes (TD) have identified 50 susceptibility regions ,2 , finding major pathways contributing to risk 3 , with some loci shared across immune disorders 4-6 .To make genetic comparisons across autoimmune disorders as informative as possible, a dense genotyping array, the Immunochip, was developed, from which we identified four new TD-associated regions (P < 5 0 8 ).A comparative analysis with 5 immune diseases showed that TD is more similar genetically to other autoantibody-positive diseases, significantly most similar to juvenile idiopathic arthritis and significantly least similar to ulcerative colitis, and provided support for three additional new TD risk loci.Using a Bayesian approach, we defined credible sets for the TD-associated SNPs.The associated SNPs localized to enhancer sequences active in thymus, T and B cells, and CD34 + stem cells.Enhancer-promoter interactions can now be analyzed in these cell types to identify which particular genes and regulatory sequences are causal.T1D results from the autoimmune destruction of pancreatic cells, leading to absolute dependence on exogenous insulin to regulate blood glucose levels 7 .In the present study, we designed and used the Immunochip, a custom Illumina Infinium high-density genotyping array, to (i) identify additional risk loci for T1D, (ii) refine mapping of T1D risk loci to their sets of most associated credible SNPs in order to (iii) analyze the locations of the credible SNPs with respect to regulatory sequences in tissues and cell types, and (iv) assemble summary genome-wide association study (GWAS) and Immunochip results from multiple immune diseases to allow comparisons of the genetic risk profiles of these diseases.The T1D SNP and indel content selected for inclusion on the Immunochip was chosen on the basis of the 41 T1D-associated regions known at the time (February 2010) 1 and 3,000 'wildcard' SNPs that tagged candidate genes or other SNPs with suggestive evidence of association (5 10 8 < P < 1 10 5 ) from GWAS of T1D.In parallel, we collected and curated all available association results for immune diseases for which the Immunochip was designed.To allow efficient comparison and downstream analysis by the research community, we created a publicly available, integrated, web-based portal (ImmunoBase; see URLs) containing complete association summary statistics that are available for querying, browsing or bulk download.", + "Impact of T1D GWAS SNPs on immune phenotypes in T1D patientsConsidering that T1D is a multifactorial disease with a genetic component, we tested whether the known risk variants of T1D affect immune phenotypes and function.We first checked SNPs within the HLA locus in our association studies on cell proportion and cytokine production level.Consistent with our previous findings in 500FG, we did not observe any significant associations of HLA allelic variants in 300DM.We then acquired non-HLA genetic loci from published GWAS of European background were acquired from the GWAS-catalog (November 2019) (Buniello et al., 2019).Among these, genetic variants in 63 independent T1D loci were present in our data, and we found that 13 of these 63 were indeed associated with susceptibility to T1D with nominal significance (p-value < 0.05) (Supplementary file 1C).Figure 2. Impact of type 1 diabetes (T1D) genome-wide association studies (GWAS) single-nucleotide polymorphisms (SNPs) on immune phenotypes. (A) Quantile-quantile (Q-Q) plots of quantitative trait locus (QTL) profiles of 62 T1D GWAS loci grouped by cell populations.The distribution of p-values of associations with T cells traits (blue) shows a significant deviation from an expected uniform distribution (dashed line). (B) Histogram showing number of associations observed (red line) and those in permutations (blue bars). (C) Heatmap of QTL profiles of cell proportion carrying certain chemokine receptors across 62 T1D GWAS loci, colored by log10(p-values) and effect direction of the T1D risk allele.Arrowhead indicates a T1D risk allele rs11574435-T.The online version of this article includes the following figure supplement(s) for figure 2: Figure supplement 1. Qqplots of QTL profiles of 62 T1D GWAS loci grouped by cytokine types.We next investigated whether these genetic risk loci for T1D affect immune parameters and function.The quantile-quantile plot of the association of the 63 T1D GWAS loci with different cell types and cytokines illustrates an inflated deviation from an expected uniform distribution (Figure 2A, Figure 2-figure supplement 1).We further tested whether this deviation can be explained by chance by comparing the association of immune traits with T1D GWAS SNPs with that of 1000 randomly selected independent SNPs (Figure 2B, Materials and methods).The p-value shows that the T1D GWAS SNPs are enriched in association with T cell traits in the T1D cohort (p-value = 0.007).", + "Table 1Polymorphisms in the human genome associated with type 1 diabetes (Adapted from (Ram et al., 2016b)).The genetic polymorphism data (i.e.SNPs) has been associated with T1D using genome-wide association studies and meta-analyses (references as noted).SNP, single nucleotide polymorphism.", + "Recent large genome-wide association studies (GWAS) have identified multiple loci which harbor genetic variants associated with type 2 diabetes mellitus (T2D), many of which encode proteins not previously suspected to be involved in the pathogenesis of T2D.Most GWAS for T2D have focused on populations of European descent, and GWAS conducted in other populations with different ancestry offer a unique opportunity to study the genetic architecture of T2D.We performed genome-wide association scans for T2D in 3,955 Chinese (2,010 cases, 1,945 controls), 2,034 Malays (794 cases, 1,240 controls), and 2,146 Asian Indians (977 cases, 1,169 controls).In addition to the search for novel variants implicated in T2D, these multi-ethnic cohorts serve to assess the transferability and relevance of the previous findings from European descent populations in the three major ethnic populations of Asia, comprising half of the world's population.Of the SNPs associated with T2D in previous GWAS, only variants at CDKAL1 and HHEX/IDE/KIF11 showed the strongest association with T2D in the meta-analysis including all three ethnic groups.However, consistent direction of effect was observed for many of the other SNPs in our study and in those carried out in European populations.Close examination of the associations at both the CDKAL1 and HHEX/IDE/KIF11 loci provided some evidence of locus and allelic heterogeneity in relation to the associations with T2D.We also detected variation in linkage disequilibrium between populations for most of these loci that have been previously identified.These factors, combined with limited statistical power, may contribute to the failure to detect associations across populations of diverse ethnicity.These findings highlight the value of surveying across diverse racial/ethnic groups towards the fine-mapping efforts for the casual variants and also of the search for variants, which may be population-specific.Recent large genome-wide association studies (GWAS) have identified multiple loci which harbor genetic variants associated with type 2 diabetes mellitus (T2D), many of which encode proteins not previously suspected to be involved in the pathogenesis of T2D.Most GWAS for T2D have focused on populations of European descent, and GWAS conducted in other populations with different ancestry offer a unique opportunity to study the genetic architecture of T2D.We performed genome-wide association scans for T2D in 3,955 Chinese (2,010 cases, 1,945 controls), 2,034 Malays (794 cases, 1,240 controls), and 2,146 Asian Indians (977 cases, 1,169 controls).In addition to the search for novel variants implicated in T2D, these multi-ethnic cohorts serve to assess the transferability and relevance of the previous findings from European descent populations in the three major ethnic populations of Asia, comprising half of the world's population.Of the SNPs associated with T2D in previous GWAS, only variants at CDKAL1 and HHEX/IDE/KIF11 showed the strongest association with T2D in the meta-analysis including all three ethnic groups.However, consistent direction of effect was observed for many of the other SNPs in our study and in those carried out in European populations.Close examination of the associations at both the CDKAL1 and HHEX/IDE/KIF11 loci provided some evidence of locus and allelic heterogeneity in relation to the associations with T2D.We also detected variation in linkage disequilibrium between populations for most of these loci that have been previously identified.These factors, combined with limited statistical power, may contribute to the failure to detect associations across populations of diverse ethnicity.These findings highlight the value of surveying across diverse racial/ethnic groups towards the fine-mapping efforts for the casual variants and also of the search for variants, which may be population-specific.", + "The T1DGC, using the same samples as in the MHC and candidate gene investigations, reevaluated 382 SNPs from 21 recently reported candidate genes, assembling nearly 4,000 ASP families and fully characterizing (through tagging SNPs and reported variants) the genetic contributions to type 1 diabetes risk.These results suggest that, aside from the MHC, 11p15 (INS), 2q33 (CTLA and other genes), 10p15.1 (IL2RA), and 1p13 (PTPN22), few of these published candidate genes can be replicated.In addition, a total of 1,715 SNPs were selected from the Wellcome Trust Case Control Consortium (WTCCC) GWA study of type 1 diabetes, and 581 SNPs were selected that exhibited association with autoimmune disease and type 2 diabetes loci (45,46).These studies confirmed established loci (above) (47,48) and suggested additional risk conferred by loci on chromosomes 5q31 (TCF7 [P19T], transcription factor 7, T-cell specific, HMG-box), 18q12 (FHOD3, formin homology two domain containing 3), and Xp22 (TLR8/ TLR7 toll-like receptor 8/toll-like receptor 7).Type 1 diabetes has many susceptibility loci and therefore pathways in common with autoimmune diseases.With the recent exception of GLIS3 (49), no genetic overlap was found between type 1 diabetes and type 2 diabetes loci (45,46,50).The dataset established by the T1DGC from its Candidate Gene Workshops is available from the NIDDK Central Repository.Genome-wide linkage.A number of genome-wide scans for linkage to type 1 diabetes have been reported (4,(51)(52)(53)(54)(55).All these studies consistently demonstrated linkage of type 1 diabetes to the MHC and specifically to the HLA genes on human chromosome 6p21.3.Additional regions with evidence of linkage have been identified, but many of these regions have not been reproduced in independent studies.", + "The latest and largest meta-analyses for T1D [4] and T1D diagnosis age [9] have been performed with variants from the ImmunoChip, a large scale but targeted genotyping platform which covers only loci previously associated with immunological diseases.We now took a genome-wide approach by performing a large genome-wide association study (GWAS) meta-analysis in 12,539 individuals with T1D from the Finnish Diabetic Nephropathy (Finn-Diane) Study, the UK Genetic Resource Investigating Diabetes (UK GRID), and Sardinia cohorts.Our aim was to identify variants affecting T1D diagnosis age and thereafter, utilizing the genome-wide coverage of our analysis, we aimed to link the variants to open chromatin indicating active gene expression in different cell types and finally, we performed transcriptome-wide association analyses in disease-relevant tissues.", + "Genome-wide association studies (GWAS) have identified >100 independent SNPs that modulate the risk of type 2 diabetes (T2D) and related traits.However, the pathogenic mechanisms of most of these SNPs remain elusive.Here, we examined genomic, epigenomic, and transcriptomic profiles in human pancreatic islets to understand the links between genetic variation, chromatin landscape, and gene expression in the context of T2D.We first integrated genome and transcriptome variation across 112 islet samples to produce dense cis-expression quantitative trait loci (cis-eQTL) maps.Additional integration with chromatin-state maps for islets and other diverse tissue types revealed that cis-eQTLs for islet-specific genes are specifically and significantly enriched in islet stretch enhancers.High-resolution chromatin accessibility profiling using assay for transposase-accessible chromatin sequencing (ATACseq) in two islet samples enabled us to identify specific transcription factor (TF) footprints embedded in active regulatory elements, which are highly enriched for islet cis-eQTL.Aggregate allelic bias signatures in TF footprints enabled us de novo to reconstruct TF binding affinities genetically, which support the high-quality nature of the TF footprint predictions.Interestingly, we found that T2D GWAS loci were strikingly and specifically enriched in islet Regulatory Factor X (RFX) footprints.Remarkably, within and across independent loci, T2D risk alleles that overlap with RFX footprints uniformly disrupt the RFX motifs at high-information content positions.Together, these results suggest that common regulatory variations have shaped islet TF footprints and the transcriptome and that a confluent RFX regulatory grammar plays a significant role in the genetic component of T2D predisposition.", + "Identifying the genetic variants that increase the risk of type 2 diabetes (T2D) in humans has been a formidable challenge.Adopting a genome-wide association strategy, we genotyped 1161 Finnish T2D cases and 1174 Finnish normal glucose-tolerant (NGT) controls with >315,000 single-nucleotide polymorphisms (SNPs) and imputed genotypes for an additional >2 million autosomal SNPs.We carried out association analysis with these SNPs to identify genetic variants that predispose to T2D, compared our T2D association results with the results of two similar studies, and genotyped 80 SNPs in an additional 1215 Finnish T2D cases and 1258 Finnish NGT controls.We identify T2D-associated variants in an intergenic region of chromosome 11p12, contribute to the identification of T2D-associated variants near the genes IGF2BP2 and CDKAL1 and the region of CDKN2A and CDKN2B, and confirm that variants near TCF7L2, SLC30A8, HHEX, FTO, PPARG, and KCNJ11 are associated with T2D risk.This brings the number of T2D loci now confidently identified to at least 10.", + "A Genome-Wide Association Study of Type 2 Diabetes in Finns Detects Multiple Susceptibility Variants Laura J. Scott, 1 Karen L. Mohlke, 2 Lori L. Bonnycastle, 3 Cristen J. Willer, 1 Yun Li, 1 William L. Duren, 1 Michael R. Erdos, 3 Heather M. Stringham, 1 Peter S. Chines, 3 Anne U. Jackson, 1 Ludmila Prokunina-Olsson, 3 Chia-Jen Ding, 1 Amy J. Swift, 3 Narisu Narisu, 3 Tianle Hu, 1 Randall Pruim, 4 Rui Xiao, 1 Xiao-Yi Li, 1 Karen N. Conneely, 1 Nancy L. Riebow, 3 Andrew G. Sprau, 3 Maurine Tong, 3 Peggy P. White, 1 Kurt N. Hetrick, 5 Michael W. Barnhart, 5 Craig W. Bark, 5 Janet L. Goldstein, 5 Lee Watkins, 5 Fang Xiang, 1 Jouko Saramies, 6 Thomas A. Buchanan, 7 Richard M. Watanabe, 8,9 Timo T. Valle, 10 Leena Kinnunen, 10,11 Gonalo R. Abecasis, 1 Elizabeth W. Pugh, 5 Kimberly F. Doheny, 5 Richard N. Bergman, 9 Jaakko Tuomilehto, 10,11,12 Francis S. Collins, 3 * Michael Boehnke 1 * Identifying the genetic variants that increase the risk of type 2 diabetes (T2D) in humans has been a formidable challenge.Adopting a genome-wide association strategy, we genotyped 1161 Finnish T2D cases and 1174 Finnish normal glucose tolerant (NGT) controls with >315,000 single-nucleotide polymorphisms (SNPs) and imputed genotypes for an additional >2 million autosomal SNPs.We carried out association analysis with these SNPs to identify genetic variants that predispose to T2D, compared our T2D association results with the results of two similar studies, and genotyped 80 SNPs in an additional 1215 Finnish T2D cases and 1258 Finnish NGT controls.We identify T2D-associated variants in an intergenic region of chromosome 11p12, contribute to the identification of T2D-associated variants near the genes IGF2BP2 and CDKAL1 and the region of CDKN2A and CDKN2B, and confirm that variants near TCF7L2, SLC30A8, HHEX, FTO, PPARG, and KCNJ11 are associated with T2D risk.This brings the number of T2D loci now confidently identified to at least 10.", + "A Genome-Wide Association Study of Type 2 Diabetes in Finns Detects Multiple Susceptibility Variants Laura J. Scott, 1 Karen L. Mohlke, 2 Lori L. Bonnycastle, 3 Cristen J. Willer, 1 Yun Li, 1 William L. Duren, 1 Michael R. Erdos, 3 Heather M. Stringham, 1 Peter S. Chines, 3 Anne U. Jackson, 1 Ludmila Prokunina-Olsson, 3 Chia-Jen Ding, 1 Amy J. Swift, 3 Narisu Narisu, 3 Tianle Hu, 1 Randall Pruim, 4 Rui Xiao, 1 Xiao-Yi Li, 1 Karen N. Conneely, 1 Nancy L. Riebow, 3 Andrew G. Sprau, 3 Maurine Tong, 3 Peggy P. White, 1 Kurt N. Hetrick, 5 Michael W. Barnhart, 5 Craig W. Bark, 5 Janet L. Goldstein, 5 Lee Watkins, 5 Fang Xiang, 1 Jouko Saramies, 6 Thomas A. Buchanan, 7 Richard M. Watanabe, 8,9 Timo T. Valle, 10 Leena Kinnunen, 10,11 Gonalo R. Abecasis, 1 Elizabeth W. Pugh, 5 Kimberly F. Doheny, 5 Richard N. Bergman, 9 Jaakko Tuomilehto, 10,11,12 Francis S. Collins, 3 * Michael Boehnke 1 * Identifying the genetic variants that increase the risk of type 2 diabetes (T2D) in humans has been a formidable challenge.Adopting a genome-wide association strategy, we genotyped 1161 Finnish T2D cases and 1174 Finnish normal glucose tolerant (NGT) controls with >315,000 single-nucleotide polymorphisms (SNPs) and imputed genotypes for an additional >2 million autosomal SNPs.We carried out association analysis with these SNPs to identify genetic variants that predispose to T2D, compared our T2D association results with the results of two similar studies, and genotyped 80 SNPs in an additional 1215 Finnish T2D cases and 1258 Finnish NGT controls.We identify T2D-associated variants in an intergenic region of chromosome 11p12, contribute to the identification of T2D-associated variants near the genes IGF2BP2 and CDKAL1 and the region of CDKN2A and CDKN2B, and confirm that variants near TCF7L2, SLC30A8, HHEX, FTO, PPARG, and KCNJ11 are associated with T2D risk.This brings the number of T2D loci now confidently identified to at least 10.", + "GWAS-Identified Variants in Protein-Coding RegionsGWAS-identified variants associated with T2D risk include single nucleotide polymorphisms (SNP), deletions, insertions and short sequence repeats (6,92).Although the majority of the variants reside in intergenic or intragenic regions, a few (less than 5%) are in protein-coding regions.As potential drug targets, these variant-containing genes have been subjected to investigation in b cells in recent years (5) using cellular and mouse knockout systems, as described in the examples below:", + "A systematic search for the variants associatedwith Type 2 diabetes mellitus, a common complex disease was recently done317318N. Shahby testing 392,935 single-nucleotide polymorphisms in a French casecontrol cohort (13). They used Illumina Infinium Human1 BeadArrays, whichassay 109,365 SNPs chosen using a gene-centred design; and Human Hap300BeadArrays, which assay 317,503 SNPs chosen to tag haplotype blocks identified by the Phase I HapMap. There were 59 SNPs, showing significant association with the disease in genome-wide study, which were tested on a largercohort using the Sequenom iPlex assay.They identified four SNPs containingvariants that confer type 2 diabetes risk. These loci include a nonsynonymouspolymorphism in the zinc transporter SLC30A8, which is expressed exclusively in insulin-producing -cells, and two linkage disequilibrium blocksthat contain genes potentially involved in -cell development or function(IDEKIF11HHEX and EXT2ALX4). Even when genome-wide studies are possible, there are statistical difficulties arising due to multiple hypotheses testing. A good review of this issue andpossible solutions are presented in (14). 3.2.3. Pool-Based Genome-Wide Association StudiesGenotyping of individual samples for genome-wide association (GWA) studies may be cost-prohibitive.", + "Association of genetic variants in genes encoding T2D and obesity drug targetsThe study design consisted of initial discovery of variants with suggestive associations to targeted genotyping and in silico follow-up analyses (Fig. 1).We investigated the association of 121 variants in six genes encoding therapeutic targets in use or in development for T2D or obesity (CNR2, DPP4, GLP1R, SLC5A1, HTR2C, and MCHR1)-drawn from a recent targeted exome sequencing study of 202 genes encoding drug targets (8)-with variation in the following traits: T2D, obesity, body mass index (BMI), waist circumference, fasting glucose, fasting insulin, and 2-hour glucose (Fig. 1).In the \"discovery analysis,\" we identified seven variants potentially associated with T2D-or obesity-related traits (where P < 0.001 or which were in a target of interest to GSK and P < 0.05) (Table 1).For these seven variants, \"follow-up analysis\" was performed by targeted genotyping in up to 39,979 additional individuals of European ancestry.Where possible, in silico follow-up analysis was performed for traits and variants available in large-scale genetic consortia data." + ], + [ + "Elucidate the pathogenesis linking obesity and type 2 diabetesA better understanding of mechanisms linking obesity, insulin resistance, and type 2 diabetes may ultimately facilitate more individualized treatment.One future research priority is to clarifty how identified gene variants affect glucose, fatty acid, and energy metabolism at both cellular and whole-body levels.Rather than searching for a single factor or theory explaining the predisposition to -cell decompensation in obese individuals, a multifactorial, synergistic explanation seems more compatible with current knowledge.Multiple mechanisms may link -cell dysfunction to systemic insulin resistance, including differing cellular responses to nutrient excess and impaired brain neurocircuits governing energy homeostasis.One way to approach this complex pathophysiology is to examine glucose-tolerant obese patients and study the association with and progression to -cell decompensation.", + "The framework described in this paper is aimed to address two key questions: (1) Can biological processes be identified that are consistently deregulated in different models of insulin resistance and diabetes and that may be manifested in a tissue-dependent or independent manner? (2) On a higher level, can tissue or condition-specific interaction networks be identified that more precisely characterize different insulinresistance models and suggest causal mechanisms?Author SummaryType 2 diabetes mellitus currently affects millions of people.It is clinically characterized by insulin resistance in addition to an impaired glucose response and associated with numerous complications including heart disease, stroke, neuropathy, and kidney failure, among others.Accurate identification of the underlying molecular mechanisms of the disease or its complications is an important research problem that could lead to novel diagnostics and therapy.The main challenge stems from the fact that insulin resistance is a complex disorder and affects a multitude of biological processes, metabolic networks, and signaling pathways.In this report, the authors develop a network-based methodology that appears to be more sensitive than previous approaches in detecting deregulated molecular processes in a disease state.The methodology revealed that both insulin signaling and nuclear receptor networks are consistently and differentially expressed in many models of insulin resistance.The positive results suggest such network-based diagnostic technologies hold promise as potentially useful clinical and research tools in the future.affected in the disease state. (3) Evaluate the hypothesis that genes in a given gene set are observed in a higher proportion (i.e., enriched) than expected by chance in the HSN and repeat for each gene set in the assembly.Repeat (2) and (3) for every insulin resistant or diabetic condition compared to normal in the dataset. (4) Order the gene sets of interest based on the number of different HSNs where they appear enriched. (5) For each gene set, assign a p-value to the number of conditions where it is enriched.The gene sets with a significant p-value are taken as transcriptionally affected across a broad set of diabetes-related models.Consistent with the stated goal of GNEA, gene sets enriched in a few conditions, while potentially interesting in their own right, will not generally be assigned a significant p-value (Figure 1).", + "of Biochemistry, Biostatistics & Medical Informatics, University ofWisconsin, Madison, WI; Rosetta Inpharmatics, Seattle, WA; KineMed,Emeryville, CA; Dept Nutritional Sciences & Toxicology, University ofCalifornia, Berkeley, CA, USAInsulin resistance is necessary but not sufficient for the development of type 2diabetes. Diabetes results when pancreatic -cells fail to compensate for insulinresistance by increasing insulin production through an expansion of -cell massor increased insulin secretion. Communication between insulin target tissues and-cells may initiate this compensatory response. Correlated changes in geneexpression between tissues can provide evidence for such intercellularcommunication.", + "The origin of chronic inflammatory processes observed in metabolic disorders is still a matter of debate. 9The recent obesity epidemic is a driving force for the worldwide increasing incidence of type 2 diabetes (T2D) as more than 80% of patients with T2D are overweight.Obesity-induced insulin resistance is the dominant underlying pathophysiological factor. 10As insulin resistance and metabolic inflammation are frequently observed in parallel, research in the past decade has tried to connect these two phenomena.It is widely accepted that the aetiology of insulin resistance is complex and involves various pathways. 11It is, however, also increasingly established that inflammatory pathways are critically involved in the evolution of insulin resistance. 12Overnutrition and certain diets could represent major starting points as they might alter the gut microbiota, lead to changes in lipid metabolism, hepatic steatosis and finally systemic inflammation. 13 14It remains, however, unclear at which sites inflammatory processes are initiated and the GI tract with its significantly altered microbiota could reflect one of the early events in these disorders.", + "Type 2 diabetes mellitus (T2D) is a common complex disease whose pathogenic mechanisms are known to a considerable extent [8,9].Several organs including pancreatic islets, liver, skeletal muscle, adipose tissues, gut, hypothalamus and the immune system play a role in its pathogenesis [10].Numerous multifactorial mechanisms that include genetic and environmental factors related to obesity are involved in the development of insulin resistance and impaired insulin secretion [8,9].Insulin resistance is associated with inactivity, obesity and ageing [8].The insulin secreting pancreatic islet b cells respond to insulin resistance by enhancing their mass and metabolic function.T2D however develops when increase in insulin secretion by b cells is not able to keep pace with the increase in insulin resistance [8,11].The latter thus characterizes both prediabetic condition and T2D.Prediabetic insulin resistance state however does not always lead to diabetes; enhanced secretion of insulin by b cells compensates for deficient insulin action in a considerable proportion of prediabetic individuals who do not develop T2D.Though the inability of b cells to secrete enough insulin primarily typifies T2D, the dysfunction can also be demonstrated in normoglycemic subjects [12].Therefore, derangements in both insulin secretion and Figure 1.Schematic representation of the workflow.T2D GWAS genes do not directly relate (indicated by 'X' on the left side) to pathways associated with disease pathophysiology.Conspicuously, effect of identified risk variants on continuous glycemic measures in nondiabetic subjects chiefly explains only perturbation of insulin secretion, not insulin resistance.Further, the genes found as associated with the disease do not clearly relate to processes and pathways consistent with the known aspects of T2D pathophysiology.The main aim of the present study was to ask the question (indicated by '?' on the right side) if GWAS data when considered in conjunction with interactome, toxicogenome and disease transcriptome data reveal genome to phenome correlation in T2D.Data available in public domain for GWAS, interactome and toxicogenome was used in the analysis.For disease transcriptome, new experimental data was generated.We specifically examined if interaction network of genes reported in T2D GWAS, genes showing altered expression after treatment with various antidiabetic drugs, and genes that are differentially expressed in insulin responsive tissues in male and female T2D patients do converge on insulin secretion, insulin resistance and other T2D associated pathophysiological pathways.doi:10.1371/journal.pone.0053522.g001", + "This underlying -cell decompensation manifests clinically as elevated fasting andPREVpostprandial blood glucose levels, diagnostic criteria for diabetes [4,5]. In humans, diabetes is often correlated with obesity, leading to a long-standinghypothesis that insulin resistance is a consequence of overnutrition and elevated dietaryfatty acids [6]. Chronic metabolic overload has a detrimental effect on whole bodymetabolism, and there is increasing evidence that the liver and adipose play a causalrole to drive this metabolic disequilibrium (Figure 1).", + "Increasing evidence from more recent studies also suggested that infl ammatory processes may have a pivotal role in metabolic diseases: prospective studies have shown that high plasma interleukin 6 (IL -6) levels increased T2DM risk [116] , but confl icting associations were found between a promoter polymorphism (G -174C) in IL6 and T2DM [117,118] .In a large joint analysis of 21 case -control studies, representing > 20 000 participants in one of the largest association studies addressing the role of a candidate gene in T2DM susceptibility, the IL6 promoter variant was found to be associated with a lower risk (OR 0.91, P = 0.037) [119] .In addition, association between T2DM and IL6R -D358A was reported in Danish white people [120] , and with TNF G -308A promoter SNP in the Finnish Diabetes Prevention Study [118] .The effects of both IL6 and IL6R variants on developing T2DM risk in interaction with age have been reported in a prospective study of a general French population [46] .", + "In the long term, these new approaches should identify additional genes and metabolic markers; profi les obtained through these assessments could provide the level of detail needed to establish the mediator (or mediators) of the feedback loop that interconnects cells with insulin-sensitive tissues, and help to unravel the heterogeneity of the disease.Furthermore, these assessments should complement and advance present understanding of the best approaches to treat the dysregulated metabolic milieu in type 2 diabetes, which includes not only glucose but also fatty acids and aminoacids.Glucose metabolism is normally regulated by a feedback loop including islet cells and insulin-sensitive tissues, in which tissue sensitivity to insulin aff ects magnitude of -cell response.If insulin resistance is present, cells maintain normal glucose tolerance by increasing insulin output.Only when cells cannot release suffi cient insulin in the presence of insulin resistance do glucose concentrations rise.Although -cell dysfunction has a clear genetic component, environmental changes play an essential part.Modern research approaches have helped to establish the important role that hexoses, aminoacids, and fatty acids have in insulin resistance and -cell dysfunction, and the potential role of changes in the microbiome.Several new approaches for treatment have been developed, but more eff ective therapies to slow progressive loss of -cell function are needed.Recent fi ndings from clinical trials provide important information about methods to prevent and treat type 2 diabetes and some of the adverse eff ects of these interventions.However, additional long-term studies of drugs and bariatric surgery are needed to identify new ways to prevent and treat type 2 diabetes and thereby reduce the harmful eff ects of this disease. The epidemic of type 2 diabetesThe worldwide explosion of obesity has resulted in an ever-increasing prevalence of type 2 diabetes-a noncommunicable disease that aff ects more than 370 million people. 1 Without concerted eff orts to address the pathogenesis and treatment of this syndrome, the harmful macrovascular and microvascular outcomes of type 2 diabetes will remain a major burden for decades to come.In this Review we examine aspects of the pathogenesis and treatment of type 2 diabetes, and discuss future needs if the most damaging result of obesity is to be reversed.Glucose metabolism is normally regulated by a feedback loop including islet cells and insulin-sensitive tissues, in which tissue sensitivity to insulin aff ects magnitude of -cell response.If insulin resistance is present, cells maintain normal glucose tolerance by increasing insulin output.Only when cells cannot release suffi cient insulin in the presence of insulin resistance do glucose concentrations rise.Although -cell dysfunction has a clear genetic component, environmental changes play an essential part.Modern research approaches have helped to establish the important role that hexoses, aminoacids, and fatty acids have in insulin resistance and -cell dysfunction, and the potential role of changes in the microbiome.Several new approaches for treatment have been developed, but more eff ective therapies to slow progressive loss of -cell function are needed.Recent fi ndings from clinical trials provide important information about methods to prevent and treat type 2 diabetes and some of the adverse eff ects of these interventions.However, additional long-term studies of drugs and bariatric surgery are needed to identify new ways to prevent and treat type 2 diabetes and thereby reduce the harmful eff ects of this disease.", + "MetabolomicsA Metabolomics approach has been applied to diabetes in several population-based studies in recent years, summarized in [68].Metabolomics profiling was previously performed typically in a small subset of large populations, and the number of metabolites was limited.In recent studies MR analysis has been combined in metabolomics in order to claim causality of the metabolites found to be associated with the risk of diabetes.Nowak and collaborators investigated the effects of insulin resistance and insulin secretion on fatty acid levels [69].The original cohort included 910 elderly men (ULSAM cohort).Insulin sensitivity was determined with gold standard measurement, the hyperinsulinemic euglycemic clamp, and beta-cell function with a Disposition Index during an oral glucose tolerance test.A total of 192 metabolites were measured using untargeted plasma metabolomics by liquid chromatography/mass spectrometry.MR analysis was based on two separate cohorts (PIVUS and TwinGene, n 2,613) followed by replication in three independent studies profiled on different metabolomics platforms (KORA/TwinsUK, n 7,824; CHARGE consortium, n 8,961; and Finnish consortium, n 8,330).In the observational part of the study the authors reported that bile acid, glycerophospholipid and caffeine metabolism were associated with insulin resistance, and fatty acids biosynthesis markers with impaired insulin secretion.In MR analysis the authors discovered and replicated causal effects of insulin resistance on lower levels of monosaturated fatty acids, palmitoleic acid and oleic acid.Beta-cell function did not have causal effects on any metabolites measured.The limitation of this study is a relatively small size of the ULSAM cohort, and the limited number of metabolites measured.", + "Our understanding of the pathophysiology of T2DM has been aided by the discovery of novel disease biomarkers.High blood concentrations of pro-inflammatory cytokines, such as C-reactive protein, interleukin-6 (IL-6) and tumour necrosis factor (TNF), are associated with an increased risk of T2DM 30 , whereas a high concentration of adiponectin, which has anti-inflammatory effects, is associated with a reduced risk 31 .Lower levels of sex hormone-binding globulin are associated with increased risk 32 , as are higher blood concentrations of branched-chain and aromatic amino acids 33 .Gut flora metabolites might predict future risk of T2DM because the gut microbiota is involved in energy extraction from the diet, modification of host gene expression, and increasing metabolic endotoxaemia (the level of e ndotoxins in blood) and chronic inflammation 34 .", + "Several lines of evidence suggest that T2D is an inflammatory disease (Donath and Shoelson 2011).Recent results from clinical trials with anti-inflammatory drugs have supported this hypothesis, and immunomodulatory strategies for the treatment of T2D to lower blood glucose levels in patients have been proposed (Barry et al. 2016).Cellular oxidative stress is known as one of the leading causes of insulin resistance and islet -cell dysfunction in T2D (Evans et al. 2003) by inducing an inflammatory response.", + "In this mini-review, we discuss this question in the context of recent advances in the understanding of the physiology of glucose metabolism in order to determine whether the classical under-standing of T2DM pathophysiology should be revised and more focus placed on the b-cell in the development of therapies for T2DM.In particular, we consider the extent to which the difficulty in identifying insulin resistance genes to date reflects limitations of study design, inadequate physiological assessment of insulin resistance or the complex underlying pathophysiology of insulin resistance (i.e.multiple parallel compensatory pathways).ConclusionWe would propose that it is highly probable that more insulin resistance than b-cell dysfunction T2DM susceptibility genes remain undiscovered at the present time, most likely due to problems associated with study design and the complex nature of physiological responses to nutrients and insulin.In addition, it must be understood that even with 38 genes identified relevant to T2DM pathophysiology, the risk conferred by these combined genes accounts for only a small proportion of overall risk.It must be remembered that the rapid changes in T2DM incidence and prevalence observed in recent decades are a result of the interaction of a stable genetic background with a rapidlychanging environment.Future intervention at newly-discovered insulin secretion controlling loci should improve b-cell function allowing a more robust defence against environmental insult.Targeting oxidative stress, metabolic stress and low grade inflammation may provide fruitful avenues.However, novel therapeutic approaches, whether pharmacological or nonpharmacological, which can target the effects of diet-induced obesity on tissue-specific insulin resistance in the early pathogenesis of T2DM remain a central and invaluable goal of research aiming to halt the rapidly-increasing prevalence of T2DM and its complications worldwide.", + "| INTRODUCTIONChronic low-grade inflammation and activation of the innate immune system are associated with insulin resistance, and -cell dysfunction in type 2 diabetes mellitus (T2DM) (Ehses, Perren, Eppler, Ribaux, & Pospisilik, 2007;Pickup, 2004).Recent studies have reported that the infiltration of the macrophages to pancreatic islets accelerates the -cell dysfunction.These macrophages secrete chemokines and stimulate the immune cell migration, as well as release of pro-inflammatory cytokines.In addition, the elevated glucose and palmitate concentrations increase chemokines release that induce granulocyte colony-stimulating factor and macrophage inflammatory protein-1 from human and mouse pancreatic islets both in vitro and in vivo (Ehses et al., 2007;Inoue et al., 2018).", + "To date, systematic review of the effects of disease risk variants on processes contributing to the diabetic state has mostly been restricted to the examination of basal indices of b-cell (BC) function or insulin sensitivity (2,3).These studies have demonstrated that most, but not all, of these loci exert their primary effects on disease risk through deficient insulin secretion rather than insulin resistance (IR) (2,(4)(5)(6).", + "The role for pro-inflammatory cytokines in regulating insulin action and glucose homeostasis and their function in T2DM has been suggested by several lines of evidence.Obesity, T2DM, and inflammation: Molecular mechanism(s) of associationIn obese people, insulin resistance is linked to the increased release of adipocyte-derived bioactive metabolites (ADBMs) such as lipids, free fatty acids, monocyte chemoattractant protein-1 (MCP-1), and pro-inflammatory cytokines. 30It should be emphasized, however, that although obesity is viewed as a predisposing factor to insulin resistance, other factors may also contribute.A study of young, insulin-resistant, lean offspring of patients with T2DM and insulin-sensitive controls of similar body mass index (BMI) showed similar plasma concentrations of TNF-, IL-6, and adiponectin between the insulin-resistant and insulin-sensitive groups. 34his suggests that in lean people, systemic inflammation may not play a significant role in the development of insulin resistance.In this case, proposed mechanisms for insulin resistance might then be attributed to a dysregulation of intramyocellular fatty acid metabolism. 14In the liver this would also include an altered expression of transcription factor 6- (ATF6) which controls expression of gluconeogenic genes. 35enetic predisposition also may contribute to the development of T2DM.Genome-wide association (GWA) and candidate gene studies over the past few years have so far uncovered 19 genes associated with T2DM. 36The disease-related genetic variants identified have high frequencies in the populations assessed although their individual contributions to increases in risk of T2DM are modest.Ongoing GWAs that target lowfrequency genetic variants and assess copy number variants (CNVs) in addition to single nucleotide polymorphisms (SNPs) are likely to identify additional loci associated with T2DM risk, and some of these may play a significant role in the risk of disease development. 36In lean subjects with T2DM, the dysregulation of fatty acid metabolism, the abnormal expression of gluconeogenic genes and the genetic predisposition necessitate the development of an additional set of biomarkers that target this subpopulation and relate to these risk factors." + ], + [ + "Key points Genome-wide association studies (GWAS) have identified >400 signals associated with the risk of type 2 diabetes mellitus (T2DM). The pancreatic islet has been identified as a key tissue involved in mediating GWAS signals in T2DM risk. Integrating genetic, epigenomic and cellular data can unlock the biology behind GWAS signals.", + "DISCUSSIONGenome-wide linkage scans aimed at identifying QTLs for type 2 diabetes and its associated traits are accumulating.However, findings seldom replicate across studies.Because type 2 diabetes represents a complex disorder with substantial clinical and genetic heterogeneity, efforts to define and identify genetically homogeneous subsamples", + "DiscussionThe present study applied a high-throughput functional genomics approach to identify the associations between genetic factors and inflammatory phenotype in patients with T1D.The results confirm a correlation between baseline immune-cell populations and ex vivo cytokine production in response to bacterial, fungal, non-microbial, and TLR ligand stimulations.We provide evidence for a direct link between T1D GWAS loci and immune functionality, particularly through circulating T cell subpopulations.We show that T cell alteration is largely driven by T1D genetics, while B cells do not show a significant association with T1D GWAS loci.The association between the proportion of CCR5+ Tregs and T1D susceptibility through CCR genes suggests that T1D-associated genetic variants contribute to alteration of immune function through a cumulative effect.Finally, out of 28 genome-wide significant genetic loci regulating immune-cell proportions and cytokine production, we identified 12 immune phenotype QTLs specific to 300DM.We also found 11 druggable genes as candidates for therapeutic intervention.Altogether, this study provides several novel insights into the genetic variability of immune traits in T1D.In the present study we aimed to comprehensively describe the immunopathological consequences of the genetic variants linked to T1D susceptibility, using a high-throughput functional genomics approach.As a part of the Human Functional Genomics Project (HFGP) (Netea et al., 2016), we carried out deep immunophenotyping in peripheral blood samples from a cohort of 243 T1D patients (300DM) using cell subpopulation composition and cytokine production upon stimulations as proxies of immunological function.Part of the results were then compared to those obtained in a populationbased cohort of 500 healthy individuals (500FG) that successfully characterized the impact of genetic factors (Aguirre-Gamboa et al., 2016;Li et al., 2016) on immune responses in healthy individuals.Here, we systematically evaluate the genetic regulation of the immune phenotypes in T1D and show how genetic variations affect immune-cell traits and cytokine production in response to stimulations.In total, we identify 15 genome-wide significant genomic loci (p-value < 5 10 -8 ) associated with immune phenotypes in the 300DM cohort, including 12 novel loci that have never been reported in any healthy population study.These data provide a deeper understanding of the immune mechanisms involved in the pathophysiology of T1D and affecting the general inflammatory response and may open avenues toward the development of novel diagnostics and potentially immunotherapies.", + "These GWA studies, as well as detecting new loci, provided the first 'genome-wide' perspective of the landscape of T2D susceptibility and thereby enabled clearer 'bench-marking' of other claimed T2D-susceptibility effects for which the accumulated evidence from candidate-gene studies remained somewhat equivocal [40].Examples include variants in the genes encoding calpain-10 (CAPN10; thought to be involved in b-cell function), insulin (INS; an obvious candidate) and PC-1 (ENPP1; the product of which is known to modulate insulin-receptor function).None of these genes has featured prominently in GWA analyses to date and, although this does not necessarily exclude a contribution to T2D predisposition, it indicates that the main effects attributable to these variants are small and/or subject to substantial modification by genetic background or environmental exposures.Either way, it seems likely that exhorbitantly large sample sets will be required before such signals can attain the standard of proof now available for the loci described in Table 1.", + "Genome-wide association studies (GWAS) have made a significant contribution to our current knowledge of the role(s) of genetic variation in population-level susceptibility to T1D (Mychaleckyj et al., 2010).", + "IntroductionGenome-wide association studies (GWAS) have identified approximately 80 loci robustly associated with predisposition to type 2 diabetes (T2D) [1][2][3] and a further 70 influencing a range of continuous glycemic traits [4][5][6][7][8][9][10] in non-diabetic subjects.There is substantial, though far from complete, overlap between these two sets of loci.Physiological studies in non-diabetic individuals indicate that most of these loci primarily influence insulin secretion rather than insulin sensitivity, highlighting a key role for the pancreatic islets of Langerhans in the mechanistic underpinnings of these association signals [11,12].These findings have motivated efforts to catalogue the epigenomic and transcriptional landscape of human islets and to apply these findings to deliver biological insights into disease pathogenesis.Recently, it has been shown, for example, that GWAS signals for T2D and fasting glucose show significant co-localization with islet enhancers [13,14].", + "It has proven to be challenging to isolate the genes underlying the genetic components conferring susceptibility to type 1 and type 2 diabetes.Unlike previous approaches, 'genome-wide association studies' have extensively delivered on the promise of uncovering genetic determinants of complex diseases, with a number of novel disease-associated variants being largely replicated by independent groups.This review provides an overview of these recent breakthroughs in the context of type 1 and type 2 diabetes, and outlines strategies on how these findings will be applied to impact clinical care for these two highly prevalent disorders.", + "Functional pathway and network analyses of GWAS data combined with proteomic/transcriptome data, i.e. expression data, have also highlighted how candidate genes interact and may be involved in immune-related mechanisms (6)(7)(8).This has added significantly to our understanding of T1DM etiology.Finally, T1DM susceptibility variants may affect both development Pociot et al. (9) and persistence (10)(11)(12) of autoimmunity and thus might serve as potential intervention targets in clinical studies aiming at diminishing autoimmunity.ConclusionsA major challenge is to translate GWAS findings into causal variants and target genes.The Immunochip effort has greatly contributed to our understanding of disease mechanisms by identifying pathways, which could not be linked to diabetes by existing hypothetical models.Diabetes is probably a much more diverse disease than the current subdivision into T1DM and T2D implies and a more precise subdivision into subgroups may also pave the way for a more individualized medicine.A holistic systems biology approach will also be required to obtain a complete picture of how genetic variation alters a protein function leading to diabetes.The rapid technology development during the past years holds promises that this will be possible in a not too distant future.", + "IntroductionGenome wide association studies (GWAS) of type 2 diabetes mellitus and relevant endophenotypes have shed new light on the complex etiology of the disease and underscored the multiple molecular mechanisms involved in the pathogenic processes leading to hyperglycemia [1].Even though these studies have successfully mapped many diabetes risk genetic loci that could not be detected by linkage analysis, the risk single nucleotide polymorphisms (SNP) have small effect sizes and generally explain little of disease heritability estimates [2].The poor contribution of risk loci to diabetes inheritance suggests a prominent role of environmental factors (eg.diet, physical activity, lifestyle), gene environment interactions and epigenetic mechanisms in the pathological processes leading to the deterioration of glycemic control [3,4].", + "Genome wide association studies (GWAS) have transformed the study of heritable factors influencing complex diseases such as type 2 diabetes (T2D), with the current tally of established risk loci approaching 70.Each of these loci has the potential to offer novel insights into the biology of this disease, and opportunities for clinical exploitation.However, the complexity of this condition has often frustrated efforts to achieve these functional and translational advances.This review describes progress made over the past year to expand genome wide association studies, to characterize the mechanisms through which diabetes risk loci operate, and to define the processes involved in diabetes predisposition.Genome wide association studies (GWAS) have transformed the study of heritable factors influencing complex diseases such as type 2 diabetes (T2D), with the current tally of established risk loci approaching 70.Each of these loci has the potential to offer novel insights into the biology of this disease, and opportunities for clinical exploitation.However, the complexity of this condition has often frustrated efforts to achieve these functional and translational advances.This review describes progress made over the past year to expand genome wide association studies, to characterize the mechanisms through which diabetes risk loci operate, and to define the processes involved in diabetes predisposition.", + "More recently, GWA studies have become feasible in large cohorts of patients and controls.Using this approach compelling evidence for genetic variants involved in type 1 diabetes [31][32][33], type 2 diabetes [31,[34][35][36][37], age-related macular degeneration [38], inflammatory bowel disease [39], heart disease [40,41] and breast cancer [42] have already been described.", + "Molecular Biology Reports, 37: 501505. Lyssenko V, Groop L (2009) Genome-wide association study for type 2 diabetes: clinical applications. Current Opinion in Lipidology, 20: 8791. Maltecca C, Weigel KA, Khatib H, Cowan M, Bagnato A (2009) Whole-genome scan for quantitative trait loci associated with birth weight, gestation length and passive immune transfer in aHolstein Jersey crossbred population. AnimalGenetics, 40: 2734. Mardis ER (2008a) The impact of next-generationsequencing technology on genetics. Trends in Genetics, 24: 133141. Mardis ER (2008b) Next-generation DNA sequencing methods. Annual Review of Genomics and Human Genetics, 9: 387402.", + "How do we identify the major 'culprits' at the implicated genome-wide association study loci? If population-based genetics, including genome-wide association studies, have allowed progress in the identification of Type 2 diabetes loci to be rapid over the past few years, progress towards determining which of the gene variants close to the implicated loci confer altered disease risk and how (at the molecular, cellular and whole body level) has lagged some way behind.Indeed, given the number of possible single nucleotide polymorphisms and genes, unravelling these questions represents a monumental challenge, requiring multiple, complementary approaches.Nonetheless, the rewards of success, in terms of new understanding of disease mechanisms and even the identification of new targets for therapeutic intervention, are likely to be great, potentially allowing the treatment of underlying disease aetiology in a personalized (stratified) manner.", + "Background: Many genetic studies, including single gene studies and Genome-wide association studies (GWAS), aim to identify risk alleles for genetic diseases such as Type II Diabetes (T2D).However, in T2D studies, there is a significant amount of the hereditary risk that cannot be simply explained by individual risk genes.There is a need for developing systems biology approaches to integrate comprehensive genetic information and provide new insight on T2D biology.", + "INTRODUCTIONMultiple genome-wide association studies (GWASs) have correlated type 2 diabetes mellitus (T2DM) with genetic variants, yielding a large number of loci and associated gene products that are linked to the disease phenotype-often with little or no insight into the mechanism underlying that link (Hivert et al., 2014).The current challenge is to establish robust systems to systematically evaluate the role of these loci using disease-relevant cells.Previous studies have used patient samples, cell lines, or animal models to seek mechanistic insight but with significant limitations.Large variation is observed in primary patient samples, perhaps due to genetic heterogeneity, whereas animal models present major physiological and metabolic differences that hamper understanding of the precise function of human genes in T2DM.Therefore, a robust system to systematically evaluate the role of T2DM-associated genes using disease-relevant human cells will provide an important tool for diabetes research and spur the development of precision (allele-specific) therapies, exemplified by the use of sulfonylurea drugs to treat patients carrying certain KCNJ11 mutations (Gloyn et al., 2004).", + "Background: Genome-wide association studies (GWAS) have identified several hundred susceptibility loci for type 2 diabetes (T2D).One critical, but unresolved, issue concerns the extent to which the mechanisms through which these diverse signals influencing T2D predisposition converge on a limited set of biological processes.However, the causal variants identified by GWAS mostly fall into a non-coding sequence, complicating the task of defining the effector transcripts through which they operate.Methods: Here, we describe implementation of an analytical pipeline to address this question.First, we integrate multiple sources of genetic, genomic and biological data to assign positional candidacy scores to the genes that map to T2D GWAS signals.Second, we introduce genes with high scores as seeds within a network optimization algorithm (the asymmetric prize-collecting Steiner tree approach) which uses external, experimentally confirmed protein-protein interaction (PPI) data to generate high-confidence sub-networks.Third, we use GWAS data to test the T2D association enrichment of the \"non-seed\" proteins introduced into the network, as a measure of the overall functional connectivity of the network.", + "Genetic predisposition to diabetes mellitus type 2: will large collaborative efforts be able to overcome the geneticist's nightmare?" + ] + ], + "task_id": [ + "00BE70B5D71A5926E56942909C8B2A92", + "CD3820AA1BD96613F78FDF3CF5C8AB3D", + "A4CE2F2F8E08E5F16C94A1BCF540D881", + "1B8618ADB274F928B3AACAB1C71A927E", + "055110B765AA502F9AAECE68CEC0DD24" + ] +}
\ No newline at end of file diff --git a/gnqa/data/study1/datasets/human/dataset_domainexpert_general_1.json b/gnqa/data/study1/datasets/human/dataset_domainexpert_general_1.json new file mode 100644 index 0000000..02296c7 --- /dev/null +++ b/gnqa/data/study1/datasets/human/dataset_domainexpert_general_1.json @@ -0,0 +1,101 @@ +{ + "question": [ + "What are the potential benefits and risk associated with gene editing technologies like CRISPRR-Cas9?", + "How does epigenetics inluence gene expression without changing the underlying DNA sequence?", + "Describe the role of mitochondrial DNA in heredity and how it differs from nuclear DNA.", + "What are the ethical considerations surrounding prenatal genetic testing and the selective termination of pregnancies based on genetic factors?", + "Create a how-to guide for genetic sequencing." + ], + "answer": [ + "The potential benefits of gene editing technologies like CRISPR-Cas9 include precise modification of DNA sequences, which can be used to alter gene function, treat genetic diseases, improve crop species, and advance biomedical research. It can also be used for functional screening in drug development and personalized medicine. However, there are risks and limitations associated with this technology. These include off-target effects or unintended modifications, which could potentially lead to harmful consequences. There's also the risk of triggering an immune response, and the potential for wide-ranging deletions or recombination events. Ethical concerns also arise, particularly in the context of editing human genomes.", + "Epigenetics influences gene expression without changing the underlying DNA sequence through mechanisms such as DNA methylation, histone modifications, and chromatin remodeling. These processes can alter the structure of the DNA and its accessibility to transcription factors, thereby regulating gene expression. For instance, DNA methylation typically represses gene expression, while histone modifications can either enhance or repress gene expression depending on the specific modification. These changes can be heritable and are influenced by environmental and lifestyle factors.", + "Mitochondrial DNA (mtDNA) is inherited solely from the maternal lineage, unlike nuclear DNA which requires both paternal and maternal contributions. This uniparental transmission of mtDNA is ensured by complex mechanisms that eliminate paternal mitochondria from sperm during fertilization. mtDNA also exhibits a higher mutation rate than nuclear DNA, leading to significant population-level mtDNA polymorphisms. These polymorphisms can be clustered into distinct haplogroups that represent major branch points on the mitochondrial phylogenetic tree. In contrast, nuclear DNA undergoes bi-parental recombination.", + "The ethical considerations surrounding prenatal genetic testing and selective termination of pregnancies based on genetic factors include the potential for implicit pressure on individuals to violate personal ethics to reduce financial burden on society, the risk of routinization of testing leading to social or medical expectations of testing in all eligible individuals, and the potential compromise of values of informed consent and individual autonomy. There are also cultural and religious beliefs to consider, as well as the potential psychological impact on parents who may feel guilt if they are carriers of genetic conditions. Furthermore, the decision to terminate a pregnancy based on genetic factors is a joint decision between parents, and the involvement of extended family members in this process varies greatly across different cultures.", + "1. Start with an initial linkage or association: This is the first step in a genetic study where a trait or disease is linked to a specific region of the genome.\n\n2. Perform Laboratory Genotyping: This involves determining the genetic variation of an individual by examining their DNA sequence using biological assays and comparing it to a standard sequence.\n\n3. Use Genomic Sequence: The role of genomic sequence is crucial at every stage. It serves as a common thread that ties all the stages together.\n\n4. Establish True Orthology: Determine the true orthology between genes in different species. If two genes are truly orthologous, their evolution closely follows patterns of speciation.\n\n5. Apply Bioinformatics: Use bioinformatics tools and techniques to analyze the data. This can be applied to sequence data from any collection of organisms.\n\n6. Understand Genome Structure and Content: Provide an overview of genome structure and content to provide a context for subsequent discussions.\n\n7. Improve DNA Sequencing Methods: Constantly work on improving DNA sequencing methods with the ultimate goal of sequencing a human genome in a single day for a cost of about US $1,000.\n\n8. Interpret the Information: The major challenge ahead is the interpretation of this information. Understand how our genes interact with each other, and how the environment contributes to the development of health and disease.\n\n9. Consider the Implications: Understand the individual and societal implications of knowing our genome sequence.\n\n10. Keep Up with Technological Advances: With further technological advances, other previously unimaginable research approaches will become real." + ], + "contexts": [ + [ + "Gene editing has gained considerable interest with the identification of the CRISPR-Cas9 system, 27 which allows for a targeted modification in the DNA sequence of an organism.Researchers can utilize their knowledge of the basic biology of the gene and its protein function to precisely change the DNA sequence, thus altering the protein function of the gene and allowing for edits to stay within the species.Researchers at the University of Missouri used the CRISPR-Cas9 system to modify the CD163 gene such that the PRRS virus is not able to replicate inside the pig. 28This slight modification of the swine genome through gene editing keeps the pigs from succumbing to PRRS which has an annual estimated loss to the United States swine industry of over $660 million per year.Despite this benefit, given the public's concerns over food safety, it is likely that approval for such technology is years away in the US, Canada and Europe.However, in some cultures, there is a wide range of non-livestock species that are consumed.Therefore, it is conceivable that these countries and cultures may be open to transgenic/gene edited livestock.They may see the importance of useful gene editing which may lead to approval and consumption of reasonable genetically edited animal products such as those with modifications that are already found in nature or those that offer a substantial welfare benefit to society.", + "As a researcher who has devoted an entire career since 1994 to the development of genome editing tools and methods, I have been amazed by the rapid progress in the field over the last few years.Considering the widespread use of the tools, I am sure that the pace will continue to accelerate.Indeed, programmable nucleases, may eventually enable humans-products of evolution-to become masters of evolution.delivered preassembled recombinant Cas9-guide RNA ribonucleoproteins (RNPs) into animal embryos 6,9 and plant 11 and mammalian cells [73][74][75] .Indeed, Cas9 RNPs were rapidly turned over in cells 73 , reducing off-target effects and mosaicism in gene-edited organisms 11 .Cas9 RNPs can be delivered into cells by various methods, including microinjection 6,9 , electroporation 73 , lipofection 74 and protein transduction 75 .Importantly-and unlike in conventional gene therapy, where therapeutic genes are delivered via plasmids or viral vectors-Cas9 RNP delivery does not involve the use of exogenous DNA; host innate immune responses against foreign DNA are not elicited, and undesired integration of foreign DNA into the host genome is avoided.The ability to genetically modify living cells and organisms is a fundamental tool for biological research, but achieving highly specific targeted changes has been technically demanding.Genome editing has been recently democratized by the development of RGENs (see Glossary in Box 1), repurposed from the type II CRISPR-Cas9 prokaryotic adaptive immune system 1 .Unlike other programmable nucleases, namely ZFNs and TALENs, whose target specificities are determined by modifying their DNA-binding domains, CRISPR-Cas9 can be customized by replacing guide RNAs, making the system much more affordable and scalable.Cas9 nucleases have been successfully used for modifying genomes in human cells [2][3][4][5] , animals [6][7][8][9] and plants 10,11 , heralding the age of genome editing.Furthermore, Cas9 or guide RNAs have been linked to various effector proteins to enable targeted gene regulation 12,13 and epigenome modifications 14,15 .It is worth noting, however, that many of these feats had been demonstrated previously using other nucleases or DNA-binding proteins 1,16 .In this Perspective, I shed light on early genome editing platforms that laid the groundwork for the widespread use of CRISPR-Cas9 in research and medicine (Fig. 1).", + "In comparison to a transgenic approach, a gene editing technique such as CRISPR-Cas9 offers the advantage that gene-edited crops are not considered genetically modified organism (GMO) in some countries, such as the US, where the demand for natural food colorants such as anthocyanins is high.Indeed, the use of GMO crops as a source of natural pigments may be inconsistent with consumer interests.However, carrot cultivars engineered with either the transgenic or gene editing approach have not been reported so far, but their development is possible.", + "Caveats and Ethical Concerns of CRISPR-Cas ApplicationsDespite the presence of both a PAM sequence and a specific gRNA, the CRISPR-Cas9 system is not infallible.In fact, DSBs can occur at different sites in the genome, potentially causing so-called \"off-target\" effects.This eventuality remains to date the biggest concern in the field, as possible undesirable modifications must be properly identified and followed in order to guarantee safety for medical purposes.Nevertheless, there is still little evidence of the biological consequence of Cas9 off-target effects.Two recent studies describe new methods to investigate potential off-target effects in both mammals and plants [33,34].In both cases, whole-genome sequencing revealed that selective nucleotide changes, such as conversion of an adenine to a guanine, caused off-target occurrence very rarely, with a frequency comparable to the one of spontaneous mutations.However, substitution of a cytosine with a thymidine was linked to a sizable number of off-target mutations.This newly acquired information adds to the plethora of studies conducted on the safety of CRISPR, which altogether highlight the need for the establishment of clinical standards for the future use of genome-editing techniques in the clinic.Despite this and other technical challenges still ahead for CRISPR genome editing, the pace at which this technology has developed in recent years suggests many of these concerns could be addressed soon, as long as proper ethical guidelines and regulatory mechanisms are established.The notable accuracy and versatility of CRISPR-Cas for genome editing also opened the door to its use in preclinical and translational settings.In the latter case, CRISPR in vivo gene editing has led to several proof-of-concept studies that would have been unachievable without it, as in the first ever correction of inherited pathogenic mutations linked to degenerative disease in a living organism [22] and even shown to be possible in human embryos [23,24].It also has great potential in the field of precision medicine as large-scale population DNA sequencing studies have provided vast amounts of information linking particular diseases with specific genetic mutations which could, in theory, be targeted through CRISPR [25,26].This could be used during the identification and validation of potential DNA targets during the development of personalised drug or cell therapies, which will require the generation of engineered cell lines and/or animal models.Techniques such as HDR-mediated gene targeting are too labour intensive, with low targeting efficiencies and long times necessary for their establishment, and consequently are not ideally suited for drug discovery purposes.Conversely, CRISPR-Cas has been proven to be efficient for editing virtually any kind of cell line, from primary immune cells to induced pluripotent stem cells (iPSCs) [27,28].Additionally, CRISPR can also be used for functional screening in the development of combined inhibitory therapy aimed at strengthening the efficiency of targeted therapeutics.An example of the latter is shown in a study where a variation of the technology known as CRISPR interference (CRISPRi) was used in genome-wide scale to identify different survival pathways used by cancer cells after oncogene inactivation and allowing the identification of successful combination therapies [29].In terms of translational applications, the overall safety of CRISPR genome editing in humans will require long-term scrutiny before its adoption in the clinic.Nonetheless, a number of CRISPR-based clinical trials are currently in progress, including studies focused on targeting patients' own T cells in order to improve the immune response towards some forms of malignant cancer [30,31], and others aimed at correcting pathogenic mutations in the hematopoietic cells of patients with beta-thalassemia and sickle cell disease [32].ConclusionsThere is no reason to doubt that the development of CRISPR-Cas genome editing represents an unprecedented breakthrough in modern science, as it has potential applications in a wide array of disciplines ranging from agriculture, zoology and renewable energy to biomedicine and synthetic biology.This powerful tool holds promise for further elucidating the molecular causes of ageing by allowing scientists to probe genetic and epigenetic pathways with a level of sophistication that was unattainable just a few years ago.It will allow so in traditional animal and cell models of ageing, but it will also drastically accelerate the generation of refined versions of those models or even allow the development of new research approaches in non-model organisms.Moreover, CRISPR-based genome editing is already having a significant impact in research aiming to understand the cellular and molecular origins of age-related diseases, as well as developing potential treatments against them.The application of CRISPR-Cas gene editing for the treatment of age-related diseases is not over the horizon yet, as it will require the identification of causative genes and their role under a variety of contexts that could be as diverse as the ageing process is across individuals.However, CRISPR-Cas might also hold the key for solving such conundrum, as it has opened the way for achieving true personalised medicine by providing both the precision and scalability required for conducting genome-wide functional screens during the refinement of drug-and cell-based therapies for age-related diseases.Since its discovery, CRISPR-Cas technology has ignited a biological revolution by providing a highly versatile platform that allows fast and efficient genome editing in an ever-growing list of organisms.In this chapter we will first describe the most recent advances in the development and application of the CRISPR-Cas platform in biomedical research.Then we will discuss the most recent and notable basic research applications of this technology in the study of the molecular causes of ageing.Finally, we will review how CRISPR-Cas has been used for creating new models for the study of age-related diseases, as well as for manipulating diseaseassociated gene pathways.", + "CRISPR screening technologiesThe discovery of CRISPR-Cas9 as a sequence-specific programmable nuclease democratized gene editing and fueled progress in forward genetic screening [20 , 66] .Genetic screens using Cas9 with a pooled singleguide RNA (sgRNA) library allow the interrogation of seemingly all genes in a genome in a single experiment [96 , 97] [null] .Engineered Cas9 variants further extend the versatility of forward genetic screening.Catalytically inactive Cas9 (dCas9) fused with chromatin effector domains permit specific activation (CRISPRa) or inhibition (CRISPRi) of gene expression [37 , 54] .Recently developed and emerging technologies -base editors, prime editors, and Cas transposases -are beginning to enable new types of genetic screens with directed, controlled, and on demand mutations by allowing the creation of user specified modifications, such as single base conversion, deletions, and insertions [4 , 42 , 58] .", + "Coming on the heels of engineered nucleases, CRISPR-Cas9 tools have accelerated the pace of genomic research by permitting highly efficient knockouts or edits of virtually any gene in cells or model organisms.Multiple CRISPR-Cas9-based clinical trials are in progress or are expected to begin soon.Although Cas9engineered cells haven't yet demonstrated efficacy at scale, early trial results suggest that such cells are stable and don't cause acute adverse reactions in humans.Long-term safety is yet to be determined.Current applications largely focus on single-gene disorders for which gene editing can be carried out ex vivo on appropriate cells, such as bone marrow hematopoietic stem cells in the case of sickle cell anemia.Exploration is under way to develop delivery systems that can target the gene-editing apparatus to the appropriate tissue in vivo.Over the past 8 years, CRISPR (clustered regularly interspaced short palindromic repeats)-Cas9 (CRISPR-associated protein 9) technologies have emerged as accessible and adaptable tools for studying and altering genomes. 5RISPR-Cas9 can be used to induce genome edits by creating targeted DNA breaks that trigger site-specific DNA repair.In nextgeneration formats, it can also control the transcriptional output of genes or alter genome sequences using a process of nucleotide base editing that does not require repair of DNA breaks.As these technologies continue to mature, it will become increasingly possible to alter cellular genomes efficiently and accurately.", + "The type II CRISPR-Cas9 systems, repurposed from prokaryotic adaptive immune responses, are now widely used for targeted genome modifications in plants, animals, and human cells (Kim et al. 2014;Woo et al. 2015;Zuris et al. 2015).In particular, Cas9 nucleases have shown promise for gene and cell therapy (Maeder and Gersbach 2016).Typically, these nucleases are expressed or delivered in vivo using plasmid DNA or viruses (Yin et al. 2014;Ran et al. 2015).However, plasmid DNA delivery is often inefficient, especially in vivo, and can cause integration of small plasmid fragments degraded by endogenous nucleases at on-target and offtarget sites in the genome (Kim et al. 2014).Viral delivery of Cas9 can be highly efficient in vivo (Ran et al. 2015;Long et al. 2016;Nelson et al. 2016;Tabebordbar et al. 2016), but may be hampered by antibodies or T cells induced against the protein (Shankar et al. 2007;Calcedo et al. 2015;Chew et al. 2016).We and others have shown that preassembled Cas9 ribonucleoproteins (RNPs) can be delivered to human primary and stem cells and mice to modify target genes (Kim et al. 2014;Schumann et al. 2015;Zuris et al. 2015).Cas9 RNPs are rapidly turned over in cells, reducing off-target effects.Furthermore, Cas9 RNPs are unlikely to be limited by host immune systems because they function and disappear before the generation of antibodies and T cells directed against them.Currently, despite these advantages of RNPs, the difficult delivery of Cas9 RNPs in vivo limits its utility for therapeutic applications (Zuris et al. 2015).Here, we show that in vivo genome editing of an wild-type gene, whose up-regulation is responsible for pathogenesis, could be a new therapeutic modality for the treatment of nongenetic degenerative diseases.Our ultimate goal is to harness Cas9 RNPs for a clinical application of therapeutic genome surgery in patients with AMD.", + "Clustered regularly interspaced short palindromic repeat (CRISPR)-Cas nucleases have revolutionized the field of gene editing and have tremendous application in the field of molecular medicine [98][99][100][101][102].Despite a significant surge in CRISPR/Cas9mediated genome editing in various disease models, the progress in the field of AD has lagged behind substantially.We believe that genome editing can significantly improve the development of AD models and also create novel opportunities for the development of the next generation precision targeted AD gene and stem cell therapies.Since there are several excellent review articles on CRISPR/Cas9-mediated genome editing, here we will limit our focus on select recent articles that are noteworthy.CRISPR/Cas9 system can be engineered to either activate transcription (gain-of-function) or achieve gene silencing (Loss-of-function).Dahlman et al. have developed a CRISPR-based system that uses catalytically active Cas9 and distinct single guide (sgRNA) constructs to activate and knockout different genes in the same cell [103].Konermann et al. have used structure-guided engineering of a CRISPR-Cas9 complex to mediate efficient transcriptional activation at endogenous genomic loci [104].Using crystallographic studies, they have engineered a combination of sgRNA2.0,NLS-dCas9-VP64 and MS2-p65-HSF1 to develop one of the most effective transcription activation system.", + "Limitations of CRISPR-Cas9CRISPR provides a simple and easy tool not only for in vitro use but potentially also for in vivo genome editing.However, there are limitations and downsides to this approach.First, and despite considerable improvements in the technology, the risk of the offtarget effect remains and must be considered carefully.Second, DSB may lead to wide-ranging deletions or recombination events involving the on-target site (204).Third, in cycling cells, DNA double strand breaks caused by Cas9 cleavage may trigger a P53 response leading to apoptosis and enrichment for potentially oncogenic P53-deficient cells (205,206).Fourth, subjects may generate antibodies to Cas9, potentially limiting gene therapies (207,208).Genome editing tools that target the desired genomic region and allow for variants to be altered (e.g. from risk to protective), or for more substantial changes to be made (e.g. the deletion of a longer stretch of DNA harbouring a number of variants) and can help to answer each of these questions.These technologies are evolving rapidly (Figure 1 and Table 2).The most recently developed of these, Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR) technology, originally developed by Doudna, Charpentier and their colleagues (72,73) and Zhang and his colleagues (50) has become a widely used tool for this purpose.Engineered CRISPR/Cas9 technology uses a guide RNA (gRNA) to direct CRISPR-associated endonuclease (Cas) to the target DNA and generate a double strand DNA break.Correction of a mutation or variant in the target DNA sequence can then be carried out by homology-directed DNA repair (HDR) with a donor template.Since its discovery eight years ago, CRISPR technology has evolved quickly to be a critical part of the molecular biologist's toolbox.", + "INTRODUCTIONGenome editing technologies based on the clustered regularly interspaced short palindromic repeats (CRISPR)-associated endonuclease Cas9 enable rapid and efficient modification of endogenous genes in a variety of cell types, allowing for analysis of gene function in many organs in vivo.CRISPR-Cas9 induces DNA double strand breaks (DSBs) at single-guide RNA (sgRNA)-specific loci in the genome, which are repaired through either non-homologous end-joining (NHEJ) or homology-directed repair (HDR) pathways.While NHEJ introduces unpredictable pattern of insertion or deletion (indel) mutations, HDR directs a precise recombination event between a homologous DNA donor template and the damaged DNA site (Cong et al., 2013;Cox et al., 2015;Doudna and Charpentier, 2014;Heidenreich and Zhang, 2016;Jinek et al., 2012;Mali et al., 2013;Sander and Joung, 2014;Wang et al., 2013;Yang et al., 2013).Thus, HDR can be used to precisely introduce sequence insertions, deletions or mutations by encoding the desired changes in the donor template DNA.", + "CRISPR technology has rapidly changed the face of biological research, such that precise genome editing has now become routine for many labs within several years of its initial development.What makes CRISPR/Cas9 so revolutionary is the ability to target a protein (Cas9) to an exact genomic locus, through designing a specific short complementary nucleotide sequence, that together with a common scaffold sequence, constitute the guide RNA bridging the protein and the DNA.Wild-type Cas9 cleaves both DNA strands at its target sequence, but this protein can also be modified to exert many other functions.For instance, by attaching an activation domain to catalytically inactive Cas9 and targeting a promoter region, it is possible to stimulate the expression of a specific endogenous gene.In principle, any genomic region can be targeted, and recent efforts have successfully generated pooled guide RNA libraries for coding and regulatory regions of human, mouse and Drosophila genomes with high coverage, thus facilitating functional phenotypic screening.In this review, we will highlight recent developments in the area of CRISPR-based functional genomics and discuss potential future directions, with a special focus on mammalian cell systems and arrayed library screening.CRISPR technology has rapidly changed the face of biological research, such that precise genome editing has now become routine for many labs within several years of its initial development.What makes CRISPR/Cas9 so revolutionary is the ability to target a protein (Cas9) to an exact genomic locus, through designing a specific short complementary nucleotide sequence, that together with a common scaffold sequence, constitute the guide RNA bridging the protein and the DNA.Wild-type Cas9 cleaves both DNA strands at its target sequence, but this protein can also be modified to exert many other functions.For instance, by attaching an activation domain to catalytically inactive Cas9 and targeting a promoter region, it is possible to stimulate the expression of a specific endogenous gene.In principle, any genomic region can be targeted, and recent efforts have successfully generated pooled guide RNA libraries for coding and regulatory regions of human, mouse and Drosophila genomes with high coverage, thus facilitating functional phenotypic screening.In this review, we will highlight recent developments in the area of CRISPR-based functional genomics and discuss potential future directions, with a special focus on mammalian cell systems and arrayed library screening.The recent development of clustered regularly interspaced short palindromic repeat (CRISPR)/Cas9 for experimental purposes has dismantled the perception that genome editing technology is off-limits for screening in mammalian systems (Heintze et al., 2013).Since this system employs the basic principle of Watson-Crick base pairing for gene targeting, generation of libraries with whole-genome target coverage is relatively easy and cost-effective.For instance, simple protocols are available to synthesize pooled lentiviral libraries by in silico design of oligonucleotides, which can then be cloned, packaged and delivered to cells by viral transduction (Paddison et al., 2004;LeProust et al., 2010).Similarly, the generation of arrayed libraries can be achieved by following protocols originally developed for arrayed shRNA library production that have been in use for a number of years (Moffat et al., 2006).All in all, the stage is set for CRISPR to make an enormous impact on genomic screening and thus scientific discovery in the coming years, and recent demonstrations of this system have shown great promise (Shalem et al., 2015).However, a number of technical challenges must be addressed in order to maximize the benefit of this technology.In this review, we will discuss current applications of CRISPR in functional genomics and provide a perspective on future developments in this area.", + "Genome editing for crop improvementReports of CRISPR-Cas9-based genome editing first appeared in 2013 (Cong et al., 2013;Feng et al., 2013;Mao et al., 2013).Since then, genome editing technologies have proven to be powerful and efficient tools for the improvement of many crop species.At present, genome editing has been widely used to introduce/modify agronomically important traits, such as increased yield, improved nutritional quality, and resistance to biotic and abiotic stresses, in multiple crops, including rice, wheat, maize, tomato, and potato (Lu et al., 2017;Soyk et al., 2017;Tang et al., 2017;D'Ambrosio et al., 2018;Ye et al., 2018;Miao et al., 2019;Zhang et al., 2019;Zhong et al., 2019;Butt et al., 2020;Zhang et al., 2020c;Li et al., 2021b;Zhan et al., 2021).CRISPR-Cas-based genome editing has been extended to targeted mutagenesis, base editing, and precisely targeted gene/allele replacement or tagging in plants.mportantly, using CRISPR-Cas9 technology, transgenes present in the genomes of genome-edited plants can be removed by chromosomal segregation via a simple self-pollination or hybridization step.Gene editing technologies continue to be developed and utilized (Mao et al., 2013;Lu and Zhu, 2017;Lu et al., 2020)." + ], + [ + "Epigenetics was originally thought of as the inheritance of traits not solely based on DNA sequence and has evolved substantially since its inception roughly fifty years ago.DNA methylation, which generally occurs at CpG islands, is the best characterized epigenetic modification that regulates gene expression and is inheritable.Recently, the term epigenetics has broadened rather than focusing so much on heredity, with a more all-encompassing and unifying definition as \"the structural adaptation of chromosomal regions so as to register, signal or perpetuate altered activity states\" (12).Histone modifications are now widely accepted to play a role in epigenetics; however, there are questions as to what role they specifically play.Histone modifications could precede or succeed DNA methylation, and whether they initiate the transcriptional memory or simply maintain it is still debated (10).In recent years, our understanding of these epigenetic mechanisms governing gene expression patterns without changes in the basic gene coding sequence has increased dramatically.However, the relationships to pathological and disease states such as diabetes and its complications are less clear and of much current interest.", + "EPIGENETIC STUDIESAn epigenetic mechanism is a biochemical alteration to the DNA molecule that does not change the sequence of the DNA but does infl uence gene expression.Epigenetics is often defi ned as the \"study of mitotically and/or meiotically heritable changes in gene function that cannot be explained by changes in DNA sequence\" (Russo, Martienssen, & Riggs, 1996, p. 1).", + "Epigenetics refers to reversible heritable mechanisms, which can affect gene expression without underlying changes in DNA sequences, but rather via chromatin modifications.Eukaryotic chromatin is a highly condensed structure containing repeating structural subunits, the nucleosomes.Each nucleosome consists of a histone octamer assembled of two copies of each histone (H2A, H2B, H3, and H4, as well as histone variants, such as macroH2A, H3.3 and H2A.Z), wrapped around by 147 base pairs of DNA [3,4].Each core histone possesses histone-fold domains serving for the interaction of the histones and N-terminal histone-tails.These tails can be subjected to post-translational modifications, which frequently affect gene expression.These modifications include, for instance, histone acetylation, methylation, phosphorylation and ubiquitination [5].", + "IntroductionEpigenetics is used to denote the regulation of gene transcription that cannot be attributed to sequence variation in the DNA.Although the term epigenetics includes a number of different mechanisms, DNA methylation and histone modification are most commonly discussed.DNA methylation in mammals appears to be specific to cytosine, predominantly to CpG (cytosine-phosphateguanine) dinucleotides.In promoter regions, CpG sites are often clustered in CpG islands (CGIs) were methylation is believed to repress gene expression [1].Even small changes in the methylation of a promoter region can introduce stable changes in gene expression, leading to silencing of a gene [2,3].Recently, different array-and sequence-based techniques for measuring of the genome-wide DNA methylation pattern have been developed, and the different techniques have yielded concordant results [4,5].", + "Epigenetic modifications reflect transmissible changes in gene expression that are not caused by changes in DNA sequence, but rather by methylation of cytosine residues in DNA and modifications of DNA-associated proteins such as histones.Epigenetic changes are influenced by environmental and lifestyle factors ranging from exposures in utero to adulthood.These changes are flexible over time in individuals and differ among tissue types in various sites in the body, thereby making their use as biomarkers for the prediction of disease challenging (see Supplementary Data online for full references).Currently, bead array-based platforms allow the genome-wide measurement of DNA methylation at more than 400,000 sites throughout the genome.", + "Epigenetic modifications reflect transmissible changes in gene expression that are not caused by changes in DNA sequence, but rather by methylation of cytosine residues in DNA and modifications of DNA-associated proteins such as histones.Epigenetic changes are influenced by environmental and lifestyle factors ranging from exposures in utero to adulthood.These changes are flexible over time in individuals and differ among tissue types in various sites in the body, thereby making their use as biomarkers for the prediction of disease challenging (see Supplementary Data online for full references).Currently, bead array-based platforms allow the genome-wide measurement of DNA methylation at more than 400,000 sites throughout the genome.", + "Epigenetics-Same Genes, But Different OutcomesEpigenetic modifications refer to dynamic changes written on and erased in and around our genes by specialised enzymes, which do not alter the DNA nucleotide sequence itself, but instead modify how it is transcribed.DNA does not exist naked within a eukaryotic cell, but is dynamically packaged as a DNA-protein complex called chromatin, that facilitates the packaging of extraordinary lengths of DNA into the tight confines of the cell nucleus.When a gene product is needed, chromatin is selectively unwound and made 'open' to allow access to transcription factors (known as euchromatin).Potentially more so than the DNA sequence itself, changes to the structure and accessibility of chromatin significantly influence the regulation of gene expression, both between different cells and within an individual cell over its lifetime.These changes are partly determined by epigenetics.This means that the same genes can result in different phenotypes without changes in the DNA sequence.For example, every cell in the body is genetically identical and has a same gene for insulin, but only the -cells of the pancreas have permissive epigenetic changes allowing open chromatin and insulin gene transcription.Elsewhere insulin expression is silenced by repressive epigenetic changes leading to chromatin condensation (known as heterochromatin).In the same way, genetically identical twins can become progressively more different as they age through accumulating epigenetic changes, even though their genetic similarity never changes.", + "Although the current ecological epigenetics literature is primarily focused on DNA methylation, other epigenetic modifications can alter gene expression.Histone modifications alter the way DNA is packaged and change the accessibility of the packaged DNA for transcription.These modifications can also interact with DNA methylation (Richards and Elgin 2002;Rapp and Wendel 2005).The activity of transposable elements, regions of DNA that have the ability to move within the genome and integrate into new sites, are regulated primarily by small interfering RNAs or by DNA methylation (Kazazian 2004;Kejnovsky et al. 2012;Richards et al. 2012a;Slotkin et al. 2012).Transposable elements have the potential to alter gene expression and function when inserted within coding regions, so regulation of these areas of the genome is highly important (Kazazian 2004;Feschotte 2008).Small interfering RNAs are active in DNA methylation pathways and histone methylation pathways.Similarities between these pathways in animals and plants suggest evolutionary conservation in these epigenetic processes (Saze et al. 2012).", + "Epigenetics is concerned with the study of heritable changes otherthan those in the DNA sequence and encompasses two major modifications of DNA orchromatin: DNA methylation and post-translational modification of histones (Callinan and Feinberg, 2006). These modifications are critical regulatory cues, makingDNA more or less accessible to DNA-binding proteins. Preliminary evidence suggeststhat epigenetics is something that geneticists must think about in their genetic analysis. Flanagan et al. (2006) demonstrated evidence of significant epigenetic variabilityin human sperm cells, suggesting that epigenetic patterns can be efficiently transmitted across generations, possibly influencing phenotypic outcomes in health and disease.", + "Epigenetics refers to mechanisms that can modify gene expression and phenotype without changes in the underlying DNA sequence (26,27).Although different cell types in a person are assumed to have identical DNA sequences, they possess distinct differences in their epigenetic information, such as DNA methylation and posttranslational modifications (PTMs) of histone proteins contained in the chromatin.Epigenetic modifications can occur when individuals are exposed to environmental factors, such as infections and nutritional changes, and can predispose them to diseases such as diabetes (28).Nucleosomes, the basic subunits of chromatin, consist of octamers of histones H2A, H2B, H3, and H4, wrapped by DNA.PTMs of histones (e.g., acetylation, methylation, phosphorylation, and ubiquitylation) form an epigenetic layer together with DNA methylation (29)(30)(31), which affects gene transcription.Acetylation of histones at lysine residues is generally associated with transcriptionally active genes, whereas lysine methylation leads to gene activation or repression, depending on the specific site and level of methylation (29,30,32).Alterations in histone PTMs and their interactions with other nuclear proteins at gene promoters or other regulatory regions can lead to relatively stable epigenetic changes that alter chromatin structure.In turn, this can lead to long-term dysregulated gene expression and disease progression.", + "The third epigenetic mechanism that influences gene expression is DNA methylation, which involves the addition of a methyl group to cytosine bases that are adjacent to a guanine base in a sequence of DNA.This combination of cytosine bases that are adjacent to nucleotides containing guanine (i.e., CpG dinucleotides) are often located near promoter sequences of many genes.Unlike chromatin remodeling and histone modifications, DNA methylation is heritable.That is, when DNA is replicated, the CpG dinucleotides on the newly synthesized strand are methylated before the cell divides.Thus, the two daughter cells that are produced by mitosis have the same DNA methylation pattern as the parent cell.When CpG dinucleotides are unmethylated, the genes are transcriptionally active.Methylated CpG dinucleotides, however, are transcriptionally inactive.", + "Things are made worse by the fact that we now know that exposure to toxic materials and stress can alter gene expression epigenetically.epigenetic change refers to changes in gene expression that do not involve any change in nucleotide sequence (Strachan and read 2011).These effects can be transient within the life span of an individual or they can be passed on to future generations.We know of at least three mechanisms that can accomplish these changes: DNA methylation, chromatin remodeling and histone modification, and microrNAs (Strachan and read 2011;Babenko, Kovalchuk, and Metz 2012).DNA methylation involves the addition of a methyl (Ch 3 -) group to the 5th carbon of a pyrimidine ring.The majority of these are found in CpG dinucleotides.human DNA is associated with histone proteins.These proteins are rich in the amino acids lysine and arginine.Usually 147 base pairs are wrapped around a complex of eight core histone molecules (called a nucleosome).histones are subject to different modifications that impact gene expression, these include acetylation (adding Ch 3 Ch 2 -group), and up to three methyl groups to the amino acid lysine, or phosphorylation of serines.This is accomplished by a large family of enzymes, histone acetyltransferases (hATs), histone methyltransferases (hMTs), and histone kinases.Generally these enzymes are associated with the expression of genes; while histone deacetylases, histone demethylases, and histone phosphatases reverse these effects.Small rNAs can act to alter gene expression either at the level of transcription (DNA -rNA) or of translation (mrNA -amino acid).", + "Epigenetic factors influence the regulation of gene expression without altering the DNA sequence and act as the bridge that links the intrinsic and extrinsic signals [6].The most common epigenetic modifications include DNA methylation, histone modification and RNA-based mechanisms [7].In particular, DNA methylation is one of the best-studied epigenetic modifications in recent decades, and plays a crucial role in many biological processes, such as development, differentiation, genomic imprinting and X chromosome inactivation (XCI) [8,9].With the accumulation of findings, the biological importance of DNA methylation attracts more and more attention.For example, several studies suggest that DNA methylation participates in the process involving the formation and stabilization of memories [10,11].A recent study demonstrated that demethylation by silencing DNA methyltransferase enzymes (DNMTs) affects cell survival [12].More interestingly, growing evidence is revealing that variations in DNA methylation caused by environmental stimuli can be transmitted from parents to offspring [13].", + "The field of \"epigenetics\" identifies the instructions (mechanisms) of gene expression (changes in the chromosome) caused by mechanisms other than changes in the DNA sequences.These instructions are important for normal functionality while their malfunction may lead to ageing, cancer, diseases, and maladaptive behavioral traits (Plomin, 2003).", + "Epigenetics refers toheritable gene expression changes that occur without DNAsequence alterations, and includes DNA methylation andhistone modifications such as acetylation, methylation, andubiquitylation. These modifications can result in either geneactivation or gene repression. That epigenetic gene repres-Immunogenetics (2008) 60:411422sion appears to be involved in the maintenance of stemness became apparent when a number of developmentalregulators were found to be epigenetically silenced inmurine ESCs and activated upon induction of ESCdifferentiation (Bernstein et al. 2006; Boyer et al. 2006).", + "Epigenetics is concerned with the study of heritable changes otherthan those in the DNA sequence and encompasses two major modifications of DNA orchromatin: DNA methylation and post-translational modification of histones (Callinan and Feinberg, 2006). These modifications are critical regulatory cues, makingDNA more or less accessible to DNA-binding proteins. Preliminary evidence suggeststhat epigenetics is something that geneticists must think about in their genetic analysis. Flanagan et al. (2006) demonstrated evidence of significant epigenetic variabilityin human sperm cells, suggesting that epigenetic patterns can be efficiently transmitted across generations, possibly influencing phenotypic outcomes in health and disease.", + "Epigenetic changes, such as DNA methylation and histone modifications, are also heritable and affect mRNA expression (14).These changes to the structure of DNA do not affect the sequence and can change with advancing age (15).It appears that genetic and epigenetic variations exert their effects by altering either the amount of RNA transcribed from a gene or the relative proportion of alternatively expressed isoforms produced by the alternative splicing mechanisms.These ultimately affect other downstream elements of the pathway, such as binding partners or inhibitors, resulting in a change in phenotype.It will therefore be necessary for future research programs to integrate genetic variation, epigenetics, and associated gene expression profiles to understand the origins of heritable traits and diseases.Such mechanistic understanding may contribute to the discovery of new therapeutic targets for aging pathologies (16).", + "EpigeneticsChanges arising from alterations in gene expression levels that are caused by reversible chemical modification of DNA, but not changes to the DNA sequence passed on from parents to offspring.", + "Epigenetics is concerned with the study of heritable changes otherthan those in the DNA sequence and encompasses two major modifications of DNA orchromatin: DNA methylation and post-translational modification of histones (Callinan and Feinberg, 2006). These modifications are critical regulatory cues, makingDNA more or less accessible to DNA-binding proteins. Preliminary evidence suggeststhat epigenetics is something that geneticists must think about in their genetic analysis. Flanagan et al. (2006) demonstrated evidence of significant epigenetic variabilityin human sperm cells, suggesting that epigenetic patterns can be efficiently transmitted across generations, possibly influencing phenotypic outcomes in health and disease.", + "Epigenetics is concerned with the study of heritable changes otherthan those in the DNA sequence and encompasses two major modifications of DNA orchromatin: DNA methylation and post-translational modification of histones (Callinan and Feinberg, 2006). These modifications are critical regulatory cues, makingDNA more or less accessible to DNA-binding proteins. Preliminary evidence suggeststhat epigenetics is something that geneticists must think about in their genetic analysis. Flanagan et al. (2006) demonstrated evidence of significant epigenetic variabilityin human sperm cells, suggesting that epigenetic patterns can be efficiently transmitted across generations, possibly influencing phenotypic outcomes in health and disease." + ], + [ + "Oxidative stress and mitochondrial DNANot long after it was discovered that mitochondria have their own genetic apparatus, Harman proposed that mitochondria play a central role in the free radical theory of aging [16].This idea was developed further by Miquel et al. [330], and the notion that mtDNA mutagenesis played a role in aging took hold.The phenotypical importance of mutations in mtDNA was demonstrated by Wallace et al. [331] and Holt et al. [332], who first showed that Leber's hereditary optic neuropathy and mitochondrial myopathies were caused by mtDNA mutations (reviewed in [333]).Because mtDNA is so close to the site of mitochondrial ROS production, it is exposed to considerably higher oxidative stress, resulting in 3-fold higher levels of DNA oxidative damage (the previously quoted 20-fold figure is apparently due to an isolation artifact [334,335]).In the 1990s a series of papers reported that the frequency of mitochondrial DNA deletions increases dramatically with age, being essentially undetectable in young individuals and reaching levels as high as 2% of mtDNA in old individuals.This age-related increase in mtDNA deletions was found in organisms as diverse as worms, mice, and humans (reviewed in [24,336]).The same is also true with mtDNA point mutations [337,338].Certain mtDNA polymorphisms have been found in increased frequency in centenarians, implying a protective effect during aging [339][340][341].Similar protective effects of mtDNA polymorphisms have been reported for the age-related neurodegenerative condition, Parkinson's disease [342].", + "Variation in the structure and function of mitochondria underlies variation in organismal energetics broadly (Seebacher et al., 2010) and evidence for the importance of mitochondrial function in the evolution of natural populations continues to accumulate (Ballard and Melvin, 2010;Glanville et al., 2012;Hicks et al., 2012;Kurbalija Novii et al., 2015).For example, variation in mitochondrial DNA sequences (mtDNA) can determine whole-organism metabolism, i.e., the rate at which organisms process energy from their environment, a phenomenon widespread across animal taxa (Arnqvist et al., 2010;Ballard et al., 2007;Ballard and Pichaud, 2014;Havird et al., 2019;Hood et al., 2018;James et al., 2016;Wolff et al., 2014).Specifically, mtDNA sequence variants are linked to functional metabolic differences in fish (Chapdelaine et al., 2020;Flight et al., 2011;Healy et al., 2019), birds (Scott et al., 2011), and mammals (Fontanillas et al., 2005), including humans (Amo and Brand, 2007;Dato et al., 2004;Niemi et al., 2003;Tranah et al., 2011).These mtDNA variants are often correlated with environmental factors such as temperature and altitude (Storz et al., 2010).However, other studies attempting to link mitochondrial function to mitochondrial DNA (mtDNA) sequence variation or environmental factors have offered mixed reports (Amo and Brand, 2007;Flight et al., 2011;Fontanillas et al., 2005;Hicks et al., 2012).The results here point to several potentially fruitful research directions.We have identified how nonsynonymous mutations in the mitochondrial genome associate with variation in whole-organism metabolism (including CytB, ND1, ND5 and ND6).A next step will be to characterize the molecular details of how these changes affect molecular function.It would also be beneficial to describe how variation in cellular oxygen consumption rate scales up to determine whole-organism metabolic rate across a range of temperatures, thus identifying potential mismatches across levels of organization that may impact organismal performance (Gangloff and Telemeco, 2018).While the interconnected processes that shape organismal and population-level responses to environmental variation do not lend themselves to simple narratives, and many molecular processes interact to produce the emergent ecotypic divergences at the phenotypic level, it is clear that the mitochondria play a central role even as that role may change across populations and ecological contexts (Fig. 1).Research within well-characterized natural systems, such as these garter snake populations, can offer illustrative case studies of how mitochondria respond to their environments, and thus impact physiological pathways and evolutionary patterns, creating variation in life histories and aging.Despite the complexities underlying observed variation in mitochondrial function, recent work has demonstrated examples of how evolution and plasticity in mitochondrial function across populations within a species can shape life histories.For example, evidence from Drosophila has demonstrated the effect of temperature on components of the ETC and has linked mtDNA variants to metabolic thermosensitivity (Pichaud et al., 2012), to differences in whole-organism metabolic rates (Kurbalija Novii et al., 2015), and to fitness-related traits (Ballard et al., 2007;Pichaud et al., 2011;Pichaud et al., 2010).In general, studies in birds and mammals demonstrate that mitochondria of longer-lived species are more efficient in ATP production, produce less reactive oxygen species, and demonstrate increased antioxidant capacities (Barja and Herrero, 2000;Ku et al., 1993;Lambert et al., 2007).While some studies in lizards and snakes demonstrate a similar pattern (Olsson et al., 2008;Robert et al., 2007), the extent to which these results are generalizable across vertebrate taxa is not yet known.The diversity of life-history traits and immense variation in longevity demonstrated by reptiles, both within and among species, make these taxa ideal candidates for understanding how variation in mitochondrial physiology drives this variation in whole-organism traits (reviewed in Hoekstra et al., 2019).Such work has moved to the forefront with a recent focus on the ecological and evolutionary significance of aging processes in wild populations (reviewed in Nussey et al., 2013;Fletcher and Selman, 2015;Gaillard and Lematre, 2020).Over evolutionary time, differential mortality rates are a selective force in shaping genetic structure.This results in divergence of a variety of physiological networks that shape, ultimately, patterns of aging and longevity in different habitats (Monaghan et al., 2008;Stojkovi et al., 2017).Such selective pressures can have differential effects on the nuclear and mitochondrial genomes (McKenzie et al., 2019;Wolff et al., 2014).Genetic variation in the mitochondrial genome is known to drive mitochondrial function in many species (Ballard and Melvin, 2010;McKenzie et al., 2019;Novelletto et al., 2016) and we find this in our system as well.Whole organism metabolic rate varies with the mitochondrial genome haplogroups we identified in this study.T. elegans individuals with the introgressed T. sirtalis mitochondrial genome had the lowest metabolic rate and had 68 amino acid changes in the ETC genes relative to the T. elegans mitochondrial genomes.As species divergence are a continuation of population divergence, this introgression provides additional insight into how genetic variation can alter mitochondrial function.Whether the lower metabolic rate in our snakes with the introgressed mitochondrial genome is due to the fixed amino acid changes between the species or a mismatch between the coadapted nuclear and mitochondrially-encoded ETC proteins that could alter function of the mitochondria (Burton et al., 2013;Haenel, 2017;Rawson and Burton, 2002;Toews et al., 2014;Wolff et al., 2014) will require further comparisons to T. sirtalis individuals.Building on previous work in this system, the current study tests three primary hypotheses about how variation in mtDNA and mitochondrial function relate to variation in life-history traits and aging within this system (Fig. 1): (1) First, we test whether rates of cellular oxygen consumption in isolated immune cells exhibit patterns that are consistent with the hypothesis that cellular processes drive whole-organism senescence and aging, and if these patterns differ between the SA and FA ecotypes and between sexes.By measuring basal, ATP-production associated, and maximal rates of cellular oxygen consumption, we further test for evidence that phenotypic divergence is dependent on a specific aspect of oxidative phosphorylation within immune cells.The energetics of these cells are particularly important given their essential role in modulating disease and infection, important factors contributing to senescence (Metcalf et al., 2019).We predict that SA snakes will maintain levels of cellular oxygen consumption across age, whereas the FA snakes will show a decline with age, especially in ATP-associated rates, possibly due to continual degradation of electron transport chain functionality from accumulating oxidative damage and reduced DNA repair mechanisms (Robert and Bronikowski, 2010;Schwartz and Bronikowski, 2013). ( 2) Second, we expand our mitochondrial genomics dataset to quantify mtDNA genetic structure across the landscape and test whether mtDNA haplotypes, and alleles at a nonsynonymous SNP in the Cytochrome B (CytB) gene correlate with aging ecotypes. (3) Third, we test the hypothesis that variation in mtDNA correlates with whole-organism variation in metabolic rates, suggesting a pathway linking mitochondrial genetic variation in mtDNA to whole-organism energetics.We first test whether different haplotypes differ in resting metabolic rate.Then, we test the effects of the nonsynonymous SNP in CytB on resting metabolic rate.The CytB gene encodes a component of complex III of the ETC, and was previously found to segregate between these life-history ecotypes (Schwartz et al., 2015).This SNP results in an amino acid substitution from isoleucine (aliphatic, hydrophobic) to threonine (hydrophilic) on a region that comes into close contact with a nuclear-encoded subunit (Schwartz et al., 2015).We combine previously published and new data on whole-organism resting metabolic rates (oxygen consumption) to test for the effects of this nonsynonymous mutation in three populations where we find heterogeneity at this nucleotide, thus allowing us to disentangle the effects of shared environment (population) from sequence variation (SNP).We predict that this SNP will correlate with variation in whole-organism metabolic rate, demonstrating a putatively adaptive difference between the derived and ancestral sequence.By utilizing this integrative data setfrom genes to organelles to whole organisms to populationsin a known life-history context, we are able to test hypotheses across levels of organization to provide a more complete picture of the complicated story of mitochondria and life history (Havird et al., 2019).", + "mtDNA DiversityUnlike the nuclear genome, which requires both paternal and maternal contributions, mtDNA is inherited solely from the maternal lineage.It is unclear what advantage a uniparental mtDNA transmission confers, but one possibility is to minimize the number of distinct genomes to maximize the efficiency of a multi-genomic system (Hill et al. 2019).In fact, humans have developed complex, redundant mechanisms to ensure uniparental inheritance of mtDNA (DeLuca and O'Farrell 2012; Rojansky et al. 2016).Paternal mitochondria from sperms that enter into the egg during fertilization are actively and selectively eliminated via mitophagy through two E3 ligases, PARKIN, and MUL1 (Rojansky et al. 2016).PARKIN and MUL1 serve redundant purposes, and mitophagy becomes insufficient to eliminate paternal mtDNA only in the absence of both (Rojansky et al. 2016).Even though oocytes have at least a thousand-fold more mitochondria than a sperm cell (Rojansky et al. 2016) and heteroplasmy levels would be very low if paternal mtDNA were to contaminate the embryo, the results can still be non-trivial.However, challenging this notion, a recent study provides evidence of potential paternal transmission (Luo et al. 2018), but awaits further corroborating studies (Lutz-Bonengel and Parson 2019).MtDNA exhibit a higher mutation rate than nuclear DNA, leading to significant population-level mtDNA polymorphisms (van Oven and Kayser 2009; Wallace 1999; Wallace and Chalkia 2013).In fact, the co-evolution of the mitonuclear genomes has been proposed to be driven by mtDNA mutations that select for compensatory changes in the nuclear genome (Havird and Sloan 2016).Populations that share similar mtDNA polymorphisms can be clustered into distinct haplogroups that are designated using all letters of the alphabet (i.e., A through Z).The mtDNA haplogroups represent major branch points on the mitochondrial phylogenetic tree that have strong regional ties around the globe, thus supporting the concept of a 'mitochondrial eve' (Wallace 1999).Haplogroups present inherently different mitonuclear interactions (Zaidi and Makova 2019), which eventually affect the aging process (Wolff et al. 2016).For example, one haplogroup commonly found in Ashkenazi Jews can interact with a specific enrichment of an amino acid sequence in complex I, and result in altered susceptibility to type 2 diabetes mellitus (Gershoni et al. 2014).The effect of mitonuclear compatibility on lifespan is influenced by environmental cues in flies (Drummond et al. 2019).It is unclear if mitonuclear compatibility is invariable throughout an organism's life, or antagonistically pleiotropic during aging, making it a difficult moving target to understand.", + "Background: The accumulation of mitochondrial DNA (mtDNA) mutations, and the reduction of mtDNA copy number, both disrupt mitochondrial energetics, and may contribute to aging and age-associated phenotypes.However, there are few genetic and epidemiological studies on the spectra of blood mtDNA heteroplasmies, and the distribution of mtDNA copy numbers in different age groups and their impact on age-related phenotypes.In this work, we used whole-genome sequencing data of isolated peripheral blood mononuclear cells (PBMCs) from the UK10K project to investigate in parallel mtDNA heteroplasmy and copy number in 1511 women, between 17 and 85 years old, recruited in the TwinsUK cohorts.", + "DiscussionTwo significant questions are raised by the findings that mitochondrial DNA can integrate into the nucleus.Firstly, is this an extraordinarily rare event or is it occurring continually and at high frequency?Secondly, can such an event have pathological consequences to the organism?", + "PhylogenyThe mtDNA is maternally inherited (120) by offspring through the oocyte cytoplasm; namely, the mother transmits her mtDNAs to all of her offspring, and her daughters transmit their mtDNAs to the next generation.This is the consequence of the fact that the mature oocyte such as mouse (304) or bovine (144) contains lOO-1,000 times more mtDNA than is found in somatic cells.Hence, the few sperm mtDNAs that enter the egg (130) have little effect on the genotype.The maternal inheritance results in sequentially diverged mtDNA polymorphism of modern human, as shown in Figure 2. The polymorphism derives from the combinations of small deletions and additions of <14 bp in noncoding region and base substitutions including some point mutations in coding region.There have been few reports on distinct correlation between mitochondrial morphology and human aging, except changes in number and size of mitochondria associated with age.Concerning the gross structure of mitochondria, the overwhelming importance of the cell nucleus in mitochondrial biogenesis should be noted, because the major parts of mitochondrial proteins are encoded by nuclear genes that are stable during life with the efficient repair mechanism for nDNA.Early data on DNA polymorphism detected by restriction endonuclease (263) have suggested that the evolutionary change of mtDNA in higher animals occurs mainly by nucleotide substitution rather than by deletion and insertion.The mtDNA nucleotide sequence evolves 6-17 times faster than comparable nuclear DNA gene sequences (51,52,405).Rapid evolution of mtDNA of higher primates including human, 0.02 base substitutions per site per million years, was calculated from the restriction map of mtDNA (51).Because orthodox recombination mechanism appears to be absent in mtDNA (128), germline mutation seems to go down to posterity as maternal inheritance from our common ancestor (57).", + "A number of conclusions may be drawn from these results.Firstly, the data begin to answer the question of how closely mtDNA replication is kept in synchrony with nuclear DNA replication: it would appear to be regulated not by direct coupling to the nuclear DNA replication, but rather by the cell mass to be serviced by mitochondria.", + "It may be that high mtDNA levels are indeed indicative of compromised mitochondria, but that the underlying defects are unrelated to alterations in the DNA sequence.Alternatively, elevated quantities of mtDNA might be associated with increased metabolic requirements of the embryo, rather than organelles of suboptimal function.It is possible that embryos produced by older oocytes are under some form of stress and therefore have larger energy requirements.Functional experiments will be required to address these questions.Whatever the underlying basis, the current study has unequivocally demonstrated that female reproductive aging is associated with changes in the mtDNA content at the blastocyst stage.", + "Age-associated alterations of the mitochondrial genome occur in several different species; however, their physiological relevance remains unclear.The age-associated changes of mitochondrial DNA (mtDNA) include nucleotide point mutations and modifications, as well as deletions.In this review, we summarize the current literature on age-associated mtDNA mutations and deletions and comment on their abundance.A clear need exists for a more thorough evaluation of the total damage to the mitochondrial genome that accumulates in aged tissues. 1997 Elsevier Science Inc.", + "Mitochondrial geneticsOne underexplored avenue for determining maternal risk for preterm birth involves the influence of the mitochondrial genome.The high mutation rate of mito chondrial DNA (mtDNA), together with the fact that most of its encoded proteins are evolutionarily con served, allowing for the selection of neutral or beneficial variants, has generated interest in defining human mtDNA variations and their roles in human biology [58].", + "Clearly, as mitochondrial metabolic and genetic therapies advance for treating mitochondrial disease, they will also be available to enhance the personal lives of others.However, mitochondrial genetic variation appears to have been one of the primary factors that permitted our ancestors to adapt to new environments, survive adverse conditions, and multiple throughout the globe.Is it possible that by taking over control of individual mtDNA variation, we might also be setting our species on the road to functional decline and ultimately extinction?Mitochondrial therapeutics and performance enhancementIt is now clear that not all mtDNA variation is deleterious.Indeed, about 25% of all ancient mtDNA variation appears to have caused functional mitochondrial changes and thus been adaptive.Those mtDNA variants that are adapted to warm climates have mtDNA variants that result in tightly coupled OXPHOS, thus maximizing ATP output and minimizing heat production.The presence of these mtDNAs permits maximum muscle performance but also predispose sedentary individuals that consume excess calories to multiple problems.They would be prone to be overweight and their mitochondria would generate excessive ROS, thus making them susceptible to a variety of degenerative diseases, cancer and premature aging.Partially uncoupled mitochondria generate more heat, but at the expense of ATP production.Individual's with these variants are better able to tolerate the cold, and are less prone to obesity.They also generate less ROS making then resistant to degenerative diseases and aging.Finally, the mitochondria are why we breathe.Hence, mitochondrial variation might be an important factor in individual predisposition to altitude sickness.", + "Human mtDNA codes for 13 essential polypeptide components of the mitochondrial oxidative phosphorylation (OXPHOS) system.mtDNA undergoes strict maternal inheritance, resulting in the absence of bi-parental recombination (Elson et al., 2001) and has a high mutation rate (Tuppen et al., 2010).As such, the evolution of mtDNA is characterised by the emergence of distinct lineages (or haplogroups) (Hernstadt et al., 2002).This results in high levels of mtDNA variation at the population level despite its rather small size, which is also illustrated by the large number of sub-haplogroups (van Oven and Kayser, 2009).Africa" + ], + [ + "A number of additional ethical implications must be considered.Associating financial investments with the prevention of disease, especially where reproductive decisions are involved, requires sensitivity, caution, and ethical rigor.Funding decisions based on imputed cost-savings must not result in implicit pressure on individuals to violate personal ethics to reduce financial burden on society.As discussions regarding prenatal testing have demonstrated, 35 is risk that \"routinization\" of testing may lead to social or medical expectations of testing in all eligible individuals.These expectations, if linked with financial incentives for the health system, could risk applying implicit pressure on serious, and potentially irreversible, personal decisions.Such expectations, if applied at the population level, could risk becoming normalized, compromising the values of informed consent and individual autonomy.With regard to pregnancies affected by a genetic condition identified through population carrier screening, we modeled the decision to terminate affected pregnancies conservatively (0.50).This is despite the literature suggesting rates above 0.90 for elective TOP for conditions such as Down syndrome 33 and SMA. 34We recognize this issue is controversial, and that laws and ethical positions vary considerably between countries/ jurisdictions.Variations in population attitudes based on age, religion, and other factors, as well as the criticality of preserving individual choice, were acknowledged in adopting this highly conservative estimate.", + "The use of genetic testing from pre-conception through adulthood is expanding rapidly.As a result of this expansion, new ethical issues are emerging related to genetic testing and informed consent.These new issues create ethical challenges for nurses and all healthcare providers.Currently expanding areas include newborn screening and genetic testing of children.These new ethical challenges will be described below.The use of genetic testing from pre-conception through adulthood is expanding rapidly. Psychological risks for parents who are carriers may include parental guilt.", + "Ethnic and cultural backgrounds may also play a role in the decisions that families make regarding prenatal testing.Moyer et al. (1999) concluded that Caucasian women more often undergo prenatal diagnoses than African American or Asian women, or Latinas.Furthermore, Awwad et al. (2008) found American couples less inclined to involve extended relatives in the prenatal decision-making process than Native Palestinian couples.Both of these examples clearly indicate that cultural differences can impact the ways in which families negotiate prenatal decisions.Further research needs to investigate how different families engage in such discussions and decision-making processes, especially as prenatal testing becomes more common and better able to predict or prevent a wider range of genetic conditions.Tightly closed ethnic groups remain at high risk of serving as carriers for genetic mutations, but the management of this possibility varies greatly.For example, some Ashkenazi Jewish groups use screening for mutations for Tay-Sachs disease (TSD) as the basis for rabbinical marriage advice; whereas, children born to Amish families in Pennsylvania more often present with glutaric aciduria type 1 (GA1) but, given their beliefs, parents tend not to accept prenatal testing because of the implication of abortion (McKusick, 2000).Researchers studying factors that contribute toward a couple's choice to undergo prenatal testing have determined that partners base their decision upon several factors, including, but not limited to: parental beliefs about abortion, attitudes regarding disability and their \"perceptions of the usefulness of having the information revealed by genetic tests\" (Moyer et al., 1999, p. 522).Abortion beliefs constitute a key issue in the decision-making process.Even though a majority of parents receiving abnormal prenatal test results terminate their pregnancies (Redlinger-Grosse, Bernhardt, Berg, Muenke, & Biesecker, 2002), Moyer et al. noted that, when asked, more families reported that they would make use of prenatal testing than would be willing to terminate a pregnancy.The decision to continue or terminate a pregnancy after prenatal testing Downloaded by [University of the Sunshine Coast] at 10:32 05 August 2017 comprises a joint decision between both parents (e.g., Awwad et al., 2008;Beeson & Golbus, 1985); however, the nature of the conversations leading to the decision and the involvement of extended family members in the decisionmaking process remains highly understudied.The Genetic Divide(s) and CommunicationThe ability of scientists to \"map\" disease through several generations (Collins, 1999) raises practical and ethical issues of access to resulting opportunities and creates family communication challenges.Currently, prenatal testing for chromosomal diseases has become increasingly common (Moyer et al., 1999).Options such as pre-implantation genetic diagnosis (PGD) can identify over 1,250 disease-related mutations creating an opportunity for parents to select unaffected embryos for implantation in the womb (R. M. Green, 2008).Test results provide potential parents with information that may lead to decisions involving intervention in the genetic makeup of future children.Although some families welcome such options, others may be unable or unwilling to consider such procedures, due to fi nancial concerns or moral/ethical/religious beliefs.", + "Privacy IssuesFinally, privacy issues should be seriously considered when the use of genetic testing is contemplated, especially with respect to whole-genome sequencing of healthy people.It is an unanswered question under what circumstances, to what extent, and by what means genetic data should be incorporated into the medical record.Although easy access to such data could be helpful to providers in improving patient care, it remains to be seen how other parties (eg, insurance companies) might act on the data in ways that do not benefit patients.The US Congress acted to prohibit discrimination by employers and health insurers on the basis of genetic testing with the Genetic Information Nondiscrimination Act in 2008, but further safeguards will undoubtedly be needed as the health implications of genetic data become clearer.", + "The ethical evaluation of genetic testing in children is traditionally based on the balance of clinical benefits and risks (American Society of Human Genetics Board of Directors and the American College of Medical Genetics All correspondence concerning this article should be addressed to Benjamin Wilfond, MD, Treuman Katz Center for Pediatric Bioethics, Seattle Children's Hospital, Metropolitan Park West M/S: MPW 8-2, 1100 Olive Way, Room 876, Seattle WA 98101, USA.E-mail: benjamin.wilfond@seattlechildrens.org Board of Directors, 1995;Andrews, Fullerton, Holtzman, & Motolsky, 1994;Clarke, 1994;Wertz, Fanos, & Reilly, 1994).In the early 1990s, when there were only scant data about children who had received genetic tests results, the presumption was to give greater weight to the potential risks and to restrict testing.However, this criterion is not necessarily consistent with the general practice of respecting broad parental discretion in health care decisionmaking for and on behalf of their children.In general, parents are the presumed decision makers for their children and their decisions are respected unless they are abusive or neglectful (Buchanan & Brock, 1989;Goldstein, Freud, & Solnit, 1979;Ross, 1998).The tension between assessments of benefits and risks made by health care providers and policy makers, and the procedural respect owed to parental authority will be clearly tested as the ability to conduct and interpret whole-genome sequencing and related technologies gain in momentum.Ethical Considerations in Developing Policy for ''Comprehensive'' Genomic TestingIn the near future, genomic testing is likely to become more accessible and will provide both information about the risks of common conditions such as heart disease, diabetes, and hypertension as well as predictions about individual responses to specific pharmaceuticals and other medical therapies (Aspinall & Hamermesh, 2007).Over time, the number and range of conditions for which such testing is available is likely to expand to include more behavioral traits, ranging from information about anxiety and depression, to attention and addiction (Rothstein, 2005).Objective Ethical evaluation of genetic testing in children is traditionally based on balancing clinical benefits and risks.However, this focus can be inconsistent with the general practice of respecting parental decision-making about their children's health care.We argue that respect for parental decision-making should play a larger role in shaping pediatric genetic testing practices, and play a similar role regarding decisions to use emerging genomic technologies.Methods Genomic testing involves the examination of thousands of DNA markers spanning genes throughout the genome and their interrelationships, yielding virtually limitless interpretations.We presume that parents and providers should proceed cautiously in applying genomic testing in children, as we explore how genomic testing will stress the fault lines of the traditional ethical analysis.Results Empirical data about the psychosocial risks and benefits of genetic testing of children do not reveal serious harms, yet virtually no such data exist yet about genomic testing.Unless empirical social and behavioral data indicate that genomic testing is highly likely to cause serious harms to the children, parental decisions to obtain comprehensive genomic testing in their children should be respected.Once comprehensive genomic testing of children becomes routine, resultant information may be more easily integrated by families than anticipated.Conclusions Research on the social and behavioral impact of comprehensive genomic testing on children and their families is needed to further inform parents, clinicians, and policy makers.Objective Ethical evaluation of genetic testing in children is traditionally based on balancing clinical benefits and risks.However, this focus can be inconsistent with the general practice of respecting parental decision-making about their children's health care.We argue that respect for parental decision-making should play a larger role in shaping pediatric genetic testing practices, and play a similar role regarding decisions to use emerging genomic technologies.Methods Genomic testing involves the examination of thousands of DNA markers spanning genes throughout the genome and their interrelationships, yielding virtually limitless interpretations.We presume that parents and providers should proceed cautiously in applying genomic testing in children, as we explore how genomic testing will stress the fault lines of the traditional ethical analysis.Results Empirical data about the psychosocial risks and benefits of genetic testing of children do not reveal serious harms, yet virtually no such data exist yet about genomic testing.Unless empirical social and behavioral data indicate that genomic testing is highly likely to cause serious harms to the children, parental decisions to obtain comprehensive genomic testing in their children should be respected.Once comprehensive genomic testing of children becomes routine, resultant information may be more easily integrated by families than anticipated.Conclusions Research on the social and behavioral impact of comprehensive genomic testing on children and their families is needed to further inform parents, clinicians, and policy makers.To the extent that ''personal meaning'' gains wider acceptance as a legitimate criterion for expanding the availability of new tests and applications of genomic technology, the current policies and practices of restricting some genetic testing of children and mandating other tests will need to be reevaluated.There will be some parents who will find the information that becomes available through new technologies and data useful in shaping their parenting practices, while others will be more skeptical of their value.These disparate parental judgments may be independent of professional assessments of clinical validity and utility.Extrapolating from the empirical data about predictive genetic testing of children in at-risk families discussed earlier, we speculate that once comprehensive genomic testing of children becomes routine, the information may be more easily integrated by families than might be predicted.This is not meant to imply that whatever information parents want about their children should be provided carte blanche.Clearly, education and counseling will be crucial to ensure that families understand the limitations of the information.However, restrictions and mandates should be based on a criterion of risk of serious harm (Diekema, 2004).Given the lack of data confirming harm and the related data that indicate children may fare better than anticipated, such restrictions and mandates cannot be justified.Policies and practices will also need to clarify the role of the older adolescent in the decision-making process, although the issues related to balancing and assessing parental and adolescent interests and preferences goes beyond the focus of this article.This is also not meant to ignore the professional and moral obligation to educate parents and to help parents make good decisions on behalf of their children.It is morally appropriate for providers to strongly recommend particular tests in infancy and young childhood (i.e., PKU testing), and to strongly discourage other tests (e.g., ApoE testing of children for adult onset Alzheimer disease and heart disease because ApoE is not predictive but only provides an increased relative risk and has limited sensitivity and specificity) (Roberts, Cupples, Relkin, Whitehouse, & Green, 2005).Selective and directive recommendations are a routine aspect of pediatric practice.However, it will become increasingly important for professional organizations to begin to reconcile their support for mandatory genetic testing for some conditions and their support for restrictions for other conditions with the broad discretion that parents have and need in the health care arena in order to promote their children's well-being.What limits should be imposed, if any, need to be determined prior to commercial feasibility.In this article, we consider how genetic testing decisions for children have been made traditionally and how the anticipation of comprehensive genomic testing in the near future will stress the fault lines of traditional approaches.The potential for comprehensive genomic testing in children could shift the equilibrium towards expanding or reducing parental discretion, and forces us to reexamine the evidence for our genetic testing policies and practices.We will highlight specific domains where further empirical social and behavioral research is necessary to inform policy and practice.", + "Prenatal genetics is largely practiced by maternal-fetal medicine specialists due to severe deficiency in the number of qualified clinical geneticists.Recent years have witnessed a tremendous growth in the demand for chorionic villous sampling and amniocentesis for the diagnosis of single gene disorders.At KFSHRC alone, the number of prenatal samples that are tested for single gene disorders has increased from 5 in 2004 to 250 in 2013.Therapeutic abortion is permitted by law if performed within 120 days from the time of fertilization in order to comply with the Islamic view of the timing of ensoulment (Alkuraya and Kilani 2001).However, the approved indication for the procedure, which is \"severe malformation\", must be authorized by three attending-level physicians.The definition of \"severe\" is left to the discretion of the medical team after consulting with the family.For example, intellectual disability is a common indication for many therapeutic abortion procedures.Contrary to commonly held views, we have shown that early prenatal diagnosis is the method of choice for couples who had one or more children with single gene disorders, as long as they are provided with a culturally sensitive genetic counseling that addresses their religious and cultural concerns (Alkuraya and Kilani 2001).Nearly 45% of these couples opt for early prenatal diagnosis compared to 35% who choose preimplantation genetic diagnosis (PGD) (Alkuraya 2013a).PGD is available freely at KFSHRC but is also provided by the private sector.Noninvasive prenatal screening using cell-free fetal DNA in maternal blood is quickly becoming integrated in prenatal care.KFSHRC offers this test routinely to all pregnant women regardless of their perceived risk and the MOH is considering making this test available throughout its vast network of hospitals and medical centers.", + "Social and psychological implications of accessing genetic services and information.", + "A corollary of the predictive power of genetic information is the limited ability to prevent or treat many conditions with significant genetic factors involved.Indeed, virtually all of the complex ethical and legal issues relevant to genetic testing would disappear if there were effective preventions or treatments available for genetic conditions.The ability to predict future disease in conjunction with a limited ability to do much about it has important social and psychological implications that must be addressed in conducting genetic research.", + "Interpretations of the literature will likely mirror the priorities and evaluative tendencies of the reader.Are you willing to accept the overall trends in genetic and genomic testing evaluation and to trust that the existing clinical approaches will apply informed consent appropriately while identifying and supporting the rare individual who has a serious adverse response to the testing?If so, you might advocate that attention be turned more toward other issues relevant to the effective implementation of genetic and genomic testing.Or do you feel a strong need to understand in more detail the possible psychosocial harms of the testing, particularly the subtler impacts or responses of individuals who do not fit the norm?In that case, you would likely encourage renewed and innovative efforts to study the psychosocial consequences of the receipt of risk information from genetic and genomic testing.", + "Other social issues require our attention if genomic medicine is to benefit our patients.How should genetic tests be regulated?What, if any, are the appropriate uses of direct-to-consumer marketing of genetic tests?The Internet has recently had a proliferation of genetic-testing sites that feature claims grounded in greed and pseudoscience, rather than in data or reality.How will health care providers and the public distinguish between these and responsible testing services, whether they are available through the Internet or in the hospital?", + "Environmental FactorsAs widespread use of genetic testing increases, it is the responsibility of the medical community to ensure its equitable use across socioeconomic and cultural spectrums." + ], + [ + "To overcome the lack of phenotypic information in the 1000 Genomes Project, The ClinSeq Project was developed to pilot large-scale genome sequencing for research in genomic medicine at the National Institutes of Health Clinical Research Center in Bethesda, MD. 40 The study seeks to enroll 1000 individuals who will be evaluated for personal health status and family history.The project aims to:", + "We (Hein, Schierup and Wiuf) have published a300 page book on molecular population genetics titled Gene Genealogies, Sequence Variation and Evolution OxfordUniversity Press, and are presently developing a tutorial in association mapping that we hope to publish as a booklet in2006 and are also involved in a very large EU collaboration (Holland, Denmark, Iceland and UK) to find susceptibilitygenes for breast and prostate cancer. In comparative genomics, the most fundamental investigation is to find genes in a pair of aligned genomes.", + "Key bioinformatic steps totake a genetic study from an initial linkage or association to laboratory genotyping are illustrated. The reader should note the role of genomic sequence as a common thread through every stageregions in man (see Chapter 5). Similar issues also exist in the establishment oftrue orthology between genes in different species, where one is identified to play arole in a disease model. If two genes are truly orthologous, their evolution closelyfollows patterns of speciation (Fitch, 2000).In general terms, the approaches we describe can be applied to sequence data from any collection of organisms, but our emphasis here is primarily onBioinformatics for Geneticists, Second Edition. Edited by Michael R. Barnes2007 John Wiley & Sons, Ltd ISBN 978-0-470-02619-9 (HB) ISBN 978-0-470-02620-5 (PB)\u0002C106CH 6 COMPARATIVE GENOMICSquestions of relevance to human genetics. We begin, in Section 6.2 by presenting anoverview of genome structure and content, providing a context for the subsequentdiscussions.4Assembling a View of theHuman GenomeColin A. M. SempleBioinformatics, MRC Human Genetics Unit, Edinburgh EH4 2XU, UK4.1 IntroductionThe miraculous birth of the draft human genome sequence took place againstthe odds. It was only made possible by parallel revolutions in the technologiesused to produce, store and analyse the sequence data, and by the development ofnew, large-scale consortia to organize and obtain funding for the work (Watson,1990). The initial flood of human sequence has subsided as the sequencing centres have sequenced genomes from other mammalian orders and beyond.This fully indexed but semi-intelligibleBioinformatics for Geneticists, Second Edition. Edited by Michael R. Barnes2007 John Wiley & Sons, Ltd ISBN 978-0-470-02619-9 (HB) ISBN 978-0-470-02620-5 (PB)\u0002C4CH 1 BIOINFORMATICS CHALLENGES FOR THE GENETICISTbook of life immediately began to serve as a valuable framework for integration ofgenetic and biological data. However, knowledge of the genome sequence did notimmediately clarify the nature and structure of human genetic variation.", + "Methods for DNA sequencing are constantly being improved, with the ultimate goal of sequencing a human genome in a single day for a cost of about US $1,000, an end that appears to be in sight (Hayden, 2014).In the very near future, whole-genome sequencing will be routinely available for clinical purposes, perhaps even beginning at birth.The major challenge ahead is the interpretation of this information.How do our genes interact with each other, and how does the environment contribute to the development of health and disease?What are the individual and societal implications of knowing our genome sequence?The answers to these and other important questions will unfold in the years ahead.Thus, we are truly in an era where precision medicine may soon become a reality.", + "Characteristics of genotyping and sequencing technologies", + "Key bioinformatic steps totake a genetic study from an initial linkage or association to laboratory genotyping are illustrated. The reader should note the role of genomic sequence as a common thread through every stageregions in man (see Chapter 5). Similar issues also exist in the establishment oftrue orthology between genes in different species, where one is identified to play arole in a disease model. If two genes are truly orthologous, their evolution closelyfollows patterns of speciation (Fitch, 2000).In general terms, the approaches we describe can be applied to sequence data from any collection of organisms, but our emphasis here is primarily onBioinformatics for Geneticists, Second Edition. Edited by Michael R. Barnes2007 John Wiley & Sons, Ltd ISBN 978-0-470-02619-9 (HB) ISBN 978-0-470-02620-5 (PB)\u0002C106CH 6 COMPARATIVE GENOMICSquestions of relevance to human genetics. We begin, in Section 6.2 by presenting anoverview of genome structure and content, providing a context for the subsequentdiscussions.4Assembling a View of theHuman GenomeColin A. M. SempleBioinformatics, MRC Human Genetics Unit, Edinburgh EH4 2XU, UK4.1 IntroductionThe miraculous birth of the draft human genome sequence took place againstthe odds. It was only made possible by parallel revolutions in the technologiesused to produce, store and analyse the sequence data, and by the development ofnew, large-scale consortia to organize and obtain funding for the work (Watson,1990). The initial flood of human sequence has subsided as the sequencing centres have sequenced genomes from other mammalian orders and beyond.", + "IntroductionSince the first human genome was sequenced at an estimated cost of $150 million,several advanced high-throughput techniques some with lower costs - have come up. Atthe same time, this resulted in a data deluge and a critical need to connect theheterogeneous sequencing data and associated annotations structural and functional with the basic tenets of biology or molecular basis of development and disease.", + "Key bioinformatic steps totake a genetic study from an initial linkage or association to laboratory genotyping are illustrated. The reader should note the role of genomic sequence as a common thread through every stageregions in man (see Chapter 5). Similar issues also exist in the establishment oftrue orthology between genes in different species, where one is identified to play arole in a disease model. If two genes are truly orthologous, their evolution closelyfollows patterns of speciation (Fitch, 2000).In general terms, the approaches we describe can be applied to sequence data from any collection of organisms, but our emphasis here is primarily onBioinformatics for Geneticists, Second Edition. Edited by Michael R. Barnes2007 John Wiley & Sons, Ltd ISBN 978-0-470-02619-9 (HB) ISBN 978-0-470-02620-5 (PB)\u0002C106CH 6 COMPARATIVE GENOMICSquestions of relevance to human genetics. We begin, in Section 6.2 by presenting anoverview of genome structure and content, providing a context for the subsequentdiscussions.4Assembling a View of theHuman GenomeColin A. M. SempleBioinformatics, MRC Human Genetics Unit, Edinburgh EH4 2XU, UK4.1 IntroductionThe miraculous birth of the draft human genome sequence took place againstthe odds. It was only made possible by parallel revolutions in the technologiesused to produce, store and analyse the sequence data, and by the development ofnew, large-scale consortia to organize and obtain funding for the work (Watson,1990). The initial flood of human sequence has subsided as the sequencing centres have sequenced genomes from other mammalian orders and beyond.", + "Ample time was allotted to answer questions and a copy of \"A Guide to Your Genome\" (National Human Genome Research Institute 2007) was provided to further assist participants' understanding and ability to communicate results with family members or others.", + "Whether within 10 or 12 (or 8) years, such inexpensive sequencing will change both research and clinical care, and progress does not need to wait even that long.The National Human Genome Research Institute (NHGRI) plans to focus a significant portion of the sequencing capacity that it supports on medical sequencing.For instance, the NHGRI and the National Cancer Institute are actively considering a Human Cancer Genome Project, 22 which would use DNA sequencing and a host of other genome technologies to gather information about the mutations and functional abnormalities found in multiple samples from many major types of cancer.Medical sequencing should also provide important insight into many other diseases.For example, sequencing all exons in X-linked mental retardation syndromes may reveal much about their etiology.Sequencing candidate genes in the extremes of the distribution of quantitative traits should also reveal much of importance about common diseases, such as coronary atherosclerosis. 23With further technological advances, other previously unimaginable research approaches will become real.", + "Key bioinformatic steps totake a genetic study from an initial linkage or association to laboratory genotyping are illustrated. The reader should note the role of genomic sequence as a common thread through every stageregions in man (see Chapter 5). Similar issues also exist in the establishment oftrue orthology between genes in different species, where one is identified to play arole in a disease model. If two genes are truly orthologous, their evolution closelyfollows patterns of speciation (Fitch, 2000).In general terms, the approaches we describe can be applied to sequence data from any collection of organisms, but our emphasis here is primarily onBioinformatics for Geneticists, Second Edition. Edited by Michael R. Barnes2007 John Wiley & Sons, Ltd ISBN 978-0-470-02619-9 (HB) ISBN 978-0-470-02620-5 (PB)\u0002C106CH 6 COMPARATIVE GENOMICSquestions of relevance to human genetics. We begin, in Section 6.2 by presenting anoverview of genome structure and content, providing a context for the subsequentdiscussions.4Assembling a View of theHuman GenomeColin A. M. SempleBioinformatics, MRC Human Genetics Unit, Edinburgh EH4 2XU, UK4.1 IntroductionThe miraculous birth of the draft human genome sequence took place againstthe odds. It was only made possible by parallel revolutions in the technologiesused to produce, store and analyse the sequence data, and by the development ofnew, large-scale consortia to organize and obtain funding for the work (Watson,1990). The initial flood of human sequence has subsided as the sequencing centres have sequenced genomes from other mammalian orders and beyond." + ] + ], + "task_id": [ + "EC2BFCD8E06079A3E595114822D2A44D", + "59AA8E8D7A97CFF47C6CFD65629F29F9", + "C58CFF41F7422B321DF88A110E278FD5", + "BCD1175CCB27FBA1E6F9D7670B17E527", + "5341FE6588C6175BC8A688A483928BC0" + ] +}
\ No newline at end of file diff --git a/gnqa/data/study1/datasets/human/dataset_domainexpert_general_1_two.json b/gnqa/data/study1/datasets/human/dataset_domainexpert_general_1_two.json new file mode 100644 index 0000000..e40ab00 --- /dev/null +++ b/gnqa/data/study1/datasets/human/dataset_domainexpert_general_1_two.json @@ -0,0 +1,109 @@ +{ + "question": [ + "Which genes give a predisposition to developing T1D?", + "What is ensembl", + "Which database can I use for genetic, genomics, phenotype, and disease-related data generated from rat research?", + "What is RGD?", + "What resources can I use to do pathway analyses?" + ], + "answer": [ + "The genes that give a predisposition to developing T1D include the HLA class II genes, the insulin locus (INS) VNTR on 11p15, the cytotoxic T-lymphocyte-associated protein 4 (CTLA4) locus on 2q31, the PTPN22 gene on 1p13, and the interleukin 2 receptor alpha (IL2RA) on 10p15. Other genes associated with T1D include those in the HLA region on chromosome 6, specifically HLA alleles DR3-DQ2 or DR4-DQ8, and alleles at HLA-DQB1.", + "Ensembl is a joint project between the EBI and the Wellcome Trust Sanger Institute. It is a publicly available web resource that contains automatically annotated genomes and integrates data from a wide range of biological research sources. The Ensembl database, launched in 1999, was the first to provide a window on the draft genome, curating the results of a series of computational analyses. It provides detailed information about the human genome including variants like SNPs, insertions, deletions and somatic mutations for several species. Ensembl also provides a DAS reference server giving access to a wide range of specialist annotations of the human genome.", + "You can use the Rat Genome Database (RGD) for genetic, genomic, phenotype, and disease-related data generated from rat research.", + "The background text does not provide information on what RGD is.", + "You can use various resources for pathway analyses such as Ingenuity Pathway Analysis (IPA), Disease Association Protein-Protein Link Evaluator (DAPPLE), Pathway Studio, MetaCore, REACTOME, LitInspector, NetPath, Predictive Networks, GeneGo, Database for Annotation, Visualization and Integrated Discovery (DAVID), PATHWAYASSIST, E! Ensemble, Protonet, Pandora, Pubmatrix, KEGG, Reactome, g:Profiler, Gene Ontology, Biocarta, GenMAPP, WebGestalt, Gene Set Enrichment Analysis (GSEA), ClueGo, CluePedia, Cytoscape, RegulonDB, WIT.UM-BBD, EcoCyc, MetaCyc, Enzyme and metabolic pathways database, and Gene-Set Enrichment Analysis (GSEA)." + ], + "contexts": [ + [ + "A. Genetic ScreeningWe have discussed above the genetic component of T1D.The genetic susceptibility to T1D is determined by genes related to immune function with the potential exception of the insulin gene (434).The genetic susceptibility component of T1D allows some targeting of primary preventive care to family members of diagnosed T1D patients, but there is no complete inheritance of the disease.Nevertheless, the risk for developing T1D compared with people with no family history is 10 -15 times greater.Although 70% of individuals with T1D carry defined risk-associated genotypes at the HLA locus, only 3-7% of the carriers of such genetic risk markers develop diabetes (3).II. THE GENETICS OF TYPE 1 DIABETESA comprehensive overview of genetic data in mouse and human is beyond the scope of this article.Instead, we will focus on how the various susceptibility genes and environmental triggers can fit in a mechanistic model for T1D etiology.", + "T1D risk is strongly influenced by multiple genetic loci and as yet poorly understood environmental factors.The disease is highly heritable, with first-degree relatives of cases displaying approximately 15 times greater risk than the general population.Concordance in monozygotic twins is also as high as 50%.A number of genetic determinants of T1D had already been established before the era of genome-wide association studies.The strongest genetic factors include the HLA class II genes, encoding highly polymorphic antigen-presenting proteins that account for almost 50% of the genetic risk for T1D.Other established loci confer more modest, but substantial effects, such as the insulin locus (INS) VNTR on 11p15 [44][45][46][47], the cytotoxic T-lymphocyte-associated protein 4 (CTLA4) locus on 2q31 [48][49][50][51] and the PTPN22 gene on 1p13 [52,53].More recently, convincing statistical support for an additional T1D susceptibility locus on 10p15 harboring the interleukin 2 receptor alpha (IL2RA) was uncovered, utilizing non-coding SNPs [54][55][56].However, the majority of other associations in the pre-GWA era remain controversial [57][58][59], and linkage studies have established the fact that there are no other loci with an effect size approaching that of HLA.", + "Clearly genetics play an important role in the T1D disease process as both MZ and DZ twins have the same environmental exposures but different concordance rates and length to diagnosis of the second twin.Numerous genes have been associated with T1D, the most significant being the HLA region on chromosome 6 [6].More than 90% of type 1 diabetics carry HLA alleles DR3-DQ2 or DR4-DQ8 compared to no more than 40% of the general population [7].Alleles at HLA-DQB1 are known to be, in part, protective [8].Single nucleotide polymorphisms (SNPs) are also associated with T1D.A recent genome-wide association study of approximately 2,000 patients with each of 7 common, chronic diseases, including T1D, and 7,000 shared controls confirmed the association of SNPs in 5 previously identified regions with T1D and discovered 5 novel associations.However, the authors concluded that these regions, with the exception of the HLA on chromosome 6, confer only modest effects on T1D, and ''the association signals so far identified account for only a small proportion of overall familiality'' [9].These results suggest that additional genetic variants contribute to inheritance of T1D.Type 1 diabetes (T1D) tends to cluster in families, suggesting there may be a genetic component predisposing to disease.However, a recent large-scale genome-wide association study concluded that identified genetic factors, single nucleotide polymorphisms, do not account for overall familiality.Another class of genetic variation is the amplification or deletion of .1 kilobase segments of the genome, also termed copy number variations (CNVs).We performed genome-wide CNV analysis on a cohort of 20 unrelated adults with T1D and a control (Ctrl) cohort of 20 subjects using the Affymetrix SNP Array 6.0 in combination with the Birdsuite copy number calling software.We identified 39 CNVs as enriched or depleted in T1D versus Ctrl.Additionally, we performed CNV analysis in a group of 10 monozygotic twin pairs discordant for T1D.Eleven of these 39 CNVs were also respectively enriched or depleted in the Twin cohort, suggesting that these variants may be involved in the development of islet autoimmunity, as the presently unaffected twin is at high risk for developing islet autoimmunity and T1D in his or her lifetime.These CNVs include a deletion on chromosome 6p21, near an HLA-DQ allele.CNVs were found that were both enriched or depleted in patients with or at high risk for developing T1D.These regions may represent genetic variants contributing to development of islet autoimmunity in T1D.Type 1 diabetes (T1D) tends to cluster in families, suggesting there may be a genetic component predisposing to disease.However, a recent large-scale genome-wide association study concluded that identified genetic factors, single nucleotide polymorphisms, do not account for overall familiality.Another class of genetic variation is the amplification or deletion of .1 kilobase segments of the genome, also termed copy number variations (CNVs).We performed genome-wide CNV analysis on a cohort of 20 unrelated adults with T1D and a control (Ctrl) cohort of 20 subjects using the Affymetrix SNP Array 6.0 in combination with the Birdsuite copy number calling software.We identified 39 CNVs as enriched or depleted in T1D versus Ctrl.Additionally, we performed CNV analysis in a group of 10 monozygotic twin pairs discordant for T1D.Eleven of these 39 CNVs were also respectively enriched or depleted in the Twin cohort, suggesting that these variants may be involved in the development of islet autoimmunity, as the presently unaffected twin is at high risk for developing islet autoimmunity and T1D in his or her lifetime.These CNVs include a deletion on chromosome 6p21, near an HLA-DQ allele.CNVs were found that were both enriched or depleted in patients with or at high risk for developing T1D.These regions may represent genetic variants contributing to development of islet autoimmunity in T1D.", + "Background: The immune system matures mainly during the postnatal period through breastfeeding, and is partly modified by nutritive factors.The manner by which early feeding practices influence the development of type 1 diabetes mellitus (TID) is not clear.Also the use of genetics in prognostic evaluation of the disease has not be studied intensely. Aim:To study the relationship between early infant feeding patterns and susceptibility to TID through the HLA-DRB1 and DQ allelic polymorphism and identify the genes of high predictive value in the prognostic model. Methods:The study included 24 diabetic children with TID matched with 21 controls.All the children were exposed to detailed history of the disease process and anthropometry for weight, height and body mass index.Blood samples were collected from all 45 cases for measuring HLA-DRB1and HLA-DQB1allelic polymorphism for the susceptible genes of HLA-DRB1 0301, 0302, 0401 and 0402 and HLA-DQB1*02 and for the protective genes HLA-DRB1 07,*13 by polymerase chain reaction sequence specific primer (PCR-SSP) done by genomic DNA extraction using Genomic DNA purification kits.Results: Allelic polymorphism for the susceptible genes of HLA-DRB1 were shown to be higher in the diabetic group compared to the control group especially for the 0302 and 0401 alleles at P<0.05, but was not significant for HLA-DRB1-0301 and 0402 at P>0.05.HLADRB1*07 and HLADRB1*13 were significantly higher in the breastfed healthy but not in the diseased or the formula fed groups (p<0.001)(p<0.05).The detection of HLADRB1 0401 allele was more with retinopathy and HLADRB1 0301 allele with microalbuminuria. Conclusions:The absence of protective genes is a strong predictor of TID.Susceptibility genes are influenced by early feeding patterns and in turn affect the clinical course of the disease that could be of prognostic value in TID.", + "More than 60 susceptibility loci have been identified (Table 1).The greatest genetic risk (50%) for T1D is conferred by alterations to immune genes, especially those encoding the classical HLAs (Ounissi-Benkalha and Polychronakos, 2008).Other genetic loci (Table 1) are believed to influence population-level risk for T1D, although it is poorly understood how these non-HLA loci contribute to disease susceptibility (Ram et al., 2016a).The genetics of type 1 diabetesThere is a strong genetic risk to T1D.This is exemplified by (Redondo et al., 2001) who demonstrated a strong concordance of genetic inheritance (65%) and T1D susceptibility in monozygotic twin pairs.That is, when one sibling is afflicted, there is a high probability that the other twin will develop T1D by the age of 60 years.Additionally, autoantibody positivity and islet destruction was observed after a prospective long-term follow-up of monozygotic twins of patients with T1D, despite initial disease-discordance among the twins (Redondo et al., 2008).", + "Family and twin studies indicate that a substantial fraction of susceptibility to type 1 diabetes is attributable to genetic factors.These and other epidemiologic studies also implicate environmental factors as important triggers.Although the specific environmental factors that contribute to immune-mediated diabetes remain unknown, several of the relevant genetic factors have been identified using two main approaches: genome-wide linkage analysis and candidate gene association studies.This article reviews the epidemiology of type 1 diabetes, the relative merits of linkage and association studies, and the results achieved so far using these two approaches.Prospects for the future of type 1 diabetes genetics research are considered.Family and twin studies indicate that a substantial fraction of susceptibility to type 1 diabetes is attributable to genetic factors.These and other epidemiologic studies also implicate environmental factors as important triggers.Although the specific environmental factors that contribute to immune-mediated diabetes remain unknown, several of the relevant genetic factors have been identified using two main approaches: genome-wide linkage analysis and candidate gene association studies.This article reviews the epidemiology of type 1 diabetes, the relative merits of linkage and association studies, and the results achieved so far using these two approaches.Prospects for the future of type 1 diabetes genetics research are considered.", + "CONCLUSIONThe greatest genetic risk (both increased risk, susceptible, and decreased risk, protective) for type 1 diabetes is conferred by specific alleles, genotypes, and haplotypes of the HLA class II (and class I) genes.There are currently about 50 non-HLA region loci that also affect the type 1 diabetes risk.Many of the assumed functions of the non-HLA genes of interest suggest that variants at these loci act in concert on the adaptive and innate immune systems to initiate, magnify, and perpetuate -cell destruction.The clues that genetic studies provide will eventually help lead us to identify how -cell destruction is influenced by environmental factors.While there is extensive overlap between type 1 diabetes and other immune-mediated diseases, it appears that type 1 and type 2 diabetes are genetically distinct entities.These observations may suggest ways to help identify causal gene(s) and, ultimately, a set of disease-associated variants defined on specific haplotypes.Unlike other complex human diseases, relatively little familial clustering remains to be explained for type 1 diabetes.The remaining missing heritability for type 1 diabetes is likely to be explained by as yet unmapped common variants, rare variants, structural polymorphisms, and gene-gene and/or gene-environmental interactions, in which we can expect epigenetic effects to play a role.The examination of the type 1 diabetes genes and their pathways may reveal the earliest pathogenic mechanisms that result in the engagement of the innate and adaptive immune systems to produce massive -cell destruction and clinical disease.The resources established by the international T1DGC are available to the research community and provide a basis for future discovery of genes that regulate the earliest events in type 1 diabetes etiology-potential targets for intervention or biomarkers for monitoring the effects and outcomes of potential therapeutic agents.", + "IntroductionOver 60 loci in the genome contribute to genetic predisposition to type 1 diabetes (T1D) [1][2][3][4][5] in which insulin deficiency results from an autoimmune attack against insulin-producing beta cells of the pancreatic islets.Heterogeneity in the disease aetiology is recently acknowledged and immunological processes leading to T1D in individuals diagnosed later in life appear different from the processes in individuals having disease onset in early childhood, in which B cells are involved in the pathological process in the pancreas [5].Different genes and genetic variants may thus affect disease course at varying ages, also suggested by the high diagnosis age correlation (r 2 = 0.95) in Finnish monozygotic twins concordant for T1D [6].Of the known T1D risk loci, however, only the HLA locus and a few non-HLA loci, have been associated with age at diagnosis [7][8][9][10].Genetic risk score combines risk-increasing alleles into a single score and the genetic risk score for T1D has already been suggested for clinical use for screening of infants at highest T1D risk [11].All disease-susceptibility variants are included in the score, but only a few known T1D variants have stronger effects in individuals with early-onset disease [10].Genes affecting type 1 diabetes diagnosis age / A. Syreeni et al.Genome-wide search for genes affecting the age at diagnosis of type 1 diabetes.", + "The risk for T1D is strongly influenced by multiple genetic loci and environmental factors.The disease is heritable, with first-degree relatives of patients with T1D being at 15-fold greater risk for developing the condition than the general population.", + "Type 1 DiabetesThe higher type 1 diabetes prevalence observed in relatives implies a genetic risk, and the degree of genetic identity with the proband correlates with risk (22)(23)(24)(25)(26). Gene variants in one major locus, human leukocyte antigen (HLA) (27), confer 50-60% of the genetic risk by affecting HLA protein binding to antigenic peptides and antigen presentation to T cells (28).Approximately 50 additional genes individually contribute smaller effects (25,29).These contributors include gene variants that modulate immune regulation and tolerance (30)(31)(32)(33), variants that modify viral responses (34,35), and variants that influence responses to environmental signals and endocrine function (36), as well as some that are expressed in pancreatic b-cells (37).Genetic influences on the triggering of islet autoimmunity and disease progression are being defined in relatives (38,39).Together, these gene variants explain ;80% of type 1 diabetes heritability.Epigenetic (40), gene expression, and regulatory RNA profiles (36) may vary over time and reflect disease activity, providing a dynamic readout of risk.", + "Type 1 diabetes risk stratification by T1D family history and HLA genotyping", + "Genetics. T1DM is a polygenic disease that is influ enced by environmental factors.Genetic risk factors are necessary but not sufficient for disease, as their pene trance is low.The concordance rate of T1DM among monozygotic twins is reported to be only 30%, although a recent study that involved longterm followup suggested that this percentage might be higher 47,48 .", + "Presently, 48 other genomic regions, referred to as susceptibility regions, have been found to also confer susceptibility to T1D (Burren et al., 2011;Steck and Rewers, 2011;Yang et al., 2011;Bluestone et al. 2010;Poicot et al., 2010;Todd et al., 2010;Todd et al., 2007).But their contribution is minimal in comparison to the HLA locus (Gillespie, 2014).Also, research has shown that less than 10% of individuals with HLA-conferred diabetes susceptibility actually progress to clinical disease (Knip andSiljandera, 2008, Wenzlau et al., 2008).This implies that additional factors are needed to trigger and drive -cell destruction in genetically predisposed persons (Knip and Siljandera, 2008).Environmental factors are believed to influence the expression of T1D.The reason being that in the case of identical twins, if one twin has T1D, the other twin only has it 30%-50% of the time, despite having the same genome.This means that other factors contribute to the prevalence or onset of this disease (Knip et al., 2005)." + ], + [ + "Zerbino, D. R., Achuthan, P., Akanni, W., Amode, M. R., Barrell,D., Bhai, J., Billis, K., Cummins, C., Gall, A., Girn, C. G., Gil,L., Gordon, L., Haggerty, L., Haskell, E., Hourlier, T., Izuogu, O.G., Janacek, S. H., Juettemann, T., To, J. K., Laird, M. R., Lavidas, I., Liu, Z., Loveland, J. E., Maurel, T., McLaren, W., Moore,B., Mudge, J., Murphy, D. N., Newman, V., Nuhn, M., Ogeh, D.,Ong, C. K., Parker, A., Patricio, M., Riat, H. S., Schuilenburg,H., Sheppard, D., Sparrow, H., Taylor, K., Thormann, A., Vullo,A., Walts, B., Zadissa, A., Frankish, A., Hunt, S. E., Kostadima,M., Langridge, N., Martin, F. J., Muffato, M., Perry, E., Ruffier,M., Staines, D. M., Trevanion, S. J., Aken, B. L., Cunningham,F., Yates, A., and Flicek, P.: Ensembl 2018, Nucl.", + "But the four sites are not equivalent; there are important distinctions between them in terms of the data analysed, the analyses carriedout and the way the results are displayed. 4.4.1 EnsemblEnsembl is a joint project between the EBI (http://www.ebi.ac.uk/) and the WellcomeTrust Sanger Institute (http://www.sanger.ac.uk/). The Ensembl database (Hubbardet al. , 2002; http://www.ensembl.org/), launched in 1999, was the first to provide awindow on the draft genome, curating the results of a series of computational analyses.Until January 2002 (Release 3.26.1), Ensembl used the UCSC draft sequenceassemblies as its starting point, but it is now based upon NCBI assemblies. TheEnsembl analysis pipeline consists of a rule-based system designed to mimic decisions made by a human annotator. The idea is to identify confirmed genes that arecomputationally predicted (by the GENSCAN gene prediction program) and alsosupported by a significant BLAST match to one or more expressed sequences orproteins. Ensembl also identifies the positions of known human genes from publicsequence database entries, usually using GENEWISE to predict their exon structures.Data retrieval is extremely well catered for in Ensembl, with text searches of alldatabase entries, BLAST searches of all sequences archived, and the availability of bulkdownloads of all Ensembl data and even software source code. Ensembl annotationcan also be viewed interactively on ones local machine with the Apollo viewer (Lewiset al. , 2002; http://www.fruitfly.org/annot/apollo/). 4.4.2 The UCSC Human Genome BrowserThe UCSC Human Genome Browser (UCSC) bears many similarities to Ensembl;it, too, provides annotation of the NCBI assemblies, and it displays a similar array offeatures, including confirmed genes from Ensembl.Ensembl provides a DAS referenceserver giving access to a wide range of specialist annotations of the humangenome (for more detail, see http://www.ensembl.org/das/). Data mining The ability to query very large databases in order to satisfy ahypothesis (top-down data mining), or to interrogate a database in order togenerate new hypotheses based on rigorous statistical correlations (bottom-updata mining). Domain (protein) A region of special biological interest within a single proteinsequence.", + "But the four sites are not equivalent; there are important distinctions between them in terms of the data analysed, the analyses carriedout and the way the results are displayed. 4.4.1 EnsemblEnsembl is a joint project between the EBI (http://www.ebi.ac.uk/) and the WellcomeTrust Sanger Institute (http://www.sanger.ac.uk/). The Ensembl database (Hubbardet al. , 2002; http://www.ensembl.org/), launched in 1999, was the first to provide awindow on the draft genome, curating the results of a series of computational analyses.Until January 2002 (Release 3.26.1), Ensembl used the UCSC draft sequenceassemblies as its starting point, but it is now based upon NCBI assemblies. TheEnsembl analysis pipeline consists of a rule-based system designed to mimic decisions made by a human annotator. The idea is to identify confirmed genes that arecomputationally predicted (by the GENSCAN gene prediction program) and alsosupported by a significant BLAST match to one or more expressed sequences orproteins. Ensembl also identifies the positions of known human genes from publicsequence database entries, usually using GENEWISE to predict their exon structures.Data retrieval is extremely well catered for in Ensembl, with text searches of alldatabase entries, BLAST searches of all sequences archived, and the availability of bulkdownloads of all Ensembl data and even software source code. Ensembl annotationcan also be viewed interactively on ones local machine with the Apollo viewer (Lewiset al. , 2002; http://www.fruitfly.org/annot/apollo/). 4.4.2 The UCSC Human Genome BrowserThe UCSC Human Genome Browser (UCSC) bears many similarities to Ensembl;it, too, provides annotation of the NCBI assemblies, and it displays a similar array offeatures, including confirmed genes from Ensembl.Ensembl provides a DAS referenceserver giving access to a wide range of specialist annotations of the humangenome (for more detail, see http://www.ensembl.org/das/). Data mining The ability to query very large databases in order to satisfy ahypothesis (top-down data mining), or to interrogate a database in order togenerate new hypotheses based on rigorous statistical correlations (bottom-updata mining). Domain (protein) A region of special biological interest within a single proteinsequence.", + "But the four sites are not equivalent; there are important distinctions between them in terms of the data analysed, the analyses carriedout and the way the results are displayed. 4.4.1 EnsemblEnsembl is a joint project between the EBI (http://www.ebi.ac.uk/) and the WellcomeTrust Sanger Institute (http://www.sanger.ac.uk/). The Ensembl database (Hubbardet al. , 2002; http://www.ensembl.org/), launched in 1999, was the first to provide awindow on the draft genome, curating the results of a series of computational analyses.Until January 2002 (Release 3.26.1), Ensembl used the UCSC draft sequenceassemblies as its starting point, but it is now based upon NCBI assemblies. TheEnsembl analysis pipeline consists of a rule-based system designed to mimic decisions made by a human annotator. The idea is to identify confirmed genes that arecomputationally predicted (by the GENSCAN gene prediction program) and alsosupported by a significant BLAST match to one or more expressed sequences orproteins. Ensembl also identifies the positions of known human genes from publicsequence database entries, usually using GENEWISE to predict their exon structures.Data retrieval is extremely well catered for in Ensembl, with text searches of alldatabase entries, BLAST searches of all sequences archived, and the availability of bulkdownloads of all Ensembl data and even software source code. Ensembl annotationcan also be viewed interactively on ones local machine with the Apollo viewer (Lewiset al. , 2002; http://www.fruitfly.org/annot/apollo/). 4.4.2 The UCSC Human Genome BrowserThe UCSC Human Genome Browser (UCSC) bears many similarities to Ensembl;it, too, provides annotation of the NCBI assemblies, and it displays a similar array offeatures, including confirmed genes from Ensembl.Ensembl provides a DAS referenceserver giving access to a wide range of specialist annotations of the humangenome (for more detail, see http://www.ensembl.org/das/). Data mining The ability to query very large databases in order to satisfy ahypothesis (top-down data mining), or to interrogate a database in order togenerate new hypotheses based on rigorous statistical correlations (bottom-updata mining). Domain (protein) A region of special biological interest within a single proteinsequence.", + "EnsemblEnsembl is a publicly available web resource that contains automatically annotated genomes.It is integrated with other available biological databases like Jasper for binding motifs.It is a much larger web resource than T1Dbase, and contains general information about the human genome including variants.These include SNPs, insertions, deletions and somatic mutations (Alterations in DNA that occur after conception, meaning that they are not inherited) for several species.Data from Ensembl can be accessed in a number of ways.The names of all the SNPs that occur in the T1D susceptibility regions can be collected from Ensembl using the Biomart tool (Kinsella et al., 2011).To achieve this, the coordinates of the T1D regions obtained from T1Dbase are uploaded to the biomart query page which allows one to search the genome browser and retrieve data like the names, chromosomal positions, and genic positions (referred to as \"consequence to transcript\", in Ensembl) of the SNPs.The SNP genic positions tell if a SNP is located within a gene, adjacent to a gene or whether they occur in inter-genic positions between gene coding regions, as well as the particular genes in which they are located.Advantages of Ensembl:There is a number of advantages to using Ensembl. (i) It is a larger web resource than T1Dbase and integrates data from a wide range of biological research sources into its database.Therefore, available information is quite comprehensive. (ii) Genic positions for 99% of the variants obtained from T1Dbase could be retrieved. (iii) Ensembl contains quality checks for genetic variants in its variation pipeline.A variant is flagged as failed if certain quality criteria are not met, for instance if none of the variant alleles match the reference allele of the variant.Generally, Ensembl was found to give more detailed information regarding the genic positions of variants compared to T1Dbase.Information about genes, including gene names, chromosomal coordinates, biotype (coding or non-coding), and number of splice variants, can also be retrieved from Ensembl.", + "But the four sites are not equivalent; there are important distinctions between them in terms of the data analysed, the analyses carriedout and the way the results are displayed. 4.4.1 EnsemblEnsembl is a joint project between the EBI (http://www.ebi.ac.uk/) and the WellcomeTrust Sanger Institute (http://www.sanger.ac.uk/). The Ensembl database (Hubbardet al. , 2002; http://www.ensembl.org/), launched in 1999, was the first to provide awindow on the draft genome, curating the results of a series of computational analyses.Until January 2002 (Release 3.26.1), Ensembl used the UCSC draft sequenceassemblies as its starting point, but it is now based upon NCBI assemblies. TheEnsembl analysis pipeline consists of a rule-based system designed to mimic decisions made by a human annotator. The idea is to identify confirmed genes that arecomputationally predicted (by the GENSCAN gene prediction program) and alsosupported by a significant BLAST match to one or more expressed sequences orproteins. Ensembl also identifies the positions of known human genes from publicsequence database entries, usually using GENEWISE to predict their exon structures.Data retrieval is extremely well catered for in Ensembl, with text searches of alldatabase entries, BLAST searches of all sequences archived, and the availability of bulkdownloads of all Ensembl data and even software source code. Ensembl annotationcan also be viewed interactively on ones local machine with the Apollo viewer (Lewiset al. , 2002; http://www.fruitfly.org/annot/apollo/). 4.4.2 The UCSC Human Genome BrowserThe UCSC Human Genome Browser (UCSC) bears many similarities to Ensembl;it, too, provides annotation of the NCBI assemblies, and it displays a similar array offeatures, including confirmed genes from Ensembl.Ensembl provides a DAS referenceserver giving access to a wide range of specialist annotations of the humangenome (for more detail, see http://www.ensembl.org/das/). Data mining The ability to query very large databases in order to satisfy ahypothesis (top-down data mining), or to interrogate a database in order togenerate new hypotheses based on rigorous statistical correlations (bottom-updata mining). Domain (protein) A region of special biological interest within a single proteinsequence." + ], + [ + "The database contains trait data for severalhundred phenotypes including common inbreds, consomics, 80 BXD recombinant inbreds,hybrids, and over 60,0000 mutagenised mice including ENU mutants and several knockoutlines. SOPs are employed for phenotypic data acquisition. This publicly accessible databaseis an excellent example of one that can be made significantly more valuable to thecommunity with a standard in place for the reporting of these protocols. PhenoSITE (http://www.gsc.riken.go.jp/Mouse/phenotype/top.htm) provides baselinephenotype data for three inbred strains and their F1 hybrids.", + "The MouseGenome Database (MGD) has structured their mouse genomic data in terms of the Mammalian Phenotype Ontology[10]. Similarly, the Rat Genome Database (RGD) [11] alsodeveloped a phenome database, integrated with its genomicdata. In humans, the GeneNetwork (WebQTL) provides adatabase of complex traits with mappings to quantitative traitloci [12]. And several studies have focused on integratinghuman phenome and genome resources. For example, Butteet al. created a large-scale phenomegenome network byintegrating the Unied Medical Language System with humanmicroarray gene expression data [13]; and Aerts et al.de la Cruz N, Bromberg S, Pasko D, Shimoyama M, Twigger S, et al. (2005)The Rat Genome Database (RGD): Developments towards a phenomedatabase. Nucleic Acids Res 33: D485D491. Wang J, Williams RW, Manly KF (2003) WebQTL: Web-based complex traitanalysis. Neuroinformatics 1: 299308. Butte AJ, Kohane IS (2006) Creation and implications of a phenomegenome network. Nat Biotechnol 24: 5562. Aerts S, Lambrechts D, Maity S, Van Loo P, Coessens B, et al. (2006) Geneprioritization through genomic data fusion. Nat Biotechnol 24: 537544.", + "Shur-Jen Wang provided an overview of the Rat Genome Database, which provides a platform to improve model selection.The database includes a quantitative phenotype tool that provides expected ranges for a phenotype of interest across strain groups, drawing from published literature and other deposited data and resources.This tool can also be used to link phenotypic variation to damaging genomic variants, which are shown in parallel.", + "This is apublicly available database that contains phenotypes from hundreds of studies and alsolists basal gene expression data for many tissues, including brain regions. 3.4. Why Mice? The European house mouse (Mus musculus) has served as human analogue in basicresearch for many decades. Ethical and logistic limitations preclude almost all toxicogeneticresearch in humans. Genome-wide association studies in humans have revealed the geneticbasis for individual differences in several diseases; however, the exact mechanisms for geneaction are difficult to ascertain. Thus, the use of animal models to uncover mechanismsbecomes the approach [61,62].", + "A number of public data resources are also being established to provide freelyaccessible microarray data on drug- and toxicity-related phenotypes. For example,the Chemical Effects in Biological Systems (CEBS) database (Mattes et al. , 2004) isa highly recommended resource that accommodates gene-expression profiles, andproteomics and metabolomics data and allows very complex queries across morethan 100 experiments, mostly performed in rat liver. These experiments include datagenerated after exposure to members of key drug classes, including the antidiabetic,troglitazone (Rezulin); the antiepileptic, valproic acid; and the antidepressive, fluoxetine (Prozac) among other drugs (Mattes et al. , 2004).", + "Although these as yet include only alimited number of laboratories and genotypes, they all try to enlist larger groupsof researchers and to expand the animalmodels covered, and they are publicly available. It will be beneficial for the redesign ofnew behavioral measures that raw behavioral data will be available as well in thesedatabases. Access to this information will allowexperimenters to extract from the databasethe size of the genotype-by-laboratory interaction relevant to their experiment.", + ", 2014; see Section 9). GeneNetwork is a database that enables searching for 4000 phenotypes from multiple studies in the BXD, HXB, and in other recombinant inbred rodent families, as well as in other model organismsand even humans (Mulligan et al. , 2017). GeneNetwork employed asomewhat dierent strategy than MPD in that it did not rely solely onresearchers submitting their data. Instead the database operators extracted the data from the scientic literature and integrated them into auniform format (Chesler et al. , 2003).In the future, these two dataresources, the per strain phenotype data storage with thorough protocoldocumentation in MPD, the Rat Genome Database, and genetic analysissuite in GeneNetwork.org will be more closely integrated (Mulliganet al. , 2017). The public database of the International Mouse Phenotyping221Neuroscience and Biobehavioral Reviews 87 (2018) 218232N. Kafka et al. Consortium (IMPC) is intended to be the rst truly comprehensivefunctional catalogue of a mammalian genome (Morgan et al. , 2009;Koscielny et al. , 2014).", + "Useful Databases for the Exploration of Relationships Among Genetic Variations and Specific Phenotypes.", + "Shimoyama M, De Pons J, Hayman GT, Laulederkind SJ, Liu W, Nigam R, Petri V, Smith JR,Tutaj M, Wang S-J, The Rat Genome Database 2015: genomic, phenotypic and environmentalvariations and disease, Nucleic acids research 43(D1) (2014) D743D750. [PubMed: 25355511][24]. Dickinson ME, Flenniken AM, Ji X, Teboul L, Wong MD, White JK, Meehan TF, Weninger WJ,Westerberg H, Adissu H, High-throughput discovery of novel developmental phenotypes, Nature537(7621) (2016) 508. [PubMed: 27626380][25].", + "All data presented in this paper were deposited in the online databaseGeneNetwork (www.genenetwork.org), an open web resource that containsgenotypic, gene expression, and phenotypic data from several genetic referencepopulations of multiple species (e.g. mouse, rat and human) and various celltypes and tissues.35;36 It provides a valuable tool to integrate gene networks andphenotypic traits, and also allows cross-cell type and cross-species comparativegene expression and eQTL analyses.", + "This is apublicly available database that contains phenotypes from hundreds of studies and alsolists basal gene expression data for many tissues, including brain regions. 3.4. Why Mice? The European house mouse (Mus musculus) has served as human analogue in basicresearch for many decades. Ethical and logistic limitations preclude almost all toxicogeneticresearch in humans. Genome-wide association studies in humans have revealed the geneticbasis for individual differences in several diseases; however, the exact mechanisms for geneaction are difficult to ascertain. Thus, the use of animal models to uncover mechanismsbecomes the approach [61,62].", + "The Mouse Phenome Database would be a natural choice: it already provides acontrolled vocabulary for representing phenotype measurements and enforces correct strain nomenclature tofacilitate accurate comparisons across studies. Effectiveintegration of phenotypic and genetic data, facilitated bythe databases and analytical tools presented in this review,is critical to realizing the promise of the CC as it existstoday.", + "A number of public data resources are also being established to provide freelyaccessible microarray data on drug- and toxicity-related phenotypes. For example,the Chemical Effects in Biological Systems (CEBS) database (Mattes et al. , 2004) isa highly recommended resource that accommodates gene-expression profiles, andproteomics and metabolomics data and allows very complex queries across morethan 100 experiments, mostly performed in rat liver. These experiments include datagenerated after exposure to members of key drug classes, including the antidiabetic,troglitazone (Rezulin); the antiepileptic, valproic acid; and the antidepressive, fluoxetine (Prozac) among other drugs (Mattes et al. , 2004).", + "The GeneNetwork database provides open accessto BXD and other RI strain derived microarray data, single nucleotide polymorphism (SNP) data,and phenotypic data for quantitative trait loci analysis and gene expression correlation analyses. Gene expression data were exported for manually selected probes in the PDNN hippocampusdatabase (Hippocampus Consortium M430v2), and the PDNN whole brain database (INIA BrainmRNA M430). The Hippocampus database was chosen as one of the most elaborate brain databases,as well as most highly recommended dataset on GeneNetwork itself (http://www.genenetwork.org/webqtl/main.py?FormID=sharinginfo&GN_AccessionId=112).", + "The Mouse Phenome Database would be anatural choice: it already provides a controlled vocabulary for representing phenotypemeasurements and enforces correct strain nomenclature to facilitate accurate comparisonsacross studies. Effective integration of phenotypic and genetic data, facilitated by thedatabases and analytical tools presented in this review, is critical to realizing the promise ofthe CC as it exists today.", + "RGD database (www.rgd.mcw.edu) provides updated genetic,genomic, phenotype, and disease data generated from mouse, rat,and human. A total of 450 genes were downloaded using cardiomyocyte, myocyte, and cardiomyopathy as the keywords. GWAS Catalog (www.ebi.ac.uk/gwas) database provides published genome-wide association studies in human populations. Atotal of 126 genes associated with cardiomyopathy disease with pvalue 5 10 6 were downloaded using cardiomyopathy asthe key word. IMPC database (http://www.mousephenotype.org/) provides detailed phenotype data for the knockout mouse. A total of 636genes were downloaded using cardiomyocyte, myocyte, andcardiomyopathy as key words. collaborative eort [19].", + "A number of public data resources are also being established to provide freelyaccessible microarray data on drug- and toxicity-related phenotypes. For example,the Chemical Effects in Biological Systems (CEBS) database (Mattes et al. , 2004) isa highly recommended resource that accommodates gene-expression profiles, andproteomics and metabolomics data and allows very complex queries across morethan 100 experiments, mostly performed in rat liver. These experiments include datagenerated after exposure to members of key drug classes, including the antidiabetic,troglitazone (Rezulin); the antiepileptic, valproic acid; and the antidepressive, fluoxetine (Prozac) among other drugs (Mattes et al. , 2004).", + "A number of public data resources are also being established to provide freelyaccessible microarray data on drug- and toxicity-related phenotypes. For example,the Chemical Effects in Biological Systems (CEBS) database (Mattes et al. , 2004) isa highly recommended resource that accommodates gene-expression profiles, andproteomics and metabolomics data and allows very complex queries across morethan 100 experiments, mostly performed in rat liver. These experiments include datagenerated after exposure to members of key drug classes, including the antidiabetic,troglitazone (Rezulin); the antiepileptic, valproic acid; and the antidepressive, fluoxetine (Prozac) among other drugs (Mattes et al. , 2004)." + ], + [ + "d", + "Summary", + "b gg n n e e r c S", + "G", + "d", + "npg", + "Hence only G2D and Gentrepid will be discussed here.", + "F, forward; R, reverse.", + "~~~.", + "n.d.n.d.", + "3KR", + "What Is Relevant?", + "R5. Ubuntu philosophya)R5. Ubuntu philosophy (See page 66)", + "RSet in 10/12 pt Dutch801BT by Aptara\u0002Inc., New Delhi, IndiaDisclaimerThe publisher and the author make no representations or warranties with respect to the accuracy orcompleteness of the contents of this work and specically disclaim all warranties, including withoutlimitation warranties of tness for a particular purpose. No warranty may be created or extended bysales or promotional materials. The advice and strategies contained herein may not be suitable forevery situation. This work is sold with the understanding that the publisher is not engaged inrendering legal, accounting, or other professional services.", + "vid", + "npg", + "HG LG HG LG HG LG HG LG HG LG HG LG HG LG", + "rMZ" + ], + [ + "Pathway analysisSignificant over-representation of biochemical pathways from KEGG and Reactome as well as gene ontology terms were taken from the output of g:Profiler, http://biit.cs.ut.ee/gprofiler/ [15].Lists of genes (n > 10) pertaining to a given type of GxE interaction, i.e., either a particular phenotype or environmental factor, served as input to the pathway/ontology tool.g:Profiler was run with default settings.", + "Pathway EnrichmentPathway analyses were performed to explore possible biological mechanisms that may underlie the associations between the identified genes and aging pathways.We used The Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways, GO ontology, Pathway commons, and disease-associated genes from WebGestalt for our analyses (Wang et al. 2013).For each pathway, the hypergeometric test was used to detect the overrepresentation of our set of genes among all genes in the pathway.Lastly, FDR was controlled using the Benjamini-Hochberg procedure.In all cases, the complete set of proteincoding genes was used as the background.", + "Multiple exploratory dataanalysis will be used since different analysis can reveal different aspects of the data (Leung Y.F. ,Cavalieri D.). The program EASE (Expression Analysis Systematic Explorer) will furtheranalyze the data by looking at over-represented functional categories of genes in the network. Ingenuity Pathway Analysis will help to identify biological pathways that are relevant to thegenes of interest. The data will be analyzed using WebQTL which will link gene expressionwith behavioral data. Important specific genes found in the study will be further confirmed byreal time PCR.", + "Pathway analysisThe identified CpGs were annotated to nearest genes and evaluated for enrichment of gene-sets in the Reactome and the KEGG (Kyoto Encyclopedia of Genes and Genomes) pathways using Gene-Set Enrichment Analysis (GSEA) (http://www.broadinstitute.org/gsea/index.jsp).", + "Ingenuity Pathway Analysis (IPA)The IPA software (Ingenuity Systems, Inc.) was used to carry out the network composition analyses.The Ingenuity Canonical Pathways analysis was used to identify the most significant pathways that were set from the Ingenuity Pathway Analysis library.The significance of the association between a data set and the canonical pathway was measured in two ways: (1) a ratio of the number of molecules from the data set that map to the pathway divided by the total number of molecules that map to the canonical pathway was displayed, and (2) Fisher's exact test was used to calculate a p-value to determine the probability that the association between the genes in the dataset and the canonical pathway can be explained by chance alone [28].", + "Pathway analysisPathway analyses were carried out using the core analysis function of the Ingenuity Pathway Analysis software (IPA, Ingenuity Systems).We performed gene-based tests for association based on results from the PAR-dr and WL-dr discovery GWAS, using the Versatile Gene-based Association Study (VEGAS) software. (16) The full list of genes and gene-based p-values generated by VEGAS was uploaded into IPA for use as a reference set (16,965 genes were available for the PAR-dr analysis and 16,953 for the WL-dr analysis).From this list p-value cut-offs of 0.01 or 0.05 were used to identify IPA focus molecules (Supplemental Section 7).Networks generated by IPA provide insight into the molecular interactions of the focus molecules, independent of any predictions of biological function.", + "Inmetabolic pathways analysis , using bioinformatics toolssuch as RegulonDB, WIT.UM-BBD, EcoCyc,MetaCyc,Enzyme and metabolic pathways database, KEGG bythe researchers willprovide them with theencyclopaedic information about biochemical products ,substrates, catalysing enzymes,amino acids,carbohydrates, lipids and toxic compounds etc. and theirmetabolic pathways specific diseases related to thefailure in their functions. Bioinformatics tools likeKEGG, KEGG BRITE, Gene network database,Genepath help the researchers in analysis of genetic pathwaysand regulatory networks in such a ways that giveinformation about the genes, transcriptional factors,miRNA, genes encode enzymes involved in geneticrelated diseases.The techniques integrate the molecular information from thedatabases with simulation of metabolic networks. These methods also help in representation of genes, proteins andmetabolic pathways in combination with dynamic simulated environment. In this paper we reviewed someapplicable bioinformatics tools for analytical study of three types of pathways such as metabolic, genetics andsignalling pathways along with the information about their principle, work system and their direct access link to thedatabases and programs. This study helps scientists in fast, economic, high accuracy and large scale based outputs ofpathways analysis of their appropriate research involving the biochemical pathways.", + "Well-established methodologies such as Gene Set EnrichmentAnalysis (GSEA) [41] help in differentiating pathways as functionalunits from experimental populations. Manually curated pathwaysbased on expert knowledge and existing literature obtained fromthe Kyoto Encyclopedia of Genes and Genomes (KEGG, http://www.genome.jp/kegg/pathway.html) are another alternative measure used for validation [21]. Biological Network Inference from Microarray Data, Current Solutions, and AssessmentsTo evaluate the biological significance of a inference method,researchers explored an alternative measure based on Gene Ontology (GO) against functional, biological enrichment of a group ofgenes derived from inferred network modules [34].", + "Pathway analyses.We used two different programs for pathway analysis: Ingenuity (see URLs), version August 2012, application build 172788, content version 14197757) and the Disease Association Protein-Protein Link Evaluator (DAPPLE) 39 .", + "PATHWAYASSIST includes an automatedtext-mining tool, which enables the software to generate pathways from the entire PubMed database and other publicsources. Thus, we surveyed all published work in PubMedand extracted data on each candidate gene relating to itstranscriptional regulation, its binding partners and any othergene/protein that modifies or interacts with it. This analysiswas presented graphically and colour-coding genes identifiedin our study enabled easy identification of the genes lying inoverlapping pathways.", + "For example, Gene Ontology [1], Biocarta [2], GenMAPP[3] and KEGG [4] all allow a list of genes to be crossedwith biological functions and genetic networks, includingmetabolic, signalling or other regulation pathways. Basicstatistical analysis (e.g. , [5,6]) can then determinewhether a pathway is over-represented in the list, andwhether it is over-activated or under-activated. However,one can argue that introducing information on the pathway at this point in the analysis process sacrifices somestatistical power to the simplicity of the approach.", + "Gene Ontology and Pathway analysisData sets were interrogated using the Ingenuity Pathways Analysis (IPA) application (Ingenuity Systems, Redwood City, CA; http://www.ingenuity.com).IPA was used to identify enriched canonical pathways, gene networks, functional classes, and toxicity lists (molecules involved in known toxicity processes).", + "Analysing participating pathways is an important aspectof any genes functional analysis strategy. In this view,REACTOME (http://www.reactome.org) [13] is a crossreferenced, manually curated and peer reviewed pathwaydatabase. LitInspector (http://www.litinspector.org) [14]and NetPath (http://www.netpath.org/index.html) [15]allow one to access curated signal transduction related literature and interaction pathways respectively. PredictiveNetworks (http://predictivenetworks.org/) [16] integratesgene interactions and networks information from PubMedliterature and other online biological databases and presents it in an accessible and efficient user interface. Twoother noteworthy commercial tools are GeneGo andIngenuity IPA.", + ", 2011; Kim et al. , 2011b; Zhang et al. ,2011). A number of pathway analysis software packages are available such as PathwayStudio(http://www.ariadnegenomics.com/),and MetaCoreTM (http://www.genego.com/metacore.php). In such software packages, thealgorithms calculate the statistical signicanceof the expression changes across every group orpathway in the database, thus, allowing identication of groups or pathways most stronglyaffected by the observed expression changes(http://www.ariadnegenomics.com/technologyresearch/pathway-analysis/).", + "Network analyses.Network analyses were carried out using the Ingenuity Pathway Analysis tool 66 .P values for canonical pathways and functions were calculated from the observed number of candidate genes in the gene set, compared with the number expected under the null hypothesis and corrected (Bonferroni) for the number of pathways tested.", + "Pathway enrichment analysis.Pathway enrichment analysis for the predicted genomic key driver variants was performed using the ClueGo(v2.1.7) 74and CluePedia(v1.1.7) 75plugins in Cytoscape(v.3.1.0) 76with the GO database (29.02.2016 download).Pathways with a Bonferroni-corrected p-value are shown with full data in Supplementary Data 4. Pathway enrichment analysis for the coexpression modules from transcriptomic analysis was performed by R package goseq with default parameters 77 .", + "Pathway analysisFor the 85 learning-associated genes, we used a combination of bioinformatics software that included E! Ensemble, Protonet, Pandora, and Pubmed and Pubmatrix searches (Becker et al., 2003).We also used http://bind.cafor protein-protein interaction information.Using this approach (Burger et al., 2007;Velardo et al., 2004) we found information on 50 genes (Table 3 and Supplementary Table 3); the other 35 transcripts were expressed sequence tags (EST).", + "Finally, using the top 24 results, we conducted a pathway analysis with the Database for Annotation, Visualization and Integrated Discovery (http://david.abcc.ncifcrf.gov/).", + "Pathway analysis helps to add structure to the very large amount of data generated by microarrays.This type of analysis allows determining whether differentially methylated genes belong to predefined networks more than by chance alone.Gene ontology enrichment was performed using the Ingenuity Pathway Analysis (IPA) software (Ingenuity System).IPA compares a provided list of genes (differentially methylated genes in this case) to a reference list of genes included in various biological pathways.It provides a P value based on a hypergeometric test identifying over-represented gene ontology categories." + ] + ], + "task_id": [ + "029A427CEEBABE644F12EE390469B134", + "7C028B1D0013EA11574B094986ABE4C2", + "55562016699AFE4B8AD9A7F29A806CB5", + "C9B1B98F9207B79EBBC98790A769CB51", + "242918F32291CC085DEB319A7EE3284B" + ] +}
\ No newline at end of file diff --git a/gnqa/data/study1/datasets/human/dataset_domainexpert_general_2.json b/gnqa/data/study1/datasets/human/dataset_domainexpert_general_2.json new file mode 100644 index 0000000..ca4b36c --- /dev/null +++ b/gnqa/data/study1/datasets/human/dataset_domainexpert_general_2.json @@ -0,0 +1,108 @@ +{ + "question": [ + "Once a sperm combines with an egg, what determines how traits are passed onto the resulting lifeform?", + "Why is genetic tracing matrilineal rather than patrilineal?", + "Explain the process of DNA replication and how it ensures accurate copying of genetic information during cell division.", + "What are the potential benefits and risks associated with gene editing technologies like CRISPR-Cas9?", + "How does one tell the difference between X and Y DNA, with repsect to DNA tracing and determining QTLs?" + ], + "answer": [ + "The traits are determined by the combination of genes from both the sperm and the egg. This process involves meiosis, where each gamete (sperm and egg) contributes one chromosome to each pair, resulting in a zygote with a full complement of 23 chromosome pairs. The process of recombination or crossing over, where similar DNA sequences from the paired chromosomes swap genetic material, also plays a crucial role in determining the traits of the offspring. This results in a shuffling of genetic material and contributes to the genetic variation seen among offspring.", + "Genetic tracing is matrilineal due to the inheritance of mitochondrial DNA (mtDNA), which is passed from mother to all her children without any admixture from the father. This allows for a clear lineage tracing through the maternal line. In contrast, Y-DNA is passed from father to son, allowing for patrilineal tracing, but it does not provide information about the genetic contributions of other ancestors in a family tree.", + "DNA replication is a process where the DNA molecule creates two identical copies of itself. This process begins with the separation of the two strands of the mother cell DNA. New nucleotides are then assembled to form two double helices identical to the original one. This is facilitated by the base pairing rules where adenine (A) pairs with thymine (T) and cytosine (C) pairs with guanine (G). This ensures that each daughter cell receives an exact copy of the DNA. The replication process is crucial during cell division as it allows for the accurate transmission of genetic information from one generation of cells to the next.", + "The potential benefits of gene editing technologies like CRISPR-Cas9 include the ability to modify genes for the treatment of diseases, improvement of crop species, and the development of personalized drug or cell therapies. It can also be used for functional screening in the development of therapies and for the study of molecular causes of ageing. However, there are risks associated with these technologies. These include off-target effects, which can lead to unwanted mutations, and the potential for wide-ranging deletions or recombination events. There's also a risk of triggering a P53 response leading to apoptosis in cycling cells, and the potential for subjects to generate antibodies to Cas9, which could limit gene therapies. Furthermore, the long-term safety of CRISPR genome editing in humans is yet to be determined.", + "The text does not provide specific information on how to differentiate between X and Y DNA in the context of DNA tracing and determining QTLs." + ], + "contexts": [ + [ + "Selection could occur at multiple levels, from germ cell generation and propagation to fertilization and early embryonic growth.Chromosomal abnormalities, including aneuploidy, were found in 10-20% of spermatozoa and oocytes (20) and in the cleaved embryo, with a 21% rate of abnormalities in preimplantation embryos (21).These findings led to a model for natural selection against chromosome abnormalities (21).Selection extends to the end of gestation: Only approximately 30% of all conceptions result in a live birth, with more than half of aborted fetuses containing chromosomal abnormalities (22), a number likely to be an underestimate because of technological limitations in measuring all possible mutations.But even in the very small fraction of germ cell duos that survive this withering genome attack and result in a live birth, a number of severe de novo mutations will still be found (23).The data on gross chromosomal alterations suggest that overall, mutation frequency early in life is very high.The functional consequence, however, is limited because of selection.Somewhat surprisingly, this picture points toward an initial decline in genomic alterations, allowing the adult individual to acquire a somatic genome optimally equipped to provide function.", + "The phenotype ofthe F1 hybrids is compared to those of the parental inbred strains to revealdominance or semi-dominance relationships between the alleles that aect thephenotype. Phenotypic dierences between reciprocal F1 hybrids indicate thatone or more of the following factors may aect the trait: (1) sex linkage (X- or Ylinked traits), (2) genomic imprinting of QTLs that aect the phenotype, (3)prenatal maternal eects (eects of intrauterine environment), and/or (4)postnatal maternal or paternal eects (eects of maternal and/or paternalparenting behaviour on ospring).", + "Sex brings harmful alleles together into thesame genetic background, allowing selection to more efficiently purge them fromthe population and potentially producing some offspring that are fitter than eitherparent. However, the benefit of recombining deleterious mutations may depend on thenature of the epistatic interactions between them. The mutational deterministic hypothesis(Kondrashov 1988) depends partly on this epistasis.In most plants and animals, sexis a necessary component of reproduction, and the question for evolutionary biologistsis why reproductive mechanisms have evolved that way. In one of the experimentsdescribed next, evolutionary geneticists have nevertheless devised a way to compareevolution with and without recombination in the obligately sexual fruit fly.This disparity in investment is the basis for the twofold cost: asexualfemales hypothetically could transmit twice as many alleles at the same cost. In most plants and animals, mates tend to be unrelated, leading to outcrossing. Butsex usually also involves the basic process of physical recombination: the breakage andreunion of two different DNA or RNA molecules. Of these two processes, recombinationis clearly the more widespread feature of sexual reproduction. A variety of reproductivesystems, such as selfing and automixis, involve recombination but not outcrossing. Incontrast, relatively few reproductive systems have outcrossing without recombination.", + "Crossing over-The swapping of genetic material that occurs in the germline.During the formation of egg and sperm cells, also known as meiosis, paired chromosomes from each parent align so that similar DNA sequences from the paired chromosomes cross over one another.Crossing over results in a shuffling of genetic material and is an important cause of the genetic variation seen among offspring.This process is also known as meiotic recombination.The reason for the rarity of these mutations is natural selection: If the mutations result in disorders that decrease health and reproductive fitness, they will eventually be eliminated from a population.In exceptional cases, mutations may cause both beneficial and detrimental consequences, resulting in opposing forces of positive selection and negative selection that may cause the mutations to be preserved at nonrare frequencies in a population.For example, the HbS mutation in the HBB gene (which produces the subunit of hemoglobin) causes sickle cell disease when present in both alleles, a detrimental consequence, but protects against malaria when present in 1 allele, a beneficial consequence, ensuring that the mutation persists in populations in areas of the world where malaria is endemic.Genes are passed from parents to offspring via the process of meiosis by which gametes, the egg cells in the mother and the sperm cells in the father, are generated.Ordinarily, each cell has 23 pairs of chromosomes; the gametes have 23 unpaired chromosomes.In meiosis, the 23 pairs are split so that each gamete receives 1 chromosome from each pair (Figures 8 and 9).Two gametes (egg and sperm) ultimately join into a single cell, the zygote, which has the full complement of 23 chromosome pairs restored.If all goes well, the zygote gives rise to a live offspring.Recombination (meiotic recombination)-The swapping of genetic material that occurs in the germline.During the formation of egg and sperm cells, also known as meiosis, paired chromosomes from each parent align so that similar DNA sequences from the paired chromosomes recombine with one another.Recombination results in a shuffling of genetic material and is an important cause of the genetic variation seen among offspring.Also known as crossing over.", + "In the generation of gametes, crossing over regularly occurs, and genetic information is swapped between members of a chromosome pair.That doesn't matter within inbred animals, because the swapped parts are identical.In an F 1 animal, however, the chromosomes of a particular pair are genetically different, one each having come from each parent.Each gamete produced will be unique, as will be each F 2 zygote formed by uniting of the gametes from two F 1 parents.An F 2 group thus provides for expression of some genetic variability.This variability is limited to the allelic differences existing between the parent strains of the F 1 s, so that another F 2 , derived from different inbred strains, will express different genetic differences.", + "Sex brings harmful alleles together into thesame genetic background, allowing selection to more efficiently purge them fromthe population and potentially producing some offspring that are fitter than eitherparent. However, the benefit of recombining deleterious mutations may depend on thenature of the epistatic interactions between them. The mutational deterministic hypothesis(Kondrashov 1988) depends partly on this epistasis.In most plants and animals, sexis a necessary component of reproduction, and the question for evolutionary biologistsis why reproductive mechanisms have evolved that way. In one of the experimentsdescribed next, evolutionary geneticists have nevertheless devised a way to compareevolution with and without recombination in the obligately sexual fruit fly.This disparity in investment is the basis for the twofold cost: asexualfemales hypothetically could transmit twice as many alleles at the same cost. In most plants and animals, mates tend to be unrelated, leading to outcrossing. Butsex usually also involves the basic process of physical recombination: the breakage andreunion of two different DNA or RNA molecules. Of these two processes, recombinationis clearly the more widespread feature of sexual reproduction. A variety of reproductivesystems, such as selfing and automixis, involve recombination but not outcrossing. Incontrast, relatively few reproductive systems have outcrossing without recombination.", + "Aberrant recombination patterns on chromosomes that have missegregated have also been identified as an important factor, in both male and female gametes (Table I).This is because recombination together with cohesion of sister chromatids establish the unique 'bivalent' chromosome structure where homologous partner chromosomes are tethered together, a configuration that is critical for their accurate segregation in meiosis I (Fig. 2A).The remarkable feature is that recombination occurs in foetal oocytes whereas chromosome segregation takes place decades later (Fig. 2A).Since mammalian oocytes are arrested at the G2/M transition (or dictyate stage), this raises the intriguing question of how the bivalent is maintained until the meiotic divisions.", + "Traditionally, it has been agreed that thenal sex of an individual (phenotypic sex)depends on two sequential processes: the sexdetermination system of the species and thegonad differentiation process (Valenzuela,2008). However, recently, these two seeminglydistinct processes are viewed as part of a general process leading to gonad formation andsex ratios (Sarre et al. , 2004; Quinn et al. , 2011;Uller and Helantera, 2011).However, we expect thatonly at this level, the most signicant contributions brought by integrating epigenetics will bemade. Concluding Remarks and FutureProspectsFish sex ratios are the result of a complex interaction between genetic, biochemical, and environmental interactions. The ultimate resultof these interactions at the individual level isgender: male or female. However, at the population level, the combination of sex determination and differentiation sets the sex ratio. Inturn, sex ratios dene the reproductive capacityof populations and, if sex growth dimorphismexists, also the growth characteristics, something very important in an aquaculture context.", + "Obehav is, in turn, influenced by offspring genesand environment (Ogene and Oenvir respectively). Hence, indirect genetic effects (blue arrows)and direct genetic effects (red arrow) are important influencers of behaviour. B) Parentoffspring conflict theory predicts that parental resource investment and offspring solicitationbehaviours are influenced by the fitness benefit to a focal individual (O), cost to a socialpartner such as a sibling (S1 and S2) or parent (P), and by their coefficient of relatedness(black arrows). 42Figure 2: Genomic imprinting can result in divergent phenotypes from the samegenotype. A) A paternally imprinted gene, i.e. maternally expressed.", + "Because of the small contribution, through the sperm, ofthe paternal transcriptome to the fertilized zygote, and because of the stronger maternal contributionto child rearing in most model organisms, parental effects are typically thought of as synonymous withmaternal effects, although true paternal effects are known to exist (Rando, 2012). Maternal effects have been shown to be important during embryonic development, leading todifferences in the birth weight of mice depending on the genotype of the mother (Cowley et al. ,1989; Wolf et al. , 2011).Therefore, the resulting phenotypic patterns lag a generationbehind the genetic transmission of the causal variants. The most well-studied parental genetic effectsare caused by deposition of maternal transcripts into the egg prior to fertilization, resulting indifferences in early embryonic development depending on the genotype of the mother. Certain geneshave also been shown to respond to maternal influence after birth through genetically definedmaternal behaviors (Weaver et al. , 2004).", + "The phenotype ofthe F1 hybrids is compared to those of the parental inbred strains to revealdominance or semi-dominance relationships between the alleles that aect thephenotype. Phenotypic dierences between reciprocal F1 hybrids indicate thatone or more of the following factors may aect the trait: (1) sex linkage (X- or Ylinked traits), (2) genomic imprinting of QTLs that aect the phenotype, (3)prenatal maternal eects (eects of intrauterine environment), and/or (4)postnatal maternal or paternal eects (eects of maternal and/or paternalparenting behaviour on ospring).", + "It was believed by many that for each trait variant we should expect to find acorresponding genetic change, or gene for that trait. Through historical happenstance therelationship between genes and traits was set up and treated as if it were one-to-one. But theproduction of a trait involves not only genes, but also their interactions with each other and theenvironment, and chance." + ], + [ + "distinguishing prenatalfrom postnatal maternal effects, see below). Maternal effects canaccount for a large proportion of phenotypic variance, especiallyduring early life, and for some traits explain more variation thandirect genetic effects [33, 97, 99, 100, 102115]. However, maternal and offspring genotype are correlated (i.e. half their genes areshared), and in inbred lines they are fully confounded, thus separating the effects of their respective genotypes is difficult. To removethis confounding effect cross-fostering has been used, both in thelaboratory and in the field [119, 131].", + "Using genetic markers, the pattern of inheritance can be tracked throughfamilies. For example, by analyzing a marker linked to the eye color genein several generations, it is possible to determine from which grandparents achild has inherited its eye color alleles. More importantly, nding a markerlinked to a disease can lead to location of the faulty gene causing the disease. Finding the gene is very valuable in the search for the cure. The distance between two loci can be expressed either as physical or genetic distance.", + "Although autosomal SNPs are commonly used as genetic markers to infer ancestry or race/ethnicity membership, haploid such as mitochondria, Y-DNA, and X-lined markers are also important to provide separate stories of ancestry of individuals from paternal and maternal sides [42,43].Therefore, genetic structure created due to autosomal markers could be different from those of lineage markers (often influenced by political, social, and migration history of individuals/populations).mitochondrial DNA or mtDNA haploid is the maternally inherited mitochondrial genome (mtDNA) [44].All children inherit mtDNA from their mother, with no admixture from the father.Like Y-line DNA, mtDNA is passed intact from one generation to the next but through maternal line.a) Autosomal DNA (testing both sexes) markers: autosomal DNA tests utilize DNA from the 22 pairs of autosomal chromosomes.Autosomal DNA is inherited from both parents.Autosomal testing provides percentages of ethnicity using autosomal DNA SNP test (i.e., ancestry informative markers), and it is the most commonly used test to infer ancestry across diploid genome.b) Y-DNA or Y-SNPs (paternal line testing) markers: a haploid Y-DNA is the paternally inherited non-recombining portion of the Y chromosome, and it tests only for males.The Y-DNA testing tests the Y chromosome which is passed intact from father to son with no DNA from the mother.Y-DNA testing can then be used to trace direct paternal line.Y-DNA remains the same in each generation, allowing us to compare surname from different regions to see if we are from the same family.Y-line testing does not indicate anything about the contributions of the other ancestors in a family tree.In other words, you could be 3/4th Native American, with only the direct paternal line being European, and this test would tell you nothing at all about those other three Native lines.When testing the Y-chromosome, there are two types of tests, short tandem repeat (STR) and SNP markers.STR tests are best for recent ancestry while SNP tests tell about more ancient ancestry.c) Mitochondrial DNA (maternal line testing) markers:", + "Additional information about past breeding practices can be gleaned by quantifying the number of reproductive males and females in a population.This can be achieved by comparing levels of genetic diversity between sex chromosomes, autosomes and mtDNA 99 .In cattle, for example, gene flow from aurochs is evident in the autosomes but is absent in mtDNA 41 .This has been interpreted as a management strategy that may have involved allowing insemination of domesticated females by wild bulls 41,100 .In horses, a comparison of the levels of diversity of the Y chromosome and the autosomal chromosomes demonstrated that some cultures allowed fewer males to breed and instead selected specific stallion bloodlines 55 .This male-oriented breeding strategy was not practised by the Romans and only became increasingly prominent in the past 1,000 years as a result of the growing influence of Oriental stallions (Arabian, Persian and Turkmen) 101 .", + "Dr Ring: What makes the maternal gene so peculiar compared to the paternal?Dr Cookson: If you look in the epidemiologic sense, many studies show that there is increased risk of allergic disease if the mother is affected.However, very few studies have actually set out to test that formally and most of them might suffer from some sort of selection bias because the mother is more likely to be aware of her symptoms and feel guilty, and so on.It is very difficult to explain.Is it genomic imprinting, where the gene is only active when transmitted through the mother?I do not think all of these genes would be imprinted, though it is possible.It also seems that there are effects of the maternal phenotype.The maternal phenotype, if the mother is affected or unaffected, determines the strength of the maternal effect.Again, if a gene was imprinted, you would not expect maternal phenotype to be important.So, I think that this has something to do with maternal/fetal interaction, either through the placenta or shortly after birth.There is the issue of immune conflict between mother and child.At the same time, the mother is trying to prime the infant's immune system.", + "Genetic and Genomic Discovery Using Family StudiesIngrid B. Borecki, PhD; Michael A. Province, PhD G enetic studies traditionally have been performed on sets of related individuals, that is, families.Mendel's early studies in sweet peas (Pisum sativum) on the inheritance patterns of discrete traits from parents with specific mating types to offspring has shed light on the basic mechanisms of inheritance, including the fundamental laws of segregation of discrete factors (genes) from parents to offspring and the cosegregation of genes that are closely located on a chromosome (linkage).The distribution of traits within families exhibited mathematical segregation ratios in offspring from known mating types.These expected segregation ratios have been used as an important discovery tool in the study of human diseases in pedigrees, providing evidence for a multitude of single-gene disorders.Furthermore, in some cases, trait cosegregation with genetic markers with known positions provides mapping information that enables localization and, ultimately, identification of the relevant causative gene.", + "In fact, this idea has been pursued before in thecontext of signatures of reproductive isolation and shown to revealpatterns consistent with epistatic gene interactions that arise in theshape of Dobzhansky-Muller incompatibilities [10,11]. In contrast to the mouse data, the available human genotypeswere derived from outbred, ethnically distinct populations. In thiscase pairs of functionally interacting genes can be detectedfollowing a slightly different approach.", + "Family StructureThe first re-identification method (FAMILY) employs genealogical data accompanying genomic data.Genealogies, rich in depth and structure, permit the construction of complex familial relationships.Consider a simple family structure of two parents and one child.Since the parental genders are guaranteed, there exist 2 variants of this structure, since the child's gender is either male or female.When disease status is taken into account, it is represented as a Boolean variable; either an individual afflicted or not afflicted.In this aspect, all three family members can be represented as three attributes {Father, Mother, Child}, and there exist (father's disease status)*(mother's disease status)*(child's disease status)*(child's gender) = 2*2*2*2 = 16 possible family-disease combinations.In reality, pedigrees are much more robust than a simple nuclear family.For example, a three-generation family of two children per family permits on the order of 10 5 distinct variants of the family-disease structure and 10 6 individuals that could be uniquely characterized.The number of combinationsk is larger when supplementary information, such as living status or medical/genetic features, is considered. 16e ability to determine unique family structures is only one part of the re-identification process.These structures must be linked to identifiable information, which, in many instances, is publicly available in the form of various genealogical databases.These databases are accessible both offline and via the World Wide Web.For example, genealogical records are available in many public databases, including ,Ancestry.com>,,Infospace.com>,,RootsWeb.com>,,GeneaNet.com>,,FamilySearch.org>, and ,Genealogy.com>. {From such data, it is not difficult to construct family structures and, with such information in hand, an adversary can link disease-labeled family structures to named individuals.", + "Fig. 3. Illustrations of the three CEU pedigrees (black) showing how genetic information from distant patrilineal relatives (arrow; red, patrilineal lines) can identify individuals.Filled squares represent sequenced individuals.To respect the privacy of these families, only abbreviated versions are presented.The sex of the CEU grandchildren was randomized.The numbers of grandchildren are not given.", + "When I was in high school, I remember often trying to match my friends to their parents at various school functions and being surprised at how easy this was.As human geneticists, in spite of the enormous advances being made in our field, we still cannot answer many of the everyday questions that we are asked, such as: \"Why does he look just like his mother? \"Max Perutz [1], in a recent editorial comment in the New Scientist entitled \"The Molecular Biology of the Future,\" suggested some questions, for, as he put it, \"an examination in some future century. \"Here are two of them: (1) \"The time has come\" the Walrus said, \"To talk of many things ...And why the sea is boiling hot And whether pigs have wings. \"Calculate the amount of genetic information this would require in megacricks.", + "Using genetic markers, the pattern of inheritance can be tracked throughfamilies. For example, by analyzing a marker linked to the eye color genein several generations, it is possible to determine from which grandparents achild has inherited its eye color alleles. More importantly, nding a markerlinked to a disease can lead to location of the faulty gene causing the disease. Finding the gene is very valuable in the search for the cure. The distance between two loci can be expressed either as physical or genetic distance.", + "Incontrast, genomic imprinting is due to epigenetic changes withinthe individual causing differential gene expression characterizedby either complete or partial silencing of one parental allele(Barlow, 2011; Abramowitz and Bartolomei, 2012; Ashbrook andHager, 2013). As both mothers and fathers had contact with thepups in our study, our observed PGEs could come from eitherparent. Among quantitative USV traits only peak amplitude of calldisplayed a possible parent-of-origin effect. For call number, callduration, mean peak frequency, and all morphological traits,there were no significant parent-of-origin effect in reciprocalF1 females. In contrast, Thornton et al.", + "Another way of avoiding stratification is to use family-based samples.This approach has several theoretical advantages: as well as being immune to stratification 114 , these samples can be used to determine whether an allele has different effects on disease when it is inherited maternally or paternally 115 , and DISCORDANT SIB designs [116][117][118] can control for the effects of shared environment.Furthermore, more complex family-based designs are possible 119 that might allow combined association and linkage analysis 120 , and family-based association tests have also been developed for quantitative traits [94][95][96][97][98] .However, pure sibship-based association studies are underpowered relative to case-control studies 107,116,117 , and the requirement for living parents might introduce an age-of-onset bias towards younger patients for diseases that usually arise late in life.Furthermore, family-based samples are often much more difficult to collect, particularly if larger pedigrees are sought.Finally, the most commonly used family-based design, the TRANSMISSION DISEQUILIBIRIUM TEST (TDT; see REF. 114) is susceptible to technical artefacts (see below).", + "Because mtDNA is not subjected (as far as we know) to sexual recombination and crossover at the time of nuclear meiosis, nature must call on other means to ensure that inevitable germ plasm mtDNA mutations (Medvedev, 1981) are not transmitted.These mutations among primary oocytes, on the face of it, can be expected to increase with time, that is with maternal age.Empirical data on this question are incomplete and conflicting, being mostly confined so far to searches for deletions rather than point mutations (Chen et al., 1995;Keefe et al., 1995).It is inevitable, however, that there will be such mutations and that there must therefore be a reliable physiological mechanism (a) for giving an opportunity for back-mutations to occur, (b) for selecting in favor of those back-mutations (thus preserving the genome) and in favor of rare advantageous mutations, and (c) for preventing the spread of persistent harmful mutations through the population -mutations that are too slight (or too late in origin) to have escaped intraovarian culling.The sheer conservation of the mitochondrial genome over 0.5 billion years or more, despite a mutation rate estimated at 10 -20 times that of nuclear DNA, is ample reason to conclude that such a physiological purification process must exist.", + "To scrutinize the polygenic networks underlying complex diseases, however, mouse resourcesthat are optimized to study the actions of isolated genetic loci ona fixed background will be insufficient on their own. For example, predisposition to the metabolic syndrome is inherited ina non-Mendelian fashion stressing genetic heterogeneity andmultigenetic pathogenesis (Nandi et al. , 2004). With the reawakening as to the extraordinary genetic resources and phenotypicdiversity archived in extant inbred strains, however, a foundationis in place for tracking down these complex traits and quantitative trait loci (QTL).", + "Otherwise, tens of thousands or markers will appear significant inthe genome-wise association studies using up to one million geneticmarkers. Approaches to control for stratification include using ofself report of ancestry or genetically derived principle componentsin the analysis. For studies using inbred mouse lines, a cladogramwhich is a hierarchical grouping based on phylogenetic analysis ofstrain relatedness can be created to subdivide inbred strains intomore genetically homogenous subgroups.", + "Although bilateral descent is the norm in Western societies, it is not universal and there is variation with cultural practices around lineage.In certain societies, individuals place greater importance on (and have greater knowledge about) one side of the family than another (unilineal descent).Thus, individuals in patrilineal groups trace relationships through males only so that your father's brother's children are members of your family, but not your father's sisters (Kottak, 2007).They are members of their husband's group or family.Efforts to create a family pedigree may be hampered if the participant is not familiar with her mother's relatives, but her mother's brother's children (her cousins) may be able to supplement her overall family history.Knowledge about the cultural system of unilineal descent avoids assuming the universality of bilateral descent.Cultural beliefs such as these also have implications in the conduct of genetic research in terms of confidentiality and autonomy (Benkendorf et al., 1997;Wertz, 1997).One cannot assume that the named proband is in a position to speak for the extended family in agreeing to participate in any genetic research (DudokdeWit et al., 1997).", + "In particular in polygynous species, a femalesoffspring may have different fathers and are thus more closely related through the maternalthan the paternal line. Therefore, any fitness cost to mothers, such as increased provisioningand care, affect maternally derived genes more strongly than paternally derived genes,leading to the silencing of the maternal copy (i.e. paternal expression) of genes that increaseresource transfer. 5. Coadaptation between offspring and maternal traitsThe genetics of the co-evolution of parental and offspring traits has been investigated usingquantitative genetics models and in several empirical studies (Agrawal et al.In thisscenario, genes expressed in parents will be selected for their effects on parental behaviourwhile genes expressed in offspring will be selected for their effects on influencing parentalbehaviour. At the genetic level the predicted conflict between paternal and maternal genomes isthought to have led to the evolution of genomic imprinting (monoallelic gene expression). Genomic imprinting effects are good examples of offspring genetic effects on maternal carebecause of the impact on the quality of maternal care and level of resource provisioning (e.g. Li et al. , 1999)." + ], + [ + "When a cell divides in two, both daughter cells must receive a copy of allthe DNA, i.e. the whole genome. During replication the two strands of themother cell DNA are separated, and new nucleotides are put together to maketwo double helices identical to the original one, see Figure 2.1. ATAAGACCG. . . . . . . . ATTCTGGCGACCG. . . . TGGCTA. . ATTCCG. . . CTGGCFigure 2.1: A DNA chain consists of two strands of complementary nucleotides.WhenDNA is replicated, two double chains identical to the original one are created. The human genome consists of approximately 3 billion nucleotide pairs. The chain is divided into pieces called chromosomes. A gene is a short segment of a chromosome where the nucleotide sequence gives the blueprint fora particular substance in the body, for example insulin. Only a small fractionof the DNA consists of genes. In between the genes there are long non-codingregions of which the function is largely unknown.Germ cells originate from 46-chromosomecells, and a sophisticated process called meiosis ensures that exactly 23 chromosomes, and exactly one from each homologous pair, ends up in each daughter cell. Before the homologous chromosomes are distributed to the daughtercells they are paired up side by side. While they are positioned close togethera process called crossover often occurs, see Figure 2.2. The homologous chromosomes randomly exchange large chunks of DNA. As a result, each chromosome that a child has inherited from a parent will most often contain segmentsfrom both grandparents.", + "Replication handlingReplication is a significant part of any comparative experimentation to raise accuracy and more significantly to deliver a basis for recognized statistical interpretation which is nowadays becoming broadly accepted for genomic data.In genetic and genomic context, replication can have various forms [97]: technical replicates, duplicate gene spots, and biological replicates.It is vital to understand that any sort of replication provides information only concerning the specific source of changeability related to that kind of replication and no other.Based on the experimental setting, it may consequently be imperative to consider one, two or all these categories of replicates.", + "Central dogma-An explanation of the flow of genetic information within a cell.Information is stored in the DNA of the genome, transcribed into RNA, and translated into protein.With a few exceptions, genetic information follows this path only in the forward direction.Basics of Molecular BiologyDeoxyribonucleic acid (DNA) is a molecule with 2 strands that are wrapped around each other in a helical formation, hence its description as a double helix (Figure 1).The outer portion of the helix contains the sugar and phosphate backbone; the inner portion contains the coding bases: adenine (A), cytosine (C), guanine (G), and thymine (T).The genetic information of an organism is determined by the order of the sequence of the bases; with 4 bases available, the number of potential sequences is almost infinite.The versatility of DNA results from the obligatory pairing of bases in the 2 strands, forming base pairs.An adenine in 1 strand is always matched up with a thymine in the other strand, and cytosine is always paired with guanine.Thus, the 2 strands contain redundant information, and each can serve as a template on which a new complementary strand can be synthesized.This allows easy duplication of the DNA so that, when a cell divides into 2 cells, each descendant cell receives the same genetic information as the original cell.Figure 1.The structure of DNA.Each DNA strand has a sugarphosphate backbone (not shown in detail) with a sequence of bases that come in 4 versions: adenine (A), cytosine (C), guanine (G), and thymine (T).Two DNA strands can combine to form a double helix, the stable form of DNA found in chromosomes.Holding the strands together are base pairs: Guanine on 1 strand binds to cytosine on the other strand, and adenine on 1 strand binds to thymine on the other strand.Thus, the 2 strands are complementary and contain redundant information.Figure 8. Meiosis, part 1.Before the first cell division, meiotic recombination (crossing over) between a chromosome pair occurs.Figure 9. Meiosis, part 2. The second cell division yields gametes, which have only half of the complete genome (unpaired chromosomes).Two gametes subsequently fuse (fertilization) to create a zygote that has a complete genome and can give rise to an organism.Figure 8. Meiosis, part 1.Before the first cell division, meiotic recombination (crossing over) between a chromosome pair occurs.Figure 9. Meiosis, part 2. The second cell division yields gametes, which have only half of the complete genome (unpaired chromosomes).Two gametes subsequently fuse (fertilization) to create a zygote that has a complete genome and can give rise to an organism.", + "When a cell divides in two, both daughter cells must receive a copy of allthe DNA, i.e. the whole genome. During replication the two strands of themother cell DNA are separated, and new nucleotides are put together to maketwo double helices identical to the original one, see Figure 2.1. ATAAGACCG. . . . . . . . ATTCTGGCGACCG. . . . TGGCTA. . ATTCCG. . . CTGGCFigure 2.1: A DNA chain consists of two strands of complementary nucleotides.WhenDNA is replicated, two double chains identical to the original one are created. The human genome consists of approximately 3 billion nucleotide pairs. The chain is divided into pieces called chromosomes. A gene is a short segment of a chromosome where the nucleotide sequence gives the blueprint fora particular substance in the body, for example insulin. Only a small fractionof the DNA consists of genes. In between the genes there are long non-codingregions of which the function is largely unknown.", + ". . . . . . . Appendices301Appendix ASummaryAll organisms have a genome made of DNA (deoxyribonucleic acid). The genome can be found in nearly every cell and is the blueprint for thegrowth, development, maintenance and repair of the body. It performsthese functions by transcribing small pieces of DNA, the genes, fromthe genome and translating them to proteins. These proteins are thetiny workhorses of the body that break down food, give bones theirstrength, make muscles move, let brains think, and so on.", + "Every nucleated cell in our body, with the exception of egg and sperm, has a complete genome in its nucleus.Each time the cell divides by the process of mitosis, all the DNA in that cell is replicated, so that each of the two new daughter cells has its own copy of the entire genome.The mitochondria, which produce the energy required for all the cell's functions, contain a small circular DNA molecule that is also part of the genome.Every living organism has a complete genome in each of its cells.And the structure of all DNA is the same.The DNA in human cells has the same structure as the DNA in the cells of a butterfly, a whale, a flower, or a worm.What differs is simply the amount of DNA carried by each organism and the order of the nucleotides in each strand.", + "IntroduclJonEver since the structure of DNA was elucidated by Watson and Crick in 1953, it has been generally assumed that genomic DNA, in view of its vital role in transferring hereditary information from generation to generation, is a stable molecule unaltered in its structure by the surrounding events.This taken for granted, its remarkable attribute of stability has turned out to be a myth.As noted by Haynes (1988) DNA is made up of rather ordinary molecules that are not endowed with any peculiar kind of quantum mechanical stability.As such, DNA must be able to undergo all kinds of structural modifications at the body temperature and with many other chemicals in proximity.Much evidence has accumulated in recent years to prove that this is indeed the case, and normal cellular metabolism itself is enough to cause various types of damage to the genomic apparatus.If the genomic DNA can be assaulted in so many ways the natural question that would emerge is: How is genetic informational integrity maintained and transmitted through generations?", + "The second form of genome partitioning was by DNA replication direction.Since the entire genome is replicated every time a cell divides (but only a portion is transcribed), replication direction has the potential to exert larger asymmetries in mutational data.However, determining direction is much more challenging for replication than transcription, since the precise locations of replication origins in the human genome are not known.This has precluded a comprehensive analysis of replicative strand asymmetry thus far.", + "Each gene is a segment of deoxyribonucleicacid (DNA) and the genes are joined together to make up a set of very long DNA moleculescalled chromosomes. In diploid organisms like humans and mouse, there are two copies of eachchromosome. One copy is inherited from each parent. DNA is comprised of a sequence of nucleotides and the four primary DNA bases found innucleotides are Adenine(A), Cytosine(C), Guanine(G), and Thymine(T). Each base binds withanother specific base (T with A and C with G).", + "If this DNA were to be uncoiled and laid out end toend, it would extend about 3 m. Obviously, this cannot possibly fit into a cell,and extended DNA would be susceptible to breakage during replication andcell division. In eukaryotes, genetic material is thus organized into complexesof DNA with core histones and other chromosomal proteins that together formchromatin. The chromatin repeating unit includes two copies each of four corehistones H2A, H2B, H3 and H4 (collective molecular mass 206,000) wrappedby 146 bp of DNA.", + "The core of the human genome is a DNAdouble helix containing ~3 billion base pairsof genetic information. It is continuously challenged by a variety of genotoxic stresses thatcause ruptures of the DNA sugar-phosphatebackbone. DSBs are the most lethal type ofDNA damage. They can be caused by collapseof the DNA replication fork or, less commonly,induced directly by environmental insults suchas ionizing radiation or radiomimetic drugs. To manage these lesions, cells have evolved twomain pathways of DSB repair. Homologousrecombination occurs in mitotic cells, usuallyduring the S and G2 phases.", + "Cellular and Genetic ChangesThis section will explain how cells normally divide.It will also describe how an unexpected change in the structure of DNA can sometimes cause harm to the body.New tools to study genetic variations of common diseases and to identify genetic variations common to specific diseases will also be presented.Table 1. -Glossary of Genetic and Genomic TermsDeoxyribonucleic acid (DNA) -The chemical inside the nucleus of a cell that carries genetic instructions for making living organisms.Double Helix -The structural arrangement of DNA, which looks something like an immensely long ladder twisted into a helix or coil.The sides of the \"ladder\" are formed by a backbone of sugar and phosphate molecules, and the \"rungs\" consist of nucleotide bases joined weakly in the middle by hydrogen bonds." + ], + [ + "Gene editing has gained considerable interest with the identification of the CRISPR-Cas9 system, 27 which allows for a targeted modification in the DNA sequence of an organism.Researchers can utilize their knowledge of the basic biology of the gene and its protein function to precisely change the DNA sequence, thus altering the protein function of the gene and allowing for edits to stay within the species.Researchers at the University of Missouri used the CRISPR-Cas9 system to modify the CD163 gene such that the PRRS virus is not able to replicate inside the pig. 28This slight modification of the swine genome through gene editing keeps the pigs from succumbing to PRRS which has an annual estimated loss to the United States swine industry of over $660 million per year.Despite this benefit, given the public's concerns over food safety, it is likely that approval for such technology is years away in the US, Canada and Europe.However, in some cultures, there is a wide range of non-livestock species that are consumed.Therefore, it is conceivable that these countries and cultures may be open to transgenic/gene edited livestock.They may see the importance of useful gene editing which may lead to approval and consumption of reasonable genetically edited animal products such as those with modifications that are already found in nature or those that offer a substantial welfare benefit to society.", + "As a researcher who has devoted an entire career since 1994 to the development of genome editing tools and methods, I have been amazed by the rapid progress in the field over the last few years.Considering the widespread use of the tools, I am sure that the pace will continue to accelerate.Indeed, programmable nucleases, may eventually enable humans-products of evolution-to become masters of evolution.delivered preassembled recombinant Cas9-guide RNA ribonucleoproteins (RNPs) into animal embryos 6,9 and plant 11 and mammalian cells [73][74][75] .Indeed, Cas9 RNPs were rapidly turned over in cells 73 , reducing off-target effects and mosaicism in gene-edited organisms 11 .Cas9 RNPs can be delivered into cells by various methods, including microinjection 6,9 , electroporation 73 , lipofection 74 and protein transduction 75 .Importantly-and unlike in conventional gene therapy, where therapeutic genes are delivered via plasmids or viral vectors-Cas9 RNP delivery does not involve the use of exogenous DNA; host innate immune responses against foreign DNA are not elicited, and undesired integration of foreign DNA into the host genome is avoided.", + "In comparison to a transgenic approach, a gene editing technique such as CRISPR-Cas9 offers the advantage that gene-edited crops are not considered genetically modified organism (GMO) in some countries, such as the US, where the demand for natural food colorants such as anthocyanins is high.Indeed, the use of GMO crops as a source of natural pigments may be inconsistent with consumer interests.However, carrot cultivars engineered with either the transgenic or gene editing approach have not been reported so far, but their development is possible.", + "The notable accuracy and versatility of CRISPR-Cas for genome editing also opened the door to its use in preclinical and translational settings.In the latter case, CRISPR in vivo gene editing has led to several proof-of-concept studies that would have been unachievable without it, as in the first ever correction of inherited pathogenic mutations linked to degenerative disease in a living organism [22] and even shown to be possible in human embryos [23,24].It also has great potential in the field of precision medicine as large-scale population DNA sequencing studies have provided vast amounts of information linking particular diseases with specific genetic mutations which could, in theory, be targeted through CRISPR [25,26].This could be used during the identification and validation of potential DNA targets during the development of personalised drug or cell therapies, which will require the generation of engineered cell lines and/or animal models.Techniques such as HDR-mediated gene targeting are too labour intensive, with low targeting efficiencies and long times necessary for their establishment, and consequently are not ideally suited for drug discovery purposes.Conversely, CRISPR-Cas has been proven to be efficient for editing virtually any kind of cell line, from primary immune cells to induced pluripotent stem cells (iPSCs) [27,28].Additionally, CRISPR can also be used for functional screening in the development of combined inhibitory therapy aimed at strengthening the efficiency of targeted therapeutics.An example of the latter is shown in a study where a variation of the technology known as CRISPR interference (CRISPRi) was used in genome-wide scale to identify different survival pathways used by cancer cells after oncogene inactivation and allowing the identification of successful combination therapies [29].In terms of translational applications, the overall safety of CRISPR genome editing in humans will require long-term scrutiny before its adoption in the clinic.Nonetheless, a number of CRISPR-based clinical trials are currently in progress, including studies focused on targeting patients' own T cells in order to improve the immune response towards some forms of malignant cancer [30,31], and others aimed at correcting pathogenic mutations in the hematopoietic cells of patients with beta-thalassemia and sickle cell disease [32].Caveats and Ethical Concerns of CRISPR-Cas ApplicationsDespite the presence of both a PAM sequence and a specific gRNA, the CRISPR-Cas9 system is not infallible.In fact, DSBs can occur at different sites in the genome, potentially causing so-called \"off-target\" effects.This eventuality remains to date the biggest concern in the field, as possible undesirable modifications must be properly identified and followed in order to guarantee safety for medical purposes.Nevertheless, there is still little evidence of the biological consequence of Cas9 off-target effects.Two recent studies describe new methods to investigate potential off-target effects in both mammals and plants [33,34].In both cases, whole-genome sequencing revealed that selective nucleotide changes, such as conversion of an adenine to a guanine, caused off-target occurrence very rarely, with a frequency comparable to the one of spontaneous mutations.However, substitution of a cytosine with a thymidine was linked to a sizable number of off-target mutations.This newly acquired information adds to the plethora of studies conducted on the safety of CRISPR, which altogether highlight the need for the establishment of clinical standards for the future use of genome-editing techniques in the clinic.Despite this and other technical challenges still ahead for CRISPR genome editing, the pace at which this technology has developed in recent years suggests many of these concerns could be addressed soon, as long as proper ethical guidelines and regulatory mechanisms are established.ConclusionsThere is no reason to doubt that the development of CRISPR-Cas genome editing represents an unprecedented breakthrough in modern science, as it has potential applications in a wide array of disciplines ranging from agriculture, zoology and renewable energy to biomedicine and synthetic biology.This powerful tool holds promise for further elucidating the molecular causes of ageing by allowing scientists to probe genetic and epigenetic pathways with a level of sophistication that was unattainable just a few years ago.It will allow so in traditional animal and cell models of ageing, but it will also drastically accelerate the generation of refined versions of those models or even allow the development of new research approaches in non-model organisms.Moreover, CRISPR-based genome editing is already having a significant impact in research aiming to understand the cellular and molecular origins of age-related diseases, as well as developing potential treatments against them.The application of CRISPR-Cas gene editing for the treatment of age-related diseases is not over the horizon yet, as it will require the identification of causative genes and their role under a variety of contexts that could be as diverse as the ageing process is across individuals.However, CRISPR-Cas might also hold the key for solving such conundrum, as it has opened the way for achieving true personalised medicine by providing both the precision and scalability required for conducting genome-wide functional screens during the refinement of drug-and cell-based therapies for age-related diseases.Since its discovery, CRISPR-Cas technology has ignited a biological revolution by providing a highly versatile platform that allows fast and efficient genome editing in an ever-growing list of organisms.In this chapter we will first describe the most recent advances in the development and application of the CRISPR-Cas platform in biomedical research.Then we will discuss the most recent and notable basic research applications of this technology in the study of the molecular causes of ageing.Finally, we will review how CRISPR-Cas has been used for creating new models for the study of age-related diseases, as well as for manipulating diseaseassociated gene pathways.", + "Caveats of advanced genome editing toolsOff-target effects.The DNA-binding domains of ZFNs and TALENs need to be very specific for the target site to avoid off-target cleavage, which results in unwanted mutations and potentially cytotoxic effects [27].CRISPR/Cas9 is also known to generate off-target alterations, albeit apparently at low incidence [28,29], since mispairing is allowed between the guide RNA and the genomic DNA.Nonetheless, caution is required in their design and use.Some strategies involving the optimization of the guide RNA/Cas9 include using of software tools to predict potential off-target sites (http://omictools.com/crispr-cas9-Figure1: Genome editing methodologies which can be applied to human pluripotent stem cells.Homologous recombination (HR), or the more advanced tools such as zinc finger nucleases (ZFNs), transcription activator-like effector nucleases (TALENs) or clustered regularly interspaced short palindromic repeat (CRISPR)/Cas system can be applied to human pluripotent stem cells (hPSCs) either to 1) create naturally occurring mutations or 2) repair a mutation to generate isogenic controls in hPSCs, to understand the function of a gene of interest.c1268-p1.html),truncating the guide RNA (<20 nucleotides) to decrease off-target mutagenesis [30], lowering the dosage of guide RNA and Cas9 plasmids, and decreasing the number of mismatches between the guide RNA and the genomic DNA.A \"double nick\" system with Cas9 nickase, which contains a single inactive catalytic domain, may also be used [31e33].", + "CRISPR screening technologiesThe discovery of CRISPR-Cas9 as a sequence-specific programmable nuclease democratized gene editing and fueled progress in forward genetic screening [20 , 66] .Genetic screens using Cas9 with a pooled singleguide RNA (sgRNA) library allow the interrogation of seemingly all genes in a genome in a single experiment [96 , 97] [null] .Engineered Cas9 variants further extend the versatility of forward genetic screening.Catalytically inactive Cas9 (dCas9) fused with chromatin effector domains permit specific activation (CRISPRa) or inhibition (CRISPRi) of gene expression [37 , 54] .Recently developed and emerging technologies -base editors, prime editors, and Cas transposases -are beginning to enable new types of genetic screens with directed, controlled, and on demand mutations by allowing the creation of user specified modifications, such as single base conversion, deletions, and insertions [4 , 42 , 58] .", + "Coming on the heels of engineered nucleases, CRISPR-Cas9 tools have accelerated the pace of genomic research by permitting highly efficient knockouts or edits of virtually any gene in cells or model organisms.Multiple CRISPR-Cas9-based clinical trials are in progress or are expected to begin soon.Although Cas9engineered cells haven't yet demonstrated efficacy at scale, early trial results suggest that such cells are stable and don't cause acute adverse reactions in humans.Long-term safety is yet to be determined.Current applications largely focus on single-gene disorders for which gene editing can be carried out ex vivo on appropriate cells, such as bone marrow hematopoietic stem cells in the case of sickle cell anemia.Exploration is under way to develop delivery systems that can target the gene-editing apparatus to the appropriate tissue in vivo.Over the past 8 years, CRISPR (clustered regularly interspaced short palindromic repeats)-Cas9 (CRISPR-associated protein 9) technologies have emerged as accessible and adaptable tools for studying and altering genomes. 5RISPR-Cas9 can be used to induce genome edits by creating targeted DNA breaks that trigger site-specific DNA repair.In nextgeneration formats, it can also control the transcriptional output of genes or alter genome sequences using a process of nucleotide base editing that does not require repair of DNA breaks.As these technologies continue to mature, it will become increasingly possible to alter cellular genomes efficiently and accurately.", + "The type II CRISPR-Cas9 systems, repurposed from prokaryotic adaptive immune responses, are now widely used for targeted genome modifications in plants, animals, and human cells (Kim et al. 2014;Woo et al. 2015;Zuris et al. 2015).In particular, Cas9 nucleases have shown promise for gene and cell therapy (Maeder and Gersbach 2016).Typically, these nucleases are expressed or delivered in vivo using plasmid DNA or viruses (Yin et al. 2014;Ran et al. 2015).However, plasmid DNA delivery is often inefficient, especially in vivo, and can cause integration of small plasmid fragments degraded by endogenous nucleases at on-target and offtarget sites in the genome (Kim et al. 2014).Viral delivery of Cas9 can be highly efficient in vivo (Ran et al. 2015;Long et al. 2016;Nelson et al. 2016;Tabebordbar et al. 2016), but may be hampered by antibodies or T cells induced against the protein (Shankar et al. 2007;Calcedo et al. 2015;Chew et al. 2016).We and others have shown that preassembled Cas9 ribonucleoproteins (RNPs) can be delivered to human primary and stem cells and mice to modify target genes (Kim et al. 2014;Schumann et al. 2015;Zuris et al. 2015).Cas9 RNPs are rapidly turned over in cells, reducing off-target effects.Furthermore, Cas9 RNPs are unlikely to be limited by host immune systems because they function and disappear before the generation of antibodies and T cells directed against them.Currently, despite these advantages of RNPs, the difficult delivery of Cas9 RNPs in vivo limits its utility for therapeutic applications (Zuris et al. 2015).Here, we show that in vivo genome editing of an wild-type gene, whose up-regulation is responsible for pathogenesis, could be a new therapeutic modality for the treatment of nongenetic degenerative diseases.Our ultimate goal is to harness Cas9 RNPs for a clinical application of therapeutic genome surgery in patients with AMD.", + "Clustered regularly interspaced short palindromic repeat (CRISPR)-Cas nucleases have revolutionized the field of gene editing and have tremendous application in the field of molecular medicine [98][99][100][101][102].Despite a significant surge in CRISPR/Cas9mediated genome editing in various disease models, the progress in the field of AD has lagged behind substantially.We believe that genome editing can significantly improve the development of AD models and also create novel opportunities for the development of the next generation precision targeted AD gene and stem cell therapies.Since there are several excellent review articles on CRISPR/Cas9-mediated genome editing, here we will limit our focus on select recent articles that are noteworthy.CRISPR/Cas9 system can be engineered to either activate transcription (gain-of-function) or achieve gene silencing (Loss-of-function).Dahlman et al. have developed a CRISPR-based system that uses catalytically active Cas9 and distinct single guide (sgRNA) constructs to activate and knockout different genes in the same cell [103].Konermann et al. have used structure-guided engineering of a CRISPR-Cas9 complex to mediate efficient transcriptional activation at endogenous genomic loci [104].Using crystallographic studies, they have engineered a combination of sgRNA2.0,NLS-dCas9-VP64 and MS2-p65-HSF1 to develop one of the most effective transcription activation system.", + "Limitations of CRISPR-Cas9CRISPR provides a simple and easy tool not only for in vitro use but potentially also for in vivo genome editing.However, there are limitations and downsides to this approach.First, and despite considerable improvements in the technology, the risk of the offtarget effect remains and must be considered carefully.Second, DSB may lead to wide-ranging deletions or recombination events involving the on-target site (204).Third, in cycling cells, DNA double strand breaks caused by Cas9 cleavage may trigger a P53 response leading to apoptosis and enrichment for potentially oncogenic P53-deficient cells (205,206).Fourth, subjects may generate antibodies to Cas9, potentially limiting gene therapies (207,208).Genome editing tools that target the desired genomic region and allow for variants to be altered (e.g. from risk to protective), or for more substantial changes to be made (e.g. the deletion of a longer stretch of DNA harbouring a number of variants) and can help to answer each of these questions.These technologies are evolving rapidly (Figure 1 and Table 2).The most recently developed of these, Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR) technology, originally developed by Doudna, Charpentier and their colleagues (72,73) and Zhang and his colleagues (50) has become a widely used tool for this purpose.Engineered CRISPR/Cas9 technology uses a guide RNA (gRNA) to direct CRISPR-associated endonuclease (Cas) to the target DNA and generate a double strand DNA break.Correction of a mutation or variant in the target DNA sequence can then be carried out by homology-directed DNA repair (HDR) with a donor template.Since its discovery eight years ago, CRISPR technology has evolved quickly to be a critical part of the molecular biologist's toolbox.", + "INTRODUCTIONGenome editing technologies based on the clustered regularly interspaced short palindromic repeats (CRISPR)-associated endonuclease Cas9 enable rapid and efficient modification of endogenous genes in a variety of cell types, allowing for analysis of gene function in many organs in vivo.CRISPR-Cas9 induces DNA double strand breaks (DSBs) at single-guide RNA (sgRNA)-specific loci in the genome, which are repaired through either non-homologous end-joining (NHEJ) or homology-directed repair (HDR) pathways.While NHEJ introduces unpredictable pattern of insertion or deletion (indel) mutations, HDR directs a precise recombination event between a homologous DNA donor template and the damaged DNA site (Cong et al., 2013;Cox et al., 2015;Doudna and Charpentier, 2014;Heidenreich and Zhang, 2016;Jinek et al., 2012;Mali et al., 2013;Sander and Joung, 2014;Wang et al., 2013;Yang et al., 2013).Thus, HDR can be used to precisely introduce sequence insertions, deletions or mutations by encoding the desired changes in the donor template DNA.", + "CRISPR technology has rapidly changed the face of biological research, such that precise genome editing has now become routine for many labs within several years of its initial development.What makes CRISPR/Cas9 so revolutionary is the ability to target a protein (Cas9) to an exact genomic locus, through designing a specific short complementary nucleotide sequence, that together with a common scaffold sequence, constitute the guide RNA bridging the protein and the DNA.Wild-type Cas9 cleaves both DNA strands at its target sequence, but this protein can also be modified to exert many other functions.For instance, by attaching an activation domain to catalytically inactive Cas9 and targeting a promoter region, it is possible to stimulate the expression of a specific endogenous gene.In principle, any genomic region can be targeted, and recent efforts have successfully generated pooled guide RNA libraries for coding and regulatory regions of human, mouse and Drosophila genomes with high coverage, thus facilitating functional phenotypic screening.In this review, we will highlight recent developments in the area of CRISPR-based functional genomics and discuss potential future directions, with a special focus on mammalian cell systems and arrayed library screening.CRISPR technology has rapidly changed the face of biological research, such that precise genome editing has now become routine for many labs within several years of its initial development.What makes CRISPR/Cas9 so revolutionary is the ability to target a protein (Cas9) to an exact genomic locus, through designing a specific short complementary nucleotide sequence, that together with a common scaffold sequence, constitute the guide RNA bridging the protein and the DNA.Wild-type Cas9 cleaves both DNA strands at its target sequence, but this protein can also be modified to exert many other functions.For instance, by attaching an activation domain to catalytically inactive Cas9 and targeting a promoter region, it is possible to stimulate the expression of a specific endogenous gene.In principle, any genomic region can be targeted, and recent efforts have successfully generated pooled guide RNA libraries for coding and regulatory regions of human, mouse and Drosophila genomes with high coverage, thus facilitating functional phenotypic screening.In this review, we will highlight recent developments in the area of CRISPR-based functional genomics and discuss potential future directions, with a special focus on mammalian cell systems and arrayed library screening.The recent development of clustered regularly interspaced short palindromic repeat (CRISPR)/Cas9 for experimental purposes has dismantled the perception that genome editing technology is off-limits for screening in mammalian systems (Heintze et al., 2013).Since this system employs the basic principle of Watson-Crick base pairing for gene targeting, generation of libraries with whole-genome target coverage is relatively easy and cost-effective.For instance, simple protocols are available to synthesize pooled lentiviral libraries by in silico design of oligonucleotides, which can then be cloned, packaged and delivered to cells by viral transduction (Paddison et al., 2004;LeProust et al., 2010).Similarly, the generation of arrayed libraries can be achieved by following protocols originally developed for arrayed shRNA library production that have been in use for a number of years (Moffat et al., 2006).All in all, the stage is set for CRISPR to make an enormous impact on genomic screening and thus scientific discovery in the coming years, and recent demonstrations of this system have shown great promise (Shalem et al., 2015).However, a number of technical challenges must be addressed in order to maximize the benefit of this technology.In this review, we will discuss current applications of CRISPR in functional genomics and provide a perspective on future developments in this area.", + "Genome editing for crop improvementReports of CRISPR-Cas9-based genome editing first appeared in 2013 (Cong et al., 2013;Feng et al., 2013;Mao et al., 2013).Since then, genome editing technologies have proven to be powerful and efficient tools for the improvement of many crop species.At present, genome editing has been widely used to introduce/modify agronomically important traits, such as increased yield, improved nutritional quality, and resistance to biotic and abiotic stresses, in multiple crops, including rice, wheat, maize, tomato, and potato (Lu et al., 2017;Soyk et al., 2017;Tang et al., 2017;D'Ambrosio et al., 2018;Ye et al., 2018;Miao et al., 2019;Zhang et al., 2019;Zhong et al., 2019;Butt et al., 2020;Zhang et al., 2020c;Li et al., 2021b;Zhan et al., 2021).CRISPR-Cas-based genome editing has been extended to targeted mutagenesis, base editing, and precisely targeted gene/allele replacement or tagging in plants.mportantly, using CRISPR-Cas9 technology, transgenes present in the genomes of genome-edited plants can be removed by chromosomal segregation via a simple self-pollination or hybridization step.Gene editing technologies continue to be developed and utilized (Mao et al., 2013;Lu and Zhu, 2017;Lu et al., 2020)." + ], + [ + "Second, and perhaps moreimportant, is the difference in the size and types of thegenetic reference populations. In our previous study, wemapped the QTL with 36 F2 mice that were genotyped at82 markers. In the current study, by comparison, we wereable to map QTLs after examining 342 mice from 55 strainsthat were genotyped at approximately 4000 markers.", + "This contrast can be exploited to identify subregions that underlie the trans-QTLs [67]. SNPs were counted for all four pairs of parental haplotypesBvs D, B vs H, B vs C, and L vs Sand SNP profiles for the fourcrosses were compared (figure 6). Qrr1 is a highly polymorphicPLoS Genetics | www.plosgenetics.org8November 2008 | Volume 4 | Issue 11 | e1000260QTL Hotspot on Mouse Distal Chromosome 1Figure 5. QTL for aminoacyl-tRNA synthetases in distal Qrr1.", + "The traditional approach to QTL mapping is to usetwo strains that differ maximally in the phenotype asparental strains for genetic crosses, with the followingcaveats. QTL analysis based on a single cross will mostlikely reflect only a small portion of the net geneticvariation, and QTL detection will be limited to regionswhere the two progenitor strains have functional polymorphisms. Data from multiple crosses, or from an HS,will overcome this limitation and can also be used toreduce QTL intervals [5,30].", + "These candidate genes are then sequenced in the two parental inbredstrains looking for sequence dierences in coding or regulatory regions. After ne mapping the QTL interval and shortening the list of plausiblecandidate polymorphisms, the major challenge remains \u0001 proving denitivelywhich nucleotide polymorphism underlies the QTL. The most direct proofwould be replacing one strains allele with another strains allele (creating aFIG. 1. Intercross breeding strategy for mapping quantitative trait loci (QTLs). On the right, the parental, F1 hybrid, and intercross (F2) mousegenerations are depicted.", + "Furthermore, splicing QTLs(sQTLs) rather than eQTLs could comprise the molecular mechanism linking DNA variants with YFP53; thus, sQTL analysis could uncover genes that would not normally bedetected at the level of differential gene expression (DGE),53 and thus, a differentially181182Molecular-Genetic and Statistical Techniques for Behavioral and Neural ResearchFigure 8.5 Schematic for immediate, rapid ne mapping in select F2 recombinants of the RCC-F2cross. Top panel: Genome-wide signicant QTL (green trace; red dashed line signicance threshold;blue vertical lines Bayes credible interval).", + "The fuzzy functional boundaries of genes andthe high density of sequence variants in linkage disequilibrium shifts the burden of prooffrom pure mapping to functional genomics, comparative analysis of human cohorts,complementary animal models, and direct pharmacological and genetic engineering (Smemoet al. , 2014). Author ManuscriptMapping with the BXDs has high powerHow many replicates and strains are needed to detect and resolve QTLs? To start with theconclusionit is almost always better to study small numbers of as many strains as possible(Andreux et al. , 2012; Belknap, 1998).", + "Interval-specific haplotype analysisApproximately 97% of the genetic variation betweeninbred mouse strains is ancestral [22], so regions ofidentity by descent (IBD) between two strains used todetect a QTL are highly unlikely to contain the causalgenetic polymorphism underlying the QTL [28]. Forexample, a cross between C57BL/6J and A/J mice detectedwww.sciencedirect.coma blood pressure QTL on Chr 1 [7].", + "Interval-specific haplotype analysisApproximately 97% of the genetic variation betweeninbred mouse strains is ancestral [22], so regions ofidentity by descent (IBD) between two strains used todetect a QTL are highly unlikely to contain the causalgenetic polymorphism underlying the QTL [28]. Forexample, a cross between C57BL/6J and A/J mice detectedwww.sciencedirect.coma blood pressure QTL on Chr 1 [7].", + "At present, the BXD panel is composed of 80 different strains that all have beenfully genotyped.26 Variation in any quantifiable trait can be associated with thesegregation of parental alleles, and linkage genetics can map this variation toquantitative trait loci (QTLs), thereby identifying the genomic region(s) affectingthat trait. An overview of the QTL mapping approach is depicted in Figure 2. Classical QTL analysis has permitted the identification of loci that areassociated with variation in HSC traits.", + "The progenitor mouse strainsshould have sufficient variation for the traits of interest and they should be genetically diverseenough to enable genetic mapping (BENNETT et al. 2006; FLINT 2003; GRISEL 2000). Thesample size required for the identification of QTL depends largely on the effect size that aQTL contributes to phenotypes on interest. Inference about QTL can be made if one or moregenetic markers are over- or underrepresented in the analysed individuals. Genotyping isoften done by means of microsatellite markers, which contains mono, di-, tri-, ortetranucleotide tandem repeats flanked by specific sequences (Figure 4a).This comparison gives information about the reliability of the observed genotypeinformation: The more the marker locations differ between the two maps (which signifiesvariation in marker positions), the higher the possibility of genotyping errors. QTL mapping was done in several stages to identify loci acting individually and QTL thatinteracted, either additively or epistatically. To determine individually-acting QTL, a singleQTL genome scan was conducted with the function scanone.In general,linking genetic variation with trait variation identifies QTL and a significant linkage ofphenotype and genotype suggest that the DNA status helps to determine trait expression. As stated above, mouse QTL studies provide distinct advantages over human studiesin the examination of genetic causes of a quantitative trait (e.g. alcoholism), even in theabsence of specific hypotheses regarding its aetiology or candidate genes.", + "Importantly, whereasthese studies required substantial labor, time, and resources, X-QTL is a quick and easyapproach to achieve a comparable level of genetic dissection. The levels of complexityobserved here (e.g. 14 loci explaining 70% of the genetic variance for 4-NQO resistance) arestill dramatically lower than those seen in for some human traits in GWAS (e.g. 40 lociexplaining 5% of the variance for height 2,5). One obvious explanation is the difference inexperimental designs (line crosses vs. population association studies), but differences ingenetic architectures among species and traits may also contribute.", + "The method uses two pieces of information: mapping data from crosses thatinvolve more than two inbred strains and sequence variants in the progenitor strains within the intervalcontaining a quantitative trait locus (QTL). By testing whether the strain distribution pattern in the progenitor strains is consistent with the observed genetic effect of the QTL we can assign a probability that anysequence variant is a quantitative trait nucleotide (QTN). It is not necessary to genotype the animals exceptat a skeleton of markers; the genotypes at all other polymorphisms are estimated by a multipoint analysis.", + "The method uses two pieces of information: mapping data from crosses thatinvolve more than two inbred strains and sequence variants in the progenitor strains within the intervalcontaining a quantitative trait locus (QTL). By testing whether the strain distribution pattern in the progenitor strains is consistent with the observed genetic effect of the QTL we can assign a probability that anysequence variant is a quantitative trait nucleotide (QTN). It is not necessary to genotype the animals exceptat a skeleton of markers; the genotypes at all other polymorphisms are estimated by a multipoint analysis.", + "Genotyping all the individual progeny formarkers that show allelic variation between the parental strains (either single nucleotide polymorphisms or simple sequence repeats) will allow the detection of associations between trait values and marker genotype, and in this way demonstrate to whichset of markers a QTL is linked. To reduce the genotyping effort, selective genotypingof the individuals at the extremes of the phenotypic spectrum can be performed (20,23). Although these three approaches are in general considered to be the best to detect andmap QTL, they have several disadvantages for quantitative traits involving HSC.", + "So, how do you go about planning and performing a QTL study, and howdo you identify the responsible gene within a QTL that you have identified? Generally, one starts by performing a strain survey to find two parental inbredstrains that have a markedly different trait. One can now look up many differenttraits of inbred mice online at the Mouse Phenome Database (http://phenome. jax.org/pub-cgi/phenome/mpdcgi?rtn=docs/home). However, the trait you maywant to study may not be present in wild type mice, so you may want to crossa mutant (or genetically engineered) strain onto several inbred strains.", + "In any case, precision much finerthan this, while welcome, will often not be critical. The fuzzy functional boundaries of genes and the high density of sequence variants in linkage disequilibrium shifts the burden of proof frompure mapping to functional genomics, comparative analysis ofhuman cohorts, complementary animal models, and direct pharmacological and genetic engineering (Smemo et al. , 2014). Mapping with BXDs has high powerHow many replicates and strains are needed to detect andresolve QTLs?", + "These candidate genes are then sequenced in the two parental inbredstrains looking for sequence dierences in coding or regulatory regions. After ne mapping the QTL interval and shortening the list of plausiblecandidate polymorphisms, the major challenge remains \u0001 proving denitivelywhich nucleotide polymorphism underlies the QTL. The most direct proofwould be replacing one strains allele with another strains allele (creating aFIG. 1. Intercross breeding strategy for mapping quantitative trait loci (QTLs). On the right, the parental, F1 hybrid, and intercross (F2) mousegenerations are depicted.", + "QTL mapping studies thenseek to detect the polymorphisms underlying the complex traits of interest byscanning for alleles that co-vary withthe traits. Similar experiments also can be conducted with special derivatives of inbredstrains known as recombinant inbred(RI) mice. These animals are derivedby cross-breeding two or more distinctparental strains (which often divergewidely for the trait of interest), followedby inbreeding of the offspring for severalgenerations (Bailey 1971). Given thecorrect breeding strategy, this method1This is an issue faced by GWASs researchers when classifyingsamples as cases or controls." + ] + ], + "task_id": [ + "12BEAFA9366519672FC8B06959FB2DAF", + "64FEC152131BC6502E15EA6A6348D70B", + "3F9EDFE9A0222EA70459EC8985F134C4", + "A010490B55F739DF95BB82DF2B0F5AA3", + "471A145E9CA1E517E462499ABCA8EA2D" + ] +}
\ No newline at end of file diff --git a/gnqa/data/study1/datasets/human/dataset_domainexpert_general_3.json b/gnqa/data/study1/datasets/human/dataset_domainexpert_general_3.json new file mode 100644 index 0000000..7cf9999 --- /dev/null +++ b/gnqa/data/study1/datasets/human/dataset_domainexpert_general_3.json @@ -0,0 +1,103 @@ +{ + "question": [ + "For text and biological resources, do you mean add some books (on biology stuff) or/and web resources (as ensembl) on your system?", + "what is ensembl?", + "What is the difference between QTL mapping and GWAS?", + "How do I determine which gene in my QTL is causal for the trait?", + "Why do males have two Y chromosomes and females only one?" + ], + "answer": [ + "The text suggests using online bioinformatics resources such as Ensembl, UCSC Human Genome Browser, and others for research and data analysis. It doesn't specifically mention adding books or web resources to a system, but refers to utilizing these online tools and databases for information retrieval and bioinformatic analysis.", + "Ensembl is a joint project between the EBI and the Wellcome Trust Sanger Institute that provides a database for genome data. Launched in 1999, it was the first to provide a window on the draft genome, curating the results of computational analyses. It contains automatically annotated genomes and integrates data from a wide range of biological research sources. Ensembl also provides tools for data retrieval and analysis, and it includes quality checks for genetic variants in its variation pipeline.", + "QTL mapping is a statistical method used to identify regions in the genome that correlate with variation in a phenotype. It has high power, meaning it can detect a QTL, but it does not precisely identify which of the many genes within the QTL is causal. On the other hand, GWAS (Genome-Wide Association Studies) is a method used to identify genetic variants associated with complex traits. It has low power, requiring large numbers of individuals, but it has high precision, often identifying smaller candidate regions.", + "To determine which gene in your QTL is causal for the trait, you can start by narrowing down the list of candidate genes within the QTL. This can be done by performing a strain survey, using genetically engineered mice to alter the expression of a candidate gene, or using comparative genomics to narrow down the QTL to a region containing only a few genes. You can then investigate whether the expression of these genes correlates with the phenotype(s) of interest. If a variant within a gene causes a change in its expression, and the expression of that gene correlates with expression of a phenotypic trait of interest, it is considered a good candidate. Network analyses can also be used to build up more evidence for which gene is causal. Finally, you can confirm the candidate genes by complementation of a QTL, which can be achieved in several ways, including transgenic complementation.", + "Actually, males have one Y chromosome and one X chromosome, while females have two X chromosomes. This combination determines the sex of an individual." + ], + "contexts": [ + [ + "For certain types of important digital objects, there are well-curated, deeply-integrated,special-purpose repositories such as Genbank3, Worldwide Protein Data Bank (wwPDB4), andUniProt5 in the life sciences; Space Physics Data Facility (SPDF; http://spdf.gsfc.nasa.gov/) and Set ofIdentications, Measurements and Bibliography for Astronomical Data (SIMBAD6) in the spacesciences. These foundational and critical core resources are continuously curating and capturing highvalue reference datasets and ne-tuning them to enhance scholarly output, provide support for bothhuman and mechanical users, and provide extensive tooling to access their content in rich, dynamicways.", + "These centralized data resources can often be veryhelpful for gaining a quick overview of an unfamiliar pathway or gene, but inevitablyone needs to re-enter the literature to build up a fuller picture and to answer the questions that are most relevant to the target phenotype or gene. The Internet is also anexcellent resource to help in this process; this probably makes the ubiquitous searchengine Google (http://www.google.com) one of the most powerful bioinformaticstools.The expansionof Web-linked interoperativity and interrogation tools means that new options willalready be available by the time this book is in print. One consequence of these advances could be the perception of a diminished necessity to perform bioinformaticanalysis. Although this is true in the sense that secondary database include an increasing amount of precooked bioinformatic data, there is a paradox in that the moresophisticated the public annotation becomes, the more important it is to understandthe underlying principles.Data retrieval at UCSC is facilitated by text and BLAT (Kent, 2002; a BLAST-likealgorithm) searches and bulk downloads of annotation or sequence data. Other complementary tools at UCSC have extended the functionality of UCSC. For instance,the Proteome Browser graphically displays protein properties such as hydrophobicity, charge and structural features across any publicly available protein sequence(Hinrichs et al. , 2006). As with Ensembl, the UCSC website has been well designedand is sympathetic to the naive user, but the UCSC graphical interface is more Spartan. If Ensembl is Disney, then UCSC is South Park.Data retrieval is extremely well catered for in Ensembl, with text searches of alldatabase entries, BLAST searches of all sequences archived, and the availability of bulkdownloads of all Ensembl data and even software source code. Ensembl annotationcan also be viewed interactively on ones local machine with the Apollo viewer (Lewiset al. , 2002; http://www.fruitfly.org/annot/apollo/). 4.4.2 The UCSC Human Genome BrowserThe UCSC Human Genome Browser (UCSC) bears many similarities to Ensembl;it, too, provides annotation of the NCBI assemblies, and it displays a similar array offeatures, including confirmed genes from Ensembl.", + "Toachieve this goal, we integrated and make available big reference datain chapters 2 and 3, bridged model organism to human data in chapter4, translated generic methods into clinical applications in chapters 5and 6, and developed a platform to bring innovations into practice inchapter 7. The resources currently available are already plentiful, and both theamount and types of molecular life science data is growing at a tremendous pace.", + "We present an easy-to-adopt module that weaves together several important bioinformatic tools so students can grasp how these tools are used in answering research questions. Students integrate information gathered from websites dealing with anatomy (Mouse BrainLibrary), quantitative trait locus analysis (WebQTL from GeneNetwork), bioinformatics and geneexpression analyses (University of California, Santa Cruz Genome Browser, National Center forBiotechnology Informations Entrez Gene, and the Allen Brain Atlas), and information resources(PubMed).", + "Useful Online Genomics Resources.", + "These centralized data resources can often be veryhelpful for gaining a quick overview of an unfamiliar pathway or gene, but inevitablyone needs to re-enter the literature to build up a fuller picture and to answer the questions that are most relevant to the target phenotype or gene. The Internet is also anexcellent resource to help in this process; this probably makes the ubiquitous searchengine Google (http://www.google.com) one of the most powerful bioinformaticstools.The expansionof Web-linked interoperativity and interrogation tools means that new options willalready be available by the time this book is in print. One consequence of these advances could be the perception of a diminished necessity to perform bioinformaticanalysis. Although this is true in the sense that secondary database include an increasing amount of precooked bioinformatic data, there is a paradox in that the moresophisticated the public annotation becomes, the more important it is to understandthe underlying principles.Data retrieval at UCSC is facilitated by text and BLAT (Kent, 2002; a BLAST-likealgorithm) searches and bulk downloads of annotation or sequence data. Other complementary tools at UCSC have extended the functionality of UCSC. For instance,the Proteome Browser graphically displays protein properties such as hydrophobicity, charge and structural features across any publicly available protein sequence(Hinrichs et al. , 2006). As with Ensembl, the UCSC website has been well designedand is sympathetic to the naive user, but the UCSC graphical interface is more Spartan. If Ensembl is Disney, then UCSC is South Park.", + "These centralized data resources can often be veryhelpful for gaining a quick overview of an unfamiliar pathway or gene, but inevitablyone needs to re-enter the literature to build up a fuller picture and to answer the questions that are most relevant to the target phenotype or gene. The Internet is also anexcellent resource to help in this process; this probably makes the ubiquitous searchengine Google (http://www.google.com) one of the most powerful bioinformaticstools.The expansionof Web-linked interoperativity and interrogation tools means that new options willalready be available by the time this book is in print. One consequence of these advances could be the perception of a diminished necessity to perform bioinformaticanalysis. Although this is true in the sense that secondary database include an increasing amount of precooked bioinformatic data, there is a paradox in that the moresophisticated the public annotation becomes, the more important it is to understandthe underlying principles.Data retrieval at UCSC is facilitated by text and BLAT (Kent, 2002; a BLAST-likealgorithm) searches and bulk downloads of annotation or sequence data. Other complementary tools at UCSC have extended the functionality of UCSC. For instance,the Proteome Browser graphically displays protein properties such as hydrophobicity, charge and structural features across any publicly available protein sequence(Hinrichs et al. , 2006). As with Ensembl, the UCSC website has been well designedand is sympathetic to the naive user, but the UCSC graphical interface is more Spartan. If Ensembl is Disney, then UCSC is South Park.Data retrieval is extremely well catered for in Ensembl, with text searches of alldatabase entries, BLAST searches of all sequences archived, and the availability of bulkdownloads of all Ensembl data and even software source code. Ensembl annotationcan also be viewed interactively on ones local machine with the Apollo viewer (Lewiset al. , 2002; http://www.fruitfly.org/annot/apollo/). 4.4.2 The UCSC Human Genome BrowserThe UCSC Human Genome Browser (UCSC) bears many similarities to Ensembl;it, too, provides annotation of the NCBI assemblies, and it displays a similar array offeatures, including confirmed genes from Ensembl.", + "There are online bioinformatics resources from which this type of information may be sourced.", + "There aremany resources for annotating the results of a genome-wide study,all located in diverse databases and other web content. Having theability to harmonize and analyze historic data, together with highlycurated public resource data such as that found in model organismdatabases, adds tremendous depth and orthogonal informationsources to prioritize and refine the results of genetic analysis.", + "These centralized data resources can often be veryhelpful for gaining a quick overview of an unfamiliar pathway or gene, but inevitablyone needs to re-enter the literature to build up a fuller picture and to answer the questions that are most relevant to the target phenotype or gene. The Internet is also anexcellent resource to help in this process; this probably makes the ubiquitous searchengine Google (http://www.google.com) one of the most powerful bioinformaticstools.The expansionof Web-linked interoperativity and interrogation tools means that new options willalready be available by the time this book is in print. One consequence of these advances could be the perception of a diminished necessity to perform bioinformaticanalysis. Although this is true in the sense that secondary database include an increasing amount of precooked bioinformatic data, there is a paradox in that the moresophisticated the public annotation becomes, the more important it is to understandthe underlying principles.Data retrieval at UCSC is facilitated by text and BLAT (Kent, 2002; a BLAST-likealgorithm) searches and bulk downloads of annotation or sequence data. Other complementary tools at UCSC have extended the functionality of UCSC. For instance,the Proteome Browser graphically displays protein properties such as hydrophobicity, charge and structural features across any publicly available protein sequence(Hinrichs et al. , 2006). As with Ensembl, the UCSC website has been well designedand is sympathetic to the naive user, but the UCSC graphical interface is more Spartan. If Ensembl is Disney, then UCSC is South Park." + ], + [ + "Annotation, preprocessing and categorization of dataWe used Ensembl (version 39) as the annotation reference database.Homology between human and mouse genes was derived via BioMart.The total number of genes under study comprises 15,277 Ensembl mouse genes representing the union of the homologue genes from all data sources.An overview about the T2DM specific datasets is given in Table 1.", + "But the four sites are not equivalent; there are important distinctions between them in terms of the data analysed, the analyses carriedout and the way the results are displayed. 4.4.1 EnsemblEnsembl is a joint project between the EBI (http://www.ebi.ac.uk/) and the WellcomeTrust Sanger Institute (http://www.sanger.ac.uk/). The Ensembl database (Hubbardet al. , 2002; http://www.ensembl.org/), launched in 1999, was the first to provide awindow on the draft genome, curating the results of a series of computational analyses.Until January 2002 (Release 3.26.1), Ensembl used the UCSC draft sequenceassemblies as its starting point, but it is now based upon NCBI assemblies. TheEnsembl analysis pipeline consists of a rule-based system designed to mimic decisions made by a human annotator. The idea is to identify confirmed genes that arecomputationally predicted (by the GENSCAN gene prediction program) and alsosupported by a significant BLAST match to one or more expressed sequences orproteins. Ensembl also identifies the positions of known human genes from publicsequence database entries, usually using GENEWISE to predict their exon structures.Data retrieval is extremely well catered for in Ensembl, with text searches of alldatabase entries, BLAST searches of all sequences archived, and the availability of bulkdownloads of all Ensembl data and even software source code. Ensembl annotationcan also be viewed interactively on ones local machine with the Apollo viewer (Lewiset al. , 2002; http://www.fruitfly.org/annot/apollo/). 4.4.2 The UCSC Human Genome BrowserThe UCSC Human Genome Browser (UCSC) bears many similarities to Ensembl;it, too, provides annotation of the NCBI assemblies, and it displays a similar array offeatures, including confirmed genes from Ensembl.Ensembl provides a DAS referenceserver giving access to a wide range of specialist annotations of the humangenome (for more detail, see http://www.ensembl.org/das/). Data mining The ability to query very large databases in order to satisfy ahypothesis (top-down data mining), or to interrogate a database in order togenerate new hypotheses based on rigorous statistical correlations (bottom-updata mining). Domain (protein) A region of special biological interest within a single proteinsequence.", + "But the four sites are not equivalent; there are important distinctions between them in terms of the data analysed, the analyses carriedout and the way the results are displayed. 4.4.1 EnsemblEnsembl is a joint project between the EBI (http://www.ebi.ac.uk/) and the WellcomeTrust Sanger Institute (http://www.sanger.ac.uk/). The Ensembl database (Hubbardet al. , 2002; http://www.ensembl.org/), launched in 1999, was the first to provide awindow on the draft genome, curating the results of a series of computational analyses.Until January 2002 (Release 3.26.1), Ensembl used the UCSC draft sequenceassemblies as its starting point, but it is now based upon NCBI assemblies. TheEnsembl analysis pipeline consists of a rule-based system designed to mimic decisions made by a human annotator. The idea is to identify confirmed genes that arecomputationally predicted (by the GENSCAN gene prediction program) and alsosupported by a significant BLAST match to one or more expressed sequences orproteins. Ensembl also identifies the positions of known human genes from publicsequence database entries, usually using GENEWISE to predict their exon structures.Data retrieval is extremely well catered for in Ensembl, with text searches of alldatabase entries, BLAST searches of all sequences archived, and the availability of bulkdownloads of all Ensembl data and even software source code. Ensembl annotationcan also be viewed interactively on ones local machine with the Apollo viewer (Lewiset al. , 2002; http://www.fruitfly.org/annot/apollo/). 4.4.2 The UCSC Human Genome BrowserThe UCSC Human Genome Browser (UCSC) bears many similarities to Ensembl;it, too, provides annotation of the NCBI assemblies, and it displays a similar array offeatures, including confirmed genes from Ensembl.Ensembl provides a DAS referenceserver giving access to a wide range of specialist annotations of the humangenome (for more detail, see http://www.ensembl.org/das/). Data mining The ability to query very large databases in order to satisfy ahypothesis (top-down data mining), or to interrogate a database in order togenerate new hypotheses based on rigorous statistical correlations (bottom-updata mining). Domain (protein) A region of special biological interest within a single proteinsequence.Anothergrowing area of activity is in cataloguing the genetic variation present in humanpopulations as Ensembl reflects the progress of the International Haplotype MapProject (Thorisson et al. , 2005). More speculative data, such as GENSCAN-predicted exons that have not beenincorporated into Ensembl-confirmed genes, may also be viewed. This means thatthe display can be used as a workbench for the user to develop personalized annotation.", + "But the four sites are not equivalent; there are important distinctions between them in terms of the data analysed, the analyses carriedout and the way the results are displayed. 4.4.1 EnsemblEnsembl is a joint project between the EBI (http://www.ebi.ac.uk/) and the WellcomeTrust Sanger Institute (http://www.sanger.ac.uk/). The Ensembl database (Hubbardet al. , 2002; http://www.ensembl.org/), launched in 1999, was the first to provide awindow on the draft genome, curating the results of a series of computational analyses.Until January 2002 (Release 3.26.1), Ensembl used the UCSC draft sequenceassemblies as its starting point, but it is now based upon NCBI assemblies. TheEnsembl analysis pipeline consists of a rule-based system designed to mimic decisions made by a human annotator. The idea is to identify confirmed genes that arecomputationally predicted (by the GENSCAN gene prediction program) and alsosupported by a significant BLAST match to one or more expressed sequences orproteins. Ensembl also identifies the positions of known human genes from publicsequence database entries, usually using GENEWISE to predict their exon structures.", + "EnsemblEnsembl is a publicly available web resource that contains automatically annotated genomes.It is integrated with other available biological databases like Jasper for binding motifs.It is a much larger web resource than T1Dbase, and contains general information about the human genome including variants.These include SNPs, insertions, deletions and somatic mutations (Alterations in DNA that occur after conception, meaning that they are not inherited) for several species.Data from Ensembl can be accessed in a number of ways.The names of all the SNPs that occur in the T1D susceptibility regions can be collected from Ensembl using the Biomart tool (Kinsella et al., 2011).To achieve this, the coordinates of the T1D regions obtained from T1Dbase are uploaded to the biomart query page which allows one to search the genome browser and retrieve data like the names, chromosomal positions, and genic positions (referred to as \"consequence to transcript\", in Ensembl) of the SNPs.The SNP genic positions tell if a SNP is located within a gene, adjacent to a gene or whether they occur in inter-genic positions between gene coding regions, as well as the particular genes in which they are located.Advantages of Ensembl:There is a number of advantages to using Ensembl. (i) It is a larger web resource than T1Dbase and integrates data from a wide range of biological research sources into its database.Therefore, available information is quite comprehensive. (ii) Genic positions for 99% of the variants obtained from T1Dbase could be retrieved. (iii) Ensembl contains quality checks for genetic variants in its variation pipeline.A variant is flagged as failed if certain quality criteria are not met, for instance if none of the variant alleles match the reference allele of the variant.Generally, Ensembl was found to give more detailed information regarding the genic positions of variants compared to T1Dbase.Information about genes, including gene names, chromosomal coordinates, biotype (coding or non-coding), and number of splice variants, can also be retrieved from Ensembl.", + "doi:10.1093/nar/gkp858Cunningham F, Amode MR, Barrell D, Beal K,Billis K, Brent S, Carvalho-Silva D, ClaphamP, Coates G, Fitzgerald S, Gil L, Giron CG,Gordon L, Hourlier T, Hunt SE, Janacek SH,Johnson N, Juettemann T, Kahari AK, KeenanS, Martin FJ, Maurel T, McLaren W, MurphyDN, Nag R, Overduin B, Parker A, PatricioM, Perry E, Pignatelli M, Riat HS, SheppardD, Taylor K, Thormann A, Vullo A, WilderSP, Zadissa A, Aken BL, Birney E, Harrow J,Kinsella R, Muffato M, Ruffier M, Searle SM,Spudich G, Trevanion SJ, Yates A, ZerbinoDR, Flicek P (2015) Ensembl 2015.", + "But the four sites are not equivalent; there are important distinctions between them in terms of the data analysed, the analyses carriedout and the way the results are displayed. 4.4.1 EnsemblEnsembl is a joint project between the EBI (http://www.ebi.ac.uk/) and the WellcomeTrust Sanger Institute (http://www.sanger.ac.uk/). The Ensembl database (Hubbardet al. , 2002; http://www.ensembl.org/), launched in 1999, was the first to provide awindow on the draft genome, curating the results of a series of computational analyses.Until January 2002 (Release 3.26.1), Ensembl used the UCSC draft sequenceassemblies as its starting point, but it is now based upon NCBI assemblies. TheEnsembl analysis pipeline consists of a rule-based system designed to mimic decisions made by a human annotator. The idea is to identify confirmed genes that arecomputationally predicted (by the GENSCAN gene prediction program) and alsosupported by a significant BLAST match to one or more expressed sequences orproteins. Ensembl also identifies the positions of known human genes from publicsequence database entries, usually using GENEWISE to predict their exon structures.Data retrieval is extremely well catered for in Ensembl, with text searches of alldatabase entries, BLAST searches of all sequences archived, and the availability of bulkdownloads of all Ensembl data and even software source code. Ensembl annotationcan also be viewed interactively on ones local machine with the Apollo viewer (Lewiset al. , 2002; http://www.fruitfly.org/annot/apollo/). 4.4.2 The UCSC Human Genome BrowserThe UCSC Human Genome Browser (UCSC) bears many similarities to Ensembl;it, too, provides annotation of the NCBI assemblies, and it displays a similar array offeatures, including confirmed genes from Ensembl.Ensembl provides a DAS referenceserver giving access to a wide range of specialist annotations of the humangenome (for more detail, see http://www.ensembl.org/das/). Data mining The ability to query very large databases in order to satisfy ahypothesis (top-down data mining), or to interrogate a database in order togenerate new hypotheses based on rigorous statistical correlations (bottom-updata mining). Domain (protein) A region of special biological interest within a single proteinsequence." + ], + [ + "However, the twomethods do not necessarily give the same result because theyare measuring complementary aspects of an association. Asquantitative trait loci (QTL) are added to Q, we expect highercompleteness because the QTL in Q cover more segments ofeach chromosome. However, if these quantitative trait loci(QTL) are unrelated to G, we expect many of them to beempty. Similarly, as genes are added to G, we expect higheraccuracy because selected genes are found in more locations.", + "QTL can be mapped through GWAS or GWLS. eQTLAn expression Quantitative Trait Locus is a region in the genome at which allelic variation correlates with the mRNA expression level variation of a certain gene. Distant eQTLA distant (or trans) eQTL is an eQTL which is located far from the gene it controls (forexample on a different chromosome). Local eQTLA local (or cis) eQTL is an eQTL which is located nearby the gene it controls in the genome.", + "QTL mapping, GWAS and genomic selectionInformation from SNP markers are being increasingly used to generate a deeper knowledge of the genetic basis of important traits and speed up the genetic progress in aquaculture species by means of GWAS and genomic selection, respectively (Y aez et al., 2014).GWAS allows the identification of genetic variants associated with complex traits (i.e QTL).When one or few QTL explain a high percentage of genetic variance for a particular trait, it is possible to improve the trait more rapidly by means of MAS.However, the complexity of some traits and the absence of QTL with major effects constrain the successful implementation of MAS.In contrast, genomic selection is the most appropriate way to select for traits that are controlled by several loci of small effects (i.e.polygenic traits) (Meuwissen et al., 2013).", + "Mouse QTL mapping has high power but low precision (i.e., we can detect a QTL, but do not know which of tens or hundreds of genes is causal), whereas human GWAS has low power but high precision (tens or hundreds of thousands of individuals are needed, but candidate regions are often smaller).By combining the power of mouse QTL mapping and the precision of human PheWAS, we can do more than both individually.Candidate genes might show up in our analysis here that did not show up in our above analysis for several reasons, the most common being that gene expression was not measured in the relevant cell type or timepoint.", + "In order to differentiate eQTL from QTLfor phenotypes other than gene expression (phenotypic, physiologic, or clinical QTL),here we will refer to the latter as pQTL. Although the GG approach is relatively recent and is still evolving, there are somegeneral features o f this approach that can be summarized as follows: (1) mapping o ftranscripts profiles as quantitative traits, (2) classification o f eQTL in cis and trans actingmode o f action, (3) identification o f loci where large number o f transcripts map, whichReproduced with permission of the copyright owner.", + "Quantitative trait locus-mapping is a statistical methodused to map chromosomal intervals (loci) that contribute toheritable variance in phenotypes. The method simply compares the inheritance of allelic variants (B or D genotypesin our case) with differences in phenotypes. A QTL willgenerally cover a region that includes 10100 genes, andthese positional candidates can then be ranked roughly onthe basis of criteria such as the types of DNA variants, patterns of mRNA expression, data from complementary humangenetic cohorts (GWAS and linkage) and relevant literatureabout gene effects on central nervous system structure andfunction.", + "These approaches were used to identify dozens of quantitative trait loci (QTL) for BMD and other bone traits [15,16]. However, identifying causative genes underlying QTL proved challenging [17]. Over the last decade, genemapping approaches have transitioned from low-resolution linkage mapping to high-resolution GWASs [11]. The first GWASs in mice used panels of inbred mouse strains [1821] andby leveraging accumulated recombinations, this approach significantly increased mapping resolution [19].", + "The process of QTL mapping includes phenotyping andgenotyping at least several hundred animals from an informative mapping population (e.g. , B6D2 F2). Once a QTL isdetected and confirmed, additional work is needed to identify the specific gene(s) in the QTL interval responsible forthe phenotypic variation. An elegant way to improve QTLmapping resolution is through the development and testingof interval-specific congenic strains (Darvasi, 1997). Thismethod has been successfully used to fine map an ethanolwithdrawal QTL on mouse chromosome 4 (Fehr et al. ,2002; Shirley et al. , 2004).", + "However, the twomethods do not necessarily give the same result because theyare measuring complementary aspects of an association. Asquantitative trait loci (QTL) are added to Q, we expect highercompleteness because the QTL in Q cover more segments ofeach chromosome. However, if these quantitative trait loci(QTL) are unrelated to G, we expect many of them to beempty. Similarly, as genes are added to G, we expect higheraccuracy because selected genes are found in more locations.", + "This comparison gives information about the reliability of the observed genotypeinformation: The more the marker locations differ between the two maps (which signifiesvariation in marker positions), the higher the possibility of genotyping errors. QTL mapping was done in several stages to identify loci acting individually and QTL thatinteracted, either additively or epistatically. To determine individually-acting QTL, a singleQTL genome scan was conducted with the function scanone.", + "Importantly, whereasthese studies required substantial labor, time, and resources, X-QTL is a quick and easyapproach to achieve a comparable level of genetic dissection. The levels of complexityobserved here (e.g. 14 loci explaining 70% of the genetic variance for 4-NQO resistance) arestill dramatically lower than those seen in for some human traits in GWAS (e.g. 40 lociexplaining 5% of the variance for height 2,5). One obvious explanation is the difference inexperimental designs (line crosses vs. population association studies), but differences ingenetic architectures among species and traits may also contribute.", + "Forward genetics approaches that combine traditional QTLmapping with expression quantitative trait mapping (eQTL; in which case transcript abundance is the quantitative trait) [32] are increasingly being used to successfully transition fromQTL to QTG [3335]. Traditional QTL analysis will identify the genomic regions affectingtrait variation, while eQTL analysis can help in understanding which genes, pathways, and biological processes are also under the influence of a given QTL.", + "On the onehand, the genomic location that are in suspicion to be involved in the trait can still involvelarge genomic segments, e.g. , millions of basepairs that include many genes within the segment. On the other hand, GWAS may point toseveral or even many genomic locations for thetrait of interest, complicating further functionalanalysis. Analysis of Quantitative Trait Loci (QTL)QTL analysis reveals statistically signicantlinkage between phenotypes and genotypes,thereby providing explanation for the geneticbasis of variation in complex traits (Falconerand Mackay, 1996; Lynch and Walsh, 1998).", + "This feature of eQTL mapping alone can often make theeffort worthwhile; even with inexpensive genotyping, it is oftencost effective in the long term to generate a TDM map of severalthousand markers, which usually ensures that any two recombination events have a marker between them and that locations ofgenetic cross-overs are well-defined. TDMs can be integrated withGenotyping by Sequencing (GBS) approaches to anchor geneticmaps to physical maps (Poland et al. , 2012; Sonah et al. , 2013). Surprisingly, eQTL mapping is actually the simple portionof eQTL analysis work.", + ", we can detect a QTL, butdo not know which of tens or hundreds of genes is causal), whereas human GWAS has lowpower but high precision (tens or hundreds of thousands of individuals are needed, butGenes 2022, 13, 61411 of 17candidate regions are often smaller). By combining the power of mouse QTL mapping andthe precision of human PheWAS, we can do more than both individually.", + ", we can detect a QTL, butdo not know which of tens or hundreds of genes is causal), whereas human GWAS has lowpower but high precision (tens or hundreds of thousands of individuals are needed, butGenes 2022, 13, 61411 of 17candidate regions are often smaller). By combining the power of mouse QTL mapping andthe precision of human PheWAS, we can do more than both individually.", + "The remarkable success in mappinggenes linked to a number of disease traits using genomewide association studies (GWAS) in human cohorts hasrenewed interest in applying this same technique in modelorganisms such as inbred laboratory mice (Su et al. 2010). Unlike classical phenotypic traits, gene expression traitsgiving rise to cis-acting eQTL provide us with a prioriknowledge of the true QTL location (Doss et al. 2005),which can be used to empirically estimate the power of aGWAS performed at a similar scale (Hao et al. 2008;Schadt et al. 2008).", + "QTLmapping has been highly successful in determining causative loci underlying severaldisease phenotypes (Wang et al. 2004; Cervino et al. 2005; Abboud and Kaplowitz 2007)and can broadly be subdivided into two classes: linkage mapping and association mapping. For standard linkage mapping in experimental crosses, likelihood or regression approachesare used to map QTL, with flanking markers used to infer genotypes in the intervalsbetween widely spaced markers (i.e. > 1cM) (Lander and Botstein 1989; Haley and Knott1992).38Quantitative Trait Locus (QTL) mapping has been used to associate a specificgenotype with the variation in a single measured phenotype like high density lipoproteins(Wang and Paigen 2005) and ethanol tolerance (Grisel et al. 2002). At each locus in asegregating population, a model is fit which estimates the likelihood that this locus explainsthe variation in phenotype versus the likelihood that there is no genotypic effect on thephenotype. Interval mapping (Lander and Botstein 1989) is a variation on QTL mapping whichuses maximum likelihood estimation.", + "QTL mapping studies thenseek to detect the polymorphisms underlying the complex traits of interest byscanning for alleles that co-vary withthe traits. Similar experiments also can be conducted with special derivatives of inbredstrains known as recombinant inbred(RI) mice. These animals are derivedby cross-breeding two or more distinctparental strains (which often divergewidely for the trait of interest), followedby inbreeding of the offspring for severalgenerations (Bailey 1971). Given thecorrect breeding strategy, this method1This is an issue faced by GWASs researchers when classifyingsamples as cases or controls." + ], + [ + "Prior belief or knowledge about the number of true causal and trueindependent links that might be expected in a typical QTL, depending on the studydesign, should be considered to safeguard against high false-positive rates (lowpositive predictive values). In studies that involve mapping gene expression (eQTL),protein (pQTL) or metabolite (mQTL) traits, information about co-localization ofQTL and genes that are functionally linked to the trait provides information aboutthe likelihood of causal links.", + "The next step is to investigate whether the expression of these genes correlates with the phenotype(s) of interest.This would suggest a chain of causality: a variant within a gene causes a change in its expression, and the expression of that gene correlates with expression of a phenotypic trait of interest.To do this, we created a correlation matrix between all genes within a QTL with a cis-eQTL in any brain tissue as well as the phenotypes that contributed to the QTL (Supplementary Table S6).Any gene with a cis-eQTL and a significantly correlated expression was considered a good candidate.If the gene only had a cis-eQTL and correlation in a single brain region, then it suggested that this brain region might also be of interest for the phenotype (adding another link to this chain).", + "One possible approach to facilitate this endeavor is to identify quantitative trait loci(QTL) that contribute to the phenotype and consequently unravel the candidategenes within these loci. Each proposed candidate locus contains multiple genes and,therefore, further analysis is required to choose plausible candidate genes. One ofsuch methods is to use comparative genomics in order to narrow down the QTL to aregion containing only a few genes. We illustrate this strategy by applying it togenetic findings regarding physical activity (PA) in mice and human.", + "Network analysesWe now have two QTL, and we have picked potentially interesting genes within each, but nowwe want to build up more evidence for which gene in our QTL interval is causal. The first, andmost obvious way, is to see what genes our trait of interest correlates with, in tissues that weexpect to be related to the trait. We calculated the Spearmans correlation between the traitBXD_17850 and all probes with expression data in T helper cells (GN319).", + "Anotherapproach to help to determine if a gene located near the mapped QTL wouldhave effects to influence the quantitative trait will be to use genetically engineered mice to determine if altering the expression of a candidate gene will alterthe phenotype of interest (38). However, it is possible that a quantitative trait isa combined effect of multiple genes located near the QTL (39).", + "With a known QTL and abody of evidence suggesting possible roles for the affected gene,phenotypes can be predicted that may be modulated as a resultof this sequence variation. If this phenotype is of interest, itcan be directly measured and a traditional forward QTL analysis carried out to confirm the prediction. Such an approach isextremely attractive when the enormous cost and time requiredfor phenotyping a large panel is considered.", + "The firststep is to narrow down the list ofcandidate causal genes within aFig1. IntervalmappingofoviductgrosspathologyacrosstheBXDstrainsQuantitative Trait Locus (QTL)arevealsaQTLondistalChr3. TheL RSvaluesareplottedinblueacrossthechromosomal region containinggenomeandmeasurethestrengthoftheassociationbetweensequence variants stronglychromosomeandMbposition(topandbottomX-axis,respectively)andassociated with phenotypicphenotypeexpression. Allelecontributionisshownbythered(C57BL/6J)andgreen(DBA/2J)lines. Redandgreyhorizontallinesindicategenome-variation.", + "QTL mapping of traits in mouse cohorts often ends up with a genetic locus, composed of a list of candidategenes. Several studies proposed the use of mediation analysis to identify the causal gene (mediator) betweenthe genetic variant (independent variable) and the trait-of-interest (dependent variable) (Figure 1.4B) [7, 47,61, 77]. Mediation analysis can be used either on gene expression levels to identify the regulatory mechanisms[7, 47, 61], or on phenotypic traits to discover the potential causal drivers contributing to the phenotypicvariances [77] (Figure 1.4C upper).", + "1a). Second-generation offspring are thenphenotyped and genotyped, and linkage analysis is carried out to identify a region that isassociated with the trait1. This approach has led to the identification of thousands of quantitative trait loci (QTLs) forvarious phenotypes and diseases. However, each QTL region is large, often tens ofmegabases, and contains hundreds of genes. The process of identifying the causal variantand the gene involved is therefore difficult and costly. Of the thousands of QTLs identified,only a small fraction of genes has been identified. NIH-PA Author Manuscript 2012 Macmillan Publishers Limited.", + "Network analysesWe now have two QTL, and we have picked potentially interesting genes within each, but nowwe want to build up more evidence for which gene in our QTL interval is causal. The first, andmost obvious way, is to see what genes our trait of interest correlates with, in tissues that weexpect to be related to the trait. We calculated the Spearmans correlation between the traitBXD_17850 and all probes with expression data in T helper cells (GN319).", + "10 JUNE 2016 VOL 352 ISSUE 6291aad0189-5R ES E A RC H | R E S EA R C H A R T I C LESolving QTLs: Finding the quantitativetrait geneFor cis-QTLs, the causal factors can be quicklyidentified: With few exceptions, they will be driven by variants within the gene itself or immediately adjacent. For trans-QTLs, mQTLs, andcQTLs, the identification of the causal quantitative trait gene (QTG) is challenging due to thewidth of the QTLs.", + "Once the QTL interval is reduced to a reasonable size,the next step in the process involves sorting through thegenes within the interval and attempting to determinewhich is the QTG. This step is daunting because more thanone gene may be involved and the function of some geneswithin the interval may be unknown. Until recently, thisstep emphasized the detection of polymorphisms withincoding sequence (reviewed in Korstanje and Paigen, 2002and Glazier et al. 2002); for a polymorphism that producesan amino acid substitution, one can often infer and thentest for a functional consequence.", + "To understand the genetic networks that underliequantitative variation in the trait, it is also very important todiscover genes whose expression is correlated with the traitafter accounting for the known effects of the QTL on thetrait. Many of these genes may have expression that isassociated with QTL genotype, and would therefore beidentified as important via the tests described above. Othergenes, however, may have expression values that are correlated with the trait but unassociated with genotype at theQTL.Theapproach is motivated by the fact that a research project isoften focused on a specific classical quantitative trait. If amajor QTL for this classical trait has been identified, it isoften desirable to test whether this QTL is also associatedwith the transcription level of any genes, which will provide clues as to which genes belong to the pathway that theQTL uses to modulate the classical trait.", + "Quantitative trait loci (QTLs) can be identified in several ways, but isthere a definitive test of whether a candidate locus actually corresponds to a specific QTL? NIH-PA Author ManuscriptMuch of the genetic variation that underlies disease susceptibility and morphology is complexand is governed by loci that have quantitative effects on the phenotype. Gene-gene and geneenvironment interactions are common and make these loci difficult to analyse. Here, we presenta communitys view on the steps that are necessary to identify genetic loci that governquantitative traits, along with a set of interpretive guidelines.", + "Confirmation of Candidate GenesThe next step is to prove that a particular gene is involved in the quantitative traitunder study. This is done by complementation of a QTL, which can be achieved inseveral ways (911,40). In principle, transgenic complementation is the most straightforward. This approach has been used successfully to demonstrate that Pla2g2a wasthe correct candidate gene for Mom1, a modifier of the apcmin allele that causesadenomatous polyposis coli (41).", + "So, how do you go about planning and performing a QTL study, and howdo you identify the responsible gene within a QTL that you have identified? Generally, one starts by performing a strain survey to find two parental inbredstrains that have a markedly different trait. One can now look up many differenttraits of inbred mice online at the Mouse Phenome Database (http://phenome. jax.org/pub-cgi/phenome/mpdcgi?rtn=docs/home). However, the trait you maywant to study may not be present in wild type mice, so you may want to crossa mutant (or genetically engineered) strain onto several inbred strains.", + "Along with correlations, this tool also derives new traits representing theprincipal components (Figure 2d). The user can add these principal components to their TraitCollection and proceed to perform QTL mapping, as in the case of a single trait QTLmapping. The R/QTL (Broman et al. 2003) and R/CAPE (Tyler et al. 2013) packages can beused for deeper analysis of epistasis and pleiotropy for multiple traits and multipleregulatory loci. Prioritizing Candidate Genes7Author ManuscriptFollowing the identification of a significant QTL, focus shifts to identifying theparticular gene(s) that cause the QTL.", + "The investigatorsfirst identified all QTLs associated witha classical phenotype and then winnowed the list of potentially associatedgene-expression traits on the basis oftheir correlation or eQTL overlap withthe phenotype of interest. Candidategenes then were ranked by applyingthe LCMS technique, which uses theeQTL data to establish causal relationships between DNA loci and transcripts as well as between transcriptsand phenotypes and finally identifiesa model that best fits the data.", + "The goal of QTL mapping is clearly theidentification and eventual confirmation of candidate genes(QTGs) underlying the phenotype. The evidence required forsuch confirmation has engendered much discussion (ComplexTrait Consortium 2003; Glazier et al. 2002) and is likely to varydepending on the nature of the trait and specific resourcesavailable to pin down underlying genes (e.g. availability ofknock-in or knock-outs, specific antibodies, siRNA, etc.). Thepaucity of QTGs meeting such multifaceted standards is testament to the difficulty of narrowing the confidence intervalsufficiently to identify and test suitable candidate genes (Flintet al." + ], + [ + "Y chromosome in peripheral blood cells increases with age in men (6) and is correlated with increased risk of cancer mortality and Alzheimer's disease (6,7).X chromosome mosaicism in women also increases with age (8), as does autosomal mosaicism in both sexes (9,10).Recent studies have shown that the prevalence of age-related mosaic abnormalities is greater in men than women (9,10); however, mechanisms underlying the sex differences observed in chromosomal mosaicism in humans are unknown.", + "Recent reports suggested a role of Y chromosome loss in risk for all-cause mortality and common age-related disease such as cancer, Alzheimer disease as well as severe atherosclerosis [12][13][14][15][16][17][18][19][20].Building on such reports, we aimed to evaluate the contribution of male Y chromosome mosaicism to the risk for late-stage AMD.", + "Box 1. Sex-specific cytonuclear interactionsSeveral predictions about the nature of cytonuclear conflicts follow from the patterns of chromosomal inheritance (Table I).In a mated pair of animals, mtDNA is co-transmitted with half of the autosomal genes, two-thirds of the X-linked genes and none of the Y-linked genes [76].This predicts that, relative to the autosomal case, positive nuclear-mitochondrial interactions are more likely to evolve for X-linked loci whereas deleterious interactions between Y-linked genes and mtDNA should accumulate (or cannot be purged efficiently).", + "In addition to genetic data, the 9p Network Cohort dataset also lists the gender for all 719 individuals.Of these individuals, 406 individuals are female and 313 are male, indicating a female bias (Binomial test p 0.0006).This result was surprising considering that no female bias has been previously reported in 9p deletion and duplication syndromes.A possible explanation for the significant bias in the 9p Network Cohort dataset is the XY sex reversal phenotype, which is commonly observed in individuals with 9p deletion syndrome.This phenotype could lead to individuals with XY sex chromosomes being listed in the dataset as having a female gender.To further examine this hypothesis, we subset our dataset to include only the 236 individuals whose sex chromosomes are listed in their genetic information.For this much smaller subset, 125 individuals had female sex chromosomes and 111 had male sex chromosomes, indicating no significant sex bias (Binomial test p 0.4).We also found no significant gender bias in this group (Binomial test p 0.2), although we did confirm that four of the individuals with XY sex chromosomes had a gender of female.This comparison suggests that the XY sex reversal phenotype may be responsible for a female gender bias, but not a sex bias, in 9p deletion and duplication syndrome cohorts.", + "Duplicated variants with multiple alternative alleles and variants in sex chromosomes X and Y", + "Autosome-One of the numbered, or nonsex, chromosomes (1 through 22).X and Y are the sex chromosomes.", + "Given such a high abundance of young male-biased genes, we asked whether their parental genes are also male-biased.We found that fewer parental genes of X-linked male-biased duplicates were also male-biased (20%, 2/10) compared to the parental genes of autosomal young male-biased duplicates (32%, 12/37).These data, despite the small sample sizes and being statistically not significant, may suggest that compared to autosomal young genes, X-linked young genes more often evolved novel male-biased expression.However, as the majority of young genes are the result of intrachromosomal duplication events, the pattern might also reflect the fact that X-linked old genes are less likely to be male-biased.A slight excess of X-linked female-biased genes was also detected (Fig. 2).Although most of them are old, a few recently arose on the X chromosome over 4 to 6 Myr in the common ancestor of the D. melanogaster and D. simulans clade (branch 5).This can be interpreted in the context of the dominance model of the sexual antagonism hypothesis.In this case, a dominant, X-linked gene that is favorable to females but disadvantageous for males can become fixed.The slow accumulation of female-biased genes in the X reflects an overall low rate of female gene origination, either due to a small dominance effect (the degree of dominance h!1/2), or a minor disadvantageous effect on males (the ratio of fitness effects of male relative to female k!0) along with a favorable effect on females (Vicoso and Charlesworth 2006, Equation 10).Regarding the second step in the evolution of male-biased genes, namely X!A transposition, sexual antagonism favorable for autosomal fixation (Vicoso and Charlesworth 2006) and/or MSCI (Lifschytz and Lindsley 1972;Betran et al. 2002) may play a role in this process.On the other hand, the within-chromosomal duplication rate is higher than the between-chromosomal duplication rate (Emerson et al. 2008), which may contribute to the slow pace of X!A transposition.It has been observed that male-biased genes in Drosophila are overrepresented on autosomes (Parisi et al. 2003;Ranz et al. 2003).Consistent with this result, a dynamic process that can explain the nonrandom autosomal distribution has also been observed, in which autosomal new genes with X-linked parental genes are often male-biased.Specifically, a significant excess of autosomal testisexpressed retrogenes were identified as RNA-duplicates of X-linked parental genes (Betran et al. 2002).Recently, similar X!A gene traffic was observed in the DNA-level duplication and relocation data set of the Drosophila genus (Vibranovski et al. 2009b), and was further confirmed for DNA-level duplications in the D. pseudoobscura neo-X chromosome (Meisel et al. 2009).In addition, selective extinction of neo-X linked male-biased genes also occurred in D. pseudoobscura (Sturgill et al. 2007).These three lines of genome-wide investigation support a common pattern of outof-X traffic for male-biased genes, resulting in an enrichment of these genes on autosomes in the long term.It has been reported that the initial manifestations of new gene emergence, namely polymorphic duplicates, occur at a lower frequency on the X chromosome, thus indicating that these duplicates are subject to stronger purifying selection (Emerson et al. 2008).Therefore, the excessive fixation of X-linked duplicates might not occur via neutral processes.Positive selection could have facilitated the fixation of X-linked young genes in addition to driving their subsequent sequence evolution.", + "Occasionally, Y chromosome DNA is detected in the maternal plasma, and the fetus appears to have female genitalia on sonographic examination.The underlying mechanisms for this include a twin demise, a maternal disorder of sexual differentiation, such as Swyer syndrome, or that the mother has undergone a bone marrow or solid organ transplant from a male donor (Bianchi, 2018;Hartwig, Ambye, Sorensen, & Jorgensen, 2017).", + "Becauseof the differences in sex chromosome number, the sexunmatched comparison contains internal controls, i.e. ,in this comparison, genes on the X-chromosome andY-chromosome (but not those on the autosomes) shouldshow copy number imbalances reective of a single copychange. We showed that the sample that is not sexmatched had readily detectable differences in aCGHsignals for genes on the X and Y chromosomes. No suchpatterns were evident for the autosomes of the sexunmatched individuals or for the sex chromosomes of thesex matched samples.", + "Sex chromosome:The X or Y chromosome in human beings that determines the sex of an individual.Females have two X chromosomes in diploid cells; males have an X and a Y chromosome.The sex chromosomes comprise the 23rd chromosome pair in a karyotype.See also: autosome Sex-linked: Traits or diseases associated with the X or Y chromosome; generally seen in males.X chromosome: One of the two sex chromosomes, X and Y. See also: Y chromosome, sex chromosome Y chromosome: One of the two sex chromosomes, X and Y. See also; X chromosome, sex chromosome", + "The male heterogamety (XY) is the mostcommon reported system, but many specieshave female heterogamety (ZW), and moreoccasionally, multiple chromosome systems(Almeida-Toledo and Foresti, 2001; Devlinand Nagahama, 2002; Penman and Piferrer,2008). Given the low resolution of optical microscopy to differentiate sex chromosomes insh, researchers have looked for an alternativein the tenfold longer meiotic chromosomes todetect mispairing tracts at the synaptonemalcomplex as an indication of the sex differentiated region with variable success.The exclusive femaleconstitution of gynogenetic genomes providesinformation on the SD system, especially in aXX/XY system, where all female progenies areexpected. If ZZ/ZW is the underlying system,male offspring always will be present, but theinterpretation is more complex and will dependon the distance of the SD region to centromereand on the viability of WW offspring (Devlinand Nagahama, 2002; Penman and Piferrer,2008). Induced triploids, on the other hand, areconstituted by the combination of two femaleand one male genomes (Piferrer et al.", + "The existence of a maternally silenced X-linked imprinted locus playing a role in social cognition could explain why males (X m Y) are more vulnerable to disorders of social cognition such as autism spectrum disorders than are females (X m X p ).The absence of the expression of this gene would not lead to autism itself, but would eliminate a putative protective factor, making an individual more susceptible to the effects of other ASD-predisposing genetic mutations or environmental factors.", + "When meiosis takes place, a pair of chromosomes may fail to separate properly, creating a sperm or egg that has either two copies or no copy of a specific chromosome.This is a sporadic event and it is called nondisjunction.Nondisjunction can lead to an extra chromosome, called trisomy, or a missing chromosome, called monosomy (GHR, 2008l).Down syndrome is an example of trisomy.Individuals who have Down syndrome have an extra chromosome number 21. Turner syndrome is an example of monosomy.Girls who have Turner syndrome have only one X chromosome.This causes them to have short stature and be infertile (NHGRI, 2008l).X-Linked InheritanceX-linked genetic disorders (also called sex-linked) are caused by gene mutations on the X chromosome.Most often X-linked genetic disorders are seen in males.Males inherit the X chromosome from their mother and the Y chromosome from their father.Because males have only one X chromosome, if they inherit a gene mutation on the X chromosome from their mother, they will have the disorder.Examples of X-linked genetic disorders occurring in males include hemophilia and Duchenne muscular dystrophy (GHR, 2008o)." + ] + ], + "task_id": [ + "19EBC6CA7E425D0C1279475BD0B411B6", + "9CFE932D7898C83E473E590BC77B4FCB", + "6DBC070B2E4DC2FE8036E5BA7480B755", + "5594EA025D9631328071B6A1A7EF1375", + "C6B9A982C9283DE065A3371F1264095C" + ] +}
\ No newline at end of file diff --git a/gnqa/data/study1/datasets/human/dataset_domainexpert_general_4.json b/gnqa/data/study1/datasets/human/dataset_domainexpert_general_4.json new file mode 100644 index 0000000..567ccf4 --- /dev/null +++ b/gnqa/data/study1/datasets/human/dataset_domainexpert_general_4.json @@ -0,0 +1,111 @@ +{ + "question": [ + "How does one tell the difference between X and Y DNA, with respect to DNA tracing and determining QTLs", + "Once a sperm combines with an egg, what determines how traits are passed onto the resulting lifeform?", + "How can I add a new species to the GeneNetwork database?", + "which genes are typically associated with diabetes in QTL analyses?", + "In which diseases is the gene TCF7L2 involved?" + ], + "answer": [ + "The text does not provide specific information on how to differentiate between X and Y DNA in the context of DNA tracing and determining Quantitative Trait Loci (QTLs). However, it discusses the process of identifying QTLs, which involves using mapping data from crosses involving more than two inbred strains and sequence variants in the progenitor strains. The strain distribution pattern in the progenitor strains is tested for consistency with the observed genetic effect of the QTL. This process helps assign a probability that any sequence variant is a Quantitative Trait Nucleotide (QTN). The text does not mention the specific roles of X and Y DNA in this process.", + "Traits are passed onto the resulting lifeform through the process of meiosis. Each parent contributes 23 unpaired chromosomes, which combine to form the full complement of 23 chromosome pairs in the zygote. These chromosomes carry the genetic material that determines the traits of the offspring. The process of recombination or crossing over during meiosis also contributes to genetic variation in the offspring.", + "The text does not provide specific instructions on how to add a new species to the GeneNetwork database. It is recommended to contact the administrators or operators of the GeneNetwork database for guidance on this process.", + "The genes typically associated with diabetes in QTL analyses include TCF7L2, HHEX-IDE, EXT2, FTO, SLC30A8, IGF2BP2, CDKAL1, CDKN2A-CDKN2B, JAZF1, TSPAN8-LGR5, THADA, ADAMTS9, NOTCH2-ADAM30, CDC123-CAMK1D, KCNQ1, PPARG, and KCNJ11.", + "The background text does not provide information on the involvement of the TCF7L2 gene in any diseases." + ], + "contexts": [ + [ + "Genetic mapping inmouse strains enhances the power of detecting modifier genes and identifying complexgenetic interactions. Genomewide quantitative trait locus (QTL) analysis, as described inmore detail below, represents a promising approach to detect genetic variants that areassociated with specific phenotypes and interact with each other. 16ACCEPTED MANUSCRIPTIn experimental crosses of two (inbred) strains the first generation (F1) ofoffsprings is genetically heterozygous but equal. Then in the next generation (F2) thePTstrain-specific genetic information is distributed across the genomes of their progeny andRIeach offspring is genetically unique.", + "This contrast can be exploited to identify subregions that underlie the trans-QTLs [67]. SNPs were counted for all four pairs of parental haplotypesBvs D, B vs H, B vs C, and L vs Sand SNP profiles for the fourcrosses were compared (figure 6). Qrr1 is a highly polymorphicPLoS Genetics | www.plosgenetics.org8November 2008 | Volume 4 | Issue 11 | e1000260QTL Hotspot on Mouse Distal Chromosome 1Figure 5. QTL for aminoacyl-tRNA synthetases in distal Qrr1.", + "The traditional approach to QTL mapping is to usetwo strains that differ maximally in the phenotype asparental strains for genetic crosses, with the followingcaveats. QTL analysis based on a single cross will mostlikely reflect only a small portion of the net geneticvariation, and QTL detection will be limited to regionswhere the two progenitor strains have functional polymorphisms. Data from multiple crosses, or from an HS,will overcome this limitation and can also be used toreduce QTL intervals [5,30].", + "These candidate genes are then sequenced in the two parental inbredstrains looking for sequence dierences in coding or regulatory regions. After ne mapping the QTL interval and shortening the list of plausiblecandidate polymorphisms, the major challenge remains \u0001 proving denitivelywhich nucleotide polymorphism underlies the QTL. The most direct proofwould be replacing one strains allele with another strains allele (creating aFIG. 1. Intercross breeding strategy for mapping quantitative trait loci (QTLs). On the right, the parental, F1 hybrid, and intercross (F2) mousegenerations are depicted.", + "Furthermore, splicing QTLs(sQTLs) rather than eQTLs could comprise the molecular mechanism linking DNA variants with YFP53; thus, sQTL analysis could uncover genes that would not normally bedetected at the level of differential gene expression (DGE),53 and thus, a differentially181182Molecular-Genetic and Statistical Techniques for Behavioral and Neural ResearchFigure 8.5 Schematic for immediate, rapid ne mapping in select F2 recombinants of the RCC-F2cross. Top panel: Genome-wide signicant QTL (green trace; red dashed line signicance threshold;blue vertical lines Bayes credible interval).", + "Interval-specific haplotype analysisApproximately 97% of the genetic variation betweeninbred mouse strains is ancestral [22], so regions ofidentity by descent (IBD) between two strains used todetect a QTL are highly unlikely to contain the causalgenetic polymorphism underlying the QTL [28]. Forexample, a cross between C57BL/6J and A/J mice detectedwww.sciencedirect.coma blood pressure QTL on Chr 1 [7].", + "Interval-specific haplotype analysisApproximately 97% of the genetic variation betweeninbred mouse strains is ancestral [22], so regions ofidentity by descent (IBD) between two strains used todetect a QTL are highly unlikely to contain the causalgenetic polymorphism underlying the QTL [28]. Forexample, a cross between C57BL/6J and A/J mice detectedwww.sciencedirect.coma blood pressure QTL on Chr 1 [7].", + "At present, the BXD panel is composed of 80 different strains that all have beenfully genotyped.26 Variation in any quantifiable trait can be associated with thesegregation of parental alleles, and linkage genetics can map this variation toquantitative trait loci (QTLs), thereby identifying the genomic region(s) affectingthat trait. An overview of the QTL mapping approach is depicted in Figure 2. Classical QTL analysis has permitted the identification of loci that areassociated with variation in HSC traits.", + "This comparison gives information about the reliability of the observed genotypeinformation: The more the marker locations differ between the two maps (which signifiesvariation in marker positions), the higher the possibility of genotyping errors. QTL mapping was done in several stages to identify loci acting individually and QTL thatinteracted, either additively or epistatically. To determine individually-acting QTL, a singleQTL genome scan was conducted with the function scanone.In general,linking genetic variation with trait variation identifies QTL and a significant linkage ofphenotype and genotype suggest that the DNA status helps to determine trait expression. As stated above, mouse QTL studies provide distinct advantages over human studiesin the examination of genetic causes of a quantitative trait (e.g. alcoholism), even in theabsence of specific hypotheses regarding its aetiology or candidate genes.The progenitor mouse strainsshould have sufficient variation for the traits of interest and they should be genetically diverseenough to enable genetic mapping (BENNETT et al. 2006; FLINT 2003; GRISEL 2000). Thesample size required for the identification of QTL depends largely on the effect size that aQTL contributes to phenotypes on interest. Inference about QTL can be made if one or moregenetic markers are over- or underrepresented in the analysed individuals. Genotyping isoften done by means of microsatellite markers, which contains mono, di-, tri-, ortetranucleotide tandem repeats flanked by specific sequences (Figure 4a).", + "Importantly, whereasthese studies required substantial labor, time, and resources, X-QTL is a quick and easyapproach to achieve a comparable level of genetic dissection. The levels of complexityobserved here (e.g. 14 loci explaining 70% of the genetic variance for 4-NQO resistance) arestill dramatically lower than those seen in for some human traits in GWAS (e.g. 40 lociexplaining 5% of the variance for height 2,5). One obvious explanation is the difference inexperimental designs (line crosses vs. population association studies), but differences ingenetic architectures among species and traits may also contribute.", + "The method uses two pieces of information: mapping data from crosses thatinvolve more than two inbred strains and sequence variants in the progenitor strains within the intervalcontaining a quantitative trait locus (QTL). By testing whether the strain distribution pattern in the progenitor strains is consistent with the observed genetic effect of the QTL we can assign a probability that anysequence variant is a quantitative trait nucleotide (QTN). It is not necessary to genotype the animals exceptat a skeleton of markers; the genotypes at all other polymorphisms are estimated by a multipoint analysis.", + "The method uses two pieces of information: mapping data from crosses thatinvolve more than two inbred strains and sequence variants in the progenitor strains within the intervalcontaining a quantitative trait locus (QTL). By testing whether the strain distribution pattern in the progenitor strains is consistent with the observed genetic effect of the QTL we can assign a probability that anysequence variant is a quantitative trait nucleotide (QTN). It is not necessary to genotype the animals exceptat a skeleton of markers; the genotypes at all other polymorphisms are estimated by a multipoint analysis.", + "which allows the the estimation of the degree of dominance of detected QTLs.but point out that various designs or linkage analysis methods have advantages and disadvantages and need to be chosen for the question at hand. In general, there are two common statistical approaches for detecting and/or locating QTL. Both approaches involve moving along the chromosome and considering data for one or several markers at a time and relating these to the traits of interest.", + "Genotyping all the individual progeny formarkers that show allelic variation between the parental strains (either single nucleotide polymorphisms or simple sequence repeats) will allow the detection of associations between trait values and marker genotype, and in this way demonstrate to whichset of markers a QTL is linked. To reduce the genotyping effort, selective genotypingof the individuals at the extremes of the phenotypic spectrum can be performed (20,23). Although these three approaches are in general considered to be the best to detect andmap QTL, they have several disadvantages for quantitative traits involving HSC.", + "So, how do you go about planning and performing a QTL study, and howdo you identify the responsible gene within a QTL that you have identified? Generally, one starts by performing a strain survey to find two parental inbredstrains that have a markedly different trait. One can now look up many differenttraits of inbred mice online at the Mouse Phenome Database (http://phenome. jax.org/pub-cgi/phenome/mpdcgi?rtn=docs/home). However, the trait you maywant to study may not be present in wild type mice, so you may want to crossa mutant (or genetically engineered) strain onto several inbred strains.By using a segregating populationit is possible to determine whether the observed difference in mRNA abundance betweentwo parental lines is caused by a sequence difference within the gene itself or close to it(cis-eQTL) or to a factor from a different location (trans or distal eQTL). If it is possibleto map both the gene itself (a binary trait) and the variation in the abundance of mRNAexpressed from that gene (a quantitative trait) onto the chromosomes (as above) then cisand trans-regulation can be easily differentiated.", + "These candidate genes are then sequenced in the two parental inbredstrains looking for sequence dierences in coding or regulatory regions. After ne mapping the QTL interval and shortening the list of plausiblecandidate polymorphisms, the major challenge remains \u0001 proving denitivelywhich nucleotide polymorphism underlies the QTL. The most direct proofwould be replacing one strains allele with another strains allele (creating aFIG. 1. Intercross breeding strategy for mapping quantitative trait loci (QTLs). On the right, the parental, F1 hybrid, and intercross (F2) mousegenerations are depicted.", + "QTL mapping studies thenseek to detect the polymorphisms underlying the complex traits of interest byscanning for alleles that co-vary withthe traits. Similar experiments also can be conducted with special derivatives of inbredstrains known as recombinant inbred(RI) mice. These animals are derivedby cross-breeding two or more distinctparental strains (which often divergewidely for the trait of interest), followedby inbreeding of the offspring for severalgenerations (Bailey 1971). Given thecorrect breeding strategy, this method1This is an issue faced by GWASs researchers when classifyingsamples as cases or controls." + ], + [ + "(1234567890)Complex trait variation in natural and experimental populations is due to specific DNA sequence polymorphisms, environmental effects, and the interactions between these factors(Johannes et al. 2009). Testis weight is a complex trait thatholds direct implications for reproductive success, as developmental abnormalities can lead to irregular sperm production and infertility in adulthood (Sharpe 2001). Variation intestis size has been linked to environmental factors such associal dominance, social organization, and seasonal changesacross numerous species.", + "Selection could occur at multiple levels, from germ cell generation and propagation to fertilization and early embryonic growth.Chromosomal abnormalities, including aneuploidy, were found in 10-20% of spermatozoa and oocytes (20) and in the cleaved embryo, with a 21% rate of abnormalities in preimplantation embryos (21).These findings led to a model for natural selection against chromosome abnormalities (21).Selection extends to the end of gestation: Only approximately 30% of all conceptions result in a live birth, with more than half of aborted fetuses containing chromosomal abnormalities (22), a number likely to be an underestimate because of technological limitations in measuring all possible mutations.But even in the very small fraction of germ cell duos that survive this withering genome attack and result in a live birth, a number of severe de novo mutations will still be found (23).The data on gross chromosomal alterations suggest that overall, mutation frequency early in life is very high.The functional consequence, however, is limited because of selection.Somewhat surprisingly, this picture points toward an initial decline in genomic alterations, allowing the adult individual to acquire a somatic genome optimally equipped to provide function.", + "We now need to investigate the candidates identified here and how their effects on parental and offspring traits are integrated into thegene networks determining individual development. By controlling for genetic variation in eithermothers or offspring we have been able to show that levels of maternal provisioning and offspringsolicitation are unique to specific genotypes (here each BXD line) and that solicitation is costly.", + "Sex brings harmful alleles together into thesame genetic background, allowing selection to more efficiently purge them fromthe population and potentially producing some offspring that are fitter than eitherparent. However, the benefit of recombining deleterious mutations may depend on thenature of the epistatic interactions between them. The mutational deterministic hypothesis(Kondrashov 1988) depends partly on this epistasis.In most plants and animals, sexis a necessary component of reproduction, and the question for evolutionary biologistsis why reproductive mechanisms have evolved that way. In one of the experimentsdescribed next, evolutionary geneticists have nevertheless devised a way to compareevolution with and without recombination in the obligately sexual fruit fly.This disparity in investment is the basis for the twofold cost: asexualfemales hypothetically could transmit twice as many alleles at the same cost. In most plants and animals, mates tend to be unrelated, leading to outcrossing. Butsex usually also involves the basic process of physical recombination: the breakage andreunion of two different DNA or RNA molecules. Of these two processes, recombinationis clearly the more widespread feature of sexual reproduction. A variety of reproductivesystems, such as selfing and automixis, involve recombination but not outcrossing. Incontrast, relatively few reproductive systems have outcrossing without recombination.Longago, Wright (1931) noted that sex may destroy adaptation because a successful combination of characteristics is attained in individuals only to be broken up in the next generation by the mechanisms of meiosis itself. Similarly, if alleles at different loci werejointly responsible for the production of phenotypes, sex has the potential to break apartcoadapted gene complexes, as it moves alleles away from genetic backgrounds wherebeneficial epistatic interactions have evolved through natural selection. Why should sex therefore be so common, given the obvious costs?", + "The reason for the rarity of these mutations is natural selection: If the mutations result in disorders that decrease health and reproductive fitness, they will eventually be eliminated from a population.In exceptional cases, mutations may cause both beneficial and detrimental consequences, resulting in opposing forces of positive selection and negative selection that may cause the mutations to be preserved at nonrare frequencies in a population.For example, the HbS mutation in the HBB gene (which produces the subunit of hemoglobin) causes sickle cell disease when present in both alleles, a detrimental consequence, but protects against malaria when present in 1 allele, a beneficial consequence, ensuring that the mutation persists in populations in areas of the world where malaria is endemic.Genes are passed from parents to offspring via the process of meiosis by which gametes, the egg cells in the mother and the sperm cells in the father, are generated.Ordinarily, each cell has 23 pairs of chromosomes; the gametes have 23 unpaired chromosomes.In meiosis, the 23 pairs are split so that each gamete receives 1 chromosome from each pair (Figures 8 and 9).Two gametes (egg and sperm) ultimately join into a single cell, the zygote, which has the full complement of 23 chromosome pairs restored.If all goes well, the zygote gives rise to a live offspring.Crossing over-The swapping of genetic material that occurs in the germline.During the formation of egg and sperm cells, also known as meiosis, paired chromosomes from each parent align so that similar DNA sequences from the paired chromosomes cross over one another.Crossing over results in a shuffling of genetic material and is an important cause of the genetic variation seen among offspring.This process is also known as meiotic recombination.Recombination (meiotic recombination)-The swapping of genetic material that occurs in the germline.During the formation of egg and sperm cells, also known as meiosis, paired chromosomes from each parent align so that similar DNA sequences from the paired chromosomes recombine with one another.Recombination results in a shuffling of genetic material and is an important cause of the genetic variation seen among offspring.Also known as crossing over.", + "Sex brings harmful alleles together into thesame genetic background, allowing selection to more efficiently purge them fromthe population and potentially producing some offspring that are fitter than eitherparent. However, the benefit of recombining deleterious mutations may depend on thenature of the epistatic interactions between them. The mutational deterministic hypothesis(Kondrashov 1988) depends partly on this epistasis.In most plants and animals, sexis a necessary component of reproduction, and the question for evolutionary biologistsis why reproductive mechanisms have evolved that way. In one of the experimentsdescribed next, evolutionary geneticists have nevertheless devised a way to compareevolution with and without recombination in the obligately sexual fruit fly.This disparity in investment is the basis for the twofold cost: asexualfemales hypothetically could transmit twice as many alleles at the same cost. In most plants and animals, mates tend to be unrelated, leading to outcrossing. Butsex usually also involves the basic process of physical recombination: the breakage andreunion of two different DNA or RNA molecules. Of these two processes, recombinationis clearly the more widespread feature of sexual reproduction. A variety of reproductivesystems, such as selfing and automixis, involve recombination but not outcrossing. Incontrast, relatively few reproductive systems have outcrossing without recombination.Longago, Wright (1931) noted that sex may destroy adaptation because a successful combination of characteristics is attained in individuals only to be broken up in the next generation by the mechanisms of meiosis itself. Similarly, if alleles at different loci werejointly responsible for the production of phenotypes, sex has the potential to break apartcoadapted gene complexes, as it moves alleles away from genetic backgrounds wherebeneficial epistatic interactions have evolved through natural selection. Why should sex therefore be so common, given the obvious costs?", + "However, we expect thatonly at this level, the most signicant contributions brought by integrating epigenetics will bemade. Concluding Remarks and FutureProspectsFish sex ratios are the result of a complex interaction between genetic, biochemical, and environmental interactions. The ultimate resultof these interactions at the individual level isgender: male or female. However, at the population level, the combination of sex determination and differentiation sets the sex ratio. Inturn, sex ratios dene the reproductive capacityof populations and, if sex growth dimorphismexists, also the growth characteristics, something very important in an aquaculture context.Traditionally, it has been agreed that thenal sex of an individual (phenotypic sex)depends on two sequential processes: the sexdetermination system of the species and thegonad differentiation process (Valenzuela,2008). However, recently, these two seeminglydistinct processes are viewed as part of a general process leading to gonad formation andsex ratios (Sarre et al. , 2004; Quinn et al. , 2011;Uller and Helantera, 2011).", + "Obehav is, in turn, influenced by offspring genesand environment (Ogene and Oenvir respectively). Hence, indirect genetic effects (blue arrows)and direct genetic effects (red arrow) are important influencers of behaviour. B) Parentoffspring conflict theory predicts that parental resource investment and offspring solicitationbehaviours are influenced by the fitness benefit to a focal individual (O), cost to a socialpartner such as a sibling (S1 and S2) or parent (P), and by their coefficient of relatedness(black arrows). 42Figure 2: Genomic imprinting can result in divergent phenotypes from the samegenotype. A) A paternally imprinted gene, i.e. maternally expressed.", + "Therefore, the resulting phenotypic patterns lag a generationbehind the genetic transmission of the causal variants. The most well-studied parental genetic effectsare caused by deposition of maternal transcripts into the egg prior to fertilization, resulting indifferences in early embryonic development depending on the genotype of the mother. Certain geneshave also been shown to respond to maternal influence after birth through genetically definedmaternal behaviors (Weaver et al. , 2004).Because of the small contribution, through the sperm, ofthe paternal transcriptome to the fertilized zygote, and because of the stronger maternal contributionto child rearing in most model organisms, parental effects are typically thought of as synonymous withmaternal effects, although true paternal effects are known to exist (Rando, 2012). Maternal effects have been shown to be important during embryonic development, leading todifferences in the birth weight of mice depending on the genotype of the mother (Cowley et al. ,1989; Wolf et al. , 2011).", + "It was believed by many that for each trait variant we should expect to find acorresponding genetic change, or gene for that trait. Through historical happenstance therelationship between genes and traits was set up and treated as if it were one-to-one. But theproduction of a trait involves not only genes, but also their interactions with each other and theenvironment, and chance." + ], + [ + "The hierarchical organization of GNs main Select and Search menu issimple and makes it relatively easy to find relevant data sets (Fig. 1). To get data, afteropening the browser, select the most appropriate Species from the dropdown menu. For anopen-ended search of phenotypes you can also select All Species at the bottom of the menu. The next steps are to select the Group, Type, and Data Set from the drop-down menus. Formany groups, a combination of phenotypes, genotypes, and molecular data are available.", + "GeneNetwork contains data from awide range of species, from humans to soybeans, but most of the available phenotypic data isfrom mice. Within the mouse dataset there are groups of families, crosses, non-geneticgroupings, and individual data. The type of dataset must be selected after defining the speciesand sample population. While genotypes, mRNA, methylated DNA, protein, metagenomic, and2bioRxiv preprint doi: https://doi.org/10.1101/2020.12.23.424047; this version posted December 24, 2020. The copyright holder for this preprint(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. metabolome datasets are available (i.e.", + "The hierarchical organization of GNs main Select and Search menu issimple and makes it relatively easy to find relevant data sets (Fig. 1). To get data, afteropening the browser, select the most appropriate Species from the dropdown menu. For anopen-ended search of phenotypes you can also select All Species at the bottom of the menu. The next steps are to select the Group, Type, and Data Set from the drop-down menus. Formany groups, a combination of phenotypes, genotypes, and molecular data are available.", + "Search and Data RetrievalPoint your browser to www.genenetwork.org. This brings you by default tothe Search page, from which you can retrieve data from many GN data sets. We will focus on the default data set, defined by Species: Mouse, Group: BXD,Type: Whole Brain, Database: INIA Brain mRNA M430 (Apr05) PDNNEnter Kcnj* into the ALL or ANY field and click the Search button. Notethe location and annotation of available potassium channel genes in the SearchResults page that opens. Use the browser Back button to return to previous page.", + "Addinformation on data provenance by giving details in Investigation, Protocols and ProtocolApplicationsCustomize Customize my XGAP database with extended variants of Trait and Subject. In the online XGAP demonstrator, Probe traits have asequence and genome location and Strain subjects have parent strains and (in)breeding method. Describe extensions using MOLGENISlanguage and the generator automatically changes XGAP database software to your researchUploadUpload data from measurement devices, public databases, collaborating XGAP databases, or a public XGAP repository with communitydata.However, a suitable and customizable integration ofthese elements to support high throughput genotype-tophenotype experiments is still needed [34]: dbGaP, GeneNetwork and the model organism databases aredesigned as international repositories and not to serveas general data infrastructure for individual projects;many of the existing bespoke data models are too complicated and specialized, hard to integrate between profiling technologies, or lack software support to easilyconnect to new analysis tools; and customization of theexisting infrastructures dbGaP, GeneNetwork or otherinternational repositories [35,36] or assembly of Bioconductor and generic model organism database components to suit particular experimental designs, organismsand biotechnologies still requires many minor andsometimes major manual changes in the software codethat go beyond what individual lab bioinformaticianscan or should do, and result in duplicated effortsbetween labs if attempted.", + ", 2014; see Section 9). GeneNetwork is a database that enables searching for 4000 phenotypes from multiple studies in the BXD, HXB, and in other recombinant inbred rodent families, as well as in other model organismsand even humans (Mulligan et al. , 2017). GeneNetwork employed asomewhat dierent strategy than MPD in that it did not rely solely onresearchers submitting their data. Instead the database operators extracted the data from the scientic literature and integrated them into auniform format (Chesler et al. , 2003).", + "GeneNetwork contains data from awide range of species, from humans to soybeans, but most of the available phenotypic data isfrom mice. Within the mouse dataset there are groups of families, crosses, non-geneticgroupings, and individual data. The type of dataset must be selected after defining the speciesand sample population. While genotypes, mRNA, methylated DNA, protein, metagenomic, and2bioRxiv preprint doi: https://doi.org/10.1101/2020.12.23.424047; this version posted December 24, 2020. The copyright holder for this preprint(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. metabolome datasets are available (i.e.", + "However, a suitable and customizable integration of these elementsto support high throughput genotype-to-phenotype experiments is stillneeded[340]: dbGaP, GeneNetwork and the model organism databasesare designed as international repositories and not to serve as generaldata infrastructure for individual projects; many of the existing bespokedata models are too complicated and specialized, hard to integrate between proling technologies, or lack software support to easily connectto new analysis tools; and customization of the existing infrastructuresdbGaP, GeneNetwork or other international repositories[384, 154] orassembly of Bioconductor and generic model organism database components to suit particular experimental designs, organisms and biotechnologies still requires many minor and sometimes major manual changes382.1.", + "All data presented in this paper were deposited in the online databaseGeneNetwork (www.genenetwork.org), an open web resource that containsgenotypic, gene expression, and phenotypic data from several genetic referencepopulations of multiple species (e.g. mouse, rat and human) and various celltypes and tissues.35;36 It provides a valuable tool to integrate gene networks andphenotypic traits, and also allows cross-cell type and cross-species comparativegene expression and eQTL analyses.", + "There is a good chance that you will be able to apply these newtechniques to specific problems, even while you read. If you have a computer with anInternet connectionso much the better, and you can read and work along at the same time. This short review and primer will take you on a tour of a web site called GeneNetwork thatembeds many large data sets that are relevant to studies of behavioral variation. GeneNetwork is an unusual site because it contains a coherent \"universe\" of data, as well asmany powerful analytic tools.", + "The GeneNetwork database provides open accessto BXD and other RI strain derived microarray data, single nucleotide polymorphism (SNP) data,and phenotypic data for quantitative trait loci analysis and gene expression correlation analyses. Gene expression data were exported for manually selected probes in the PDNN hippocampusdatabase (Hippocampus Consortium M430v2), and the PDNN whole brain database (INIA BrainmRNA M430). The Hippocampus database was chosen as one of the most elaborate brain databases,as well as most highly recommended dataset on GeneNetwork itself (http://www.genenetwork.org/webqtl/main.py?FormID=sharinginfo&GN_AccessionId=112).", + "2016) and canalso be accessed in GeneNetwork by entering Record ID 18494 in the Get Anyspace on the Search page and clicking on the Search button. Alternatively, enterdata by hand into the designated boxes provided by GeneNetwork. These latteroptions also allow for the inclusion of trait variance. It is a good idea to namethe trait in the box provided. Then click Next, and manually enter the data foreach RI strain, F1, and founder strain. 3Author ManuscriptAfter entering the data, click on the blue plus sign button called Add.To submit multiple phenotypes at the sametime, select the option for Batch Submission under the Home tab. This allowsusers to submit up to 100 traits for analysis by GeneNetwork. Here, select BXDas the cross or RI set to analyze from the first pull-down menu. The phenotypefile should follow the format described in the Sample text (http://genenetwork.org/sample.txt). After uploading the appropriate file using theBrowse button, enter a name for the file in the Dataset space. The data will bestored in the GeneNetwork server for 24 hours. Click Next.Author ManuscriptMaterialsHere we will provide detailed instructions for using GeneNetwork along with someworked examples taken from the recent study of intravenous cocaine self-administrationby Dickson et al. (2016) in BXD RI mice. A complete overview of GeneNetwork is beyondthe scope of this protocol, but is extensively covered in elsewhere (see Mulligan et al. 2016;Williams & Mulligan 2012 for excellent reviews on GeneNetwork). A computer with an internet connection and current web browser. See the GeneNetwork.orgsite for information on supported browser versions. Author ManuscriptMethodEntering DataAuthor Manuscript1Link to http://www.genenetwork.org.", + "Species in GenAge model organisms", + "Data are reviewed before entry inGeneNetwork by the senior author. Phenotypes are currently split into 15 broadphenotypic categories (Supplementary Data 1). Phenome curation and descriptionwas initiated by R.W.W. and Dr Elissa Chesler in 2002 by literature review and dataextraction. The early work is described briey in Chesler et al.51,52. Most work overthe past 5 years has been performed by two of the coauthors (R.W.W. andM.K.M.). We have used a controlled vocabulary and set of rules described here(http://www.genenetwork.org/faq.html#Q-22).", + "9) To bring your data to GeneWeaver,click on the GeneWeaver icon, making sure to be previouslylogin to your GeneWeaver account. You will be brought to theGeneSet upload page with the Genes Uploaded and theGeneweaver Analysis Platform139Fig. 5 Default settings at GeneNetwork.org are set to search Mouse, Phenotypes, from among the BXDPublished Phenotypes data set. Here the term nociception was searched forFig. 6 The search results page in GeneNetwork showing the 33 records retrieved from the phenotype searchfor nociception.Users may also share their data with other users selectively,make it public, or keep it restricted to a private account. Data can beimported by users, uploading their gene set data directly or exporting to GeneWeaver from within another online resource such asNeuro Informatics Framework (NIF) [8], Grappa [9], MousePhenome Database (MPD) [10] or GeneNetwork [11]. These datasets can then be added to your collection to be analyzed togetherwith other gene sets retrieved from the GeneWeaver database. To begin a GeneWeaver analysis a user must collect GeneSetstogether in a Project.Alternatively the spreadsheet can be saved as a .txt fileand uploaded by clicking on Switch to file upload. Oncecomplete click on upload GeneSet. 7. Once completed you are taken to the GeneSet detail page. Ifthere are errors in your uploaded data you can correct them byclicking on Edit. 8. Use the Add Selected to Project, and create a new project, e.g. Chronic Cocaine. 9. Now using the Search function populate this project with additional gene sets related to this study trying Queries such asCocaine Addiction, Chronic Cocaine." + ], + [ + "Figure 3 | Association statistics from one of the five type 2 diabetes genome-wide association studies 20 .The y axis represents the -log10 p value and the x axis represents each of the ~400,000 SNPs used in this scan.The point of each arrow indicates the location of the most strongly associated SNP in each of nine known type 2 diabetes gene regions.Two signals, in SLC30A8 and TCF2, were not captured on the Affymetrix chip.The plot was generated using Haploview.CDKAL1, CDK5 regulatorysubunit-associated protein 1-like 1; CDKN2, cyclin-dependent kinase inhibitor 2A; FTO, fat mass and obesity-associated; HHEX, haematopoietically expressed homeobox; IDE, insulin-degrading enzyme; IGF2BP2, insulin-like growth factor 2 mRNA-binding protein 2; KCNJ11, potassium inwardly-rectifying channel, subfamily J, member 11; PPARG, peroxisome proliferator-activated receptor- gene; SLC30A8, solute carrier family 30 (zinc transporter), member 8; TCF2, transcription factor 2, hepatic; TCF7L2, transcription factor 7-like 2 (T-cell specific, HMg-box).", + ", for the Diabetes Genetics Replication And Meta-analysis (DIAGRAM) Consortium 9Genome-wide association (GWA) studies have identified multiple loci at which common variants modestly but reproducibly influence risk of type 2 diabetes (T2D) [1][2][3][4][5][6][7][8][9][10][11] .Established associations to common and rare variants explain only a small proportion of the heritability of T2D.As previously published analyses had limited power to identify variants with modest effects, we carried out meta-analysis of three T2D GWA scans comprising 10,128 individuals of European descent and B2.2 million SNPs (directly genotyped and imputed), followed by replication testing in an independent sample with an effective sample size of up to 53,975.We detected at least six previously unknown loci with robust evidence for association, including the JAZF1 (P 5.0 10 -14 ), CDC123-CAMK1D (P 1.2 10 -10 ), TSPAN8-LGR5 (P 1.1 10 -9 ), THADA (P 1.1 10 -9 ), ADAMTS9 (P 1.2 10 -8 ) and NOTCH2 (P 4.1 10 -8 ) gene regions.Our results illustrate the value of large discovery and follow-up samples for gaining further insights into the inherited basis of T2D. [3][4][5]7,10 , for the Diabetes Genetics Replication And Meta-analysis (DIAGRAM) Consortium 9Genome-wide association (GWA) studies have identified multiple loci at which common variants modestly but reproducibly influence risk of type 2 diabetes (T2D) [1][2][3][4][5][6][7][8][9][10][11] .Established associations to common and rare variants explain only a small proportion of the heritability of T2D.As previously published analyses had limited power to identify variants with modest effects, we carried out meta-analysis of three T2D GWA scans comprising 10,128 individuals of European descent and B2.2 million SNPs (directly genotyped and imputed), followed by replication testing in an independent sample with an effective sample size of up to 53,975.We detected at least six previously unknown loci with robust evidence for association, including the JAZF1 (P 5.0 10 -14 ), CDC123-CAMK1D (P 1.2 10 -10 ), TSPAN8-LGR5 (P 1.1 10 -9 ), THADA (P 1.1 10 -9 ), ADAMTS9 (P 1.2 10 -8 ) and NOTCH2 (P 4.1 10 -8 ) gene regions.Our results illustrate the value of large discovery and follow-up samples for gaining further insights into the inherited basis of T2D.", + "Although these are considered to be loci convincingly associated with susceptibility to type 2 diabetes in populations of European descent, other genes related to susceptibility to the disease are probably still unidentified, particularly those for populations of other ancestries.In order to uncover genetic variants that increase the risk of type 2 diabetes, we conducted a genome-wide association study in Japanese individuals with type 2 diabetes and unrelated controls.We first genotyped 268,068 SNPs, which covered approximately 56% of common SNPs in the Japanese, in 194 individuals with type 2 diabetes and diabetic retinopathy (case 1) and in 1,558 controls (control 1) collected in the BioBank Japan.We compared the allele frequencies of 207,097 successfully genotyped SNPs and selected the 8,323 SNPs showing the lowest P values.We then attempted to genotype these 8,323 SNPs in 1,367 individuals with type 2 diabetes and diabetic retinopathy (case 2) and for 1,266 controls (control 2) (stage 2), and successfully obtained data for 6,731 SNPs (the P value distribution in the second test is shown in Supplementary Fig. 1a online).The results of principal component analysis 8 in the stage 1 and 2 samples and HapMap samples revealed that there was no evidence for population stratification between the case and control groups throughout the present tests (Supplementary Fig. 1b,c).We selected the 9 SNP loci showing P values o0.0001 (additive model in stage 2, Table 1) and genotyped a third set of cases and controls comprising 3,557 Japanese individuals with type 2 diabetes (cases 3,4,5) and 1,352 controls (controls 3,4).We evaluated the differences in the population structure among these three sets of case and two sets of control groups by Wright's F test.As the results indicated that there was no difference in the population structure among these groups (Supplementary Table 1b online), we combined these populations for the third test of case-control study.The third set of analysis identified the significant associations for six SNPs (Table 1), including the CDKAL1 locus at 6p22.3 (rs4712524, rs9295475 and rs9460546), the IGF2BP2 locus at 3q27.2 (rs6769511 and rs4376068) and the KCNQ1 locus at 11p15.5 (rs2283228).The remaining three SNPs (rs13259803, rs612774 and rs10836097) had P values of 40.05 in the third test and were not further examined.CDKAL1 and IGF2BP2 were previously reported as susceptibility genes for type 2 diabetes in the Japanese population 9 .Therefore, we focused on the KCNQ1 locus, which was highly associated with type 2 diabetes.", + "Genome-wide association studies (GWAS) have recently revealed many novel SNPs associated with type 2 diabetes.These include SNPs located in the regions near TCF7L2, HHEX-IDE, EXT2, FTO, SLC30A8, IGF2BP2, CDKAL1, and CDKN2A-CDKN2B [8][9][10][11][12][13].A second phase of studies identified many additional variants, including those near JAZF1, TSPAN8-LGR5, THADA, ADAMTS9, NOTCH2-ADAM30, CDC123-CAMK1D, and KCNQ1 [14,15].The two genes in which common variants were previously convincingly associated with type 2 diabetes, PPARG and KCNJ11, were also identified in these GWAS [12,16,17].More recently, numerous other SNPs have been identified in additional GWAS and meta-analyses [18].", + "As noted by Below et al. and others [34,44], there is a significant enrichment of eQTLs among top type 2 diabetesassociated loci.Genetic heritability estimates for type 2 diabetes are markedly higher than can be explained by the variation identified to date; to characterize this Bmissing^heritability, Torres et al. composed multiple SNP subsets by partitioning interrogated maker sets into groups by status as eQTL in several insulin-responsive peripheral tissues [45].They discovered that these subsets explain a greater portion of type 2 diabetes risk than expected by chance, suggesting a significant role of regulatory variation in diabetes susceptibility.Several reasons have been suggested as to why so much of the genetic heritability of type 2 diabetes remains unmapped to risk loci [46].Conclusive identification of less common (0.5-5 % MAF) variation of modest effect will require investments in extremely large sample sizes.The heterogeneous nature of Hispanic populations increases the challenge because to detect variation or effects specific to groups or environments may require sample sizes beyond what exist to be collected.There is evidence that parent of origin may influence effects of variants on type 2 diabetes risk [47].Studies in mouse models also demonstrate that some genetic effects on type 2 diabetes and related traits are modified by sex, diet, and epigenetic effects, indicating that careful environmental modelling and stratification will be necessary to identify some loci subject to interaction effects [48].Genetic characterizations of larger Hispanic samples are underway, but especially in the case of extremely rare or private variation, a return to family-based study designs will improve power through enrichment of allelic observations and increased environmental and genetic homogeneity [49].] performed a GWA study for lipid traits in a cohort of 3642 Hispanic participants from the Women's Health Initiative SNP Health Association Resource (WHI-SHARe) and reported genome-wide significant signals within or near the genes GCKR, LPL, and APOA/APOC for TG and CETP and APOA/APOC for HDL-C.These authors also showed that there is a substantial overlap in the genes associated with lipid traits in different population groups.When testing the markers showing genome-wide significance or suggestive evidence of association (p 10 5 ) in European GWA studies in the Hispanic and African American WHI cohorts, a strong enrichment of small p values was observed in both cohorts.Additionally, there was a significant correlation of the allelic effects of markers with p 10 5 identified in Europeans in the Hispanic and African American cohorts.The genomic regions showing association in Europeans accounted for a disproportionate amount of variance in both cohorts.", + "Attempts to apply similar approaches to families in which either common forms of diabetes or obesity is segregating have proved to be largely unrewarding, 5 and the second wave of discovery involved a switch to tests of association.Although intrinsically more powerful than linkage analysis, association analysis suffers from the disadvantage that the signal can be detected only if one examines the causal variant itself or a nearby marker with which it is tightly correlated.Until the advent of methods that enabled genomewide surveys of association, researchers were therefore obliged to direct their attention to specific candidate variants or genes of interest. 6In retrospect, it is obvious that most such studies were seriously underpowered or focused on inappropriate candidates. 6Nevertheless, by accruing data over the course of multiple studies, some genuine susceptibility variants were identified.Common coding variants in PPARG and KCNJ11 (each of which encodes a protein that acts as a target for classes of therapeutic agents widely used in diabetes management) were shown to have modest effects on the risk of type 2 diabetes. 7,8Resequencing of the gene encoding the melanocortin-4 receptor (MC4R) resulted in the identification of low-frequency coding variants that explain approximately 2 to 3% of cases of severe obesity. 9he third, and most successful, wave of discovery has been driven by systematic, large-scale surveys of association between common DNA sequence variants and disease.The first demonstration that unbiased discovery efforts could reveal new insights into the pathogenesis of type 2 diabetes resulted from identification of the association between type 2 diabetes and variants within TCF7L2 (encoding transcription factor 7-like 2, a protein not previously identified as a biologic candidate). 10TCF7L2 has now been shown to modulate pancreatic islet function. 113][14][15][16][17][18] Together, these studies revealed six new associations, including variants near CDKAL1, CDKN2A, and CDKN2B (which encode putative or known regulators of cyclin-dependent kinases) and HHEX (which is transcribed into a homeobox protein implicated in beta-cell development).Typically each copy of a susceptibility allele at one of these loci is associated with a 15 to 20% increase in the risk of diabetes.Since then, the dominant approach to discovery has involved ever-larger aggregations of genomewide association data from multiple samples so as to improve the power to identify variants of modest effect: these studies have revealed more than 20 additional confirmed signals of susceptibility to type 2 diabetes [19][20][21][22] (Table 1 and Fig. 1).Though early studies were restricted to samples obtained from persons of European descent, genomewide association analyses conducted in other ethnic groups are now emerging. 23,24,29he current total of approximately 40 confirmed type 2 diabetes loci includes variants in or near WFS1 (wolframin) and the hepatocyte nuclear factors HNF1A and HNF1B (genes that also harbor rare mutations responsible for monogenic forms of diabetes) [30][31][32][33] ; the melatonin-receptor gene MTNR1B (which highlights the link between circadian and metabolic regulation) [26][27][28] ; and IRS1 (encoding insulin-receptor substrate 1), one of a limited number of type 2 diabetes loci with a primary effect on insulin action rather than on secretion. 25enomewide association studies of genetic variants influencing body-mass index (BMI) and obesity have been similarly productive, with three main strategies being adopted (Table 2 and Fig. 2).Genomewide association studies of population-based samples to examine the full range of BMI values have identified approximately 30 loci influencing BMI and the risk of obesity.The strongest signal remains the association with variants within FTO (the fat-mass and obesity-related gene). 13,34,45Other signals near BDNF, SH2B1, and NEGR1 (all implicated in aspects of neuronal function) reinforce the view of obesity as a disorder of hypothalamic function. 35,37,38,43A second approach, focusing on case-control analysis of persons selected from the extremes of the BMI distribution, has delivered a complementary, only partly overlapping, set of loci. 39,42,46,47Finally, genomewide analyses of patterns of fat distribution, prompted by the particularly deleterious health effects of visceral fat accumulation, have characterized approximately 15 loci that are largely distinct from those influencing overall adiposity 36,40,41,44 : many of the 15 display markedly stronger associations in women than in men.", + "Recent large genome-wide association studies (GWAS) have identified multiple loci which harbor genetic variants associated with type 2 diabetes mellitus (T2D), many of which encode proteins not previously suspected to be involved in the pathogenesis of T2D.Most GWAS for T2D have focused on populations of European descent, and GWAS conducted in other populations with different ancestry offer a unique opportunity to study the genetic architecture of T2D.We performed genome-wide association scans for T2D in 3,955 Chinese (2,010 cases, 1,945 controls), 2,034 Malays (794 cases, 1,240 controls), and 2,146 Asian Indians (977 cases, 1,169 controls).In addition to the search for novel variants implicated in T2D, these multi-ethnic cohorts serve to assess the transferability and relevance of the previous findings from European descent populations in the three major ethnic populations of Asia, comprising half of the world's population.Of the SNPs associated with T2D in previous GWAS, only variants at CDKAL1 and HHEX/IDE/KIF11 showed the strongest association with T2D in the meta-analysis including all three ethnic groups.However, consistent direction of effect was observed for many of the other SNPs in our study and in those carried out in European populations.Close examination of the associations at both the CDKAL1 and HHEX/IDE/KIF11 loci provided some evidence of locus and allelic heterogeneity in relation to the associations with T2D.We also detected variation in linkage disequilibrium between populations for most of these loci that have been previously identified.These factors, combined with limited statistical power, may contribute to the failure to detect associations across populations of diverse ethnicity.These findings highlight the value of surveying across diverse racial/ethnic groups towards the fine-mapping efforts for the casual variants and also of the search for variants, which may be population-specific.", + "Discovery of Novel Genes Associated With Type 1 DiabetesMellitus A total of 24,984 genes were analyzed in the initial gene-based GWAS.Three quantile-quantile plots for gene-based P-values, SNP-based P-values inside genes and SNP-based P-values outside genes are shown in Figure 1.We observed dramatic deviations at the tails of the distributions for the three plots.The deviation was much stronger for the plot of gene-based P-values than the other two plots, suggesting relatively higher power for gene-based association analysis.In short, through a gene-based association study, we identified 183 type 1 diabetes mellitus-associated genes that were insignificant in the original SNP-based association tests.Among the 183 genes, 171 genes are 'novel' genes identified for type 1 diabetes mellitus.Replication studies and/or differential expression studies further supported the significance of 53 genes to type 1 diabetes mellitus.In particular, four non-HLA genes (RASIP1, STRN4, BCAR1 and MYL2) and three HLA genes (FYN, HLA-J and PPP1R11) were validated by both replication and differential expression studies.", + "Genome-wide association studies (GWAS) have identified >100 independent SNPs that modulate the risk of type 2 diabetes (T2D) and related traits.However, the pathogenic mechanisms of most of these SNPs remain elusive.Here, we examined genomic, epigenomic, and transcriptomic profiles in human pancreatic islets to understand the links between genetic variation, chromatin landscape, and gene expression in the context of T2D.We first integrated genome and transcriptome variation across 112 islet samples to produce dense cis-expression quantitative trait loci (cis-eQTL) maps.Additional integration with chromatin-state maps for islets and other diverse tissue types revealed that cis-eQTLs for islet-specific genes are specifically and significantly enriched in islet stretch enhancers.High-resolution chromatin accessibility profiling using assay for transposase-accessible chromatin sequencing (ATACseq) in two islet samples enabled us to identify specific transcription factor (TF) footprints embedded in active regulatory elements, which are highly enriched for islet cis-eQTL.Aggregate allelic bias signatures in TF footprints enabled us de novo to reconstruct TF binding affinities genetically, which support the high-quality nature of the TF footprint predictions.Interestingly, we found that T2D GWAS loci were strikingly and specifically enriched in islet Regulatory Factor X (RFX) footprints.Remarkably, within and across independent loci, T2D risk alleles that overlap with RFX footprints uniformly disrupt the RFX motifs at high-information content positions.Together, these results suggest that common regulatory variations have shaped islet TF footprints and the transcriptome and that a confluent RFX regulatory grammar plays a significant role in the genetic component of T2D predisposition.", + "Attention turned instead to association approaches in larger, unrelated samples sets (Merikangas and Risch 2003).Association analyses, however, rely upon typing the causal variant or a closely correlated proxy, and hence, initial efforts were constrained by practical limitations of genotyping cost and capacity to the evaluation of variants within pre-defined candidate genes.Nonetheless, this approach heralded the first wave of robustly associated variants.For T2D, non-synonymous variants in genes encoding the targets of two drugs widely used in T2D management [P12A in PPARG (Altshuler et al. 2000) for thiazolidinediones and E23K in KCNJ11 (Gloyn et al. 2003) for sulfonylureas] showed consistent, though modest (per-allele odds ratios of *1.2), evidence of association with disease risk.For obesity, variants within two genes already known to harbour mutations implicated in monogenic obesity-MC4R (V103I, I251L) and PCSK1 (N221D, Q665E-S690T)-were shown to be associated with common obesity risk (Heid et al. 2005;Geller et al. 2004;Benzinou et al. 2008).However, the candidate gene approach is restricted by its intrinsic reliance upon prior knowledge and expectation.When, as with T2D and obesity, our understanding of disease pathogenesis is imperfect, there is a manifest need to extend the search for susceptibility variants across the entire genome in an unbiased, hypothesis-free manner.The first gene to be implicated in T2D susceptibility without prior biological candidacy was TCF7L2, discovered following systematic association analysis across a region of previously identified linkage (Grant et al. 2006).The most strongly associated variants at this locus have the greatest effect on T2D susceptibility of any common variant so far identified.", + "Identifying the genetic variants that increase the risk of type 2 diabetes (T2D) in humans has been a formidable challenge.Adopting a genome-wide association strategy, we genotyped 1161 Finnish T2D cases and 1174 Finnish normal glucose-tolerant (NGT) controls with >315,000 single-nucleotide polymorphisms (SNPs) and imputed genotypes for an additional >2 million autosomal SNPs.We carried out association analysis with these SNPs to identify genetic variants that predispose to T2D, compared our T2D association results with the results of two similar studies, and genotyped 80 SNPs in an additional 1215 Finnish T2D cases and 1258 Finnish NGT controls.We identify T2D-associated variants in an intergenic region of chromosome 11p12, contribute to the identification of T2D-associated variants near the genes IGF2BP2 and CDKAL1 and the region of CDKN2A and CDKN2B, and confirm that variants near TCF7L2, SLC30A8, HHEX, FTO, PPARG, and KCNJ11 are associated with T2D risk.This brings the number of T2D loci now confidently identified to at least 10.", + "Large-scale association studies conducted by DIAGRAM, in individuals overwhelmingly of European descent, have reported 65 lead SNPs associated with susceptibility to type 2 diabetes (1). Figure 1 illustrates how these SNPs and closely correlated proxy SNPs were systematically selected for allelic expression analysis.In brief, 1,525 proxy SNPs (r 2 .0.8, CEU, 1,000 Genomes Phase 1) were found.Of these SNPs (lead + proxies), 45/1,590 (2.8%) map to exons of 23 human RefSeq genes.For 18 of these genes, TaqMan SNP genotyping assays could be designed to map entirely to exonic sequence, thus allowing for amplification and measurement of mature (i.e., spliced) mRNA species and normalization of allelic expression using genomic DNA from the same individual.After exclusion of SNPs with ,4 heterozygotes (rs1801282, PPARG; rs3734621, KIF6) and assays where .50%cDNA samples yielded Ct values .36(rs2793823, ADAM30; rs7377, SRGN), indicating very low levels of gene expression, allelic expression could be determined for 14 genes in samples from 36 white nondiabetic donors.", + "Associations of seven genes with type 2 diabetes in Chinese and Korean populationsRESULTSWe genotyped 13 representative SNPs from 7 genes implicated in type 2 diabetes in recent GWA studies in 3,041 type 2 diabetic case subjects and 3,678 nondiabetic control subjects from a Chinese population in Hong Kong and two Korean populations.The clinical characteristics of the subjects are summarized in Table 1.Table 2 showed the meta-analyses of type 2 diabetes association under a log additive model.There was no heterogeneity of ORs among the three study populations except for CDKN2A/B (rs10811661) (Q statistic P 0.03), with a random effect OR of 1.32 (1.15-1.52).Apart from two SNPs at CDKN2A/B (rs564398 and rs1333040), all other 11 SNPs were significantly associated with type 2 diabetes, with ORs ranging from 1.09 to 1.35 (1.3 10 12 P 0.016) in the combined samples (Table 2).Eight of the 11 SNPs remained significant after adjustment for multiple comparison by permutation (1.0 10 4 P empirical 0.012) (Table 2) despite nonsignificance of CDKN2A/B (rs10757278), TCF7L2 (rs7903146), and FTO (rs8050136).Because multiple SNPs with little or moderate linkage disequilibrium at CDKAL1 (r 2 0.56), CDKN2A/B (r 2 0.002-0.31),and HHEX (r 2 0.25-0.55)were studied (Supplementary Table 2), we examined haplotype associations but did not reveal more significant association than single marker analyses (Supplementary Table 3).Further haplotype analyses by conditioning rs7756992 on CDKAL1 haplotypes and rs7923837 on HHEX haplotypes revealed no significant residual associations (P 0.05; data not shown), suggesting that these two SNPs are sufficient to explain the respective multiple associations at CDKAL1 and HHEX.Although residual association was observed after conditioning rs10811661 on CDKN2A/B haplotypes (P 0.023), the much stronger single marker association of rs10811661 compared with rs10757278 (P 1.3 10 12 vs. 0.015; Table 2) suggests the former is the key associated SNP.Taken together, seven key SNPs from these genes were significant without correction for multiple comparisons.In this regard, TCF7L2 (rs7903146) showed the strongest effect on type 2 diabetes risk (OR 1.35), followed by CDKN2A/B (rs10811661), CDKAL1 (rs7756992), HHEX (rs7923837), IGF2BP2 (rs4402960), SLC30A8 (rs13266634), and FTO (rs8050136).These seven SNPs were further examined in the subsequent analyses.DISCUSSIONOur study provides important insights for the impact of the new type 2 diabetes genes identified through GWA studies.To our knowledge, this is the largest replication study in Asians up to now.We confirm the type 2 diabetes association of seven representative risk alleles for these seven genes found in Europeans (3)(4)(5)(6)(7)(8), suggesting many of the variants associated with type 2 diabetes in Europeans are also associated in Asians.These genetic effects seem to be additive.Despite differences in effect size of each gene, a crude estimate suggests up to 3.3-fold increased type 2 diabetes risk in subjects carrying eight or more risk alleles compared with those carrying two or fewer risk alleles (Supplementary Fig. 1).Two adjacent regions near CDKN2A/B have been reported to be associated with type 2 diabetes and cardiovascular diseases.Our data confirm the association of type 2 diabetes for rs10811661, found in the European type 2 diabetes studies (3,4,8), but not rs564398, found only in the Wellcome Trust Case Control Consortium Study (8).In addition, we found that the cardiovascular disease risk loci (rs1333040 and rs10757278) (14 -16) were not associated with type 2 diabetes.", + "Finally, we examined whether genes identified using our association studies were enriched within diabetes-related pathways.We collated a list of 42 genes to which 53 CpG sites associated with T2D traits (CS score 1.77, combined P < 0.017) mapped.Even in this small dataset, pathway analysis (Supplementary Material, Table S12) indicated significant enrichment in 31 pathways (Fisher's exact P < 0.05), including those related to circadian clock (P = 0.005), adipocytokine signaling (P = 0.009), leptin pathway (P = 0.023), HDL-mediated lipid transport (P = 0.031) and insulin signaling (P = 0.033).", + "These associations were subsequently confirmed in three reports simultaneously published in Science in April 2007 [13, 14. 23].Although several GWAS had been performed on T2DM in recent years, these latter studies are particularly important because of: the unprecedented joint collaborative effort to combine findings and to perform replication and meta-analysis; the large number of cases examined (each had 1,900 or more cases and controls with a combined number of 14,586 patients and 17,968 controls); and the common European ancestry of all subjects (reviewed in [23]).Despite some differences in the selection of phenotypes, there was remarkable consistency in the genes identified in these studies as linked to T2DM.In addition to replicating positive associations for TCF/L2, KCNJ11, PPARG, HHEX-IDE and SLC30A8, new variants were found in an intron (non-coding, spacer, section of a gene) of cyclin-dependent kinase 5 (CDK5)-regulatory subunit associated protein 1-like 1 (CDKAL1), in an intron of insulin-like growth factor binding protein 2 (IGF2BP2), in non-coding regions near the genes for cyclin-dependent kinase-inhibitor A and B (CDKN2A/B) on chromosome 9, and in the fat mass and obesity associated (FTO) region.", + "A r t i c l e sBy combining genome-wide association data from 8,130 individuals with type 2 diabetes (T2D) and 38,987 controls of European descent and following up previously unidentified meta-analysis signals in a further 34,412 cases and 59,925 controls, we identified 12 new T2D association signals with combined P < 5 10 8 .These include a second independent signal at the KCNQ1 locus; the first report, to our knowledge, of an X-chromosomal association (near DUSP9); and a further instance of overlap between loci implicated in monogenic and multifactorial forms of diabetes (at HNF1A).The identified loci affect both beta-cell function and insulin action, and, overall, T2D association signals show evidence of enrichment for genes involved in cell cycle regulation.We also show that a high proportion of T2D susceptibility loci harbor independent association signals influencing apparently unrelated complex traits." + ], + [ + "Indicative diseases associated with the candidate aging genes", + "In our study, genes regulated in NF1 blood samples appear to be potentially interesting for understanding the pathogenesis of NF1 and the function of NF1 gene.Adhesion molecules and genes involved in matrix remodeling may provide insights into the mechanism of bone malformations seen in some NF1 patients.Interestingly, six genes down-regulated in NF1 are implicated in craniofacial dysplasia (defects in neural crest development) and long bone dysplasia (defects in mesoderm development).The Armadillo repeat gene, deleted in velocardiofacial syndrome (ARVCF), is involved in some phenotypes associated with velocardiofacial syndrome, an autosomal dominant disorder characterized by cleft palate and facial dysmorphology and conotruncal heart defects [24].Chondroitin 6 sulfotransferase catalyzes the formation of choidroitin sulfate and a deficiency in this gene has been associated with a heritable form of spondyloepiphyseal dysplasia [28].Domain-specific mutations in transforming growth factor beta 1 have been implicated in Camurati -Engelmann disease, an autosomal dominant, progressive diaphyseal dysplasia characterized by hyperostosis and sclerosis of the diaphyses of long bones [12].Core-binding factor alpha1 subunit (CBFA1) is a runtdomain containing gene, mutations of which are responsible for cleidocranial dysplasia, a disease characterized by hypoplasia/aplasia of clavicles, patent fontanelles, supernumerary teeth, short stature, and other changes in skeletal patterning and growth [16].Mutations of TCOF1, are responsible for Treacher Collins Syndrome, one of the most common mandibulofacial dysostosis disorders [34].Finally, mutation of Pax 9 is associated with hypodontia [5,26].", + "A Syndromic Intellectual Disability Disorder Caused by Variants in TELO2, a Gene Encoding a Component of the TTT Complex. Am J Hum Genet. 2016; 98: 909918. https://doi.org/10.1016/j.ajhg.2016.03.014 PMID: 2713259397. Moosa S, Altmuller J, Lyngbye T, Christensen R, Li Y, Nurnberg P, et al. Novel compound heterozygous mutations in TELO2 in a patient with severe expression of You-Hoover-Fong syndrome. MolGenet Genomic Med. 2017; 5: 580584. https://doi.org/10.1002/mgg3.287 PMID: 2894424098. Choy KR, Watters DJ. Neurodegeneration in ataxia-telangiectasia: Multiple roles of ATM kinase in cellular homeostasis. Developmental Dynamics. John Wiley and Sons Inc.; 2018. pp. 3346. https://doi. org/10.1002/dvdy.24522 PMID: 2854393599.", + "Human Immunodeficiency-Centromeric Instability-Facial Anomalies (ICF) SyndromeICF syndrome is a rare genetic disorder caused by mutations in one of four identified genes: DNMT3b [57], HELLS [58], CDCA7 [58], and ZBTB24 [59].Most cases of ICF syndrome (55%) are caused by mutation in DNMT3b, the gene coding for the human de novo DNA methyltransferase [60].This disease causes severe immunodeficiency, increased susceptibility to infection, abnormal facial features, and cognitive disabilities [61,62].ICF syndrome is often diagnosed by the presence of stretched and fragile juxtacentromeric heterochromatin on chromosomes 1 and 16 in activated lymphocytes.Perhaps as a result of this phenotype, these chromosomes are more susceptible to breakage, missegregation resulting in aneuploidy, and micronuclei formation [62].Mutation in CDCA7, HELLS, and ZBTB24 also results in DNA methylation defects at alpha-satellites and satellite II DNA that is enriched on juxtacentromeric heterochromatin on chromosomes 1 and 16 [58,63].HELLS and CDCA7 work together in a complex known as CHIRRC (CDCA7-HELLS ICF-Related nucleosome Remodeling Complex) to catalyze nucleosome remodeling, which could modulate the accessibility of DNA for methylation [64].DNA methylation profiles in ICF patients with mutations in any of these three genes are different from that in ICF patients with DNMT3b mutation, indicating that these proteins may not all work in the same pathway [65].All observed ICF patients, however, have hypomethylation of the juxtacentromeric satellite II repeats, leading to the hypothesis that the chromosome fragility and disease symptoms are directly linked to DNA hypomethylation.", + "A genetic factorsuch as M5ahl8/Tmc1m4 may be involved in the samepathway and could modify the deleterious effects associated with defects in other pathway components (such asFSCN2 and TMC1). D2 mice also have a deleteriousvariant of CDH23 (Noben-Trauth et al. 2003) that is epistatic to the effects of the FSCN2 variant (Johnson et al. 2008). CDH23 is a component of the stereocilia tip link,supporting the view that the polygenic nature of the progressive hearing loss of D2 mice is at least partially due todeficiencies in multiple stereocilia components conferringfunctional integrity.", + "TCM Gene Disease", + "To further understand these genes and potential phenotypic consequences, we looked for enrichment of deletions or duplications in a dataset of 29,085 individuals with NDDs and 19,584 controls. 50Six of the pLI >0.9 genes were enriched for deletions in individuals with NDDs (CDC37L1, NFIB, PTPRD, RFX3, SMARCA2, UHRF2), and all 27 were enriched for duplications in individuals with NDDs (BNC2, CDC37L1, CLTA, CNTFR, ELAVL2, MLLT3, NFIB, NOL6, PAX5, PSIP1, PTPRD, RFX3, RNF38, RPS6, RUSC2, SHB, SMARCA2, SMU1, TAF1L, TEK, TESK1, TLN1, TOPORS, UBAP1, UBE2R2, UHRF2, VCP).This observation suggests that the dosage of these genes may play a role in NDDs.The mappability of most of 9p is quite high for short-read WGS data, indicating that the detection of CNV should be robust (Figure S2).Copy-number assessments generated from short-read WGS data in individuals from the 1000 Genomes Project 54 reveal that the copy number of the majority of 9p is not variable in the population (Figure S2).expand beyond NDDs, a search for other gene/disease associations was carried out (Table S3; Figure 2).This analysis revealed two genes in blood phenotypes 115501], TPM2).Importantly, 29 of these genes are known to be involved in autosomal recessive conditions, including DNAI1 in primary ciliary dyskinesia and GALT in galactosemia.Fourteen of these autosomal recessive genes are asso-ciated with neurological phenotypes (e.g., KANK1 in cerebral palsy and MPDZ in congenital hydrocephalus), which may contribute to atypical or severe NDD phenotypes in some patients with 9p CNVs.Disruption of these genes can thus potentially unmask recessive traits and contribute to phenotypic variability and should be explored in patients with complex presentations.", + "CTCF has also been suggested to play a role in two human syndromes, Silver-Russell (SRS) and Beckwith-Wiedeman Syndrome (BWS) (Sparago et al. 2004;Eggermann et al. 2008).BWS is a developmental disorder with variable clinical symptoms including increased frequency of tumors, macroglossia, and overgrowth.It is caused by aberrant activation of the normally maternally imprinted H19/Igf2 locus on chromosome 11.Several sporadic and familial mutations have been characterized in the H19/Igf2 region, which lead to the loss of CTCF binding sites, directly implicating CTCF in the disease (Sparago et al. 2004;Eggermann et al. 2008).Similarly, in SRS, a developmental disorder characterized by severe growth retardation and body asymmetry caused by activation of the paternal allele of Igf2, mutations in CTCF binding sites have been identified (Scho nherr et al. 2008).CTCF has been implicated in various diseases.Trinucleotide repeat expansion diseases including Huntington's disease, fragile X mental retardation, and myotonic dystrophy are caused by excessive lengthening of microsatellite repeat sequences (reviewed in Orr and Zoghbi 2007).For example, although a healthy individual has a stretch of fewer than 27 CAG repeats in their HTT gene, a Huntington's disease patient has typically more than 35.The trigger for repeat expansion is unknown.Recent evidence suggests that CTCF might contribute to generating fragile sites within repeats, thus facilitating their expansion (Libby et al. 2008).Mutation of a CTCF-binding site near a repeat leads to increased genomic instability and increased repeat length, similar to that seen in disease situations.Although not tested, it is possible that interference with CTCF binding, either by mutation of its target site or mutations in an interacting partner, may contribute to trinucleotide repeat diseases (Libby et al. 2008).Defects in CTCF, and other genome organizers, may also play a key role in destabilizing expanded microsatellite repeats in other trinucleotide repeat diseases (reviewed in Dion and Wilson 2009).", + "This is a particular problem for very rare diseases where only one or twofamilies are diagnosed with the condition. An example of the latter is tylosis withoesophageal cancer (TOC). This is an autosomal dominant single gene disorder thatoccurs in three families, with two of the families related by disease haplotype analysis. The entire minimal region (34 kb) has been sequenced (except for highly repetitiveregions) and no obvious disease-causing mutation has been identified, but a diseasemechanism has been postulated (MacDonald et al. , 2006).", + "9(HLA-DR4) [108], cystic fibrosis (CFTR) [22], and thrombophilia (FV, FII, MTHFR)[109-111] have been widely used to guide preventive care. For example, prophylacticmastectomy or oophorectomy is recommended to predisposed individuals and has shownto reduce the risk of cancer by 90-95% in women [112]. Similarly, genetic screenings areavailable for prenatal and newborns to detect birth defects and genetic diseases includingcystic fibrosis, severe combined immunodeficiencies, phenylketonuria, tyrosinemia,sickle cell anemia, hearing loss, and congenital heart defects.The firstsuccessful application of this approach identified genomic loci responsible for an Xlinked phagocytic disorderchronic granulomatous disease (CYBB) [20]. This was soonfollowed by identification of the loci and ultimately genes responsible for other geneticdisorders including Duchene muscular dystrophy (DMD) [21], cystic fibrosis (CFTR)[22], Huntington disease (HTT) [23,24], polycystic kidney disease (PKD1, PKD2 andPKHD1) [25-27] , phenylketonuria (PAH)[28], albinism (TYR) [29] and many more. Currently, Online Mendelian Inheritance in Man (OMIM) catalogues 4,500 humandisorders for which the underlying genetic mutations are known(http://omim.org/statistics/entry).", + "In addition, Mendelian disorders showing certain symptoms are extremely helpful in identifying genes influencing the same symptoms: the gene polymorphism for Netherton's syndrome (ichthyotic dermatosis, severe atopy) was associated with the development of asthma and eczema in children without Netherton's syndrome (Cookson and Moffatt 2000).As the deficiency of the FOXP3/Scurfin gene causes intractable diarrhea, eczema, hemolytic anemia, diabetes mellitus, or thyroid autoimmunity in humans (Patel 2001), dysregulation of FOXP3/ Scurfin gene expression may promote the development of autoimmune diseases, especially type 1 diabetes, thyroid autoimmunity and hemolytic anemia under various effects from circumstances.From an immunogenetical point of view, the FOXP3/Scurfin gene is located on chromosome (Chr) Xp11.23, which includes one of the type 1 diabetes susceptiblity loci (Cucca et al. 1998;Nerup and Pociot 2001).", + "Studies of syndromes that include CTDs, such as the 22q11 deletion syndrome, have provided some clues regarding the specific genes that may be involved in determining the risk of CTDs (e.g.TBX1 [13]).In addition, studies of rare, presumably pathogenic, copy number variants [14][15][16], and inherited [17] and de novo [17,18] single nucleotide variants have identified genes that may contribute to the risk of CTDs [18,19].Yet, most affected patients do not carry a confirmed or suspected rare, causative variant.Moreover, rare variants, in particular rare de novo variants, do not account for the observed increase in risk of CTDs among the relatives of affected patients.", + "This is a particular problem for very rare diseases where only one or twofamilies are diagnosed with the condition. An example of the latter is tylosis withoesophageal cancer (TOC). This is an autosomal dominant single gene disorder thatoccurs in three families, with two of the families related by disease haplotype analysis. The entire minimal region (34 kb) has been sequenced (except for highly repetitiveregions) and no obvious disease-causing mutation has been identified, but a diseasemechanism has been postulated (MacDonald et al. , 2006).", + "Examplesof this application include the detection of over 98% of deletions occurring onthe dystrophin gene for the diagnosis of Duchenne muscular dystrophy (9,10),;an 8-bp deletion in exon 3 of the P450c21B gene in individuals affected by21-hydroxylase deficiency, a recessively inherited disease (11), and the F508mutation, a 3-bp deletion in the gene CFTR that is the most frequently mutation found in individuals affected with cystic fibrosis (12). Another exampleis detection of the internal tandem duplication (ITD) in the juxtamembranedomain-coding sequence of the FLT3 gene in acute leukemias.", + "This is a particular problem for very rare diseases where only one or twofamilies are diagnosed with the condition. An example of the latter is tylosis withoesophageal cancer (TOC). This is an autosomal dominant single gene disorder thatoccurs in three families, with two of the families related by disease haplotype analysis. The entire minimal region (34 kb) has been sequenced (except for highly repetitiveregions) and no obvious disease-causing mutation has been identified, but a diseasemechanism has been postulated (MacDonald et al. , 2006).", + "The location of this gene in a genomic regionresponsible for the disorder, as well as its strong coexpression with other known mitochondrial genesacross tissues, hinted that it may be involved in thedisease. Additional experiments proved that the genedid indeed cause the disorder and was most likely amitochondrial gene. Tissue gene expression databases, along with other databases, have also been used tocategorize, at a whole-genome level, genes potentiallyinvolved in a particular type of disease category (Calvoet al. 2006).", + "A Syndromic Intellectual Disability Disorder Caused by Variants in TELO2, a Gene Encoding a Component of the TTT Complex. Am J Hum Genet. 2016; 98: 909918. https://doi.org/10.1016/j.ajhg.2016.03.014 PMID: 2713259397. Moosa S, Altmuller J, Lyngbye T, Christensen R, Li Y, Nurnberg P, et al. Novel compound heterozygous mutations in TELO2 in a patient with severe expression of You-Hoover-Fong syndrome. MolGenet Genomic Med. 2017; 5: 580584. https://doi.org/10.1002/mgg3.287 PMID: 2894424098. Choy KR, Watters DJ. Neurodegeneration in ataxia-telangiectasia: Multiple roles of ATM kinase in cellular homeostasis. Developmental Dynamics. John Wiley and Sons Inc.; 2018. pp. 3346. https://doi. org/10.1002/dvdy.24522 PMID: 2854393599.", + "This is a particular problem for very rare diseases where only one or twofamilies are diagnosed with the condition. An example of the latter is tylosis withoesophageal cancer (TOC). This is an autosomal dominant single gene disorder thatoccurs in three families, with two of the families related by disease haplotype analysis. The entire minimal region (34 kb) has been sequenced (except for highly repetitiveregions) and no obvious disease-causing mutation has been identified, but a diseasemechanism has been postulated (MacDonald et al. , 2006)." + ] + ], + "task_id": [ + "C0015BEE5FE41769A65126B79BB1E40D", + "3A16235DA1E02B9148B9288A06EE567E", + "E94FFD042BB146E8A429200590A6792D", + "FCFCE5BBA2A8B3D8818890B9D2308C5A", + "FFA6EADA5502933C0C30C9D16DCAA073" + ] +}
\ No newline at end of file diff --git a/gnqa/data/study1/datasets/human/dataset_domainexpert_general_5.json b/gnqa/data/study1/datasets/human/dataset_domainexpert_general_5.json new file mode 100644 index 0000000..1fa087a --- /dev/null +++ b/gnqa/data/study1/datasets/human/dataset_domainexpert_general_5.json @@ -0,0 +1,89 @@ +{ + "question": [ + "Once a sperm combines with an egg, what determines how traits are passed onto the resulting lifeform?", + "Can you explain what a ribosomal binding site at a high level and make it accessable to a non-expert?", + "Once a sperm combines with an egg, what determines how traits are passed onto the resulting lifeform?", + "Can you explain the difference between sequencing with short reads vs long reads? Please make you answer accessible to a non-expert.", + "Can you explain why using a pangenome-based reference might be more useful than simply using a single linear reference? Please make you answer accessible to a non-expert." + ], + "answer": [ + "The traits are determined by the combination of genes from both the sperm and the egg. These genes are located on the chromosomes that each parent contributes. The process of meiosis ensures that each gamete (sperm or egg) contributes one chromosome from each pair, resulting in the offspring having a full set of 23 chromosome pairs. Additionally, certain parental genetic effects, such as maternal transcripts deposited into the egg prior to fertilization, can influence early embryonic development and result in differences in traits depending on the genotype of the mother.", + "A ribosomal binding site is a specific sequence on a molecule of mRNA (messenger RNA) that the ribosome recognizes and binds to when it's time to make a protein. Think of it like a 'start' sign that tells the ribosome where to begin translating the mRNA into a protein. This process is part of how our bodies use genetic information to build the proteins we need to function.", + "Traits are passed onto the resulting lifeform through the process of meiosis, where each gamete (sperm and egg) contributes one chromosome to each pair in the zygote. This process results in a shuffling of genetic material, known as recombination or crossing over, which is a significant cause of genetic variation among offspring. Additionally, certain traits can be influenced by maternal transcripts deposited into the egg prior to fertilization, leading to differences in early embryonic development depending on the genotype of the mother.", + "Sequencing with short reads and long reads refers to the length of DNA segments that are sequenced in one go. Short-read sequencing, like that done by Illumina, generates many small pieces of DNA sequence, typically around 100-150 base pairs long. This method is relatively inexpensive and produces a high volume of data, but can struggle with complex parts of the genome.\n\nOn the other hand, long-read sequencing, like that done by Pacific Biosciences (PacBio) or Oxford Nanopore Technologies (ONT), sequences much longer pieces of DNA, sometimes up to 100,000 base pairs. This can provide more complete information about the genome and can handle complex regions better. However, it tends to have a higher error rate and is more expensive.\n\nIn simple terms, imagine trying to solve a jigsaw puzzle: short-read sequencing gives you many small pieces, which can be harder to put together, especially in complex regions. Long-read sequencing gives you fewer, but much larger pieces, which can make the puzzle easier to solve, but might be more costly and have more mistakes.", + "Using a single linear reference, like a specific genome, can limit the scope of genetic variation we can study. It assumes that all genetic variations align neatly with this reference, which isn't always the case in reality. A pangenome-based reference, on the other hand, includes multiple genomes and thus captures a wider range of genetic variations. This can help us better understand and study the complexity of genetic diversity." + ], + "contexts": [ + [ + "Selection could occur at multiple levels, from germ cell generation and propagation to fertilization and early embryonic growth.Chromosomal abnormalities, including aneuploidy, were found in 10-20% of spermatozoa and oocytes (20) and in the cleaved embryo, with a 21% rate of abnormalities in preimplantation embryos (21).These findings led to a model for natural selection against chromosome abnormalities (21).Selection extends to the end of gestation: Only approximately 30% of all conceptions result in a live birth, with more than half of aborted fetuses containing chromosomal abnormalities (22), a number likely to be an underestimate because of technological limitations in measuring all possible mutations.But even in the very small fraction of germ cell duos that survive this withering genome attack and result in a live birth, a number of severe de novo mutations will still be found (23).The data on gross chromosomal alterations suggest that overall, mutation frequency early in life is very high.The functional consequence, however, is limited because of selection.Somewhat surprisingly, this picture points toward an initial decline in genomic alterations, allowing the adult individual to acquire a somatic genome optimally equipped to provide function.", + "We now need to investigate the candidates identified here and how their effects on parental and offspring traits are integrated into thegene networks determining individual development. By controlling for genetic variation in eithermothers or offspring we have been able to show that levels of maternal provisioning and offspringsolicitation are unique to specific genotypes (here each BXD line) and that solicitation is costly.", + "In most plants and animals, sexis a necessary component of reproduction, and the question for evolutionary biologistsis why reproductive mechanisms have evolved that way. In one of the experimentsdescribed next, evolutionary geneticists have nevertheless devised a way to compareevolution with and without recombination in the obligately sexual fruit fly.Sex brings harmful alleles together into thesame genetic background, allowing selection to more efficiently purge them fromthe population and potentially producing some offspring that are fitter than eitherparent. However, the benefit of recombining deleterious mutations may depend on thenature of the epistatic interactions between them. The mutational deterministic hypothesis(Kondrashov 1988) depends partly on this epistasis.Longago, Wright (1931) noted that sex may destroy adaptation because a successful combination of characteristics is attained in individuals only to be broken up in the next generation by the mechanisms of meiosis itself. Similarly, if alleles at different loci werejointly responsible for the production of phenotypes, sex has the potential to break apartcoadapted gene complexes, as it moves alleles away from genetic backgrounds wherebeneficial epistatic interactions have evolved through natural selection. Why should sex therefore be so common, given the obvious costs?This disparity in investment is the basis for the twofold cost: asexualfemales hypothetically could transmit twice as many alleles at the same cost. In most plants and animals, mates tend to be unrelated, leading to outcrossing. Butsex usually also involves the basic process of physical recombination: the breakage andreunion of two different DNA or RNA molecules. Of these two processes, recombinationis clearly the more widespread feature of sexual reproduction. A variety of reproductivesystems, such as selfing and automixis, involve recombination but not outcrossing. Incontrast, relatively few reproductive systems have outcrossing without recombination.", + "The reason for the rarity of these mutations is natural selection: If the mutations result in disorders that decrease health and reproductive fitness, they will eventually be eliminated from a population.In exceptional cases, mutations may cause both beneficial and detrimental consequences, resulting in opposing forces of positive selection and negative selection that may cause the mutations to be preserved at nonrare frequencies in a population.For example, the HbS mutation in the HBB gene (which produces the subunit of hemoglobin) causes sickle cell disease when present in both alleles, a detrimental consequence, but protects against malaria when present in 1 allele, a beneficial consequence, ensuring that the mutation persists in populations in areas of the world where malaria is endemic.Genes are passed from parents to offspring via the process of meiosis by which gametes, the egg cells in the mother and the sperm cells in the father, are generated.Ordinarily, each cell has 23 pairs of chromosomes; the gametes have 23 unpaired chromosomes.In meiosis, the 23 pairs are split so that each gamete receives 1 chromosome from each pair (Figures 8 and 9).Two gametes (egg and sperm) ultimately join into a single cell, the zygote, which has the full complement of 23 chromosome pairs restored.If all goes well, the zygote gives rise to a live offspring.Crossing over-The swapping of genetic material that occurs in the germline.During the formation of egg and sperm cells, also known as meiosis, paired chromosomes from each parent align so that similar DNA sequences from the paired chromosomes cross over one another.Crossing over results in a shuffling of genetic material and is an important cause of the genetic variation seen among offspring.This process is also known as meiotic recombination.Recombination (meiotic recombination)-The swapping of genetic material that occurs in the germline.During the formation of egg and sperm cells, also known as meiosis, paired chromosomes from each parent align so that similar DNA sequences from the paired chromosomes recombine with one another.Recombination results in a shuffling of genetic material and is an important cause of the genetic variation seen among offspring.Also known as crossing over.", + "In most plants and animals, sexis a necessary component of reproduction, and the question for evolutionary biologistsis why reproductive mechanisms have evolved that way. In one of the experimentsdescribed next, evolutionary geneticists have nevertheless devised a way to compareevolution with and without recombination in the obligately sexual fruit fly.Sex brings harmful alleles together into thesame genetic background, allowing selection to more efficiently purge them fromthe population and potentially producing some offspring that are fitter than eitherparent. However, the benefit of recombining deleterious mutations may depend on thenature of the epistatic interactions between them. The mutational deterministic hypothesis(Kondrashov 1988) depends partly on this epistasis.Longago, Wright (1931) noted that sex may destroy adaptation because a successful combination of characteristics is attained in individuals only to be broken up in the next generation by the mechanisms of meiosis itself. Similarly, if alleles at different loci werejointly responsible for the production of phenotypes, sex has the potential to break apartcoadapted gene complexes, as it moves alleles away from genetic backgrounds wherebeneficial epistatic interactions have evolved through natural selection. Why should sex therefore be so common, given the obvious costs?This disparity in investment is the basis for the twofold cost: asexualfemales hypothetically could transmit twice as many alleles at the same cost. In most plants and animals, mates tend to be unrelated, leading to outcrossing. Butsex usually also involves the basic process of physical recombination: the breakage andreunion of two different DNA or RNA molecules. Of these two processes, recombinationis clearly the more widespread feature of sexual reproduction. A variety of reproductivesystems, such as selfing and automixis, involve recombination but not outcrossing. Incontrast, relatively few reproductive systems have outcrossing without recombination.", + "Aberrant recombination patterns on chromosomes that have missegregated have also been identified as an important factor, in both male and female gametes (Table I).This is because recombination together with cohesion of sister chromatids establish the unique 'bivalent' chromosome structure where homologous partner chromosomes are tethered together, a configuration that is critical for their accurate segregation in meiosis I (Fig. 2A).The remarkable feature is that recombination occurs in foetal oocytes whereas chromosome segregation takes place decades later (Fig. 2A).Since mammalian oocytes are arrested at the G2/M transition (or dictyate stage), this raises the intriguing question of how the bivalent is maintained until the meiotic divisions.", + "However, we expect thatonly at this level, the most signicant contributions brought by integrating epigenetics will bemade. Concluding Remarks and FutureProspectsFish sex ratios are the result of a complex interaction between genetic, biochemical, and environmental interactions. The ultimate resultof these interactions at the individual level isgender: male or female. However, at the population level, the combination of sex determination and differentiation sets the sex ratio. Inturn, sex ratios dene the reproductive capacityof populations and, if sex growth dimorphismexists, also the growth characteristics, something very important in an aquaculture context.Traditionally, it has been agreed that thenal sex of an individual (phenotypic sex)depends on two sequential processes: the sexdetermination system of the species and thegonad differentiation process (Valenzuela,2008). However, recently, these two seeminglydistinct processes are viewed as part of a general process leading to gonad formation andsex ratios (Sarre et al. , 2004; Quinn et al. , 2011;Uller and Helantera, 2011).", + "Obehav is, in turn, influenced by offspring genesand environment (Ogene and Oenvir respectively). Hence, indirect genetic effects (blue arrows)and direct genetic effects (red arrow) are important influencers of behaviour. B) Parentoffspring conflict theory predicts that parental resource investment and offspring solicitationbehaviours are influenced by the fitness benefit to a focal individual (O), cost to a socialpartner such as a sibling (S1 and S2) or parent (P), and by their coefficient of relatedness(black arrows). 42Figure 2: Genomic imprinting can result in divergent phenotypes from the samegenotype. A) A paternally imprinted gene, i.e. maternally expressed.", + "Therefore, the resulting phenotypic patterns lag a generationbehind the genetic transmission of the causal variants. The most well-studied parental genetic effectsare caused by deposition of maternal transcripts into the egg prior to fertilization, resulting indifferences in early embryonic development depending on the genotype of the mother. Certain geneshave also been shown to respond to maternal influence after birth through genetically definedmaternal behaviors (Weaver et al. , 2004).Because of the small contribution, through the sperm, ofthe paternal transcriptome to the fertilized zygote, and because of the stronger maternal contributionto child rearing in most model organisms, parental effects are typically thought of as synonymous withmaternal effects, although true paternal effects are known to exist (Rando, 2012). Maternal effects have been shown to be important during embryonic development, leading todifferences in the birth weight of mice depending on the genotype of the mother (Cowley et al. ,1989; Wolf et al. , 2011).", + "It was believed by many that for each trait variant we should expect to find acorresponding genetic change, or gene for that trait. Through historical happenstance therelationship between genes and traits was set up and treated as if it were one-to-one. But theproduction of a trait involves not only genes, but also their interactions with each other and theenvironment, and chance." + ], + [ + "SpliceAid2, a tool that allows us to know which proteins can bind to the desired RNA sequence, was used to investigate the effects of studied SNPs in protein binding affinity.This tool is entirely based on the human true splicing site and experimentally assessed target motif.The results of mutation effects on splicing binding in spliceAid2 database demonstrate information on creation of splice sites or strengthening of cryptic splice sites and provide information on intron retention, appearance, and disappearance of new alternative splice site forms (Piva et al., 2012).", + "The variations of binding sites can be collected from knowntarget genes, mutagenesis studies (Hallikas et al. , 2006), phylogenetic shadowing (orthologous binding sites in different species) (Ostrin et al. , 2006), and in vitro SELEXexperiments (Liu and Stormo, 2005). Several recent technologies, such as SELEXSAGE (Roulet et al. , 2002) and protein-binding microarray (PBM) (Mukherjee et al. ,2004), allow the determination of binding specificity in a high throughput manner. The profile or motif of binding sites can then be described with a consensussequence.An alternative mechanism for translation initiation has been identified thatdoes not obey the first AUG rule; this involves cap-independent internal264CH 11 PREDICTIVE FUNCTIONAL ANALYSIS OF POLYMORPHISMSribosome binding mediated by a Y-shaped secondary structure, denoted the internal ribosome entry site (IRES), located in the 5 UTR of 510 per cent ofhuman mRNA molecules (see Le and Maizel, 1997, for a review of these elements). IRES elements are complex stem loop structures, and there is no reliablesequence consensus to allow prediction of the possible functional effects of polymorphisms in these elements.", + "Based on structural modeling of the observedprotein sequences, the authors suggest that the glycine-rich conserved regions offerflexibility to the protein in the form of a wobble (Riede et al. 1987; Drexler et al. 1989). A number of recent studies have explicitly investigated the issue of specialist-generalisttrade-offs. For example, Turner and Elena (2000) evolved replicate populations of anRNA virus on novel hosts using a single novel host or alternating novel hosts. Theauthors observed improvements in fitness on the novel hosts.", + "5.5.3 Candidates in Qrr1d and Possible Link with Local Protein SynthesisTransfer RNAs are direct biological partners of the ARS, and the cluster oftRNAs in the highly polymorphic intergenic region of Qrr1d is an enticing candidate. Inaddition to their role in shuttling amino acids, tRNAs also act as sensors of cellular aminoacid levels and regulate transcription of genes involved in amino acid metabolism and theARS (Ryckelynck et al. , 2005). There is tissue specificity in the expression of differenttRNA isoforms (Dittmar et al.", + "The elucidation of the structure of the ribonuclease inhibitor, a protein containing 15 LRRs,revealed that the LRR sequence is a structural motif,each consisting of a b-strand connected by variableloops to a helical or extended part of about 24 aminoacid residues in length with a conserved 11-residuesegment corresponding to the b-strand and adjacentCorrespondence: Dr Antoni Matilla Duenas, Institute of Child Health, University College London, 30 Guilford Street, London WC1N 1EH, UK. Fax: +44(0)207 905 2301.", + "The elucidation of the structure of the ribonuclease inhibitor, a protein containing 15 LRRs,revealed that the LRR sequence is a structural motif,each consisting of a b-strand connected by variableloops to a helical or extended part of about 24 aminoacid residues in length with a conserved 11-residuesegment corresponding to the b-strand and adjacentCorrespondence: Dr Antoni Matilla Duenas, Institute of Child Health, University College London, 30 Guilford Street, London WC1N 1EH, UK. Fax: +44(0)207 905 2301.", + "Based on structural modeling of the observedprotein sequences, the authors suggest that the glycine-rich conserved regions offerflexibility to the protein in the form of a wobble (Riede et al. 1987; Drexler et al. 1989). A number of recent studies have explicitly investigated the issue of specialist-generalisttrade-offs. For example, Turner and Elena (2000) evolved replicate populations of anRNA virus on novel hosts using a single novel host or alternating novel hosts. Theauthors observed improvements in fitness on the novel hosts.", + "Figure 4. Structure of the ribosomal DNA gene cluster in S. cerevisiae.The location of the rDNA cluster on chromosome XII is shown at the top, with the telomere (TEL) and centromere (CEN) indicated.A detailed view of an rDNA repeat unit is shown below.The 35S and 5S rRNA gene-coding regions are indicated, as is the rDNA origin of replication (rARS).The RFB (red box) is bound by Fob1p (pink).The locations of the 35S promoter and the bidirectional noncoding promoter E-pro (blue box), silenced by Sir2p, are indicated.", + "An alternative mechanism for translation initiation has been identified thatdoes not obey the first AUG rule; this involves cap-independent internal264CH 11 PREDICTIVE FUNCTIONAL ANALYSIS OF POLYMORPHISMSribosome binding mediated by a Y-shaped secondary structure, denoted the internal ribosome entry site (IRES), located in the 5 UTR of 510 per cent ofhuman mRNA molecules (see Le and Maizel, 1997, for a review of these elements). IRES elements are complex stem loop structures, and there is no reliablesequence consensus to allow prediction of the possible functional effects of polymorphisms in these elements.The variations of binding sites can be collected from knowntarget genes, mutagenesis studies (Hallikas et al. , 2006), phylogenetic shadowing (orthologous binding sites in different species) (Ostrin et al. , 2006), and in vitro SELEXexperiments (Liu and Stormo, 2005). Several recent technologies, such as SELEXSAGE (Roulet et al. , 2002) and protein-binding microarray (PBM) (Mukherjee et al. ,2004), allow the determination of binding specificity in a high throughput manner. The profile or motif of binding sites can then be described with a consensussequence.", + "Schluenzen F, Tocilj A, Zarivach R, Harms J, Glyehmann M, Janell D, Bashan A,Bartels H, Agmon I, Franceschi F, Yonath A (2000) Structure of functionally activated small ribosomal subunit at 3.3 Angstroms resolution. Cell 102:61562319. Hendrickson WA (1991) Determination of macromolecular structures from anomalous diffraction of synchrotron radiation. Science 254:515820. Yang C, Pflugrath JW, Courville DA, Stence CN, Ferrara JD (2003) Away fromthe edge: SAD phasing from the sulfur anomalous signal measured in-house withchromium radiation. Acta Crystallogr D59(11):1943195721. Kissinger CR, Gehlhaar DK, Smith BA, Bouzida D (2001) Molecular replacementby evolutionary search.Using this technique, the 6xHis-tagged ExoS ADP-ribosyltransferaseprotein was shown to bind 14-3-3 in a direct fashion (56). The two purifiedproteins were mixed together, and were subjected to nickel-charged affinitychromatography to isolate 6xHis-protein complexes. To quantify the interaction, the amount of 14-3-3 eluted from the 6xHis-ExoS resin was determined. It was found that essentially molar equivalents of ExoS and 14-3-3 werereversibly bound to the affinity matrix while a ligand binding mutant of 14-3-3or egg albumin was detected primarily in the unbound phase (56).Proc Natl Acad Sci USA91:9022902630. Wilson DS, Keefe AD, Szostak JW (2001) The use of mRNA display to selecthigh-affinity protein-binding peptides. Proc Natl Acad Sci USA 98:3750375531. Hanes J, Jermutus L, Weber-Bornhauser S, Bosshard HR, Plckthun A (1998)Ribosome display efficiently selects and evolves high-affinity antibodies in vitrofrom immune libraries. Proc Natl Acad Sci USA 95:141301413532. Fields S, Song O (1989) A novel genetic system to detect protein-protein interactions. Nature 340:24524633. Ma J, Ptashne M (1988) Converting a eukaryotic transcriptional inhibitor into anactivator. Cell 55:44344634.Khler F, Mller KM (2003) Adaptation of the Ras-recruitment system to the analysisof interactions between membrane-associated proteins. Nucleic Acids Res 31:e2841. Osborne MA, Dalton S, Kochan JP (1995) The yeast tribrid system-geneticdetection of trans-phosphorylated ITAM-SH2-interactions. Biotechnology (NY)13:1474147842. Zhang J, Lautar S (1996) A yeast three-hybrid method to clone ternary proteincomplex components. Anal Biochem 242:687243. Licitra EJ, Liu JO (1996) A three-hybrid system for detecting small lig-and-proteinreceptor interactions. Proc Natl Acad Sci USA 93:128171282144. Tafelmeyer P, Johnsson N, Johnsson K (2004) Transforming a (beta/alpha)8-barrelenzyme into a split-protein sensor through directed evolution.", + "The variations of binding sites can be collected from knowntarget genes, mutagenesis studies (Hallikas et al. , 2006), phylogenetic shadowing (orthologous binding sites in different species) (Ostrin et al. , 2006), and in vitro SELEXexperiments (Liu and Stormo, 2005). Several recent technologies, such as SELEXSAGE (Roulet et al. , 2002) and protein-binding microarray (PBM) (Mukherjee et al. ,2004), allow the determination of binding specificity in a high throughput manner. The profile or motif of binding sites can then be described with a consensussequence.An alternative mechanism for translation initiation has been identified thatdoes not obey the first AUG rule; this involves cap-independent internal264CH 11 PREDICTIVE FUNCTIONAL ANALYSIS OF POLYMORPHISMSribosome binding mediated by a Y-shaped secondary structure, denoted the internal ribosome entry site (IRES), located in the 5 UTR of 510 per cent ofhuman mRNA molecules (see Le and Maizel, 1997, for a review of these elements). IRES elements are complex stem loop structures, and there is no reliablesequence consensus to allow prediction of the possible functional effects of polymorphisms in these elements.", + "Gribskov M, Robinson NL: Use of receiver operating characteristic (ROC) analysis to evaluate sequence matching. Computerand Chemistry 1996, 20:25-33. Ren B, Robert F, Wyrick JJ, Aparicio O, Jennings EG, Simon I, Zeitlinger J, Schreiber J, Hannett N, Kanin E, Volkert TL, Wilson CJ, Bell SP,Young RA: Genome-wide location and function of DNA binding proteins. Science 2000, 290(5500):2306-2309. Iyer VR, Horak CE, Scafe CS, Botstein D, Snyder M, Brown PO:Genomic binding sites of the yeast cell-cycle transcriptionfactors SBF and MBF. Nature 2001, 409(6819):533-538.", + "The variations of binding sites can be collected from knowntarget genes, mutagenesis studies (Hallikas et al. , 2006), phylogenetic shadowing (orthologous binding sites in different species) (Ostrin et al. , 2006), and in vitro SELEXexperiments (Liu and Stormo, 2005). Several recent technologies, such as SELEXSAGE (Roulet et al. , 2002) and protein-binding microarray (PBM) (Mukherjee et al. ,2004), allow the determination of binding specificity in a high throughput manner. The profile or motif of binding sites can then be described with a consensussequence.An alternative mechanism for translation initiation has been identified thatdoes not obey the first AUG rule; this involves cap-independent internal264CH 11 PREDICTIVE FUNCTIONAL ANALYSIS OF POLYMORPHISMSribosome binding mediated by a Y-shaped secondary structure, denoted the internal ribosome entry site (IRES), located in the 5 UTR of 510 per cent ofhuman mRNA molecules (see Le and Maizel, 1997, for a review of these elements). IRES elements are complex stem loop structures, and there is no reliablesequence consensus to allow prediction of the possible functional effects of polymorphisms in these elements." + ], + [ + "(1234567890)Complex trait variation in natural and experimental populations is due to specific DNA sequence polymorphisms, environmental effects, and the interactions between these factors(Johannes et al. 2009). Testis weight is a complex trait thatholds direct implications for reproductive success, as developmental abnormalities can lead to irregular sperm production and infertility in adulthood (Sharpe 2001). Variation intestis size has been linked to environmental factors such associal dominance, social organization, and seasonal changesacross numerous species.", + "Selection could occur at multiple levels, from germ cell generation and propagation to fertilization and early embryonic growth.Chromosomal abnormalities, including aneuploidy, were found in 10-20% of spermatozoa and oocytes (20) and in the cleaved embryo, with a 21% rate of abnormalities in preimplantation embryos (21).These findings led to a model for natural selection against chromosome abnormalities (21).Selection extends to the end of gestation: Only approximately 30% of all conceptions result in a live birth, with more than half of aborted fetuses containing chromosomal abnormalities (22), a number likely to be an underestimate because of technological limitations in measuring all possible mutations.But even in the very small fraction of germ cell duos that survive this withering genome attack and result in a live birth, a number of severe de novo mutations will still be found (23).The data on gross chromosomal alterations suggest that overall, mutation frequency early in life is very high.The functional consequence, however, is limited because of selection.Somewhat surprisingly, this picture points toward an initial decline in genomic alterations, allowing the adult individual to acquire a somatic genome optimally equipped to provide function.", + "It is a cornerstone of modern biology that a purposeful genetic program drives all biological processes that occur from conception to reproductive maturation.But, once reproductive maturation is reached, thought is divided in respect to whether the aging process is a continuation of the genetic program or whether it is the result of random losses in molecular fidelity.Despite the claim by many researchers to the contrary, there is no direct evidence that genes drive age changes.I will discuss how genes are involved in the finitude of life subsequently.", + "In most plants and animals, sexis a necessary component of reproduction, and the question for evolutionary biologistsis why reproductive mechanisms have evolved that way. In one of the experimentsdescribed next, evolutionary geneticists have nevertheless devised a way to compareevolution with and without recombination in the obligately sexual fruit fly.Sex brings harmful alleles together into thesame genetic background, allowing selection to more efficiently purge them fromthe population and potentially producing some offspring that are fitter than eitherparent. However, the benefit of recombining deleterious mutations may depend on thenature of the epistatic interactions between them. The mutational deterministic hypothesis(Kondrashov 1988) depends partly on this epistasis.Longago, Wright (1931) noted that sex may destroy adaptation because a successful combination of characteristics is attained in individuals only to be broken up in the next generation by the mechanisms of meiosis itself. Similarly, if alleles at different loci werejointly responsible for the production of phenotypes, sex has the potential to break apartcoadapted gene complexes, as it moves alleles away from genetic backgrounds wherebeneficial epistatic interactions have evolved through natural selection. Why should sex therefore be so common, given the obvious costs?This disparity in investment is the basis for the twofold cost: asexualfemales hypothetically could transmit twice as many alleles at the same cost. In most plants and animals, mates tend to be unrelated, leading to outcrossing. Butsex usually also involves the basic process of physical recombination: the breakage andreunion of two different DNA or RNA molecules. Of these two processes, recombinationis clearly the more widespread feature of sexual reproduction. A variety of reproductivesystems, such as selfing and automixis, involve recombination but not outcrossing. Incontrast, relatively few reproductive systems have outcrossing without recombination.", + "Crossing over-The swapping of genetic material that occurs in the germline.During the formation of egg and sperm cells, also known as meiosis, paired chromosomes from each parent align so that similar DNA sequences from the paired chromosomes cross over one another.Crossing over results in a shuffling of genetic material and is an important cause of the genetic variation seen among offspring.This process is also known as meiotic recombination.The reason for the rarity of these mutations is natural selection: If the mutations result in disorders that decrease health and reproductive fitness, they will eventually be eliminated from a population.In exceptional cases, mutations may cause both beneficial and detrimental consequences, resulting in opposing forces of positive selection and negative selection that may cause the mutations to be preserved at nonrare frequencies in a population.For example, the HbS mutation in the HBB gene (which produces the subunit of hemoglobin) causes sickle cell disease when present in both alleles, a detrimental consequence, but protects against malaria when present in 1 allele, a beneficial consequence, ensuring that the mutation persists in populations in areas of the world where malaria is endemic.Genes are passed from parents to offspring via the process of meiosis by which gametes, the egg cells in the mother and the sperm cells in the father, are generated.Ordinarily, each cell has 23 pairs of chromosomes; the gametes have 23 unpaired chromosomes.In meiosis, the 23 pairs are split so that each gamete receives 1 chromosome from each pair (Figures 8 and 9).Two gametes (egg and sperm) ultimately join into a single cell, the zygote, which has the full complement of 23 chromosome pairs restored.If all goes well, the zygote gives rise to a live offspring.Recombination (meiotic recombination)-The swapping of genetic material that occurs in the germline.During the formation of egg and sperm cells, also known as meiosis, paired chromosomes from each parent align so that similar DNA sequences from the paired chromosomes recombine with one another.Recombination results in a shuffling of genetic material and is an important cause of the genetic variation seen among offspring.Also known as crossing over.", + "In most plants and animals, sexis a necessary component of reproduction, and the question for evolutionary biologistsis why reproductive mechanisms have evolved that way. In one of the experimentsdescribed next, evolutionary geneticists have nevertheless devised a way to compareevolution with and without recombination in the obligately sexual fruit fly.Sex brings harmful alleles together into thesame genetic background, allowing selection to more efficiently purge them fromthe population and potentially producing some offspring that are fitter than eitherparent. However, the benefit of recombining deleterious mutations may depend on thenature of the epistatic interactions between them. The mutational deterministic hypothesis(Kondrashov 1988) depends partly on this epistasis.Longago, Wright (1931) noted that sex may destroy adaptation because a successful combination of characteristics is attained in individuals only to be broken up in the next generation by the mechanisms of meiosis itself. Similarly, if alleles at different loci werejointly responsible for the production of phenotypes, sex has the potential to break apartcoadapted gene complexes, as it moves alleles away from genetic backgrounds wherebeneficial epistatic interactions have evolved through natural selection. Why should sex therefore be so common, given the obvious costs?This disparity in investment is the basis for the twofold cost: asexualfemales hypothetically could transmit twice as many alleles at the same cost. In most plants and animals, mates tend to be unrelated, leading to outcrossing. Butsex usually also involves the basic process of physical recombination: the breakage andreunion of two different DNA or RNA molecules. Of these two processes, recombinationis clearly the more widespread feature of sexual reproduction. A variety of reproductivesystems, such as selfing and automixis, involve recombination but not outcrossing. Incontrast, relatively few reproductive systems have outcrossing without recombination.", + "However, we expect thatonly at this level, the most signicant contributions brought by integrating epigenetics will bemade. Concluding Remarks and FutureProspectsFish sex ratios are the result of a complex interaction between genetic, biochemical, and environmental interactions. The ultimate resultof these interactions at the individual level isgender: male or female. However, at the population level, the combination of sex determination and differentiation sets the sex ratio. Inturn, sex ratios dene the reproductive capacityof populations and, if sex growth dimorphismexists, also the growth characteristics, something very important in an aquaculture context.Traditionally, it has been agreed that thenal sex of an individual (phenotypic sex)depends on two sequential processes: the sexdetermination system of the species and thegonad differentiation process (Valenzuela,2008). However, recently, these two seeminglydistinct processes are viewed as part of a general process leading to gonad formation andsex ratios (Sarre et al. , 2004; Quinn et al. , 2011;Uller and Helantera, 2011).", + "Obehav is, in turn, influenced by offspring genesand environment (Ogene and Oenvir respectively). Hence, indirect genetic effects (blue arrows)and direct genetic effects (red arrow) are important influencers of behaviour. B) Parentoffspring conflict theory predicts that parental resource investment and offspring solicitationbehaviours are influenced by the fitness benefit to a focal individual (O), cost to a socialpartner such as a sibling (S1 and S2) or parent (P), and by their coefficient of relatedness(black arrows). 42Figure 2: Genomic imprinting can result in divergent phenotypes from the samegenotype. A) A paternally imprinted gene, i.e. maternally expressed.", + "Because of the small contribution, through the sperm, ofthe paternal transcriptome to the fertilized zygote, and because of the stronger maternal contributionto child rearing in most model organisms, parental effects are typically thought of as synonymous withmaternal effects, although true paternal effects are known to exist (Rando, 2012). Maternal effects have been shown to be important during embryonic development, leading todifferences in the birth weight of mice depending on the genotype of the mother (Cowley et al. ,1989; Wolf et al. , 2011).Therefore, the resulting phenotypic patterns lag a generationbehind the genetic transmission of the causal variants. The most well-studied parental genetic effectsare caused by deposition of maternal transcripts into the egg prior to fertilization, resulting indifferences in early embryonic development depending on the genotype of the mother. Certain geneshave also been shown to respond to maternal influence after birth through genetically definedmaternal behaviors (Weaver et al. , 2004).", + "It was believed by many that for each trait variant we should expect to find acorresponding genetic change, or gene for that trait. Through historical happenstance therelationship between genes and traits was set up and treated as if it were one-to-one. But theproduction of a trait involves not only genes, but also their interactions with each other and theenvironment, and chance." + ], + [ + "One of the major drawbacks in NGS technologies is the relatively small reads obtained, which are up to 500pb length in pyrosequencing (Roche 454 genome sequencer) and roughly 50pb for Illumina Gene Analyzer.In order to generate new sequencing technologies that boost the size of sequencing reads, great efforts have been made to develop single-molecule sequencing (SMS) platforms.In SMS sequencing, synthesis is accomplished by single DNA molecule arrays without the amplification step performed in NGS sequencing platforms.This no-amplification procedure allows an increase in the number of DNA molecules tested to improve the throughput, the absence of PCR reduces the price of sequencing, and this technology also increases the length of sequencing reads [12][13][14].", + "However, the quality and completeness of sequencesshould be considered when undertaking an analysis. For a finished sequence, theaccuracy is expected to be high; with less than one nucleotide error per 100 000 nucleotides and fewer than one insertion/deletion error per 200 000 nucleotides, the vastmajority of which are located in tandemly repetitive sequence (IHGSC, 2004), andthere should be no gaps in sequence coverage. The quality of draft sequences dependsto a large degree on the depth of coverage.", + "Thesignificantly longer error-corrected SMRT reads as comparedto other available technologies are expected to enable sequencing through complex repeats, close genomic gaps, allowhaplotype phasing, and catalog alternative splicing isoformsthat are generally out of reach for second-generation sequencing technologies. There are still many challenges ahead forthis third-generation sequencing technology particularly inincreasing sequencing accuracy and throughput. Lee Hood focused the challenge of high throughputdata centers on signal to noise issues as he addressedthe advances that have occurred in dealing with analyzing the complexity of biological systems (Tian et al. 2012).Michael Schatz and Tim Smith reported on the use of SMRTDNA along with other second-generation technologies tosequence and assemble various small genomes includingEscherichia coli strains and budding yeast. SMRT long readshave a mean length of 12 kb for most genomes tested withthe longest reads reaching up to 15 kb. The read accuracy isabout 85% with the vast majority of the errors correspondingto base insertions, ranging from one to several bases, followedby deletions and mismatches.", + "The improvements associated with 3rdGS technologies have led to advances in read quality but significant advances are required to overcome some limitations, such as fresh starting material or intact cells and, moreover, issues associated with high sequencing error rate Long-read sequencing technologies are particularly efficient in de novo genome assembly [62] or complex structural genomic sequencing.SMRT, for example, allowed the sequencing of long stretches of CGG as a Mycobacterium tuberculosis genome, characterized by high redundancy (65.6% of GC bases) or short tandem repeats (STPs) implicated in X fragile syndrome (FXS) [63,64].", + "On the other hand, short-insertpaired-end capability enables high-resolutionre-sequencing, transcriptome analysis (RANSeq) and miRNA discovery. Comparison of Illumina Sequencingand 454 PyrosequencingBoth Illumina and 454 Roche/Pyrosequencinghave been used in different rainbow troutgenome projects, which showed the pros andcons of each technology. The main differences are read length and throughput: Illumina sequencing generates relatively short sequence reads of approximately 100 bp, but withenormously high throughput of approximately20 Gb; in contrast, 454 sequencing generatesrelatively long reads of approximately 400 bp,but its throughput is only approximately 0.4 Gb.Based upon different choices ofsequencing technology, the sequencing yieldsand read lengths vary. Currently, three main next-generationsequencing platforms are widely used in theRNA-Seq, the 454, Illumina and ABI SOLiD. Among these platforms, the throughput variesfrom hundreds of thousands of reads for the454 system to hundreds of millions of readsfor the Illumina and ABI SOLiD systems(Marguerat and Bahler, 2010). The readlengths typically range from 30100 bp forIllumina and SOLiD to 200500 bp for 454.In general, Illumina and SOLiD platforms arerelatively inexpensive, while the 454 technologyoffers longer reads, but is more expensive perrun. Illumina, SOLiD and 454 technologiescan be combined in a hybrid assemblystrategy: short reads that are sequenced at agreater depth are assembled into contigs, andlong reads are subsequently used to scaffoldthe contigs and resolve variants (Martin andWang, 2011). Two main approaches can be used for RNASeq data analysis. One way is to map the resulting reads to a reference genome or referencetranscriptome. This is usually taken in wellstudied species with sequenced genome.The most problematic compromise of NGS isthe relatively shorter read length of 35400 bpas compared with 7501000 bp of thecapillary-based sequencing. Discussions on thischapter will mainly focus on the 454 Roche/pyrosequencing and the Illumina sequencingtechnologies that have been used in rainbowtrout studies. Detailed description of NGStechnologies can be found in the literature(Mardis, 2008b, 2008a; Shendure and Ji, 2008;Zhou et al. , 2010; Kircher and Kelso, 2010). 454 Roche/PyrosequencingSequencing workow of the 454 Roche GSFLX and GS Junior platforms consists of thefollowing basic steps (Figure 11.1A).These technologiesallow sequencing of millions of reads in a relatively short time (Voelkerding et al. , 2009). The assembly and sequencing analysis, as wellas the whole process, require signicant computational and economic investment, but thehigh volume of data allows a full genome tobe sequenced in a relatively short time. Severaldifferent platforms are available, including 454Life Sciences (Roche), Solexa (Illumina), Solid(Applied Biosystems), and single molecule sequencing (Helicos Biosciences). All platformsshare a common approach, involving massiveparallel sequencing of DNA molecules thatare spatially separated. (Brenner et al.Technologically, 454 Roche/Pyrosequencing ismore suitable for de novo characterization ofnonmodel genomes/transcriptomes, and Illumina is more suitable for re-sequencing knowngenomes, digital gene expression, and miRNAdiscovery. However, such technological capability difference is complicated by the cost associated with the sequencing platforms. Forinstance, Illumina HiSeq 2000 instrument cangenerate 20 Gb sequence per sequencing lanethat cost only $30004000, while one run of 454sequencing can only generate 400 million bpthat cost over $10,000.With theseNGS technologies, libraries are constructedand sequenced from amplied fragments ofsingle-stranded DNA. The most signicantadvantage of NGS technology is saving thetime-consuming, laborious, and equipmentassociated steps of cloning and colony pickingof the Sanger-based capillary sequencing. Thesecond most important advantage of NGS isthe tremendous increase in sequencing outputin terms of the number of reads (14800million reads) and total bases per run (0.4300 Gbp) compared with 96 reads of 7501000 bp each in the capillary-based sequencing.Although shorter reads produced by Illumina or SOLiD compared with the 454 technology may be more challenging for de novosequence assembly, the preexisting ESTs produced by Sanger sequencing can be used tofacilitate the assembly (Liu et al. , 2011), andthe algorithms for short reads de novo assembly are being developed (e.g. , Grabherret al. , 2011). Xiang et al. assembled the shortreads from Illumina RNA-Seq deep sequencing to generate the nonredundant consensuswhich is subsequently used as references forDGE prole analysis (Xiang et al. , 2010).", + "There are currently three general WGS strategies (Figure 2): (1) short-read WGS using the Illumina technology, which currently yields paired-end 150 bp reads with low error rates in the range of 0.1%-0.5%;(2) long-read WGS using singlemolecule technologies from Pacific Biosciences (PacBio) or Oxford Nanopore Technologies (ONT), which yield 10-100 kb reads-and occasionally much longer-with high error rates in the range of 10%-15%; and (3) linked-read WGS using the technology from 10X Genomics, which generates barcoded Illumina short-reads from longer molecules (e.g., 50 kb).Due to considerations of cost, ease of use, and accuracy, the overwhelming majority of human genetics studies employ short-read WGS using the Illumina HiSeq or NovaSeq platform, and we therefore focus primarily on analysis of this data type.", + "Currently, there are two main types of long-read technologies: single-molecule real-time sequencing approaches and synthetic approaches that rely on existing shortread technologies to construct long reads in silico.The single-molecule approaches differ from short-read approaches in that they do not rely on a clonal population of amplified DNA fragments to generate detectable Following cluster generation or bead deposition onto a slide, fragments are sequenced by ligation, in which a fluorophore-labelled two-base-encoded probe, which is composed of known nucleotides in the first and second positions (dark blue), followed by degenerate or universal bases (pink), is added to the DNA library.The two-base probe is ligated onto an anchor (light purple) that is complementary to an adapter (red), and the slide is imaged to identify the first two bases in each fragment.Unextended strands are capped by unlabelled probes or phosphatase to maintain cycle synchronization.Finally, the terminal degenerate bases and the fluorophore are cleaved off the probe, leaving a 5 bp extended fragment.The process is repeated ten times until two out of every five bases are identified.At this point, the entire strand is reset by removing all of the ligated probes and the process of probe binding, ligation, imaging and cleavage is repeated four times, each with an n + 1, n + 2, n + 3 or n + 4 offset anchor.b | Complete Genomics.DNA is sequenced using the combinatorial probe -anchor ligation (cPAL) approach.After DNA nanoball deposition, an anchor complementary to one of four adapter sequences and a fluorophore-labelled probe are bound to each nanoball.The probe is degenerate at all but the first position.The anchor and probe are then ligated into position and imaged to identify the first base on either the 3 or the 5 side of the anchor.Next, the probe-anchor complex is removed and the process begins again with the same anchor but a different probe with the known base at the n + 1 position.This is repeated until five bases from the 3 end of the anchor and five bases from the 5 end of the anchor are identified.Another round of hybridization occurs, this time using anchors with a five-base offset identifying an additional five bases on either side of the anchor.Finally, this whole process is repeated for each of the remaining three adapter sequences in the nanoball, generating 100 bp paired-end reads.Figure 5 | Real-time and synthetic long-read sequencing approaches.A | Real-time long-read sequencing platforms.Aa | Single-molecule real-time (SMRT) sequencing from Pacific Biosciences (PacBio).Template fragments are processed and ligated to hairpin adapters at each end, resulting in a circular DNA molecule with constant single-stranded DNA (ssDNA) regions at each end with the double-stranded DNA (dsDNA) template in the middle.The resulting 'SMRTbell' template undergoes a size-selection protocol in which fragments that are too large or too small are removed to ensure efficient sequencing.Primers and an efficient 29 DNA polymerase are attached to the ssDNA regions of the SMRTbell.The prepared library is then added to the zero-mode waveguide (ZMW) SMRT cell, where sequencing can take place.To visualize sequencing, a mixture of labelled nucleotides is added; as the polymerase-bound DNA library sits in one of the wells in the SMRT cell, the polymerase incorporates a fluorophore-labelled nucleotide into an elongating DNA strand.During incorporation, the nucleotide momentarily pauses through the activity of the polymerase at the bottom of the ZMW, which is being monitored by a camera.Ab | Oxford Nanopore Technologies (ONT).DNA is initially fragmented to 8-10 kb.Two different adapters, a leader and a hairpin, are ligated to either end of the fragmented dsDNA.Currently, there is no method to direct the adapters to a particular end of the DNA molecule, so there are three possible library conformations: leader -leader, leaderhairpin and hairpin-hairpin.The leader adapter is a double-stranded adapter containing a sequence required to direct the DNA into the pore and a tether sequence to help direct the DNA to the membrane surface.Without this leader adapter, there is minimal interaction of the DNA with the pore, which prevents any hairpin-hairpin fragments from being sequenced.The ideal library conformation is the leader-hairpin.In this conformation the leader sequence directs the DNA fragment to the pore with current passing through.As the DNA translocates through the pore, a characteristic shift in voltage through the pore is observed.Various parameters, including the magnitude and duration of the shift, are recorded and can be interpreted as a particular k-mer sequence.As the next base passes into the pore, a new k-mer modulates the voltage and is identified.At the hairpin, the DNA continues to be translocated through the pore adapter and onto the complement strand.This allows the forward and reverse strands to be used to create a consensus sequence called a '2D' read.B | Synthetic long-read sequencing platforms.Ba | Illumina.Genomic DNA templates are fragmented to 8-10 kb pieces.They are then partitioned into a microtitre plate such that there are around 3,000 templates in a single well.Within the plate, each fragment is sheared to around 350 bp and barcoded with a single barcode per well.The DNA can then be pooled and sent through standard short-read pipelines.Bb | 10X Genomics' emulsion-based sequencing.With as little as 1 ng of starting material, the GemCode can partition arbitrarily large DNA fragments, up to ~100 kb, into micelles (also called 'GEMs') along with gel beads containing adapter and barcode sequences.The GEMs typically contain ~0.3 copies of the genome and 1 unique barcode out of 750,000.Within each GEM, the gel bead dissolves and smaller fragments of DNA are amplified from the original large fragments, each with a barcode identifying the source GEM.After sequencing, the reads are aligned and linked together to form a series of anchored fragments across a span of ~50 kb.Unlike the Illumina system, this approach does not attempt to get full end-to-end coverage of a single DNA fragment.Instead, the reads from a single GEM are dispersed across the original DNA fragment and the cumulative coverage is derived from multiple GEMs with dispersed -but linked -reads.Part Aa is adapted from REF.18, Nature Publishing Group.Part Ba is adapted from REF.62.Synthetic long-reads.Unlike true sequencing platforms, synthetic long-read technology relies on a system of barcoding to associate fragments that are sequenced on existing short-read sequencers 61 .These approaches partition large DNA fragments into either microtitre wells or an emulsion such that very few molecules exist in each partition.Within each partition the template fragments are sheared and barcoded.This approach allows for sequencing on existing short-read instrumentation, after which data are split by barcode and reassembled with the knowledge that fragments sharing barcodes Barcodes A series of known bases added to a template molecule either through ligation or amplification.After sequencing, these barcodes can be used to identify which sample a particular read is derived from.Template fragments are processed and ligated to hairpin adapters at each end, resulting in a circular DNA molecule with constant single-stranded DNA (ssDNA) regions at each end with the double-stranded DNA (dsDNA) template in the middle.The resulting 'SMRTbell' template undergoes a size-selection protocol in which fragments that are too large or too small are removed to ensure efficient sequencing.Primers and an efficient 29 DNA polymerase are attached to the ssDNA regions of the SMRTbell.The prepared library is then added to the zero-mode waveguide (ZMW) SMRT cell, where sequencing can take place.To visualize sequencing, a mixture of labelled nucleotides is added; as the polymerase-bound DNA library sits in one of the wells in the SMRT cell, the polymerase incorporates a fluorophore-labelled nucleotide into an elongating DNA strand.During incorporation, the nucleotide momentarily pauses through the activity of the polymerase at the bottom of the ZMW, which is being monitored by a camera.Ab | Oxford Nanopore Technologies (ONT).DNA is initially fragmented to 8-10 kb.Two different adapters, a leader and a hairpin, are ligated to either end of the fragmented dsDNA.Currently, there is no method to direct the adapters to a particular end of the DNA molecule, so there are three possible library conformations: leader -leader, leaderhairpin and hairpin-hairpin.The leader adapter is a double-stranded adapter containing a sequence required to direct the DNA into the pore and a tether sequence to help direct the DNA to the membrane surface.Without this leader adapter, there is minimal interaction of the DNA with the pore, which prevents any hairpin-hairpin fragments from being sequenced.The ideal library conformation is the leader-hairpin.In this conformation the leader sequence directs the DNA fragment to the pore with current passing through.As the DNA translocates through the pore, a characteristic shift in voltage through the pore is observed.Various parameters, including the magnitude and duration of the shift, are recorded and can be interpreted as a particular k-mer sequence.As the next base passes into the pore, a new k-mer modulates the voltage and is identified.At the hairpin, the DNA continues to be translocated through the pore adapter and onto the complement strand.This allows the forward and reverse strands to be used to create a consensus sequence called a '2D' read.B | Synthetic long-read sequencing platforms.Ba | Illumina.Genomic DNA templates are fragmented to 8-10 kb pieces.They are then partitioned into a microtitre plate such that there are around 3,000 templates in a single well.Within the plate, each fragment is sheared to around 350 bp and barcoded with a single barcode per well.The DNA can then be pooled and sent through standard short-read pipelines.Bb | 10X Genomics' emulsion-based sequencing.With as little as 1 ng of starting material, the GemCode can partition arbitrarily large DNA fragments, up to ~100 kb, into micelles (also called 'GEMs') along with gel beads containing adapter and barcode sequences.The GEMs typically contain ~0.3 copies of the genome and 1 unique barcode out of 750,000.Within each GEM, the gel bead dissolves and smaller fragments of DNA are amplified from the original large fragments, each with a barcode identifying the source GEM.After sequencing, the reads are aligned and linked together to form a series of anchored fragments across a span of ~50 kb.Unlike the Illumina system, this approach does not attempt to get full end-to-end coverage of a single DNA fragment.Instead, the reads from a single GEM are dispersed across the original DNA fragment and the cumulative coverage is derived from multiple GEMs with dispersed -but linked -reads.Part Aa is adapted from REF. 18, Nature Publishing Group.Part Ba is adapted from REF. 62.There are currently two systems available for generating synthetic long-reads: the Illumina synthetic long-read sequencing platform (FIG.5c) and the 10X Genomics emulsion-based system (FIG.5d).The Illumina system (formerly Moleculo) partitions DNA into a microtitre plate and does not require specialized instrumentation.However, the 10X Genomics instruments (GemCode and Chromium) use emulsion to partition DNA and require the use of a microfluidic instrument to perform pre-sequencing reactions.With as little as 1 ng of starting material, the 10X Genomics instruments can partition arbitrarily large DNA fragments, up to ~100 kb, into micelles called 'GEMs' , which typically contain 0.3 copies of the genome and one unique barcode.Within each GEM, a gel bead dissolves and smaller fragments of DNA are amplified from the original large fragments, each with a barcode identifying the source GEM.After sequencing, the reads are aligned and linked together to form a series of anchored fragments across the span of the original fragment.Unlike the Illumina system, this approach does not attempt gapless, end-to-end coverage of a single DNA fragment.Instead it relies on linked reads, in which dispersed, small fragments that are derived from a single long molecule share a communal barcode.Although these fragments leave segments of the original large molecule without any coverage, the gaps are overcome by ensuring that there are many long fragments from the same genomic region in the initial preparation, thus generating a read cloud wherein linked reads from each long fragment can be stacked, combining their individual coverage into an overall map (FIG.5d).Comparison of single-molecule and synthetic longread sequencing. There is growing interest in the field of long-read sequencing, and each system has its own advantages and drawbacks (TABLE 1).Currently, the most widely used instrument in long-read sequencing is the PacBio RS II instrument.This device is capable of generating single polymerase reads in excess of 50 kb with average read lengths of 10-15 kb for a long-insert library.Such properties are ideal for de novo genome assembly applications 63 , for revealing complex longrange genomic structures 64 and for full-length transcript sequencing.There are, however, several notable limitations.The single-pass error rate for long reads is as high as 15% with indel errors dominating 65 , raising concerns about the utility of the instrument 66 .Fortunately, these errors are randomly distributed within each read and hence sufficiently high coverage can overcome the high error rate 67 .The use of a circular template by PacBio also provides a level of error correction.The more frequently a single molecule is sequenced, the higher the resulting accuracy -up to ~99.999% for insert sequences derived from at least 10 subreads 59,68 .This high accuracy rivals that of Sanger sequencing, leading researchers to speculate that this technology can be used in a manner analogous to Sanger-based SNP validation 65 .The runtimes and throughput of this instrument can be tuned by controlling the length of time for which the sensor monitors the ZMW; longer templates require longer times.For example, a 1 kb library that is run for 1 hour will generate around 7,500 bases of sequence per molecule, with an average of 8 passes, whereas a 4-hour run will generate around 30,000 bases per molecule and ~30 passes.Conversely, a 10 kb library requires a 4-hour run to generate ~30,000 bases with ~3 passes.The limited throughput and high costs of PacBio RS II (around $1,000 per Gb), in addition to the need for high coverage, place this instrument out of reach of many small laboratories.However, in an attempt to ameliorate these concerns, PacBio has launched the Sequel System, which reportedly has a throughput 7 that of the RS II, thus halving the cost of sequencing a human genome at 30 coverage 69 .Single-end and paired-end sequencingIn single-end sequencing, a DNA template is sequenced only in one direction.In paired-end sequencing, a DNA template is sequenced from both sides; the forward and reverse reads may or may not overlap.A deviation in the expected genome alignment between two ends of a paired-end read can indicate astructural variation." + ], + [ + "A good starting point is with the databases called Integrated Microbial Genomes (IMG; http://img.jgi.doe. gov/) or National Center for Biotechnology Information (NCBI; http://www.ncbi.nlm.nih.gov/). Many genomes identified inIMG as belonging to the Genome Encyclopedia of Bacteria and Archaea (GEBA; http://jgi.doe.gov/our-science/science-programs/microbial-genomics/phylogenetic-diversity/) project are relatively unstudied. In Step 2, either the instructor or the student chooses an annotation approach. A few of the annotation questions that one might ask using GENI-ACT are these: An initial automated gene call is made following sequencing. Is the automated gene call accurate?", + "The GO Consortium coordinates an effort to maximize the utility of a large and representative set of key genomes, which we refer to as reference genomes.The Reference Genome project has two aspects: (i) to encourage complete and precise annotations of the proteins for the species widely used as model organisms; and (ii) to provide inferred annotations for proteins for which no experimental data are available [4].We describe here the homology-based method and software we have developed to achieve those goals.", + "Although comparative immunogenomic analyses clearly benefit from highly quality WG assemblies, costs may still prevent large-scale analyses involving many species.Nevertheless, the availability of high-quality reference genomes for select species within target clades can enable much more accurate assembly and annotation of other species using SRseq, providing a way forward whilst minimising costs (e.g., [65], Figure 2, Key figure).", + "Each of these has advantages anddisadvantages, and, although the main genome portals are generally consistent, theymay not give the same answers in every case. Bearing in mind that only the first two ofthese gene location methods are based on stable (almost) unambiguous information,it is better to use at least two ways to define and store the results: for example, a sectionof raw sequence and a gene name, or a primary accession number and a set of genomecoordinates. The BACE1 gene will be used as an example of a known gene to locate.This problem can be overcomeby generating several multiple sequence alignments, one with each of the sequencesunder study as the reference. This solution is time-consuming, raises the additionalproblem of integrating results between alignments, and exposes the second majordrawback to the reference sequence approach; that is, the potential for inconsistencieswhen using alternate sequences as the reference. A solution to the problems presented by reference sequence-based alignment andanalysis has been proposed in the form of a threaded blockset (Blanchette et al. ,2004).", + "Toachieve this goal, we integrated and make available big reference datain chapters 2 and 3, bridged model organism to human data in chapter4, translated generic methods into clinical applications in chapters 5and 6, and developed a platform to bring innovations into practice inchapter 7. The resources currently available are already plentiful, and both theamount and types of molecular life science data is growing at a tremendous pace.Interpretation can be sped up using the huge amount of useful information collected by laboratories, public databases and biobanks. Unfortunately, for now, all these sources of useful data cannot be easily integrated and explored in unison. Further, while many innovative analysismethods emerge from research on a regular basis, a lack of standardization makes it dicult to adopt, share, compare and validate them inpractice. Here we report a lightweight framework for genome interpretationpipelines that aims to enable rapid implementation and adaptation ofanalysis protocols that integrate reference annotation data (e.g. ClinVar, ExAC, GoNL), run best-practice analysis tools (e.g.", + "Links to external resources, including genome browsers, sequence databases, protein structure databases, and many other webresources, make it possible to place results in a rich bioinformatic context. It is practical to integrate data across allof these domains and scales for the simple reason that datahave been collected from a common genetic referencepanel of strains. The integration of data across domains in the RI strainsalso allows us to easily study pleiotropic effects of variability in gene expression.", + ", 2012), or the 1001 Genomes Project for A. thaliana (1001 Genomes Consortium,2016), may even be better suited for similar analyses. In thecase of humans, however, it is almost impossible to simultaneously phenotype individuals and sample multi-tissue andmulti-omic data, while controlling the environmental sources ofvariation. Assessing the use of these tools may require cohortsthat have extensive multi-omics datasets available or have relevant samples biobanked, e.g. , the Framingham Heart Study(Mahmood et al. , 2014). Imputation of gene expression in deeptissues from either reference transcriptome datasets (Gamazonet al.", + "The use oftrue strain-specific genomes for read alignment, rather thanthe reference genome or imputed pseudogenomes, willpose new analytical challenges. It will also offer theopportunity to capture biological signals which are notapparent in the present framework. One remaining gap in the CC infrastructure is the lackof a centralized, public platform for sharing and integrating phenotype data on CC lines. The Mouse PhenomeDatabase (http://phenome.jax.org/) (Grubb et al.The genome assembly and annotations are made available to the community via many onlinegenome browsers, the most popular of which are hosted bythe University of California at Santa Cruz [UCSC, (Karolchik et al. 2014)] and Ensembl (Flicek et al. 2013). Use ofa single haploid reference sequence as an anchor for allstudies of genetic variation in mouse offers many practicaladvantages. But the dependency on a reference genomerequires several assumptions about the nature of geneticvariation which may be violated in practicethe strongestof which is that of genomic collinearity (i.e.", + "Each of these has advantages anddisadvantages, and, although the main genome portals are generally consistent, theymay not give the same answers in every case. Bearing in mind that only the first two ofthese gene location methods are based on stable (almost) unambiguous information,it is better to use at least two ways to define and store the results: for example, a sectionof raw sequence and a gene name, or a primary accession number and a set of genomecoordinates. The BACE1 gene will be used as an example of a known gene to locate.This problem can be overcomeby generating several multiple sequence alignments, one with each of the sequencesunder study as the reference. This solution is time-consuming, raises the additionalproblem of integrating results between alignments, and exposes the second majordrawback to the reference sequence approach; that is, the potential for inconsistencieswhen using alternate sequences as the reference. A solution to the problems presented by reference sequence-based alignment andanalysis has been proposed in the form of a threaded blockset (Blanchette et al. ,2004).", + "The large number of bioinformatic tools that have beenmade available to scientists during the last few years has presented theproblem of which to use and how best to obtain scientifically valid answers(3). In this chapter, we will provide a guide for the most efficient way toanalyze a given sequence or to collect information regarding a gene, protein,structure, or interaction of interest by applying current publicly available software and databases that mainly use the World Wide Web.", + "The genome assemblyand annotations are made available to the community via many online genome browsers, themost popular of which are hosted by the University of California at Santa Cruz [UCSC,(Karolchik et al. 2014)] and Ensembl (Flicek et al. 2013). Use of a single haploid referencesequence as an anchor for all studies of genetic variation in mouse offers many practicaladvantages. But the dependency on a reference genome requires several assumptions aboutthe nature of genetic variation which may be violated in practicethe strongest of which isthat of genomic collinearity (i.e.Detailed knowledge of the subspecies contributions to CC genomes, obtained by integratingCC lines haplotype mosaics with data from the Mouse Phylogeny Viewer, will be critical tothis effort. Author ManuscriptAuthor ManuscriptMost of the resources discussed in this review ultimately depend on the mouse referencegenome. A high-quality, well-annotated reference assembly for any model organism isextremely valuable for the research community. In addition to the genomic sequence itself, areference genome provides a backbone for annotation and a common coordinate system toanchor genetic maps.", + "Each of these has advantages anddisadvantages, and, although the main genome portals are generally consistent, theymay not give the same answers in every case. Bearing in mind that only the first two ofthese gene location methods are based on stable (almost) unambiguous information,it is better to use at least two ways to define and store the results: for example, a sectionof raw sequence and a gene name, or a primary accession number and a set of genomecoordinates. The BACE1 gene will be used as an example of a known gene to locate.This problem can be overcomeby generating several multiple sequence alignments, one with each of the sequencesunder study as the reference. This solution is time-consuming, raises the additionalproblem of integrating results between alignments, and exposes the second majordrawback to the reference sequence approach; that is, the potential for inconsistencieswhen using alternate sequences as the reference. A solution to the problems presented by reference sequence-based alignment andanalysis has been proposed in the form of a threaded blockset (Blanchette et al. ,2004).", + "The general conclusion of this comparison is thus that co-assembly and co-binning approaches would be useful for retrieving substantially more genomes in relatively long (>5) subject-specific time series, whereas the potential advantage of retrieving more low-abundance species in a cross-sectional co-assembly is overcome by the disadvantage of having to use more complex approaches such as DESMAN to resolve the strain variation.That is perhaps more appropriate where the aim is to extract as much information as possible from a single study rather than to produce a single comprehensive high fidelity strain catalog.Because time series comprising more than 5 samples from the same subject and body site are very rare in the available cohorts (only 70 individuals -i.e., 1.0% -in our database), co-assembly is not considered in the present work as it would not provide advantages.", + "Each of these has advantages anddisadvantages, and, although the main genome portals are generally consistent, theymay not give the same answers in every case. Bearing in mind that only the first two ofthese gene location methods are based on stable (almost) unambiguous information,it is better to use at least two ways to define and store the results: for example, a sectionof raw sequence and a gene name, or a primary accession number and a set of genomecoordinates. The BACE1 gene will be used as an example of a known gene to locate." + ] + ], + "task_id": [ + "847F1E1599EECDE92F99B7581728FFE8", + "037BAB6CB2DE7A42AAAA73CE5DA8DB73", + "B2AA6DE557D652A0A660C4E0FAC1124D", + "7EC697DE62C0C57E601EC3F5B295DF61", + "0A6673A0B69F0FF9C9657FB797DD1FE2" + ] +}
\ No newline at end of file diff --git a/gnqa/data/study1/datasets/human/dataset_domainexpert_general_6.json b/gnqa/data/study1/datasets/human/dataset_domainexpert_general_6.json new file mode 100644 index 0000000..c2a25f9 --- /dev/null +++ b/gnqa/data/study1/datasets/human/dataset_domainexpert_general_6.json @@ -0,0 +1,65 @@ +{ + "question": [ + "Is all genetic regulation done through DNA (e.g., prompters, repressors, activators) or are there other forms of genetic regulation? Please make you answer accessible to a non-expert.", + "What are the different relationship between traits?", + "Can landscape of QTL and GWAS hits be used to find relationships between traits ?" + ], + "answer": [ + "Genetic regulation is not only done through DNA elements like promoters, repressors, and activators. It also involves other components such as transcription factors, which are proteins that control the rate of transcription of genetic information from DNA to messenger RNA. Additionally, non-coding RNAs, which do not code for proteins, play a significant role in gene regulation. There are also epigenetic factors, which influence gene expression without changing the DNA sequence. These include chemical modifications to the DNA or proteins associated with it. So, genetic regulation is a complex process involving multiple elements and layers of control.", + "The relationships between traits can be described by four basic models: one-to-one, where one gene gives rise to one trait; one-to-many, where one gene affects many traits (pleiotropy); many-to-one, where many genes affect one trait (polygeny); and many-to-many, where multiple genes interact to influence multiple traits. Additionally, traits can also be related through genetic correlation, where the directions of effect are consistently aligned. Furthermore, traits can be interconnected through complex developmental processes and environmental interactions.", + "Yes, the landscape of QTL and GWAS hits can be used to find relationships between traits. This is done by mapping genome regions to variation in a large number of traits, thereby inferring biological relationships between those traits and connecting them into networks. This approach can help identify the genetic basis of variation in complex traits." + ], + "contexts": [ + [ + "At the intermediate level, there are regulatory unitsmade up of multiple components, such as gene-promoter pairs. At the highest level, regulatory units interact to create a particulargene circuit, e.g. , two gene-promoter pairs can be arranged in amutually inhibitory network to create a genetic toggle switch. Ateach of these levels, one can identify sequence representationsthat define certain aspects of regulation and control, as well ascompositional relationships (e.g. , spatial arrangement and orientation) and interactions between biomolecules, molecular components, and/or sub-components that impact functional outputsand behaviors.", + "These regulatory programs are apparent across a variety of jointcontributions, from the independent contribution of each of the regulatory mechanisms to acooperative contribution of several mechanisms. A regulatory program may include a varietyof mechanisms such as transcription factors, chromatin remodeling complexes, and promoterregulatory elements. Natural genetic variations may provide important insights into regulatory programs. Inparticular, transcription profiles can be integrated with genotypic data across a population toidentify genomic loci that have an effect on gene expression (Mackay et al. , 2009), and hence itis possible to use these loci as potential regulatory mechanisms.", + "During the development,genes are turned on and off in a pre-programmed fashion, a process orchestratedby TFs, whose binding sites aggregate in the promoters near their controlled genes. A combinatorial control is achieved via different combinations of ubiquitous andcell-specific regulatory factors. Moreover, genes can initiate transcription at multiple loci (alternative promoters), creating RNA isoforms with different 5 regions. Alternative promoters are potentially important for gene-expression regulation orgenerating different protein products. Complex regulation in vivo can also involvemany more features, such as enhancers, locus control regions (LCRs), and/or scaffold/matrix attachment regions (S/MARs).Tightly regulated gene expression for specific cell types and developmental stages inresponse to different physiological conditions is driven by the orchestration of complex and multilayered gene regulatory networks (GRNs) (Maniatis and Reed, 2002). Inferring GRNs is of fundamental importance and a great challenge for molecularbiologists and geneticists. Mutations, including point mutations, insertions and deletions, translocations,and duplications, play critical roles in determining biological phenotypes and disease susceptibilities by perturbing the GRNs. Among them, single nucleotide polymorphisms (SNPs) generated by point mutations occur approximately one per 1000bases and are the predominant variations in man.", + "Gene expression directs the process of cellular differentiation, in which19specialized cells are generated for the different tissue types. The regulation of gene expression (i.e. gene regulation) controls the amount and timing of changes to the geneproduct. This is the basic mechanism for modifying cell function and thereby the versatility and adaptability of an organism. Therefore, gene expression and regulation functionas a bridge between genetic makeup and expression of observable traits. Despite its vital importance, determining the precise roles of given transcripts remainsa fundamental challenge.", + "INTRODUCTIONThe field of gene regulation is currently undergoing a renaissance.With the successful annotation of most of the protein-coding portion of the human genome [1], the focus of much research has shifted toward deciphering the regulatory logic governing the temporal, spatial and quantitative aspects of gene expression that is embedded in the remaining 98% of DNA that does not encode for protein [2].A flurry of papers stemming, in large part, from two broad areas of investigation has recently made a significant impact on the field of gene regulation.The first revolves around the genetic basis of human disease.Fueled by the power of linkage and genome-wide association studies, an ever-expanding list of human diseases has been associated with single nucleotide polymorphisms (SNPs) residing in noncoding regions of the genome [3].These disease-associated SNPs are thought to directly control some aspect of target gene expression, or are linked to other DNA variants that possess regulatory activity.In a small but growing number of cases, the regulatory SNPs identified in human genetic studies have led to the identification of disease susceptibility loci and have served as useful entry points for unraveling the complexities of the gene regulatory landscape (Table 1) [3].The second line of investigation that has revitalized gene expression research relates to the development of functional genomic approaches to screen noncoding DNA for regulatory potential.Genome-wide surveys of sequence conservation [4][5][6], histone modifications [7][8], DNAse I hypersensitivity [9] and DNA structure [10], have all significantly improved the detection of functional cis-acting regulatory sequences.This review will highlight recent examples from the literature that have successfully integrated genetic and genomic approaches to uncover the molecular basis by which cis-regulatory mutations alter gene expression and contribute to human disease.", + "Complexity of gene regulationGene regulation is a complex multi-layered process involving numerous proteins and non-coding RNAs which may act at a great distance from their target gene.Elaborate multi-protein/RNA complexes must be assembled at the site of regulation.The regulatory mechanism may be intricate and variable, potentially involving transcript rearrangement and mRNA degradation.It is now clear that RNA has a diverse set of functions and is more than just a messenger between gene and protein.The mammalian genome is extensively transcribed, giving rise to thousands of RNA transcripts that are never translated into proteins.Whether all of these transcripts are functional is currently debatable, but it is evident that these include families of RNA molecules with a regulatory function [34].The presence of a gene expression change, which is strongly correlated with relevant physiological changes, in the absence of proximate significant GWAS signals, suggests that relatively distant regulatory variants (and potentially many such variants) may act in combination to regulate the expression of the target gene of interest.Such putative gene expression-modulating variants could potentially act upon target gene expression through the mediation of non-protein-coding regulatory RNAs.For example, recent studies have shown that the expression of many genes is modulated by small interfering RNAs (siRNAs) and micro-RNAs, e.g.reviews by [10,30], which do not encode proteins.In addition to microRNAs, many nonprotein-coding RNA species (or \"RNA genes\"), such as long noncoding RNAs [42], are transcribed from the genome.Thus, there is compelling evidence that most of the genome may be transcribed [5,6,9,19,38,53,58,59,62] and the potential role of non-protein coding RNA genes in the modulation of protein-coding gene expression remains to be fully evaluated.", + "Transcription factors that bind to DNA recognize this sequence and use it to correctly position RNA polymerase, the enzyme that actually generates the transcript.Other sequences, called enhancers and repressors, speed up and slow down, respectively, the rate of transcription.Enhancer and repressor sequences can be quite distant from the gene's coding region.Other transcription factors recognize these sequences and further control how much and how fast mRNA is generated.All of these sequences are part of a gene and are required to generate the many proteins that control the overall maintenance and general metabolism of all of our cells.Genes that are expressed in all cell types, such as RNA polymerase and transcription factors, are called housekeeping genes.Concepts in the 21st Century: Genetic and Epigenetic Regulation of Gene ExpressionWe now know that only about 1% of our genome encodes proteins.Alternative splicing is the primary mechanism by which our approximately 20,000 genes can code for hundreds of thousands of proteins.Alternative splicing refers to modification of the primary mRNA produced during transcription (Figure 8).Only a portion of the transcript contains sequences that are translated into a protein.Introns, or intervening sequences, are removed after transcription, and the remaining sequences, known as exons, are spliced together.One transcript can be processed in multiple ways, such that different combinations of exons can be spliced together, producing many different proteins from the same primary transcript.The discovery of alternative splicing has changed our thinking about the central dogma because we now know that the concept of one gene encoding one protein is not true.", + "Of the total 20,000-25,000 protein-coding genes, occupying only 1.2% of the human genome, about six percent are functionally classified as TFs [8].However, some 93% of our genome is transcribed, by far the greatest part expressed as non-protein-coding RNAs (ncRNA), including the miR-NAs [9].An order of magnitude more numerous than all the proteins which make up living organisms are the transcrip-*Address correspondence to this author at the School of Medicine, University of Louisville, 580 S. Preston St., Louisville, KY 40202, USA; Tel: 502-852-2554; Fax: 502-852-2555; E-mail: Eugenia.Wang@Louisville.edution start sites (TSSs), located in promoter-proximal element regions, as well as an increasing number of putative promoter-distal elements, identified by the pilot ENCODE project [9].These recent findings, together with the fact that nonprotein-coding genomic sequence elements-such as miR-NAs-predominate and are evolutionarily conserved in our genome, challenge our traditional understanding of the definition of a gene, which has been generally considered a unit of genome sequence that is transcribed to produce a protein product for a given cellular function.Nevertheless, as the ENCODE consortium suggests, a gene may be defined as \"a union of genomic sequences encoding a coherent set of potentially overlapping functional products\" that eventually orchestrate the complex regulation and function of the host organism's cellular activities [10].An even bolder scenario is proposed by John S. Mattick, who suggests that the genome may consist largely of massively embedded RNA coding sequences directing regulatory networks, which may have co-evolved with proteins.These two complementary genomic sets may ultimately form the interacting RNAprotein regulatory networks which control the complex layers of signaling communication within all cells [11,12].Thus, the intriguing notion of epigenomic regulation of essential processes such as cell proliferation, differentiation, apoptosis, etc., characterized by feed-forward RNA regulatory networks, is becoming increasingly important in our appreciation of the epigenetic information required for the development of multi-cellular organisms [11].In this report, we focus our discussion on the suggestion that derailment of the RNA-protein interaction, and its subsequent impact on the regulatory networks which they direct, may constitute a significant fraction of the molecular mechanisms controlling the aging process.", + "During the development,genes are turned on and off in a pre-programmed fashion, a process orchestratedby TFs, whose binding sites aggregate in the promoters near their controlled genes. A combinatorial control is achieved via different combinations of ubiquitous andcell-specific regulatory factors. Moreover, genes can initiate transcription at multiple loci (alternative promoters), creating RNA isoforms with different 5 regions. Alternative promoters are potentially important for gene-expression regulation orgenerating different protein products. Complex regulation in vivo can also involvemany more features, such as enhancers, locus control regions (LCRs), and/or scaffold/matrix attachment regions (S/MARs).Tightly regulated gene expression for specific cell types and developmental stages inresponse to different physiological conditions is driven by the orchestration of complex and multilayered gene regulatory networks (GRNs) (Maniatis and Reed, 2002). Inferring GRNs is of fundamental importance and a great challenge for molecularbiologists and geneticists. Mutations, including point mutations, insertions and deletions, translocations,and duplications, play critical roles in determining biological phenotypes and disease susceptibilities by perturbing the GRNs. Among them, single nucleotide polymorphisms (SNPs) generated by point mutations occur approximately one per 1000bases and are the predominant variations in man.Gene expression regulation can take place at any step during the path of expression, including transcription, mRNA splicing and processing, export and subcellularlocalization, translation and post-translational modifications. These steps are oftencoupled with each other (Maniatis and Reed, 2002). Currently, it is still too early tobuild comprehensive and accurate dynamic models for truly realistic GRNs. The majority of computational methods attempt to detect cis-trans relationships, the basicbuilding blocks of GRNs, by modern statistical or machine learning approaches.", + "Other possible regulatory regions includeenhancers and silencer etc. In the coding regions of a gene, Triplets of nucleotides,known as codons, each encode for one of 20 amino acids or a signal. 3The process that a ribonucleic acid (RNA) synthesized from DNA is calledtranscription. One strand of DNA is served as template during transcription. The RNAtranscribed from the template DNA is identical in sequence with the other strand of theDNA which is called coding strand.", + "Third, instructions encoded within the embryos DNA can directly control if, andwhen, a nearby gene becomes activated; this is known as cis-regulation. Finally, similar instructionscan also control genes that are situated elsewhere in the embryos DNA through indirectmechanisms; this is known as trans-regulation. Now, Spies, Smith et al. have investigated these four processes in the offspring of two differentstrains of mice, one originally from Europe and the other from Southeast Asia. The two strains werecrossbred and the resulting embryos were analyzed to see which of the four processes affected geneactivity.", + "During the development,genes are turned on and off in a pre-programmed fashion, a process orchestratedby TFs, whose binding sites aggregate in the promoters near their controlled genes. A combinatorial control is achieved via different combinations of ubiquitous andcell-specific regulatory factors. Moreover, genes can initiate transcription at multiple loci (alternative promoters), creating RNA isoforms with different 5 regions. Alternative promoters are potentially important for gene-expression regulation orgenerating different protein products. Complex regulation in vivo can also involvemany more features, such as enhancers, locus control regions (LCRs), and/or scaffold/matrix attachment regions (S/MARs).Tightly regulated gene expression for specific cell types and developmental stages inresponse to different physiological conditions is driven by the orchestration of complex and multilayered gene regulatory networks (GRNs) (Maniatis and Reed, 2002). Inferring GRNs is of fundamental importance and a great challenge for molecularbiologists and geneticists. Mutations, including point mutations, insertions and deletions, translocations,and duplications, play critical roles in determining biological phenotypes and disease susceptibilities by perturbing the GRNs. Among them, single nucleotide polymorphisms (SNPs) generated by point mutations occur approximately one per 1000bases and are the predominant variations in man.", + "REGULATION OF GENE EXPRESSIONApart from the protein coding sequences, there are other biologically relevant nucleic acid sequences that play other important roles in the genome such as regulation of gene expression and maintenance of the chromatin structure (Pique-Regis et al., 2011).Regulation of gene expression involves a process that leads to increase or decrease in the production of specific proteins (Jacob and Monod, 1961).It is an important aspect of the cell because it increases the versatility and adaptability of an organism by allowing the cell to produce proteins only when they are needed (Payankaulam, 2010;Jacob and Monod, 1961).Gene expression is regulated at the level of transcription (described in 2.8), which can only occur if transcription factors bind to the DNA.Binding occurs within special nucleotide sequences called regulatory regions that are usually several hundred base pairs long (Lodish et al., 2000).Regulatory regions surround transcription start sites (TSSs) of genes apart from some sequences called enhancers that are located far upstream or downstream of their target gene (Birney et al., 2007;Dineen et al., 2007).", + "During the development,genes are turned on and off in a pre-programmed fashion, a process orchestratedby TFs, whose binding sites aggregate in the promoters near their controlled genes. A combinatorial control is achieved via different combinations of ubiquitous andcell-specific regulatory factors. Moreover, genes can initiate transcription at multiple loci (alternative promoters), creating RNA isoforms with different 5 regions. Alternative promoters are potentially important for gene-expression regulation orgenerating different protein products. Complex regulation in vivo can also involvemany more features, such as enhancers, locus control regions (LCRs), and/or scaffold/matrix attachment regions (S/MARs).Tightly regulated gene expression for specific cell types and developmental stages inresponse to different physiological conditions is driven by the orchestration of complex and multilayered gene regulatory networks (GRNs) (Maniatis and Reed, 2002). Inferring GRNs is of fundamental importance and a great challenge for molecularbiologists and geneticists. Mutations, including point mutations, insertions and deletions, translocations,and duplications, play critical roles in determining biological phenotypes and disease susceptibilities by perturbing the GRNs. Among them, single nucleotide polymorphisms (SNPs) generated by point mutations occur approximately one per 1000bases and are the predominant variations in man." + ], + [ + "Examples of economically important traits, their heritabilities, and relative economic values.", + "Genetic correlation is different from pleiotropy.Two traits have a pleiotropic relationship if many variants affect both.Genetic correlation is a stronger condition than pleiotropy: to exhibit genetic correlation, the directions of effect must also be consistently aligned.", + "This means that it is the developmentalbasis of trait integration, not simply the strength of the genetic correlations and observable patterns of covariation among traits, that will affect how components of a scalingrelationship can evolve. Although these powerful phenotype landscape models have generated important insights into the evolution of complex traits such as scaling relationships, they are difficult totest empirically (see Rice 2008).A, Shape variation within a group of organisms isshown by a line fit to a data cloud representing the size of two traits for a group of organisms. Shapevariation within a group of organisms is shown by a line (dark line) fit to a data cloud (gray elipse)representing the size of two traits for a group of organisms, in this case the brain-body size relationshipin humans (data from Koh 2005). B, Scaling relationships are divided into three classes based on thepattern of variation they describe.At the phenotypic level, detailed studies of physiology, morphology, and biochemistry canelucidate whether a higher-level trait has evolved via changes in different subordinate traits. At the genetic level, a first-pass black box approach to determine whether different genesunderlie the response to selection in replicate lines is to cross those lines and examine thetraits of interest in the F1, F2, and/or backcross populations (see also Rhodes and Kaweckithis volume).Particularly relevant for the evolution of scaling relationships,these models have revealed that the developmental basis of genetic correlations (e.g. , thedegree to which a given genetic correlation results from additive or nonadditive epistaticinteractions among traits) can profoundly affect the evolutionary malleability of the correlation, trait covariation, and the evolutionary trajectory of the complex phenotype (Wolfet al. 2001, 2004; Rice 2002, 2004a, 2008).The phenotype landscapeapproach has been extended to connect with existing quantitative genetic treatments ofmultivariate evolution, yielding an emergent theory exploring how developmental integration, or entanglement, among traits affects the symmetry and rates of trait evolution;the evolution of heritabilities; the impact of genetic correlations on evolutionary trajectoriesacross different time scales; the evolutionary relationships among trait means, variances,THE EVOLUTION OF ANIMAL FORM437and covariances; and the distribution of traits in phenotypic space (Wolf et al. 2001,2004; Rice 2004b, 2008).", + "In contrast, and consistently with our goal of identifying novel relationships among traits, module nos. 3, 4 and 5 suggest previously unknown connections between traits. We next characterized pairs of traits within each group of traits (trait pairs) to show that thequality of these pairs is not lower than in existing methods. We focused on three main properties oftrait pairs: the correlation among traits in a pair; the correlation between a trait pair and thetranscripts; and the knowledge-based relationships among traits.However, in most cases the genes and molecular mechanisms involved are not yet known so it ismore difficult to work out how the traits are connected. Computing techniques make it possible to assess the relationships between hundreds orthousands of traits at the same time. These high volume analyses can also allow scientists to identifyless obvious relationships that might be missed in more traditional types of study. Here, Oren et al. created a new computer algorithm to identify related traits, their shared geneticbasis, and the molecular mechanisms behind them.", + "This means that it is the developmentalbasis of trait integration, not simply the strength of the genetic correlations and observable patterns of covariation among traits, that will affect how components of a scalingrelationship can evolve. Although these powerful phenotype landscape models have generated important insights into the evolution of complex traits such as scaling relationships, they are difficult totest empirically (see Rice 2008).A, Shape variation within a group of organisms isshown by a line fit to a data cloud representing the size of two traits for a group of organisms. Shapevariation within a group of organisms is shown by a line (dark line) fit to a data cloud (gray elipse)representing the size of two traits for a group of organisms, in this case the brain-body size relationshipin humans (data from Koh 2005). B, Scaling relationships are divided into three classes based on thepattern of variation they describe.At the phenotypic level, detailed studies of physiology, morphology, and biochemistry canelucidate whether a higher-level trait has evolved via changes in different subordinate traits. At the genetic level, a first-pass black box approach to determine whether different genesunderlie the response to selection in replicate lines is to cross those lines and examine thetraits of interest in the F1, F2, and/or backcross populations (see also Rhodes and Kaweckithis volume).Particularly relevant for the evolution of scaling relationships,these models have revealed that the developmental basis of genetic correlations (e.g. , thedegree to which a given genetic correlation results from additive or nonadditive epistaticinteractions among traits) can profoundly affect the evolutionary malleability of the correlation, trait covariation, and the evolutionary trajectory of the complex phenotype (Wolfet al. 2001, 2004; Rice 2002, 2004a, 2008).The phenotype landscapeapproach has been extended to connect with existing quantitative genetic treatments ofmultivariate evolution, yielding an emergent theory exploring how developmental integration, or entanglement, among traits affects the symmetry and rates of trait evolution;the evolution of heritabilities; the impact of genetic correlations on evolutionary trajectoriesacross different time scales; the evolutionary relationships among trait means, variances,THE EVOLUTION OF ANIMAL FORM437and covariances; and the distribution of traits in phenotypic space (Wolf et al. 2001,2004; Rice 2004b, 2008).", + "As outlined by Lewontin (2011), the relationship between genotype and phenotype can be described by four basic models that have been, and still are, used in genetics: one-to-one, one-to-many, many-to-one, and many-to-many (see Fig. 1).The first goes back to the unit factor theory at the beginning of the twentieth century, i.e., one gene gives rise to one trait (Mayr 1982).The second model describes one gene affecting many traits (pleiotropy), while the third model accounts for many genes affecting one trait (polygeny).It is undoubtedly correct that every part of the genome is connected causally with the phenome (a set of phenotypes) by at least some molecular mechanistic pathways, but there is variation in this relation, which can make all of these four models valid at least for some cases.But generally for most eukaryotic organisms, model 4 (many-to-many) is the most acceptable description for most cases of the relationship between phenotype and genotype (Lewontin 2011).And often, the many-to-many model is insufficient, since genes and environment are usually both involved in the development of phenotypes, as captured by the norm-of-reaction concept (see, e.g., Falk 2001).", + "At the otherend are traits, such as growth, which are likely to be affected by many genes that each contributea small portion to the overall phenotype. Between these two extremes are traits that areregulated by more than one genetic locus (and are possibly also influenced by environmentalfactors), which show several intermediate phenotypes. Generally, the more loci that areinvolved in determining a quantitative trait, the more difficult it is to map and identify all ofthe causative QTLs.", + "Genetic Correlations Among Multiple TraitsWhen a sufficient number of traits have been tested in the same inbred strains, the geneticrelationships among the traits can be determined and a genetic framework developed usingmultivariate statistical methods. A growing literature of SI and RI strain surveys exists, withonline resources to search these data and to directly compare previous and new strain surveysMethods Mol Biol. Author manuscript; available in PMC 2011 January 1. Lariviere and MogilPage 4NIH-PA Author Manuscript(e.g. , http://www.jax.org/phenome, http://www.genenetwork.org).", + "However, common practice in genetics treats this relationshipas a straightforward one-to-one mapping from genotype to phenotype. The roots of this practicecan be traced to Mendel who chose traits with a direct relationship between genetic variation andphenotypic variation in formulating his particulate theory of inheritance. It has been furthersolidified by the successes of modern genetics in identifying genes involved in many simpleWtraits, such as rare human diseases. However, most traits are not simple and to understandcomplex traits it is necessary to decipher the developmental processes that occur between genesIEand traits.It was believed by many that for each trait variant we should expect to find acorresponding genetic change, or gene for that trait. Through historical happenstance therelationship between genes and traits was set up and treated as if it were one-to-one. But theproduction of a trait involves not only genes, but also their interactions with each other and theenvironment, and chance.Two approaches to understanding the genotype-phenotype relationship are describedand examples given of how both lead to a many-to-many relationship. First, cellular and geneticmechanisms, such as alternative splicing, DNA and chromatin modification, cellular gene choice,and gene regulation, which lead from DNA sequence to protein structure, are discussed. And,second, examples of variation in the genotype-phenotype relationship which can producevariable phenotypes from the same genetic information and stable phenotypes despite geneticvariation are presented. iiiTo examine how normal variation in complex repeated traits such as the mammaliandentition is produced two experimental approaches are taken." + ], + [ + "Another striking finding has been the revelation of the existence ofgenome regions to which variation in large number of traits can be mapped [29];such regions have been designated as QTL hotspots. This genetic information wasthen used to try to infer biological relationships between those traits and to connectthem into networks [30] (for example transcriptional networks). In more recentstudies, efforts have been devoted to the integration of phenotypes from differentlevels, jointly studying gene expression, proteome, metabolome and sometimesclassical traits such as diseases [31, 32].", + "First, it is possible to map Mendelian traitsand even quantitative traits with modest LOD scores with good precision, even whenusing a small numbers of strains7577. Second, a good way to transition from QTLs tospecific genes, variants, and mechanisms is often to use complementary resources suchas panels of common inbred strains, Collaborative Cross (CC), or Diversity Outbred (DO)cases, efficient screens of candidate genes using in vitro and in vivo assays 48,76, and evenhuman genome-wide association study (GWAS) data 7882.", + "For example, in comparative genomics, QTLs coming from different species andassociated with a given complex phenotype are aligned based on the syntenybetween these species. The overlapping genetic region is considered very likely tocontain the causal gene for this complex trait. In Chapter 9, we wondered whether it197Chapter 10is possible to apply this approach to the currently available data regarding thegenetic basis of physical activity in mice and humans in order to discover novelcandidate genes for this phenotype.", + "It is now widely appreciated that even when an association can be localized to a singlegene, that gene may not be the cause of the association [Smemo-2014], meaning that proximity tothe peak SNP is not sufficient to identify the causal gene. Therefore, a major goal of our study was tointegrate behavioral QTL and eQTL data. eQTLs can provide the crucial link between a regionimplicated by GWAS and the biological processes that underlie that association. We exploited theeasy access to tissue, which is a critical advantage of model organisms, to map eQTLs.Theseexamples illustrate the utility of combining GWAS with eQTL data to identify the molecularmechanism by which a chromosomal region influences a complex trait. DiscussionWe performed a GWAS in a commercially available outbred mouse population, which identifiednumerous physiological, behavioral, and expression QTLs. In several cases the implicated loci weresmaller than 1 Mb and contained just a handful of genes that included an obvious candidate. Inaddition, we used the eQTL results to further parse among the genes in the intervals that wereimplicated in the behavioral traits.", + "The authors analyzed GWAS data to confirm that annotating SNPs with a scorereflecting the strength of the evidence that the SNP is an eQTL can improve the ability todiscover true associations and may further clarify the nature of the mechanism driving theassociations. This raises the possibility that eQTL data may increase the proportion ofheritability explained by identifiable genetic factors, and be used to gain a betterunderstanding of the biology underlying complex traits.", + "Network analysesWe now have two QTL, and we have picked potentially interesting genes within each, but nowwe want to build up more evidence for which gene in our QTL interval is causal. The first, andmost obvious way, is to see what genes our trait of interest correlates with, in tissues that weexpect to be related to the trait. We calculated the Spearmans correlation between the traitBXD_17850 and all probes with expression data in T helper cells (GN319).", + "The advent of largerpanels and denser marker maps, in conjunction with high quality gene expression data, now means that expression QTLs arestatistically robust enough to be considered starting points forfurther study in their own right. This can be used to great effectin reverse complex trait analysis, a powerful new approach inwhich segregating genetic variation, as evidenced by a strongQTL, is mapped to other potentially interacting genes, and ultimately back to candidate phenotypes.", + "Since our driving application is toidentify the genes that cause variation in complex traits, it is necessary to show the relationship or distance between genes and QTLs. For that, we need an additional relationaltable describing the exact location of QTLs in the unit of megabases. Graph theoretic algorithms provide valuable information that is otherwise hard to discern about the data. However, many such algorithms incur long compute times and arefar from being interactive.", + "Using this tool, a QTL analysis may also shed light onwhether differences in phenotype are due to one or two largeeffect genes or many loci of small effect (Stapley et al. , 2010). A model constructed by Malcom (2011) highlights the importance of considering the genetic architecture when attempting topredict evolutionary trajectories by suggesting that a trait controlled by a small gene network will adapt more rapidly but reacha less than optimal endpoint, whereas a trait controlled by a largegene network will evolve more slowly but more accurately.", + "Network analysesWe now have two QTL, and we have picked potentially interesting genes within each, but nowwe want to build up more evidence for which gene in our QTL interval is causal. The first, andmost obvious way, is to see what genes our trait of interest correlates with, in tissues that weexpect to be related to the trait. We calculated the Spearmans correlation between the traitBXD_17850 and all probes with expression data in T helper cells (GN319).", + "We [16,18], and others [19,20] have indicated that the combined use of gene expression datatogether with QTL (quantitative trait locus) analysis canprovide for a better understanding of the genetics of complex traits.", + "These relationships provide important information forbiologists to understand and search for the genetic basis ofeQTL. An eQTL can span physically a large genomicregion, depending on the mapping experimental design. Due to the limitations of linkage studies it is difficult topin down which gene within an eQTL is the source ofeTrait variation [20]. By relating eTraits and genetic markers to their corresponding genes, our eQTL Viewer organizes each eQTL as a list of pairwise relationships betweenan eTrait gene and the multiple candidate genes in theeQTL region.", + "On the onehand, the genomic location that are in suspicion to be involved in the trait can still involvelarge genomic segments, e.g. , millions of basepairs that include many genes within the segment. On the other hand, GWAS may point toseveral or even many genomic locations for thetrait of interest, complicating further functionalanalysis. Analysis of Quantitative Trait Loci (QTL)QTL analysis reveals statistically signicantlinkage between phenotypes and genotypes,thereby providing explanation for the geneticbasis of variation in complex traits (Falconerand Mackay, 1996; Lynch and Walsh, 1998).", + "It is now widely appreciated that even when an association can be localized to a singlegene, that gene may not be the cause of the association [Smemo-2014], meaning that proximity tothe peak SNP is not sufficient to identify the causal gene. Therefore, a major goal of our study was tointegrate behavioral QTL and eQTL data. eQTLs can provide the crucial link between a regionimplicated by GWAS and the biological processes that underlie that association. We exploited theeasy access to tissue, which is a critical advantage of model organisms, to map eQTLs.Theseexamples illustrate the utility of combining GWAS with eQTL data to identify the molecularmechanism by which a chromosomal region influences a complex trait. DiscussionWe performed a GWAS in a commercially available outbred mouse population, which identifiednumerous physiological, behavioral, and expression QTLs. In several cases the implicated loci weresmaller than 1 Mb and contained just a handful of genes that included an obvious candidate. Inaddition, we used the eQTL results to further parse among the genes in the intervals that wereimplicated in the behavioral traits.", + "The remarkable success in mappinggenes linked to a number of disease traits using genomewide association studies (GWAS) in human cohorts hasrenewed interest in applying this same technique in modelorganisms such as inbred laboratory mice (Su et al. 2010). Unlike classical phenotypic traits, gene expression traitsgiving rise to cis-acting eQTL provide us with a prioriknowledge of the true QTL location (Doss et al. 2005),which can be used to empirically estimate the power of aGWAS performed at a similar scale (Hao et al. 2008;Schadt et al. 2008).", + "Genomic regions linked to complex traits can be identified by genetic mappingand quantitative trait locus (QTL) analysis (Shehzad and Okuno 2014). 7QTL mappingQTL mapping with molecular markers is the first strategy in genetic studies. In plantbreeding, QTL mapping is an essential step required for marker-assisted selection(Mohan et al. 1997; Shehzad and Okuno 2014). The fundamental idea underlying QTLanalysis is to associate genotype and phenotype in a population exhibiting a geneticvariation (Broman and Sen 2009).", + "QTL mapping studies thenseek to detect the polymorphisms underlying the complex traits of interest byscanning for alleles that co-vary withthe traits. Similar experiments also can be conducted with special derivatives of inbredstrains known as recombinant inbred(RI) mice. These animals are derivedby cross-breeding two or more distinctparental strains (which often divergewidely for the trait of interest), followedby inbreeding of the offspring for severalgenerations (Bailey 1971). Given thecorrect breeding strategy, this method1This is an issue faced by GWASs researchers when classifyingsamples as cases or controls.The investigatorsfirst identified all QTLs associated witha classical phenotype and then winnowed the list of potentially associatedgene-expression traits on the basis oftheir correlation or eQTL overlap withthe phenotype of interest. Candidategenes then were ranked by applyingthe LCMS technique, which uses theeQTL data to establish causal relationships between DNA loci and transcripts as well as between transcriptsand phenotypes and finally identifiesa model that best fits the data." + ] + ], + "task_id": [ + "44B088326CD80B4980D810738D88A284", + "BF1705D2C26044038FF1483258548167", + "68AB7A78543D5B36206274837824091B" + ] +}
\ No newline at end of file |