aboutsummaryrefslogtreecommitdiff
path: root/gnqa/paper2_eval/data/dataset/human/intermediate_files/human_de_gn_5
diff options
context:
space:
mode:
Diffstat (limited to 'gnqa/paper2_eval/data/dataset/human/intermediate_files/human_de_gn_5')
-rw-r--r--gnqa/paper2_eval/data/dataset/human/intermediate_files/human_de_gn_565
1 files changed, 65 insertions, 0 deletions
diff --git a/gnqa/paper2_eval/data/dataset/human/intermediate_files/human_de_gn_5 b/gnqa/paper2_eval/data/dataset/human/intermediate_files/human_de_gn_5
new file mode 100644
index 0000000..3407a58
--- /dev/null
+++ b/gnqa/paper2_eval/data/dataset/human/intermediate_files/human_de_gn_5
@@ -0,0 +1,65 @@
+{
+ "titles": [
+ "2020 - Precision and Personalized Medicine How Genomic.pdf",
+ "2016 - Coming of age ten years of next.pdf",
+ "2015 -Pandey- Functional Analysis of Genomic Variation and Impact on Molecular.pdf",
+ "2015 - Functional Analysis of Genomic Variation and Impact on Molecular and Higher Order Phenotypes.pdf",
+ "2017 - Infection control in the new age of genomic epidemiology.pdf",
+ "2016 - Coming of age ten years of next.pdf",
+ "2016 - Coming of age ten years of next.pdf",
+ "2016 - Coming of age ten years of next.pdf",
+ "2016 - Coming of age ten years of next.pdf",
+ "2008 - Gene Expression Profiling.pdf"
+ ],
+ "extraction_id": [
+ "de09f30d-e9ba-5379-8c7a-85b2cd2ed6c8",
+ "5da5fc5d-1fe6-58f0-9141-72b9b2996fff",
+ "f4e989e5-c3d0-5d5c-b8c3-95894a14630b",
+ "fa426831-7c04-56c1-a191-1ebbc35342ed",
+ "04f06fb6-b2ff-57d4-bac0-de5cf4782ff3",
+ "4cdf13c0-c505-5ff9-9a6e-b10e5d1c8819",
+ "4cdf13c0-c505-5ff9-9a6e-b10e5d1c8819",
+ "5da5fc5d-1fe6-58f0-9141-72b9b2996fff",
+ "cad1dcca-621d-5003-ba3a-81950819bc52",
+ "c14d1c74-a14a-5037-8d3f-f32a60faa9a5"
+ ],
+ "document_id": [
+ "cd11028a-933b-52a0-9534-c173323056ef",
+ "9dd6e4e9-d136-507b-b628-68c8e1461bd0",
+ "93381e23-494b-5bc2-9d09-ef315506601f",
+ "263d327b-f5db-54e4-a215-b3f8a51cd7d6",
+ "8f028916-b990-5e95-b2a6-e69f451cc291",
+ "9dd6e4e9-d136-507b-b628-68c8e1461bd0",
+ "9dd6e4e9-d136-507b-b628-68c8e1461bd0",
+ "9dd6e4e9-d136-507b-b628-68c8e1461bd0",
+ "9dd6e4e9-d136-507b-b628-68c8e1461bd0",
+ "59f3b969-089b-5258-93ad-892dbc9ffa9c"
+ ],
+ "id": [
+ "chatcmpl-ADZ9nZqF5q344Dbv0zjyqijEIpDdi",
+ "8477a774-dddb-5541-b8d7-d51a7e56b0af",
+ "13a6b6f9-4a9a-5eb9-ac79-d986d9e613f0",
+ "d1158643-3625-5855-a03d-eec4ac96eb4d",
+ "cdf2b80f-1509-50a2-9cb2-a36dd6f3f2cc",
+ "f8ae01ae-cea8-5b8b-95c0-7147055de596",
+ "d2540614-9397-5e3e-8b5f-ad328ca973b2",
+ "199e1929-dc7c-58d4-8c8d-1c931e658e9c",
+ "1e324977-2ca5-5062-8a09-7659d516e899",
+ "98010acc-fd11-5d33-bced-626ef29f2896",
+ "3e782f01-a06e-51b6-ac8a-0e0a56939d08"
+ ],
+ "contexts": [
+ "36. Sequencing, H.G. Finishing the euchromatic sequence of the human genome. Nature 2004 ,431, 931945. 37. Heather, J.M.; Chain, B. The sequence of sequencers: The history of sequencing DNA. Genomics 2016 ,107, 18. [CrossRef] 38. Rothberg, J.M.; Leamon, J.H. The development and impact of 454 sequencing. Nat. Biotechnol. 2008 ,26, 11171124. [CrossRef] [PubMed] 39. Shendure, J.; Ji, H. Next-generation DNA sequencing. Nat. Biotechnol. 2008 ,26, 11351145. [CrossRef] [PubMed]",
+ "22. Karow, J. Qiagen launches GeneReader NGS System atAMP; presents performance evaluation by broad. GenomeWeb [online], https:// www.genomeweb.com/ molecular-diagnostics/qiagen-launches-genereader- ngs-system-amp-presents-performance-evaluation (4Nov 2015). 23. Smith,D.R. & McKernan,K. Methods of producing and sequencing modified polynucleotides . US Patent 8058030 (2011). 24. Margulies,M. etal. Genome sequencing in microfabricated high-density picolitre reactors. Nature 437, 376380 (2005).",
+ "11 BIOINFORMATIC CHALLENGES FOR GENOMIC MEDICINE Processing and managing of high-throughput sequence data High throughput sequencing offers severa l advantages relative to array-based genotyping or expression assays. First, unlike genotyping arrays, whole genome sequencing is not limited to interrogating onl y known sequence variants. Similarly, RNA- sequencing (RNA-seq) enables expression quanti fication of novel transcripts that are not",
+ "11 BIOINFORMATIC CHALLENGES FOR GENOMIC MEDICINE Processing and managing of high-throughput sequence data High throughput sequencing offers severa l advantages relative to array-based genotyping or expression assays. First, unlike genotyping arrays, whole genome sequencing is not limited to interrogating onl y known sequence variants. Similarly, RNA- sequencing (RNA-seq) enables expression quanti fication of novel transcripts that are not",
+ "High-throughput bacterial genome sequencing: an embarrassment of choice, aworldof opportunity.NatRevMicrobiol2012;10:599-606. 11.CroucherNJ,DidelotX.Theapplicationof genomicstotracingbacterialpathogen transmission.CurrOpinMicrobiol2015;23:62-7. 12.ShendureJ,JiH.Next-generationDNAsequencing.NatBiotechnol2008;26:1135- 45. 13.MillerJR,KorenS,SuttonG.Assemblyalgorithmsfornext-generationsequencing data.Genomics2010;95:315-27. 14.OlsonND,LundSP,ColmanRE,FosterJT,SahlJW,SchuppJM,etal.Bestpractices",
+ "sequencing. Genome Res. 20, 11651173 (2010). 64. English,A.C. etal. Assessing structural variation in a personal genome-towards a human reference diploid genome. BMC Genomics 16, 286 (2015). 65. Carneiro,M.O. etal. Pacific Biosciences sequencing technology for genotyping and variation discovery in human data. BMC Genomics 13, 375 (2012). 66. Quail,M.A. etal. A tale of three next generation sequencing platforms: comparison of Ion T orrent, Pacific Biosciences and Illumina MiSeq sequencers.",
+ "Nat. Biotechnol. 30, 10331036 (2012). 111. Chrystoja,C.C. & Diamandis,E.P . Whole genome sequencing as a diagnostic test: challenges and opportunities. Clin. Chem. 60, 724733 (2014). 112. McGuire,A.L. etal. Point-counterpoint. Ethics and genomic incidental findings. Science 340, 10471048 (2013). 113. Bowers,J. etal. Virtual terminator nucleotides for next-generation DNA sequencing. Nat. Methods 6, 593595 (2009). 114. Heger,M. Chinas Direct Genomics unveils new",
+ "sequencing. Bioinformatics 31, 20402042 (2015). 46. Qiagen. Oncology insights enabled by knowledge base- guided panel design and the seamless workflow of the GeneReader NGS system Press Release. Qiagen [online], http://www.genereaderngs.com/PROM-9192- 001_1100403_WP_GeneReader_NGS_0116_NA.pdf (2016). 47. Forgetta,V. etal. Sequencing of the Dutch elm disease fungus genome using the Roche/454 GS-FLX Titanium System in a comparison of multiple genomics core",
+ "FURTHER INFORMATION 10X Genomics: http://www.10xgenomics.com 454 Sequencing: http://www.454.com Advances in Genome Biology and Technology (AGBT): http://www.agbt.org BGISEQ500: http://seq500.com/en/portal/Sequencer.shtml Illumina: http://www.illumina.com Ion Torrent: https://www.thermofisher.com/us/en/home/ brands/ion-torrent.html Oxford Nanopore Technologies: https://www.nanoporetech. com Pacific Biosciences: http://www.pacb.com Personal Genome Project: http://www.personalgenomes.org",
+ "DNA), and provide the means to link sequences containing applications. First, base- callers like Phred (4,5) extract raw sequences from raw data. There are also contig assemblers like Phrap (University of Washington, http://bozeman. mbt.washington.edu/phrap.docs/phrap.html ) or CAP3 (6) that assemble frag- ments to contigs and packages like consed (7) or GAP4 (8), which are used to finish sequencing projects. These programs are not explained in detail here."
+ ],
+ "answer": "### How-to Guide for Genetic Sequencing Genetic sequencing is a powerful tool used to determine the order of nucleotides in DNA. This guide will walk you through the basic steps involved in genetic sequencing, referencing key resources for further information. #### Step 1: Sample Preparation 1. **Collect DNA Sample**: Obtain a sample from the organism or tissue of interest. This could be blood, saliva, or tissue biopsy. 2. **Extract DNA**: Use a DNA extraction kit to isolate DNA from the sample. This involves cell lysis, removal of proteins and other contaminants, and purification of DNA. #### Step 2: Library Preparation 1. **Fragment DNA**: Break the DNA into smaller fragments. This can be done using mechanical shearing or enzymatic digestion. 2. **Add Adapters**: Ligate specific adapter sequences to the ends of the DNA fragments. These adapters are necessary for the fragments to bind to the sequencing platform. #### Step 3: Amplification 1. **PCR Amplification**: Amplify the DNA fragments using Polymerase Chain Reaction (PCR). This step increases the quantity of DNA to ensure there is enough material for sequencing. #### Step 4: Sequencing 1. **Choose a Sequencing Platform**: Select an appropriate sequencing technology based on your needs. Common platforms include Illumina, Ion Torrent, and Pacific Biosciences [6], [9]. 2. **Load DNA onto the Sequencer**: Follow the manufacturer's instructions to load your prepared DNA library onto the sequencing machine. 3. **Run the Sequencer**: Initiate the sequencing run. The machine will read the nucleotide sequences of the DNA fragments and generate raw sequence data. #### Step 5: Data Analysis 1. **Base Calling**: Use software like Phred to convert raw data into nucleotide sequences [10]. 2. **Sequence Assembly**: Assemble the short DNA sequences into longer contiguous sequences (contigs) using tools like Phrap or CAP3 [10]. 3. **Variant Calling**: Identify genetic variants by comparing the assembled sequences to a reference genome. This can be done using various bioinformatics tools. #### Step 6: Interpretation 1. **Annotation**: Annotate the sequences to identify genes, regulatory elements, and other functional regions. 2. **Data Management**: Store and manage the high-throughput sequence data efficiently. This is crucial for handling large datasets generated by next-generation sequencing technologies [3], [4]. #### Additional Resources - **Next-Generation Sequencing (NGS)**: For a detailed understanding of NGS technologies and their applications, refer to Shendure and Ji's review on next-generation DNA sequencing [1], [5]. - **Sequencing Platforms**: For comparisons of different sequencing platforms, see the work by Quail et al. [6]. - **Bioinformatics Tools**: Explore various bioinformatics tools and software for sequence analysis and data management [10]. By following these steps and utilizing the referenced resources, you can successfully perform genetic sequencing and analyze the resulting data.",
+ "question": "Create a how-to guide for genetic sequencing."
+} \ No newline at end of file