aboutsummaryrefslogtreecommitdiff
path: root/gnqa/paper2_eval/data/dataset/human/intermediate_files/human_cs_gn_21
diff options
context:
space:
mode:
authorShelbySolomonDarnell2024-10-17 12:24:26 +0300
committerShelbySolomonDarnell2024-10-17 12:24:26 +0300
commit00cba4b9a1e88891f1f96a1199320092c1962343 (patch)
tree270fd06daa18b2fc5687ee72d912cad771354bb0 /gnqa/paper2_eval/data/dataset/human/intermediate_files/human_cs_gn_21
parente0b2b0e55049b89805f73f291df1e28fa05487fe (diff)
downloadgn-ai-master.tar.gz
Docker image built to run code, all evals run using R2RHEADmaster
Diffstat (limited to 'gnqa/paper2_eval/data/dataset/human/intermediate_files/human_cs_gn_21')
-rw-r--r--gnqa/paper2_eval/data/dataset/human/intermediate_files/human_cs_gn_2165
1 files changed, 65 insertions, 0 deletions
diff --git a/gnqa/paper2_eval/data/dataset/human/intermediate_files/human_cs_gn_21 b/gnqa/paper2_eval/data/dataset/human/intermediate_files/human_cs_gn_21
new file mode 100644
index 0000000..1cd69f5
--- /dev/null
+++ b/gnqa/paper2_eval/data/dataset/human/intermediate_files/human_cs_gn_21
@@ -0,0 +1,65 @@
+{
+ "titles": [
+ "2016 - Coming of age ten years of next.pdf",
+ "2016 - Coming of age ten years of next.pdf",
+ "2020 - Precision and Personalized Medicine How Genomic.pdf",
+ "2016 - Coming of age ten years of next.pdf",
+ "2016 - Coming of age ten years of next.pdf",
+ "2016 - Coming of age ten years of next.pdf",
+ "2016 - Coming of age ten years of next.pdf",
+ "2008 - Gene Expression Profiling.pdf",
+ "2016 - Coming of age ten years of next.pdf",
+ "2015 -Pandey- Functional Analysis of Genomic Variation and Impact on Molecular.pdf"
+ ],
+ "extraction_id": [
+ "cad1dcca-621d-5003-ba3a-81950819bc52",
+ "5da5fc5d-1fe6-58f0-9141-72b9b2996fff",
+ "de09f30d-e9ba-5379-8c7a-85b2cd2ed6c8",
+ "4cdf13c0-c505-5ff9-9a6e-b10e5d1c8819",
+ "5da5fc5d-1fe6-58f0-9141-72b9b2996fff",
+ "c91e328e-4a01-5952-85b8-d7b5b47237c5",
+ "cad1dcca-621d-5003-ba3a-81950819bc52",
+ "c14d1c74-a14a-5037-8d3f-f32a60faa9a5",
+ "4cdf13c0-c505-5ff9-9a6e-b10e5d1c8819",
+ "f4e989e5-c3d0-5d5c-b8c3-95894a14630b"
+ ],
+ "document_id": [
+ "9dd6e4e9-d136-507b-b628-68c8e1461bd0",
+ "9dd6e4e9-d136-507b-b628-68c8e1461bd0",
+ "cd11028a-933b-52a0-9534-c173323056ef",
+ "9dd6e4e9-d136-507b-b628-68c8e1461bd0",
+ "9dd6e4e9-d136-507b-b628-68c8e1461bd0",
+ "9dd6e4e9-d136-507b-b628-68c8e1461bd0",
+ "9dd6e4e9-d136-507b-b628-68c8e1461bd0",
+ "59f3b969-089b-5258-93ad-892dbc9ffa9c",
+ "9dd6e4e9-d136-507b-b628-68c8e1461bd0",
+ "93381e23-494b-5bc2-9d09-ef315506601f"
+ ],
+ "id": [
+ "chatcmpl-ADZLdc2V8rjlDwihUfGh20lox5Tad",
+ "98010acc-fd11-5d33-bced-626ef29f2896",
+ "13a6b6f9-4a9a-5eb9-ac79-d986d9e613f0",
+ "8477a774-dddb-5541-b8d7-d51a7e56b0af",
+ "d2540614-9397-5e3e-8b5f-ad328ca973b2",
+ "1e324977-2ca5-5062-8a09-7659d516e899",
+ "a2d9c614-903d-513a-ad88-5a40f3534988",
+ "97f2aa12-623b-53ec-9793-5834311a37dd",
+ "3e782f01-a06e-51b6-ac8a-0e0a56939d08",
+ "199e1929-dc7c-58d4-8c8d-1c931e658e9c",
+ "d1158643-3625-5855-a03d-eec4ac96eb4d"
+ ],
+ "contexts": [
+ "FURTHER INFORMATION 10X Genomics: http://www.10xgenomics.com 454 Sequencing: http://www.454.com Advances in Genome Biology and Technology (AGBT): http://www.agbt.org BGISEQ500: http://seq500.com/en/portal/Sequencer.shtml Illumina: http://www.illumina.com Ion Torrent: https://www.thermofisher.com/us/en/home/ brands/ion-torrent.html Oxford Nanopore Technologies: https://www.nanoporetech. com Pacific Biosciences: http://www.pacb.com Personal Genome Project: http://www.personalgenomes.org",
+ "22. Karow, J. Qiagen launches GeneReader NGS System atAMP; presents performance evaluation by broad. GenomeWeb [online], https:// www.genomeweb.com/ molecular-diagnostics/qiagen-launches-genereader- ngs-system-amp-presents-performance-evaluation (4Nov 2015). 23. Smith,D.R. & McKernan,K. Methods of producing and sequencing modified polynucleotides . US Patent 8058030 (2011). 24. Margulies,M. etal. Genome sequencing in microfabricated high-density picolitre reactors. Nature 437, 376380 (2005).",
+ "36. Sequencing, H.G. Finishing the euchromatic sequence of the human genome. Nature 2004 ,431, 931945. 37. Heather, J.M.; Chain, B. The sequence of sequencers: The history of sequencing DNA. Genomics 2016 ,107, 18. [CrossRef] 38. Rothberg, J.M.; Leamon, J.H. The development and impact of 454 sequencing. Nat. Biotechnol. 2008 ,26, 11171124. [CrossRef] [PubMed] 39. Shendure, J.; Ji, H. Next-generation DNA sequencing. Nat. Biotechnol. 2008 ,26, 11351145. [CrossRef] [PubMed]",
+ "sequencing. Genome Res. 20, 11651173 (2010). 64. English,A.C. etal. Assessing structural variation in a personal genome-towards a human reference diploid genome. BMC Genomics 16, 286 (2015). 65. Carneiro,M.O. etal. Pacific Biosciences sequencing technology for genotyping and variation discovery in human data. BMC Genomics 13, 375 (2012). 66. Quail,M.A. etal. A tale of three next generation sequencing platforms: comparison of Ion T orrent, Pacific Biosciences and Illumina MiSeq sequencers.",
+ "sequencing. Bioinformatics 31, 20402042 (2015). 46. Qiagen. Oncology insights enabled by knowledge base- guided panel design and the seamless workflow of the GeneReader NGS system Press Release. Qiagen [online], http://www.genereaderngs.com/PROM-9192- 001_1100403_WP_GeneReader_NGS_0116_NA.pdf (2016). 47. Forgetta,V. etal. Sequencing of the Dutch elm disease fungus genome using the Roche/454 GS-FLX Titanium System in a comparison of multiple genomics core",
+ "for sequencing on existing short-read instrumentation, after which data are split by barcode and reassembled with the knowledge that fragments sharing barcodes Barcodes A series of known bases addedto a template molecule either through ligation or amplification. After sequencing, these barcodes can be used to identify which sample a particular read is derived from. Figure 5 | Real-time and synthetic long-read sequencing approaches.",
+ "160. Glenn,T .C. Field guide to next-generation DNA sequencers. Mol. Ecol. Resour. 11, 759769 (2011). 161. Karow,J. At AGBT , 10X Genomics launches GemCode platform; shipments slated for Q2 as firm battles IP lawsuits. GenomeWeb [online], https://www. genomeweb.com/sample-prep/agbt-10x-genomics- launches-gemcode-platform-shipments-slated-q2-firm- battles-ip-lawsuits (2Mar 2015). Competing interests statement The authors declare competing interests: see Web version for details. FURTHER INFORMATION",
+ "DNA), and provide the means to link sequences containing applications. First, base- callers like Phred (4,5) extract raw sequences from raw data. There are also contig assemblers like Phrap (University of Washington, http://bozeman. mbt.washington.edu/phrap.docs/phrap.html ) or CAP3 (6) that assemble frag- ments to contigs and packages like consed (7) or GAP4 (8), which are used to finish sequencing projects. These programs are not explained in detail here.",
+ "Nat. Biotechnol. 30, 10331036 (2012). 111. Chrystoja,C.C. & Diamandis,E.P . Whole genome sequencing as a diagnostic test: challenges and opportunities. Clin. Chem. 60, 724733 (2014). 112. McGuire,A.L. etal. Point-counterpoint. Ethics and genomic incidental findings. Science 340, 10471048 (2013). 113. Bowers,J. etal. Virtual terminator nucleotides for next-generation DNA sequencing. Nat. Methods 6, 593595 (2009). 114. Heger,M. Chinas Direct Genomics unveils new",
+ "11 BIOINFORMATIC CHALLENGES FOR GENOMIC MEDICINE Processing and managing of high-throughput sequence data High throughput sequencing offers severa l advantages relative to array-based genotyping or expression assays. First, unlike genotyping arrays, whole genome sequencing is not limited to interrogating onl y known sequence variants. Similarly, RNA- sequencing (RNA-seq) enables expression quanti fication of novel transcripts that are not"
+ ],
+ "answer": "### How-to Guide for Genetic Sequencing Genetic sequencing is a powerful tool used to determine the order of nucleotides in DNA. This guide will walk you through the basic steps involved in genetic sequencing, referencing specific resources for further information. #### Step 1: Sample Preparation 1. **Collect DNA Sample**: Obtain a sample containing DNA. This could be from blood, saliva, or tissue. 2. **Extract DNA**: Use a DNA extraction kit to isolate DNA from the sample. Ensure the DNA is pure and of high quality. #### Step 2: Library Preparation 1. **Fragment DNA**: Break the DNA into smaller fragments. This can be done using mechanical shearing or enzymatic methods. 2. **Add Adapters**: Ligate specific sequences (adapters) to the ends of the DNA fragments. These adapters are necessary for the sequencing process and allow the fragments to bind to the sequencing platform [6]. #### Step 3: Amplification 1. **PCR Amplification**: Amplify the DNA fragments using Polymerase Chain Reaction (PCR). This step increases the quantity of DNA to ensure there is enough material for sequencing. #### Step 4: Sequencing 1. **Choose a Sequencing Platform**: Select a sequencing technology based on your needs. Popular platforms include Illumina, Ion Torrent, Pacific Biosciences, and Oxford Nanopore Technologies [1], [4], [7]. 2. **Load DNA onto Sequencer**: Load the prepared DNA library onto the sequencing machine. Each platform has specific protocols for loading and running the sequencing process. #### Step 5: Data Analysis 1. **Base Calling**: Use software to convert raw sequencing data into nucleotide sequences. Tools like Phred are commonly used for this purpose [8]. 2. **Sequence Assembly**: Assemble the short DNA reads into longer sequences (contigs). Software like Phrap or CAP3 can be used for this step [8]. 3. **Variant Calling**: Identify genetic variants by comparing the assembled sequences to a reference genome. This step is crucial for understanding genetic differences and mutations. #### Step 6: Interpretation 1. **Bioinformatics Analysis**: Use bioinformatics tools to analyze the sequencing data. This includes identifying genes, predicting functions, and understanding genetic variations [10]. 2. **Report Findings**: Summarize the results in a report. This should include information on identified variants, potential implications, and any recommendations for further analysis or validation. #### Additional Resources - **10X Genomics**: Offers advanced solutions for single-cell sequencing and spatial genomics [1]. - **Illumina**: Provides a wide range of sequencing platforms and reagents [1]. - **Pacific Biosciences**: Known for long-read sequencing technology [1], [4]. - **Oxford Nanopore Technologies**: Offers portable and real-time sequencing solutions [1]. By following these steps and utilizing the referenced resources, you can successfully perform genetic sequencing and analyze the resulting data.",
+ "question": "Create a how-to guide for genetic sequencing"
+} \ No newline at end of file