aboutsummaryrefslogtreecommitdiff
path: root/gnqa/paper2_eval/data/dataset/human/intermediate_files/human_cs_gn_18
diff options
context:
space:
mode:
authorShelbySolomonDarnell2024-10-17 12:24:26 +0300
committerShelbySolomonDarnell2024-10-17 12:24:26 +0300
commit00cba4b9a1e88891f1f96a1199320092c1962343 (patch)
tree270fd06daa18b2fc5687ee72d912cad771354bb0 /gnqa/paper2_eval/data/dataset/human/intermediate_files/human_cs_gn_18
parente0b2b0e55049b89805f73f291df1e28fa05487fe (diff)
downloadgn-ai-master.tar.gz
Docker image built to run code, all evals run using R2RHEADmaster
Diffstat (limited to 'gnqa/paper2_eval/data/dataset/human/intermediate_files/human_cs_gn_18')
-rw-r--r--gnqa/paper2_eval/data/dataset/human/intermediate_files/human_cs_gn_1865
1 files changed, 65 insertions, 0 deletions
diff --git a/gnqa/paper2_eval/data/dataset/human/intermediate_files/human_cs_gn_18 b/gnqa/paper2_eval/data/dataset/human/intermediate_files/human_cs_gn_18
new file mode 100644
index 0000000..cfe846d
--- /dev/null
+++ b/gnqa/paper2_eval/data/dataset/human/intermediate_files/human_cs_gn_18
@@ -0,0 +1,65 @@
+{
+ "titles": [
+ "2016 - Coming of age ten years of next.pdf",
+ "2016 - Coming of age ten years of next.pdf",
+ "2020 - Precision and Personalized Medicine How Genomic.pdf",
+ "2016 - Coming of age ten years of next.pdf",
+ "2016 - Coming of age ten years of next.pdf",
+ "2016 - Coming of age ten years of next.pdf",
+ "2016 - Coming of age ten years of next.pdf",
+ "2008 - Gene Expression Profiling.pdf",
+ "2016 - Coming of age ten years of next.pdf",
+ "2015 -Pandey- Functional Analysis of Genomic Variation and Impact on Molecular.pdf"
+ ],
+ "extraction_id": [
+ "cad1dcca-621d-5003-ba3a-81950819bc52",
+ "5da5fc5d-1fe6-58f0-9141-72b9b2996fff",
+ "de09f30d-e9ba-5379-8c7a-85b2cd2ed6c8",
+ "4cdf13c0-c505-5ff9-9a6e-b10e5d1c8819",
+ "5da5fc5d-1fe6-58f0-9141-72b9b2996fff",
+ "c91e328e-4a01-5952-85b8-d7b5b47237c5",
+ "cad1dcca-621d-5003-ba3a-81950819bc52",
+ "c14d1c74-a14a-5037-8d3f-f32a60faa9a5",
+ "4cdf13c0-c505-5ff9-9a6e-b10e5d1c8819",
+ "f4e989e5-c3d0-5d5c-b8c3-95894a14630b"
+ ],
+ "document_id": [
+ "9dd6e4e9-d136-507b-b628-68c8e1461bd0",
+ "9dd6e4e9-d136-507b-b628-68c8e1461bd0",
+ "cd11028a-933b-52a0-9534-c173323056ef",
+ "9dd6e4e9-d136-507b-b628-68c8e1461bd0",
+ "9dd6e4e9-d136-507b-b628-68c8e1461bd0",
+ "9dd6e4e9-d136-507b-b628-68c8e1461bd0",
+ "9dd6e4e9-d136-507b-b628-68c8e1461bd0",
+ "59f3b969-089b-5258-93ad-892dbc9ffa9c",
+ "9dd6e4e9-d136-507b-b628-68c8e1461bd0",
+ "93381e23-494b-5bc2-9d09-ef315506601f"
+ ],
+ "id": [
+ "chatcmpl-ADZL3IjE1j3CPp4KvVU6JGEtG2NTv",
+ "98010acc-fd11-5d33-bced-626ef29f2896",
+ "13a6b6f9-4a9a-5eb9-ac79-d986d9e613f0",
+ "8477a774-dddb-5541-b8d7-d51a7e56b0af",
+ "d2540614-9397-5e3e-8b5f-ad328ca973b2",
+ "1e324977-2ca5-5062-8a09-7659d516e899",
+ "a2d9c614-903d-513a-ad88-5a40f3534988",
+ "97f2aa12-623b-53ec-9793-5834311a37dd",
+ "3e782f01-a06e-51b6-ac8a-0e0a56939d08",
+ "199e1929-dc7c-58d4-8c8d-1c931e658e9c",
+ "d1158643-3625-5855-a03d-eec4ac96eb4d"
+ ],
+ "contexts": [
+ "FURTHER INFORMATION 10X Genomics: http://www.10xgenomics.com 454 Sequencing: http://www.454.com Advances in Genome Biology and Technology (AGBT): http://www.agbt.org BGISEQ500: http://seq500.com/en/portal/Sequencer.shtml Illumina: http://www.illumina.com Ion Torrent: https://www.thermofisher.com/us/en/home/ brands/ion-torrent.html Oxford Nanopore Technologies: https://www.nanoporetech. com Pacific Biosciences: http://www.pacb.com Personal Genome Project: http://www.personalgenomes.org",
+ "22. Karow, J. Qiagen launches GeneReader NGS System atAMP; presents performance evaluation by broad. GenomeWeb [online], https:// www.genomeweb.com/ molecular-diagnostics/qiagen-launches-genereader- ngs-system-amp-presents-performance-evaluation (4Nov 2015). 23. Smith,D.R. & McKernan,K. Methods of producing and sequencing modified polynucleotides . US Patent 8058030 (2011). 24. Margulies,M. etal. Genome sequencing in microfabricated high-density picolitre reactors. Nature 437, 376380 (2005).",
+ "36. Sequencing, H.G. Finishing the euchromatic sequence of the human genome. Nature 2004 ,431, 931945. 37. Heather, J.M.; Chain, B. The sequence of sequencers: The history of sequencing DNA. Genomics 2016 ,107, 18. [CrossRef] 38. Rothberg, J.M.; Leamon, J.H. The development and impact of 454 sequencing. Nat. Biotechnol. 2008 ,26, 11171124. [CrossRef] [PubMed] 39. Shendure, J.; Ji, H. Next-generation DNA sequencing. Nat. Biotechnol. 2008 ,26, 11351145. [CrossRef] [PubMed]",
+ "sequencing. Genome Res. 20, 11651173 (2010). 64. English,A.C. etal. Assessing structural variation in a personal genome-towards a human reference diploid genome. BMC Genomics 16, 286 (2015). 65. Carneiro,M.O. etal. Pacific Biosciences sequencing technology for genotyping and variation discovery in human data. BMC Genomics 13, 375 (2012). 66. Quail,M.A. etal. A tale of three next generation sequencing platforms: comparison of Ion T orrent, Pacific Biosciences and Illumina MiSeq sequencers.",
+ "sequencing. Bioinformatics 31, 20402042 (2015). 46. Qiagen. Oncology insights enabled by knowledge base- guided panel design and the seamless workflow of the GeneReader NGS system Press Release. Qiagen [online], http://www.genereaderngs.com/PROM-9192- 001_1100403_WP_GeneReader_NGS_0116_NA.pdf (2016). 47. Forgetta,V. etal. Sequencing of the Dutch elm disease fungus genome using the Roche/454 GS-FLX Titanium System in a comparison of multiple genomics core",
+ "for sequencing on existing short-read instrumentation, after which data are split by barcode and reassembled with the knowledge that fragments sharing barcodes Barcodes A series of known bases addedto a template molecule either through ligation or amplification. After sequencing, these barcodes can be used to identify which sample a particular read is derived from. Figure 5 | Real-time and synthetic long-read sequencing approaches.",
+ "160. Glenn,T .C. Field guide to next-generation DNA sequencers. Mol. Ecol. Resour. 11, 759769 (2011). 161. Karow,J. At AGBT , 10X Genomics launches GemCode platform; shipments slated for Q2 as firm battles IP lawsuits. GenomeWeb [online], https://www. genomeweb.com/sample-prep/agbt-10x-genomics- launches-gemcode-platform-shipments-slated-q2-firm- battles-ip-lawsuits (2Mar 2015). Competing interests statement The authors declare competing interests: see Web version for details. FURTHER INFORMATION",
+ "DNA), and provide the means to link sequences containing applications. First, base- callers like Phred (4,5) extract raw sequences from raw data. There are also contig assemblers like Phrap (University of Washington, http://bozeman. mbt.washington.edu/phrap.docs/phrap.html ) or CAP3 (6) that assemble frag- ments to contigs and packages like consed (7) or GAP4 (8), which are used to finish sequencing projects. These programs are not explained in detail here.",
+ "Nat. Biotechnol. 30, 10331036 (2012). 111. Chrystoja,C.C. & Diamandis,E.P . Whole genome sequencing as a diagnostic test: challenges and opportunities. Clin. Chem. 60, 724733 (2014). 112. McGuire,A.L. etal. Point-counterpoint. Ethics and genomic incidental findings. Science 340, 10471048 (2013). 113. Bowers,J. etal. Virtual terminator nucleotides for next-generation DNA sequencing. Nat. Methods 6, 593595 (2009). 114. Heger,M. Chinas Direct Genomics unveils new",
+ "11 BIOINFORMATIC CHALLENGES FOR GENOMIC MEDICINE Processing and managing of high-throughput sequence data High throughput sequencing offers severa l advantages relative to array-based genotyping or expression assays. First, unlike genotyping arrays, whole genome sequencing is not limited to interrogating onl y known sequence variants. Similarly, RNA- sequencing (RNA-seq) enables expression quanti fication of novel transcripts that are not"
+ ],
+ "answer": "### How-to Guide for Genetic Sequencing Genetic sequencing is a powerful tool used to determine the order of nucleotides in DNA. This guide will walk you through the basic steps involved in genetic sequencing, referencing specific resources for further information. #### Step 1: Sample Preparation 1. **Collect DNA Sample**: Obtain a DNA sample from the organism or tissue of interest. This can be done using various methods such as blood draws, buccal swabs, or tissue biopsies. 2. **Extract DNA**: Use a DNA extraction kit to isolate DNA from the collected sample. Ensure the DNA is of high quality and free from contaminants. #### Step 2: Library Preparation 1. **Fragment DNA**: Break the DNA into smaller fragments. This can be done using mechanical shearing or enzymatic digestion. 2. **Add Adapters**: Ligate specific adapters to the ends of the DNA fragments. These adapters are necessary for the sequencing process and often contain barcodes for sample identification [6]. #### Step 3: Amplification 1. **PCR Amplification**: Amplify the DNA fragments using Polymerase Chain Reaction (PCR). This step increases the quantity of DNA to ensure there is enough material for sequencing. #### Step 4: Sequencing 1. **Choose a Sequencing Platform**: Select an appropriate sequencing platform based on your needs. Popular platforms include Illumina, Ion Torrent, Pacific Biosciences, and Oxford Nanopore Technologies [1], [4], [7]. 2. **Load the Sample**: Load the prepared DNA library onto the sequencing machine. Follow the manufacturer's instructions for your specific platform. #### Step 5: Data Acquisition 1. **Run the Sequencer**: Start the sequencing run. The machine will read the nucleotide sequences of the DNA fragments and generate raw data. 2. **Monitor the Run**: Keep an eye on the sequencing run to ensure it is proceeding correctly. Some platforms provide real-time monitoring capabilities [6]. #### Step 6: Data Analysis 1. **Base Calling**: Use base-calling software to convert raw sequencing data into nucleotide sequences. Tools like Phred are commonly used for this purpose [8]. 2. **Sequence Assembly**: Assemble the short reads into longer contiguous sequences (contigs) using assemblers like Phrap or CAP3 [8]. 3. **Data Interpretation**: Analyze the assembled sequences to identify genetic variants, structural variations, and other genomic features. This step may involve bioinformatics tools and databases [10]. #### Step 7: Validation and Reporting 1. **Validate Results**: Confirm the accuracy of your sequencing results using additional methods such as Sanger sequencing or quantitative PCR. 2. **Report Findings**: Compile your findings into a comprehensive report. Include details about the methods used, the quality of the data, and any significant genetic variations identified. #### Further Information For more detailed information on specific sequencing technologies and platforms, refer to the following resources: - [10X Genomics](http://www.10xgenomics.com) - [Illumina](http://www.illumina.com) - [Ion Torrent](https://www.thermofisher.com/us/en/home/brands/ion-torrent.html) - [Pacific Biosciences](http://www.pacb.com) - [Oxford Nanopore Technologies](https://www.nanoporetech.com) [1] By following these steps and utilizing the referenced resources, you can successfully perform genetic sequencing and analyze the resulting data.",
+ "question": "Create a how-to guide for genetic sequencing"
+} \ No newline at end of file