aboutsummaryrefslogtreecommitdiff
path: root/gnqa/paper2_eval/data/dataset/human/intermediate_files/human_de_gn_17
diff options
context:
space:
mode:
Diffstat (limited to 'gnqa/paper2_eval/data/dataset/human/intermediate_files/human_de_gn_17')
-rw-r--r--gnqa/paper2_eval/data/dataset/human/intermediate_files/human_de_gn_1765
1 files changed, 65 insertions, 0 deletions
diff --git a/gnqa/paper2_eval/data/dataset/human/intermediate_files/human_de_gn_17 b/gnqa/paper2_eval/data/dataset/human/intermediate_files/human_de_gn_17
new file mode 100644
index 0000000..42b1aeb
--- /dev/null
+++ b/gnqa/paper2_eval/data/dataset/human/intermediate_files/human_de_gn_17
@@ -0,0 +1,65 @@
+{
+ "titles": [
+ "2007 - Bioinformatics_for_Genetices_MAZEN_SAEED.pdf",
+ "2007 - Bioinformatics_for_Geneticists.pdf",
+ "003 -Barnes- Bioinformatics_for_Geneticists.pdf",
+ "003 -Barnes- Bioinformatics_for_Geneticists.pdf",
+ "2007 - Bioinformatics_for_Genetices_MAZEN_SAEED.pdf",
+ "2007 - Bioinformatics_for_Geneticists.pdf",
+ "003 -Barnes- Bioinformatics_for_Geneticists.pdf",
+ "2007 - Bioinformatics_for_Genetices_MAZEN_SAEED.pdf",
+ "2007 - Bioinformatics_for_Geneticists.pdf",
+ "2007 - Bioinformatics_for_Genetices_MAZEN_SAEED.pdf"
+ ],
+ "extraction_id": [
+ "4ef84d22-b428-5386-bbc0-39dbd364d3d7",
+ "7524bdfa-63f8-57c1-b5fe-1edcf11c275e",
+ "c8e9c4b7-19c6-5426-83a2-6f8628b68ceb",
+ "a3ae6875-b0fc-5a4e-866f-4fee99c7d2a2",
+ "bb247bfe-333b-553a-94e6-2dc1b13b4723",
+ "9c89683f-aca5-57f9-b28d-62e9eb64377b",
+ "23bb58ad-7835-58f4-862f-dd17e1ec5140",
+ "52fc5fdb-48b5-5c1e-a8d2-1e67d7702c9f",
+ "20d0c226-76aa-5c1e-85cc-9d5c1bcce2f2",
+ "bb247bfe-333b-553a-94e6-2dc1b13b4723"
+ ],
+ "document_id": [
+ "139463d1-c63c-5c51-bf9c-9ccc356768e0",
+ "4ea8e1a8-e113-5f02-ad78-880b9c51a101",
+ "045edae8-468b-5725-be06-8cb4b8f6a92b",
+ "045edae8-468b-5725-be06-8cb4b8f6a92b",
+ "139463d1-c63c-5c51-bf9c-9ccc356768e0",
+ "4ea8e1a8-e113-5f02-ad78-880b9c51a101",
+ "045edae8-468b-5725-be06-8cb4b8f6a92b",
+ "139463d1-c63c-5c51-bf9c-9ccc356768e0",
+ "4ea8e1a8-e113-5f02-ad78-880b9c51a101",
+ "139463d1-c63c-5c51-bf9c-9ccc356768e0"
+ ],
+ "id": [
+ "chatcmpl-ADZAv1loJK3Vz78QdzvVvEogO2ngd",
+ "c36215f6-2230-58ef-b3eb-44d1799ba5c2",
+ "89a578c7-5961-5b88-9a6d-f338216702c3",
+ "81e589eb-aa51-5f2a-966f-31928fb31943",
+ "1bf9bb72-ebaa-51d1-82ce-aae2f16dd92b",
+ "f0c00edb-f07d-5975-a16b-16a072d0f2d4",
+ "e2e526cb-0ac3-51ff-a1c5-43ff032b5558",
+ "66294988-1566-5bec-8f63-658ea9011e26",
+ "b19972d1-7ec6-5f66-ac2f-518e69c5f22b",
+ "69fdc34c-c187-5c7a-973c-a629045841a7",
+ "7e15e9b2-c731-5ab0-85c0-b6b432623220"
+ ],
+ "contexts": [
+ "supported by a signicant BLAST match to one or more expressed sequences or proteins. Ensembl also identies the positions of known human genes from public sequence database entries, usually using GENEWISE to predict their exon structures. The total set of Ensembl genes should therefore be a much more accurate reection of reality than ab initio predictions alone, but it is clear that some novel genes are missed (Hogenesch et al. , 2001). Of the many novel genes that are detected, some are",
+ "supported by a signicant BLAST match to one or more expressed sequences or proteins. Ensembl also identies the positions of known human genes from public sequence database entries, usually using GENEWISE to predict their exon structures. The total set of Ensembl genes should therefore be a much more accurate reection of reality than ab initio predictions alone, but it is clear that some novel genes are missed (Hogenesch et al. , 2001). Of the many novel genes that are detected, some are",
+ "supported by a signicant BLAST match to one or more expressed sequences or proteins. Ensembl also identies the positions of known human genes from public sequence database entries, usually using GENEWISE to predict their exon structures. The total set of Ensembl genes should therefore be a much more accurate reection of reality than ab initio predictions alone, but it is clear that some novel genes are missed (Hogenesch et al. , 2001). Of the many novel genes that are detected, some are",
+ "Ostell/Spidey/ SSAHA at Sanger Institute http://www.sanger.ac.uk/Software/analysis/SSAHA/ human and mouse genomes, where there are large full-length cDNA collections to guide the hunt for genes, Ensembl should be very reliable. From the beginning, many genomic features other than predicted genes were included in Ensembl: different repeat classes, cytological bands, CpG island predic- tions, tRNA gene predictions, expressed sequence clusters from the UniGene database",
+ "Ostell/Spidey/ SSAHA at Sanger Institute http://www.sanger.ac.uk/Software/analysis/SSAHA/ human and mouse genomes, where there are large full-length cDNA collections to guide the hunt for genes, Ensembl should be very reliable. From the beginning, many genomic features other than predicted genes were included in Ensembl: different repeat classes, cytological bands, CpG island predic- tions, tRNA gene predictions, expressed sequence clusters from the UniGene database",
+ "Ostell/Spidey/ SSAHA at Sanger Institute http://www.sanger.ac.uk/Software/analysis/SSAHA/ human and mouse genomes, where there are large full-length cDNA collections to guide the hunt for genes, Ensembl should be very reliable. From the beginning, many genomic features other than predicted genes were included in Ensembl: different repeat classes, cytological bands, CpG island predic- tions, tRNA gene predictions, expressed sequence clusters from the UniGene database",
+ "database, which aims to compile a non-redundant, curated data set representing current knowledge of known genes (Wheeler et al. , 2002; http://www.ncbi.nlm.nih. gov/entrez/query.fcgi?db=gene). Like the Ensembl protocol, many Acembly- predicted structures (the NCBI estimate 42 per cent) are incomplete. These struc- tures can be displayed alongside ab initio gene models, Ensembl-predicted genes, and matching UniGene clusters to allow users to make their own conclusions about the likeliest gene structure.",
+ "database, which aims to compile a non-redundant, curated data set representing current knowledge of known genes (Wheeler et al. , 2002; http://www.ncbi.nlm.nih. gov/entrez/query.fcgi?db=gene). Like the Ensembl protocol, many Acembly- predicted structures (the NCBI estimate 42 per cent) are incomplete. These struc- tures can be displayed alongside ab initio gene models, Ensembl-predicted genes, and matching UniGene clusters to allow users to make their own conclusions about the likeliest gene structure.",
+ "database, which aims to compile a non-redundant, curated data set representing current knowledge of known genes (Wheeler et al. , 2002; http://www.ncbi.nlm.nih. gov/entrez/query.fcgi?db=gene). Like the Ensembl protocol, many Acembly- predicted structures (the NCBI estimate 42 per cent) are incomplete. These struc- tures can be displayed alongside ab initio gene models, Ensembl-predicted genes, and matching UniGene clusters to allow users to make their own conclusions about the likeliest gene structure.",
+ "populations as Ensembl reects the progress of the International Haplotype Map Project (Thorisson et al. , 2005). More speculative data, such as GENSCAN-predicted exons that have not been incorporated into Ensembl-conrmed genes, may also be viewed. This means that the display can be used as a workbench for the user to develop personalized an- notation. For example, one may discover novel exons by nding GENSCAN exon predictions which coincide with good matches to a fragment of the draft mouse"
+ ],
+ "answer": "Ensembl is a genomic database that identifies the positions of known human genes from public sequence database entries, usually using tools like GENEWISE to predict their exon structures. It aims to provide a more accurate reflection of reality compared to ab initio predictions alone [1]. Ensembl also includes many genomic features other than predicted genes, such as different repeat classes, cytological bands, CpG island predictions, tRNA gene predictions, and expressed sequence clusters from the UniGene database [4].",
+ "question": "what is ensembl?"
+} \ No newline at end of file