aboutsummaryrefslogtreecommitdiff
path: root/gnqa/paper2_eval
diff options
context:
space:
mode:
Diffstat (limited to 'gnqa/paper2_eval')
-rw-r--r--gnqa/paper2_eval/README.md27
-rw-r--r--gnqa/paper2_eval/data/lists/gpt4o-queries.json (renamed from gnqa/paper2_eval/data/gpt4o-queries.json)0
-rw-r--r--gnqa/paper2_eval/data/lists/human-questions.json (renamed from gnqa/paper2_eval/data/human-questions.json)0
3 files changed, 27 insertions, 0 deletions
diff --git a/gnqa/paper2_eval/README.md b/gnqa/paper2_eval/README.md
index 13cb113..8dff6f6 100644
--- a/gnqa/paper2_eval/README.md
+++ b/gnqa/paper2_eval/README.md
@@ -4,3 +4,30 @@
This directory contains the code created to evaluate questions submitted to GNQA.
Unlike the evaluation in paper 1, this work uses different LLMs and a different RAG engine.
RAGAS is still used to evaluate the queries.
+
+The RAG engine being used is [R2R](https://github.com/SciPhi-AI/R2R). It is open source and has performance similar to the engine we used for our 1st GNQA paper.
+
+The evaluation workflow is organized around reading questions that can be organized with two sets of categories, e.g. category 1 - who asked the questions, category 2 - the field to which the question belongs.
+In our initial work our category 1 consists of citizen scientists and domain experts.
+While category 2 consists of three fields or specializations: Genenetwork.org systems genetics, the genetics of diabetes and the genetics of aging.
+
+We will have make the code more configurable by pulling the categories out of the source code and keeping them strictly in settings files.
+
+It is best to define a structure for your different types of data: sets, lists, responses, and scores.
+
+| File Operator | From directory | To directory | command |
+|:---:|---:|---:|:--|
+| create_dataset | list | dataset | python create_dataset.py \
+| | | |     ../data/lists/list_catA_catB.json \ |
+| | | |     ../data/dataset/catA_catB.json |
+| run_questions | list | responses |
+| | | |     ../data/list/catA_question_list.json \ |
+| | | |     ../data/responses/resp_catA_catB.json |
+| parse_r2r_result | responses | dataset | |
+| | | |     ../data/responses/resp_catA_catB.json \ |
+| | | |     ../data/dataset/intermediate_files/catA_catB_.json |
+| ragas_eval | dataset | scores | python3 ragas_eval.py \ |
+| | | |     ../data/datasets/catA/catB_1.json \ |
+| | | |     ../data/scores/catA/catB_1.json \ |
+| | | |     3 # run evaluation 3 times |
+ \ No newline at end of file
diff --git a/gnqa/paper2_eval/data/gpt4o-queries.json b/gnqa/paper2_eval/data/lists/gpt4o-queries.json
index 74c18b0..74c18b0 100644
--- a/gnqa/paper2_eval/data/gpt4o-queries.json
+++ b/gnqa/paper2_eval/data/lists/gpt4o-queries.json
diff --git a/gnqa/paper2_eval/data/human-questions.json b/gnqa/paper2_eval/data/lists/human-questions.json
index 4142e5b..4142e5b 100644
--- a/gnqa/paper2_eval/data/human-questions.json
+++ b/gnqa/paper2_eval/data/lists/human-questions.json