aboutsummaryrefslogtreecommitdiff
path: root/gnqa/README.md
blob: 577c7b07aff2312e6489fc4a9d5b74d5229a7c85 (about) (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
# GeneNetwork Question Answer (GNQA) Study Evaluation


This directory contains the code created to evaluate questions submitted to GNQA.
Unlike the evaluation in paper 1, this work uses different LLMs and a different RAG engine.
RAGAS is still used to evaluate the queries.

The RAG engine being used is [R2R](https://github.com/SciPhi-AI/R2R). It is open source and has performance similar to the engine we used for our 1st GNQA paper.

The evaluation workflow is organized around reading questions that can be organized with two sets of categories, e.g. category 1 - who asked the questions, category 2 - the field to which the question belongs.
In our initial work our category 1 consists of citizen scientists and domain experts.
While category 2 consists of three fields or specializations: Genenetwork.org systems genetics, the genetics of diabetes and the genetics of aging.

We will have make the code more configurable by pulling the categories out of the source code and keeping them strictly in settings files.

It is best to define a structure for your different types of data: sets, lists, responses, and scores.

## Tasks

1. Create list(s) of questions (not automated)
1. Run question list through RAG (automated)
1. Save responses (automated)
1. Create datasets from responses (automated)
1. Run datasets through evaluator to get scores (not automated)
1. Create plots of scores (not automated)

## Covering the tasks

*ID refers to the task number from the previous section*

| ID | File Operator | From directory | To directory | command |
|:--|:---:|---:|---:|:--|
| 2 | run_questions | lists | responses | `python run_questions.py lists/catA_question_list.json responses/resp_catA_catB.json` |
| 3 | parse_r2r_result | responses | datasets | `python parse_r2r_result.py responses/resp_catA_catB.json datasets/intermediate_files/catA_catB_.json` |
| 4 | create_dataset | lists | datasets | `python create_dataset.py lists/list_catA_catB.json dataset/catA_catB.json` |
| 5 | ragas_eval | datasets | scores | `python3 ragas_eval.py datasets/catA/catB_1.json scores/catA/catB_1.json 3` # run evaluation 3 times |