diff options
author | SidiBlak | 2023-06-23 18:35:15 +0300 |
---|---|---|
committer | GitHub | 2023-06-23 18:35:15 +0300 |
commit | 61df9620396b423eb7e945ee5d0630f799d41b65 (patch) | |
tree | 8d57f6dcf2c92684e1ec5f58fe3949591aa1fba8 /topics/lmms/llm-metadata.gmi | |
parent | 2732639dfd2e96a70c824e00048455b61a4ec86c (diff) | |
download | gn-gemtext-61df9620396b423eb7e945ee5d0630f799d41b65.tar.gz |
Create llm-metadata.gmi
Diffstat (limited to 'topics/lmms/llm-metadata.gmi')
-rw-r--r-- | topics/lmms/llm-metadata.gmi | 40 |
1 files changed, 40 insertions, 0 deletions
diff --git a/topics/lmms/llm-metadata.gmi b/topics/lmms/llm-metadata.gmi new file mode 100644 index 0000000..431be4e --- /dev/null +++ b/topics/lmms/llm-metadata.gmi @@ -0,0 +1,40 @@ +# Large Language Models (LLMs) & Metadata + +* assigned: soloshelby (S. Solomon Darnell), priscilla +* contact: bonfacekilz (Munyoki) +* keywords: gnsoc, LLMs, metadata + +## Integrate an LLM Q&A system into gn.genenetwork.org +This development will be done in stages: +* [X] 1 - get API access to FahamuAI GeneNetwork Q&A system +* [ ] 2 - create local python Flask sandbox +* [ ] 3 - create UI for Q&A window that fits into current GN framework +* [ ] 4 - create CI/CD tests for new module +* [ ] 5 - integrate new functionality into GN1 & GN2 + +## Add GN metadata +* [ ] export GN RDF triples +* [ ] convert data of triples into plain English sentences +* [ ] submit triples-based sentences to Q&A LLM +* [ ] submit RDF metadata to an Oracle to support Q&A system truthfulness + +## Set up system update protocol +These are all living systems that must be kept up-to-date. +GN is consistently being used for research and we are improving its design and functionality to make this statement perpetually true. +In order to keep the Q&A system up-to-date we must: +* [ ] create protocol to get new publications +* [ ] query web for new publications utilizing GN +* [ ] pull links to the newly found documents +* [ ] acquire the documents +* [ ] process documents for LLM + +The National Library of Medicine's PubMed is a National Institute of Health system that is one of the most widely used resources for researchers found +PubMed is consistently updated by the NIH, so we must build a script to: +* [ ] poll its API on a regular basis +* [ ] download new citations, +* [ ] parse citations and metadata for input into LLM +* [ ] upload new data into LLM + +By ensuring up-to-date information about the main information sources for the GeneNetwork Q&A system, the system grows with the knowledgebase. + +Add functionality that allows someone to submit documentation to the system, which is added after being reviewed by a specialist. |