summaryrefslogtreecommitdiff
path: root/topics
diff options
context:
space:
mode:
authorSidiBlak2023-06-23 18:35:15 +0300
committerGitHub2023-06-23 18:35:15 +0300
commit61df9620396b423eb7e945ee5d0630f799d41b65 (patch)
tree8d57f6dcf2c92684e1ec5f58fe3949591aa1fba8 /topics
parent2732639dfd2e96a70c824e00048455b61a4ec86c (diff)
downloadgn-gemtext-61df9620396b423eb7e945ee5d0630f799d41b65.tar.gz
Create llm-metadata.gmi
Diffstat (limited to 'topics')
-rw-r--r--topics/lmms/llm-metadata.gmi40
1 files changed, 40 insertions, 0 deletions
diff --git a/topics/lmms/llm-metadata.gmi b/topics/lmms/llm-metadata.gmi
new file mode 100644
index 0000000..431be4e
--- /dev/null
+++ b/topics/lmms/llm-metadata.gmi
@@ -0,0 +1,40 @@
+# Large Language Models (LLMs) & Metadata
+
+* assigned: soloshelby (S. Solomon Darnell), priscilla
+* contact: bonfacekilz (Munyoki)
+* keywords: gnsoc, LLMs, metadata
+
+## Integrate an LLM Q&A system into gn.genenetwork.org
+This development will be done in stages:
+* [X] 1 - get API access to FahamuAI GeneNetwork Q&A system
+* [ ] 2 - create local python Flask sandbox
+* [ ] 3 - create UI for Q&A window that fits into current GN framework
+* [ ] 4 - create CI/CD tests for new module
+* [ ] 5 - integrate new functionality into GN1 & GN2
+
+## Add GN metadata
+* [ ] export GN RDF triples
+* [ ] convert data of triples into plain English sentences
+* [ ] submit triples-based sentences to Q&A LLM
+* [ ] submit RDF metadata to an Oracle to support Q&A system truthfulness
+
+## Set up system update protocol
+These are all living systems that must be kept up-to-date.
+GN is consistently being used for research and we are improving its design and functionality to make this statement perpetually true.
+In order to keep the Q&A system up-to-date we must:
+* [ ] create protocol to get new publications
+* [ ] query web for new publications utilizing GN
+* [ ] pull links to the newly found documents
+* [ ] acquire the documents
+* [ ] process documents for LLM
+
+The National Library of Medicine's PubMed is a National Institute of Health system that is one of the most widely used resources for researchers found
+PubMed is consistently updated by the NIH, so we must build a script to:
+* [ ] poll its API on a regular basis
+* [ ] download new citations,
+* [ ] parse citations and metadata for input into LLM
+* [ ] upload new data into LLM
+
+By ensuring up-to-date information about the main information sources for the GeneNetwork Q&A system, the system grows with the knowledgebase.
+
+Add functionality that allows someone to submit documentation to the system, which is added after being reviewed by a specialist.