about summary refs log tree commit diff
diff options
context:
space:
mode:
-rw-r--r--genecup_synthesis_prompt.txt12
-rwxr-xr-xratspub.py167
-rw-r--r--requirements.txt50
-rwxr-xr-xserver.py19
-rw-r--r--templates/genenames.html73
5 files changed, 113 insertions, 208 deletions
diff --git a/genecup_synthesis_prompt.txt b/genecup_synthesis_prompt.txt
index 75af3af..c8ee861 100644
--- a/genecup_synthesis_prompt.txt
+++ b/genecup_synthesis_prompt.txt
@@ -9,18 +9,18 @@ Organism:
 Phenotype:

 Candidate Gene: {{gene}}

 

-Goal: To critically evaluate [Candidate Gene] as a plausible causal gene for the [Phenotype] by analyzing the literature excerpts provided at the end of this prompt with appropriate scientific caution.

+Goal: To critically evaluate {{gene}} as a plausible causal gene for the [Phenotype] by analyzing the literature excerpts provided at the end of this prompt with appropriate scientific caution.

 

 2. Required Analysis:

 Please perform the following four-step analysis based on the Source Information provided at the end. Your evaluation must be rigorous and avoid overstating claims. Acknowledge the limitations of interpreting isolated sentences and prioritize a nuanced perspective.

 

 A. Term Disambiguation

 

-For each sentence provided in the "Source Information" section, confirm if the term "[Candidate Gene]" unambiguously refers to the intended gene. If the term is used ambiguously or refers to another scientific concept, state this and exclude the sentence from further analysis. Proceed only with the confirmed sentences.

+For each sentence provided in the "Source Information" section, confirm if the term "{{gene}}" unambiguously refers to the intended gene. If the term is used ambiguously or refers to another scientific concept, state this and exclude the sentence from further analysis. Proceed only with the confirmed sentences.

 

 B. Synthesis of Function and Experimental Context

 

-From the sentences confirmed in Step A, synthesize the known biological functions of [Candidate Gene]. Do not create a single, flattened narrative. Instead, structure your summary to reflect the nuance of the findings:

+From the sentences confirmed in Step A, synthesize the known biological functions of {{gene}}. Do not create a single, flattened narrative. Instead, structure your summary to reflect the nuance of the findings:

 Characterize Each Function: For each reported function, describe what the gene does.

 Note the Experimental System: Specify the context for each finding. Was it observed in vivo (e.g., in a mouse model), in vitro (e.g., in a specific cell line like HEK293), or is it a finding from a computational prediction? (Cite PMID/ID).

 Distinguish Strength of Claims: Differentiate between established, speculative, or indirect roles. For example, note if the source text uses cautious language like "may regulate," "is associated with," or "is thought to be involved in." (Cite PMID/ID).

@@ -28,7 +28,7 @@ Acknowledge Inconsistencies: If any sentences suggest conflicting or different r
 

 C. Critical Evaluation of Causal Gene Plausibility (with In-text Citations)

 

-Construct a detailed scientific evaluation of [Candidate Gene]'s plausibility for [Phenotype]. Your argument must be built cautiously, explicitly weighing the evidence for and against the gene's candidacy. Every claim you make must be immediately followed by its source (PMID/ID).

+Construct a detailed scientific evaluation of {{gene}}'s plausibility for [Phenotype]. Your argument must be built cautiously, explicitly weighing the evidence for and against the gene's candidacy. Every claim you make must be immediately followed by its source (PMID/ID).

 

 Start with an Initial Caveat that acknowledges the inherent limitations of this analysis, such as the small number of excerpts and the lack of full experimental details.

 

@@ -60,8 +60,8 @@ Evaluate the nature of these prior associations. Are they from robust genetic st
 D. Balanced Concluding Assessment

 

 Conclude with a brief, balanced summary that encapsulates the strength of the evidence. This conclusion must reflect the cautious and critical nature of your analysis.

-Summarize Supporting Evidence: Briefly state the strongest, most direct lines of evidence that support [Candidate Gene] as a plausible candidate, citing the key PMIDs.

+Summarize Supporting Evidence: Briefly state the strongest, most direct lines of evidence that support {{gene}} as a plausible candidate, citing the key PMIDs.

 Summarize Limitations and Gaps: Crucially, summarize the most significant weaknesses in the argument. This includes any identified knowledge gaps, lack of specificity, reliance on non-ideal experimental models, or speculative functional links.

-Final Judgment on Plausibility: Provide a final, nuanced statement on whether [Candidate Gene] is a weak, plausible, or strong candidate based only on the provided information. Avoid definitive conclusions and frame the outcome in terms of what further research would be needed to solidify the connection.

+Final Judgment on Plausibility: Provide a final, nuanced statement on whether {{gene}} is a weak, plausible, or strong candidate based only on the provided information. Avoid definitive conclusions and frame the outcome in terms of what further research would be needed to solidify the connection.

 

 3. Source Information:

diff --git a/ratspub.py b/ratspub.py
deleted file mode 100755
index 5621b5e..0000000
--- a/ratspub.py
+++ /dev/null
@@ -1,167 +0,0 @@
-#!/bin/env python3 
-from nltk.tokenize import sent_tokenize
-import os
-import re
-from ratspub_keywords import *
-from gene_synonyms import *
-
-global function_d, brain_d, drug_d, addiction_d, brain_query_term, pubmed_path, genes
-
-## turn dictionary (synonyms) to regular expression
-def undic(dic):
-    return "|".join(dic.values())
-
-def findWholeWord(w):
-    return re.compile(r'\b({0})\b'.format(w), flags=re.IGNORECASE).search
-
-def getSentences(query, gene):
-    abstracts = os.popen("esearch -db pubmed -query " +  query + " | efetch -format uid |fetch-pubmed -path "+ pubmed_path + " | xtract -pattern PubmedArticle -element MedlineCitation/PMID,ArticleTitle,AbstractText|sed \"s/-/ /g\"").read()
-    out=str()
-    for row in abstracts.split("\n"):
-        tiab=row.split("\t")
-        pmid = tiab.pop(0)
-        tiab= " ".join(tiab)
-        sentences = sent_tokenize(tiab)
-        ## keep the sentence only if it contains the gene 
-        for sent in sentences:
-            if findWholeWord(gene)(sent):
-                sent=re.sub(r'\b(%s)\b' % gene, r'<strong>\1</strong>', sent, flags=re.I)
-                out+=pmid+"\t"+sent+"\n"
-    return(out)
-
-def gene_category(gene, cat_d, query, cat):
-    #e.g. BDNF, addiction_d, undic(addiction_d) "addiction"
-    q="\"(" + query.replace("|", " OR ")  + ") AND " + gene + "\""
-    sents=getSentences(q, gene)
-    out=str()
-    for sent in sents.split("\n"):
-        for key in cat_d:
-            if findWholeWord(cat_d[key])(sent) :
-                sent=sent.replace("<b>","").replace("</b>","") # remove other highlights
-                sent=re.sub(r'\b(%s)\b' % cat_d[key], r'<b>\1</b>', sent, flags=re.I) # highlight keyword
-                out+=gene+"\t"+ cat + "\t"+key+"\t"+sent+"\n"
-    return(out)
-
-def generate_nodes(nodes_d, nodetype):
-    # include all search terms even if there are no edges, just to show negative result 
-    json0 =str()
-    for node in nodes_d:
-        json0 += "{ data: { id: '" + node +  "', nodecolor: '" + nodecolor[nodetype] + "', nodetype: '"+nodetype + "', url:'/shownode?nodetype=" + nodetype + "&node="+node+"' } },\n"
-    return(json0)
-
-def generate_nodes_json(nodes_d, nodetype):
-    # include all search terms even if there are no edges, just to show negative result 
-    nodes_json0 =str()
-    for node in nodes_d:
-        nodes_json0 += "{ \"id\": \"" + node +  "\", \"nodecolor\": \"" + nodecolor[nodetype] + "\", \"nodetype\": \"" + nodetype + "\", \"url\":\"/shownode?nodetype=" + nodetype + "&node="+node+"\" },\n"
-    return(nodes_json0)
-
-def generate_edges(data, filename):
-    pmid_list=[]
-    json0=str()
-    edgeCnts={}
-    for line in  data.split("\n"):
-        if len(line.strip())!=0:
-            (source, cat, target, pmid, sent) = line.split("\t")
-            edgeID=filename+"|"+source+"|"+target
-            if (edgeID in edgeCnts) and (pmid+target not in pmid_list):
-                edgeCnts[edgeID]+=1
-                pmid_list.append(pmid+target)
-            elif (edgeID not in edgeCnts) and (pmid+target not in pmid_list):
-                edgeCnts[edgeID]=1
-                pmid_list.append(pmid+target)
-    for edgeID in edgeCnts:
-        (filename, source,target)=edgeID.split("|")
-        json0+="{ data: { id: '" + edgeID + "', source: '" + source + "', target: '" + target + "', sentCnt: " + str(edgeCnts[edgeID]) + ",  url:'/sentences?edgeID=" + edgeID + "' } },\n"
-    return(json0)
-
-def generate_edges_json(data, filename):
-    pmid_list=[]
-    edges_json0=str()
-    edgeCnts={}
-    for line in  data.split("\n"):
-        if len(line.strip())!=0:
-            (source, cat, target, pmid, sent) = line.split("\t")
-            edgeID=filename+"|"+source+"|"+target
-            if (edgeID in edgeCnts) and (pmid+target not in pmid_list):
-                edgeCnts[edgeID]+=1
-                pmid_list.append(pmid+target)
-            elif (edgeID not in edgeCnts) and (pmid+target not in pmid_list):
-                edgeCnts[edgeID]=1
-                pmid_list.append(pmid+target)
-    for edgeID in edgeCnts:
-        (filename, source,target)=edgeID.split("|")
-        edges_json0+="{ \"id\": \"" + edgeID + "\", \"source\": \"" + source + "\", \"target\": \"" + target + "\", \"sentCnt\": \"" + str(edgeCnts[edgeID]) + "\",  \"url\":\"/sentences?edgeID=" + edgeID + "\" },\n"
-    return(edges_json0)
-
-def searchArchived(sets, query, filetype):
-    if sets=='topGene':
-        dataFile="topGene_addiction_sentences.tab"
-        nodes= "{ data: { id: '" + query +  "', nodecolor: '" + "#2471A3" + "', fontweight:700, url:'/progress?query="+query+"' } },\n"
-
-    elif sets=='GWAS':
-        dataFile="gwas_addiction.tab"
-        nodes=str()
-    with open(dataFile, "r") as sents:
-        pmid_list=[]
-        cat1_list=[]
-        catCnt={}
-        for sent in sents:
-            (symb, cat0, cat1, pmid, sent)=sent.split("\t")
-            if (symb.upper() == query.upper()) :
-                if (cat1 in catCnt.keys()) and (pmid+cat1 not in pmid_list):
-                    pmid_list.append(pmid+cat1)
-                    catCnt[cat1]+=1
-                elif (cat1 not in catCnt.keys()):
-                    catCnt[cat1]=1
-                    pmid_list.append(pmid+cat1)
-
-    nodes= "{ data: { id: '" + query +  "', nodecolor: '" + "#2471A3" + "', fontweight:700, url:'/progress?query="+query+"' } },\n"
-    edges=str()
-    gwas_json=str()
-    for key in catCnt.keys():
-        if sets=='GWAS':
-            nc=nodecolor["GWAS"]
-            nodes += "{ data: { id: '" + key +  "', nodecolor: '" + nc + "', url:'https://www.ebi.ac.uk/gwas/search?query="+key.replace("_GWAS","")+"' } },\n"
-        elif key in drug_d.keys():
-            nc=nodecolor["drug"]
-            nodes += "{ data: { id: '" + key +  "', nodecolor: '" + nc + "', url:'/shownode?node="+key+"' } },\n"
-        else:
-            nc=nodecolor["addiction"]
-            nodes += "{ data: { id: '" + key +  "', nodecolor: '" + nc + "', url:'/shownode?node="+key+"' } },\n"
-        edgeID=dataFile+"|"+query+"|"+key
-        edges+="{ data: { id: '" + edgeID+ "', source: '" + query + "', target: '" + key + "', sentCnt: " + str(catCnt[key]) + ",  url:'/sentences?edgeID=" + edgeID + "' } },\n"
-        gwas_json+="{ \"id\": \"" + edgeID + "\", \"source\": \"" + query + "\", \"target\": \"" + key + "\", \"sentCnt\": \"" + str(catCnt[key]) + "\",  \"url\":\"/sentences?edgeID=" + edgeID + "\" },\n"
-    if(filetype == 'cys'):
-        return(nodes+edges)
-    else:
-        return(gwas_json)
-# brain region has too many short acronyms to just use the undic function, so search PubMed using the following 
-brain_query_term="cortex|accumbens|striatum|amygadala|hippocampus|tegmental|mesolimbic|infralimbic|prelimbic|habenula"
-function=undic(function_d)
-addiction=undic(addiction_d)
-drug=undic(drug_d)
-
-gene_s=undic(genes)
-
-nodecolor={'function':"#A9CCE3", 'addiction': "#D7BDE2", 'drug': "#F9E79F", 'brain':"#A3E4D7", 'GWAS':"#AEB6BF", 'stress':"#EDBB99", 'psychiatric':"#F5B7B1"}
-#https://htmlcolorcodes.com/ third column down
-
-n0=generate_nodes(function_d, 'function')
-n1=generate_nodes(addiction_d, 'addiction')
-n2=generate_nodes(drug_d, 'drug')
-n3=generate_nodes(brain_d, 'brain')
-n4=generate_nodes(stress_d, 'stress')
-n5=generate_nodes(psychiatric_d, 'psychiatric')
-n6=''
-
-nj0=generate_nodes_json(function_d, 'function')
-nj1=generate_nodes_json(addiction_d, 'addiction')
-nj2=generate_nodes_json(drug_d, 'drug')
-nj3=generate_nodes_json(brain_d, 'brain')
-nj4=generate_nodes_json(stress_d, 'stress')
-nj5=generate_nodes_json(psychiatric_d, 'psychiatric')
-nj6=''
-
-pubmed_path=os.environ["EDIRECT_PUBMED_MASTER"]
-
diff --git a/requirements.txt b/requirements.txt
index c2ba0ba..5c15516 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -1,33 +1,29 @@
-pandas==1.2.1
-bcrypt==3.1.7
-cffi==1.13.2
-pycparser==2.19
-Flask-SQLAlchemy==2.4.4
+# Core Data Stack (Stable for 3.12)
+numpy>=1.26.0
+pandas==2.2.0
+
+# Core Application Dependencies
 Flask==1.1.2
+Flask-SQLAlchemy==2.4.4
+SQLAlchemy==1.3.23
+bcrypt==3.1.7
+python-dotenv
+pytz
+
+# Natural Language Processing
+nltk==3.5
+
+# Generative AI (Migrated to new SDK)
+google-genai
+
+# Utilities and Sub-dependencies
 Click==7.0
 itsdangerous==1.1.0
 Jinja2==2.11.3
-MarkupSafe==1.0
-Werkzeug==1.0.0
-SQLAlchemy==1.3.23
-Keras==2.4.3
-h5py==2.10.0
-numpy==1.19.5
-six==1.15.0
-Keras-Preprocessing==1.1.2
-PyYAML==5.3.1
-scipy==1.6.0
-nltk==3.5
-regex==2020.11.13
-tensorflow==2.4.1
-absl-py==0.11.0
-astunparse==1.6.3
-gast==0.3.3
-grpcio==1.32.0
-protobuf==3.14.0
-tensorboard==2.4.1
-Markdown==3.3.3
+MarkupSafe==2.0.1
 Werkzeug==1.0.1
+Markdown==3.3.3
+cffi==1.17.0
+pycparser==2.19
+six==1.17.0
 wheel==0.36.2
-tensorflow-estimator==2.4.0
-python==3.8.5
diff --git a/server.py b/server.py
index 19d7486..d9b4ef3 100755
--- a/server.py
+++ b/server.py
@@ -34,6 +34,17 @@ from datetime import datetime
 # Gemini API related imports
 import google.generativeai as genai
 
+# Removed TensorFlow and Keras related imports
+# import tensorflow
+# import tensorflow.keras
+# from nltk.corpus import stopwords # Removed
+# from nltk.stem.porter import PorterStemmer # Removed
+# from tensorflow.keras import backend as K # Removed
+# from tensorflow.keras import metrics, optimizers # Removed
+# from tensorflow.keras.layers import * # Removed (Dense, Embedding, Flatten, Conv1D, MaxPooling1D)
+# from tensorflow.keras.models import Model, Sequential # Removed
+# from tensorflow.keras.preprocessing.sequence import pad_sequences # Removed
+# from tensorflow.keras.preprocessing.text import Tokenizer # Removed
 import re
 import ast
 from more_functions import *
@@ -124,7 +135,7 @@ def classify_stress_with_gemini(sentence_text):
         return "error_no_prompt_template"
 
     try:
-        model_gemini = genai.GenerativeModel('gemini-3-flash-preview')
+        model_gemini = genai.GenerativeModel('gemini-2.5-pro')
         
         # Append the new sentence and the final instruction to the prompt template
         # This is safer than .format() when the template contains its own curly braces.
@@ -155,7 +166,7 @@ def classify_stress_with_gemini(sentence_text):
         return "error_no_api_key"
 
     try:
-        model_gemini = genai.GenerativeModel('gemini-3-flash-preview')
+        model_gemini = genai.GenerativeModel('gemini-2.5-pro')
         prompt = f"""Classify the following sentence based on whether it describes 'systemic stress' or 'cellular stress'.
 Please return ONLY the word 'systemic' if it describes systemic stress, or ONLY the word 'cellular' if it describes cellular stress. Do not add any other explanation or punctuation.
 
@@ -1585,7 +1596,7 @@ Here are the sentences to classify:
 {sentences_to_classify_str}
 """
                 # Call the API
-                model_gemini = genai.GenerativeModel('gemini-3-flash-preview')
+                model_gemini = genai.GenerativeModel('gemini-2.5-pro')
                 response = model_gemini.generate_content(batched_prompt)
 
                 # Step 3: Parse the JSON response
@@ -2039,4 +2050,4 @@ def top150genes():
 
 if __name__ == '__main__':
     # For production, consider using a more robust web server like Gunicorn or Waitress
-    app.run(debug=True, host='0.0.0.0', port=4200) # Changed to 0.0.0.0 for accessibility if needed
+    app.run(debug=True, host='0.0.0.0', port=4200) # Changed to 0.0.0.0 for accessibility if needed
\ No newline at end of file
diff --git a/templates/genenames.html b/templates/genenames.html
index fe22d0b..d1e4960 100644
--- a/templates/genenames.html
+++ b/templates/genenames.html
@@ -18,11 +18,76 @@
     {%endfor%}
     </ul>
     <br>
-    {%else%}
+    {# --- Added Section for Gemini Prompt --- #}
+    {% if prompt %}
+        <div style="display: flex; align-items: center; gap: 10px;">
+            <h4>LLM Prompt for {{gene}}</h4>
+            <button id="copy-button" onclick="copyPromptToClipboard()" class="btn btn-secondary btn-sm">Copy Prompt</button>
+        </div>
+        <textarea id="prompt-textarea" rows="20" cols="100" readonly style="width:100%; font-family: monospace; white-space: pre-wrap; word-wrap: break-word;">
+{{ prompt }}
+        </textarea>
+        
+        <script>
+        function copyPromptToClipboard() {
+            // Get the textarea element
+            var textArea = document.getElementById("prompt-textarea");
+            var copyButton = document.getElementById("copy-button");
+
+            try {
+                // Modern browsers: Use the Clipboard API
+                navigator.clipboard.writeText(textArea.value).then(function() {
+                    // Success feedback
+                    copyButton.innerText = "Copied!";
+                    setTimeout(function() {
+                        copyButton.innerText = "Copy Prompt";
+                    }, 2000); // Revert back to "Copy" after 2 seconds
+                }, function(err) {
+                    // Error callback for modern API
+                    console.error('Async: Could not copy text: ', err);
+                    // If this fails, try the fallback method
+                    fallbackCopyTextToClipboard(textArea, copyButton);
+                });
+
+            } catch (err) {
+                // If navigator.clipboard is not supported at all, go directly to fallback
+                console.log('Clipboard API not available, using fallback.');
+                fallbackCopyTextToClipboard(textArea, copyButton);
+            }
+        }
+
+        function fallbackCopyTextToClipboard(element, button) {
+            // Select the text field
+            element.select();
+            
+            // --- THIS IS THE FIX ---
+            // Set the selection range to the full length of the content.
+            // This removes the arbitrary 99999 character limit.
+            element.setSelectionRange(0, element.value.length); 
+            
+            try {
+                var successful = document.execCommand('copy');
+                if (successful) {
+                    button.innerText = "Copied!";
+                    setTimeout(function() {
+                        button.innerText = "Copy Prompt";
+                    }, 2000);
+                } else {
+                    alert('Oops, unable to copy. Please copy manually.');
+                }
+            } catch (err) {
+                console.error('Fallback: Oops, unable to copy', err);
+                alert('Oops, unable to copy. Please copy manually.');
+            }
+        }
+        </script>
+    {% else %}
+        <p>Prompt generation failed or no sentences found.</p>
+    {% endif %}
+    {# --- END OF MODIFIED SECTION --- #}
+
+{%else%}
         No synonym for {{gene}} is found. 
 {%endif%} 
 <br>
 {% endblock %}
-
-
-