aboutsummaryrefslogtreecommitdiff
path: root/Readme.md
diff options
context:
space:
mode:
Diffstat (limited to 'Readme.md')
-rw-r--r--Readme.md30
1 files changed, 21 insertions, 9 deletions
diff --git a/Readme.md b/Readme.md
index 3fd867c..81afd20 100644
--- a/Readme.md
+++ b/Readme.md
@@ -1,8 +1,10 @@
-# RatsPub: Relationship with Addiction Through Searches of PubMed
+# GeneCup: Mining gene relationships from PubMed using custom ontology
-http://rats.pub
+URL: [http://genecup.org](http://genecup.org)
-RatsPub searches PubMed to find sentences that contain the query terms (e.g., gene symbols) and a drug addiction related keyword. Over 300 of these keywords are organized into six categories:
+GeneCup automatically extracts information from PubMed and NHGRI-EBI GWAS catalog on the relationship of any gene with a custom list of keywords hierarchically organized into an ontology. The users create an ontology by identifying categories of concepts and a list of keywords for each concept.
+
+As an example, we created an ontology for drug addiction related concepts over 300 of these keywords are organized into six categories:
* names of abused drugs, e.g., opioids
* terms describing addiction, e.g., relapse
* key brain regions implicated in addiction, e.g., ventral striatum
@@ -10,9 +12,9 @@ RatsPub searches PubMed to find sentences that contain the query terms (e.g., ge
* synaptic plasticity, e.g., long term potentiation
* intracellular signaling, e.g., phosphorylation
-Live searches are conducted through PubMed to get relevant PMIDs, which are then used to retrieve the abstracts from a local archive. The relationships are presented as an interactive cytoscape graph. The nodes can be moved around to better reveal the connections. Clicking on the links will bring up the corresponding sentences in a new browser window. Stress related sentences are further classified into either systemic or cellular stress using a convolutional neural network.
+Live searches are conducted through PubMed to get relevant PMIDs, which are then used to retrieve the abstracts from a local archive. The relationships are presented as an interactive cytoscape graph. The nodes can be moved around to better reveal the connections. Clicking on the links will bring up the corresponding sentences in a new browser window. Stress related sentences for addiction keywords are further classified into either systemic or cellular stress using a convolutional neural network.
-## Top addiction related genes
+## Top addiction related genes for addiction ontology
0. extract gene symbol, alias and name from NCBI gene_info for taxid 9606.
1. search PubMed to get a count of these names/alias, with addiction keywords and drug name
@@ -23,8 +25,18 @@ Live searches are conducted through PubMed to get relevant PMIDs, which are then
## dependencies
* [local copy of PubMed](https://dataguide.nlm.nih.gov/edirect/archive.html)
-* python flask
-* python nltk
+* python == 3.8
+* see requirements.txt for list of packages and versions
+
+## Mini PubMed for testing
+
+For testing or code development, it is useful to have a small collection of PubMed abstracts in the same format as the local PubMed mirror. We provide 2473 abstracts that can be used to test four gene symbols (gria1, crhr1, drd2, and penk).
-## planned
-* NLP analysis of the senences (topic modeling, ranking, etc.)
+1. install [edirect](https://dataguide.nlm.nih.gov/edirect/install.html) (make sure you refresh your shell after install so the PATH is updated)
+2. unpack the minipubmed.tgz file
+3. test the installation by running:
+```
+cd minipubmed
+cat pmid.list |fetch-PubMed -path PubMed/Archive/ >test.xml
+```
+You should see 2473 abstracts in the test.xml file.