aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorPjotr Prins2024-06-23 04:36:18 -0500
committerPjotr Prins2024-06-23 04:36:18 -0500
commit8d87f4db24ca5a36897fbd454527cbf54aff40cd (patch)
treeb4beba623da108dca56f200ec513d6f89b250af0
parente51a02b508f7a19b5225f87c6daabffa4bf16910 (diff)
downloadgenecup-8d87f4db24ca5a36897fbd454527cbf54aff40cd.tar.gz
README: more info
-rw-r--r--README.md24
1 files changed, 22 insertions, 2 deletions
diff --git a/README.md b/README.md
index 0be306b..04677c7 100644
--- a/README.md
+++ b/README.md
@@ -36,6 +36,14 @@ The main genecup.org service is deployed deterministically (and self contained)
The source code and data are in a git repository: https://git.genenetwork.org/genecup/
+Unpack minipubmed and punkt (see below). And run, for example, using GNU Guix:
+
+```sh
+guix shell -C -N -F python python-flask coreutils-minimal python-bcrypt python-nltk python-numpy python-pandas python-regex python-flask-sqlalchemy edirect inetutils python-keras tensorflow sed -- env EDIRECT_PUBMED_MASTER=minipubmed/ NLTK_DATA=`pwd`/minipubmed ./server.py
+```
+
+and the service should be listening on port 4200.
+
## Mini PubMed for testing
For testing or code development, it is useful to have a small collection of PubMed abstracts in the same format as the local PubMed mirror. We provide 2473 abstracts that can be used to test four gene symbols (gria1, crhr1, drd2, and penk).
@@ -49,17 +57,29 @@ cat pmid.list |fetch-PubMed -path PubMed/Archive/ >test.xml
```
You should see 2473 abstracts in the test.xml file.
+## NLTK tokens
+
+You also need to fetch punkt.zip from https://www.nltk.org/nltk_data/
+
+```sh
+cd minipubmed
+mkdir tokenizers
+cd tokenizers
+wget https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/packages/tokenizers/punkt.zip
+unzip punkt.zip
+```
+
## Source code
https://git.genenetwork.org/genecup/
## Support
-E-mail Pjotr Prins or Hao Chen.
+E-mail [Pjotr Prins](https://thebird.nl) or [Hao Chen](https://www.uthsc.edu/neuroscience-institute/about/faculty/chen.php).
## License
-GeneCup source code is published under the liberal MIT licence (aka expat license)
+GeneCup source code is published under the liberal free software MIT licence (aka expat license)
## Cite