diff options
| author | Pjotr Prins | 2026-03-27 11:38:42 +0100 |
|---|---|---|
| committer | Pjotr Prins | 2026-03-27 11:38:42 +0100 |
| commit | 088240be9ef1c014bf10fb64a8a80fdc278f19db (patch) | |
| tree | b6c7871632209cf5e15952ba39f1e2b2e8475432 /README.md | |
| parent | afa3fd534a558fb2ea11f8c40df968635d4291c7 (diff) | |
| download | genecup-088240be9ef1c014bf10fb64a8a80fdc278f19db.tar.gz | |
Add instruction (README) to install punkt
Diffstat (limited to 'README.md')
| -rw-r--r-- | README.md | 14 |
1 files changed, 13 insertions, 1 deletions
diff --git a/README.md b/README.md index d824f9c..08676b3 100644 --- a/README.md +++ b/README.md @@ -44,12 +44,24 @@ cat pmid.list |fetch-pubmed -path PubMed/Archive/ >test.xml You should see 2473 abstracts in the test.xml file. +## NLTK tokens + +You also need to fetch punkt.zip from https://www.nltk.org/nltk_data/ + +```sh +cd minipubmed +mkdir tokenizers +cd tokenizers +wget https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/packages/tokenizers/punkt.zip +unzip punkt.zip +``` + # Run the server You can use the [guix.scm](./guix.scm) container to run genecup: ```sh -GeneCup$ guix shell -L . -C -N -F genecup-gemini coreutils edirect -- env EDIRECT_PUBMED_MASTER=./minipubmed GEMINI_API_KEY="AIza****" ./server.py --port 4201 +GeneCup$ guix shell -L . -C -N -F genecup-gemini coreutils edirect -- env EDIRECT_PUBMED_MASTER=./minipubmed NLTK_DATA=./minipubmed GEMINI_API_KEY="AIza****" ./server.py --port 4201 ``` ## Development |
