From 088240be9ef1c014bf10fb64a8a80fdc278f19db Mon Sep 17 00:00:00 2001 From: Pjotr Prins Date: Fri, 27 Mar 2026 11:38:42 +0100 Subject: Add instruction (README) to install punkt --- README.md | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index d824f9c..08676b3 100644 --- a/README.md +++ b/README.md @@ -44,12 +44,24 @@ cat pmid.list |fetch-pubmed -path PubMed/Archive/ >test.xml You should see 2473 abstracts in the test.xml file. +## NLTK tokens + +You also need to fetch punkt.zip from https://www.nltk.org/nltk_data/ + +```sh +cd minipubmed +mkdir tokenizers +cd tokenizers +wget https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/packages/tokenizers/punkt.zip +unzip punkt.zip +``` + # Run the server You can use the [guix.scm](./guix.scm) container to run genecup: ```sh -GeneCup$ guix shell -L . -C -N -F genecup-gemini coreutils edirect -- env EDIRECT_PUBMED_MASTER=./minipubmed GEMINI_API_KEY="AIza****" ./server.py --port 4201 +GeneCup$ guix shell -L . -C -N -F genecup-gemini coreutils edirect -- env EDIRECT_PUBMED_MASTER=./minipubmed NLTK_DATA=./minipubmed GEMINI_API_KEY="AIza****" ./server.py --port 4201 ``` ## Development -- cgit 1.4.1