From a7d660f91002a81d7bcfdfcae54b2d879bf5100b Mon Sep 17 00:00:00 2001 From: Pjotr Prins Date: Sat, 22 Jun 2024 04:28:16 -0500 Subject: README: add deployment and Guix info --- Readme.md | 22 +++++++++++++++------- 1 file changed, 15 insertions(+), 7 deletions(-) (limited to 'Readme.md') diff --git a/Readme.md b/Readme.md index d0ef0b4..b6888ea 100644 --- a/Readme.md +++ b/Readme.md @@ -2,7 +2,7 @@ URL: [https://genecup.org](https://genecup.org) -GeneCup automatically extracts information from PubMed and NHGRI-EBI GWAS catalog on the relationship of any gene with a custom list of keywords hierarchically organized into an ontology. The users create an ontology by identifying categories of concepts and a list of keywords for each concept. +GeneCup automatically extracts information from PubMed and NHGRI-EBI GWAS catalog on the relationship of any gene with a custom list of keywords hierarchically organized into an ontology. The users create an ontology by identifying categories of concepts and a list of keywords for each concept. As an example, we created an ontology for drug addiction related concepts over 300 of these keywords are organized into six categories: * names of abused drugs, e.g., opioids @@ -17,24 +17,32 @@ Live searches are conducted through PubMed to get relevant PMIDs, which are then ## Top addiction related genes for addiction ontology 0. extract gene symbol, alias and name from NCBI gene_info for taxid 9606. -1. search PubMed to get a count of these names/alias, with addiction keywords and drug name -2. sort the genes with top counts, retrieve the abstracts and extract sentences with the 1) symbols and alias and 2) one of the keywords. manually check if there are stop words need to be removed. +1. search PubMed to get a count of these names/alias, with addiction keywords and drug name +2. sort the genes with top counts, retrieve the abstracts and extract sentences with the 1) symbols and alias and 2) one of the keywords. manually check if there are stop words need to be removed. 3. sort the genes based on the number of abstracts with useful sentences. 4. generate the final list, include symbol, alias, and name -## dependencies +## Dependencies * [local copy of PubMed](https://dataguide.nlm.nih.gov/edirect/archive.html) * python == 3.8 -* see requirements.txt for list of packages and versions +* see requirements.txt for list of packages and versions + +## Deploy with GNU Guix + +The main genecup.org service is deployed deterministically (and self contained) using GNU Guix. See https://issues.genenetwork.org/topics/deploy/genecup and https://git.genenetwork.org/guix-bioinformatics/ + +## Development + +The source code and data are in a git repository: https://git.genenetwork.org/genecup/ ## Mini PubMed for testing For testing or code development, it is useful to have a small collection of PubMed abstracts in the same format as the local PubMed mirror. We provide 2473 abstracts that can be used to test four gene symbols (gria1, crhr1, drd2, and penk). -1. install [edirect](https://dataguide.nlm.nih.gov/edirect/install.html) (make sure you refresh your shell after install so the PATH is updated) +1. install [edirect](https://dataguide.nlm.nih.gov/edirect/install.html) (make sure you refresh your shell after install so the PATH is updated) 2. unpack the minipubmed.tgz file -3. test the installation by running: +3. test the installation by running: ``` cd minipubmed cat pmid.list |fetch-PubMed -path PubMed/Archive/ >test.xml -- cgit v1.2.3