genenetwork3

GeneNetwork3 REST API for data science and machine learning

Installation

GNU Guix packages

Install GNU Guix - this can be done on every running Linux system.

There are at least three ways to start GeneNetwork3 with GNU Guix:

Create an environment with guix shell
Create a container with guix shell -C
Use a profile and shell settings with source ~/opt/genenetwork3/etc/profile

Create an environment:

Simply load up the environment (for development purposes):

guix shell -Df guix.scm

Also, make sure you have the guix-bioinformatics channel set up.

guix shell --expose=$HOME/genotype_files/ -Df guix.scm
python3
  import redis

Run a Guix container

guix shell -C --network --expose=$HOME/genotype_files/ -Df guix.scm

Using a Guix profile (or rolling back)

Create a new profile with

guix package -i genenetwork3 -p ~/opt/genenetwork3

and load the profile settings with

source ~/opt/genenetwork3/etc/profile
start server...

Note that GN2 profiles include the GN3 profile (!). To roll genenetwork3 back you can use either in the same fashion (probably best to start a new shell first)

bash
source ~/opt/genenetwork2-older-version/etc/profile
set|grep store
run tests, server etc...

Troubleshooting Guix packages

If you get a Guix error, such as ice-9/boot-9.scm:1669:16: In procedure raise-exception: error: python-sqlalchemy-stubs: unbound variable it typically means an update to guix latest is required (i.e., guix pull):

guix pull
source ~/.config/guix/current/etc/profile

and try again. Also make sure your ~/guix-bioinformatics is up to date.

Running Tests

(assuming you are in a guix container; otherwise use venv!)

To run tests:

pytest

To specify unit-tests:

pytest -k unit_test

Running pylint:

pylint *py tests gn3

Running mypy(type-checker):

mypy .

Running the GN3 web service

To spin up the server on its own (for development):

env FLASK_DEBUG=1 FLASK_APP="main.py" flask run --port=8080

And test with

curl localhost:8080/api/version
"1.0"

To run with gunicorn

gunicorn --bind 0.0.0.0:8080 wsgi:app

consider the following options for development --bind 0.0.0.0:$SERVER_PORT --workers=1 --timeout 180 --reload wsgi.

And for the scalable production version run

gunicorn --bind 0.0.0.0:8080 --workers 8 --keep-alive 6000 --max-requests 10 --max-requests-jitter 5 --timeout 1200 wsgi:app

(see also the .guix_deploy script)

Using python-pip

IMPORTANT NOTE: we do not recommend using pip tools, use Guix instead

Prepare your system. You need to make you have python > 3.8, and the ability to install modules.
Create and enter your virtualenv:

virtualenv --python python3 venv
. venv/bin/activate

Install the required packages

# The --ignore-installed flag forces packages to
# get installed in the venv even if they existed
# in the global env
pip install -r requirements.txt --ignore-installed

A note on dependencies

Make sure that the dependencies in the requirements.txt file match those in guix. To freeze dependencies:

# Consistent way to ensure you don't capture globally
# installed packages
pip freeze --path venv/lib/python3.8/site-packages > requirements.txt

Genotype Files

You can get the genotype files from http://ipfs.genenetwork.org/ipfs/QmXQy3DAUWJuYxubLHLkPMNCEVq1oV7844xWG2d1GSPFPL and save them on your host machine at, say $HOME/genotype_files with something like:

$ mkdir -p $HOME/genotype_files
$ cd $HOME/genotype_files
$ yes | 7z x genotype_files.tar.7z
$ tar xf genotype_files.tar

The genotype_files.tar.7z file seems to only contain the BXD.geno genotype file.

QTLReaper (rust-qtlreaper) and Trait Files

To run QTL computations, this system makes use of the rust-qtlreaper utility.

To do this, the system needs to export the trait data into a tab-separated file, that can then be passed to the utility using the --traits option. For more information about the available options, please see the rust-qtlreaper repository.

Traits File Format

The traits file begins with a header row/line with the column headers. The first column in the file has the header "Trait". Every other column has a header for one of the strains in consideration.

Under the "Trait" column, the traits are numbered from T1 to T where is the count of the total number of traits in consideration.

As an example, you could end up with a trait file like the following:

Trait   BXD27   BXD32   DBA/2J  BXD21   ...
T1  10.5735 9.27408 9.48255 9.18253 ...
T2  6.4471  6.7191  5.98015 6.68051 ...
...

It is very important that the column header names for the strains correspond to the genotype file used.

Partial Correlations

The partial correlations feature depends on the following external systems to run correctly:

Redis: Acts as a communications broker between the webserver and external processes
sheepdog/worker.py: Actually runs the external processes that do the computations

These two systems should be running in the background for the partial correlations feature to work correctly.