diff options
author | Pjotr Prins | 2023-06-18 14:30:50 +0200 |
---|---|---|
committer | Pjotr Prins | 2023-06-18 14:30:50 +0200 |
commit | 24e3cc15aed86b6922a9c1c2c52f60d160f36e18 (patch) | |
tree | 8a369086bbb6f2d4ffe04be5955fc614b706eae4 | |
parent | 4e5bd4c2af89885b1fc4ceb33d54427898b5eed1 (diff) | |
download | gn-gemtext-24e3cc15aed86b6922a9c1c2c52f60d160f36e18.tar.gz |
Started work on installation docs again -- moved out of GN README
-rw-r--r-- | topics/deployment.gmi | 7 | ||||
-rw-r--r-- | topics/developing-against-gn.gmi | 198 | ||||
-rw-r--r-- | topics/installation.gmi | 347 |
3 files changed, 549 insertions, 3 deletions
diff --git a/topics/deployment.gmi b/topics/deployment.gmi index 92a2c01..7cf6dec 100644 --- a/topics/deployment.gmi +++ b/topics/deployment.gmi @@ -2,9 +2,7 @@ # Description -This page attempts to document the deployment process we have for GeneNetwork. -We use Guix system containers for deployment of CI/CD and -the Guix configuration for the CI/CD container should be considered the authoritative reference. +This page attempts to document the deployment process we have for GeneNetwork. We use Guix system containers for deployment of CI/CD and the Guix configuration for the CI/CD container should be considered the authoritative reference. => https://github.com/genenetwork/genenetwork-machines/blob/main/genenetwork-development.scm @@ -14,7 +12,10 @@ See also ## genenetwork2 +To install GN2 by hand for development we also track +=> ./developing-against-gn.gmi +=> ./installation.gmi ## genenetwork3 diff --git a/topics/developing-against-gn.gmi b/topics/developing-against-gn.gmi new file mode 100644 index 0000000..b94b681 --- /dev/null +++ b/topics/developing-against-gn.gmi @@ -0,0 +1,198 @@ +# Developing against GeneNetwork + +## Configuration + +GeneNetwork2 comes with a [default configuration file](./etc/default_settings.py) +which can be used as a starting point. + +The recommended way to deal with the configurations is to **copy** this default configuration file to a location outside of the repository, say, + +```sh +.../genenetwork2$ cp etc/default_settings.py "${HOME}/configurations/gn2.py" +``` + +then change the appropriate values in the new file. You can then pass in the new +file as the configuration file when launching the application, + +```sh +.../genenetwork2$ bin/genenetwork "${HOME}/configurations/gn2.py" <command-to-run> +``` + +The other option is to override the configurations in `etc/default_settings.py` +by setting the configuration you want to override as an environment variable e.g. +to override the `SQL_URI` value, you could do something like: + +```sh +.../genenetwork2$ env SQL_URI="mysql://<user>:<passwd>@<host>:<port>/<db_name>" \ + bin/genenetwork "${HOME}/configurations/gn2.py" <command-to-run> +``` + +replacing the placeholders in the angle brackets with appropriate values. + +For a detailed breakdown of the configuration variables and their use, see the +[configuration documentation](doc/configurations.org) + +## Run + +Once having installed GN2 it can be run through a browser +interface + +```sh +genenetwork2 +``` + +A quick example is + +```sh +env GN2_PROFILE=~/opt/gn-latest SERVER_PORT=5300 \ + GENENETWORK_FILES=~/data/gn2_data/ \ + GN_PROXY_URL="http://localhost:8080"\ + GN3_LOCAL_URL="http://localhost:8081"\ + SPARQL_ENDPOINT=http://localhost:8892/sparql\ + ./bin/genenetwork2 ./etc/default_settings.py -gunicorn-dev +``` + +For full examples (you may need to set a number of environment +variables), including running scripts and a Python REPL, also see the +startup script [./bin/genenetwork2](https://github.com/genenetwork/genenetwork2/blob/testing/bin/genenetwork2). + +Also mariadb and redis need to be running, see +[INSTALL](./doc/README.org). + +## Debugging + +To run the application under the pdb debugger, you can add the `--with-pdb` +option when launching the application, for example: + +```sh +env GN2_PROFILE=~/opt/gn-latest SERVER_PORT=5300 \ + GENENETWORK_FILES=~/data/gn2_data/ \ + GN_PROXY_URL="http://localhost:8080"\ + GN3_LOCAL_URL="http://localhost:8081"\ + SPARQL_ENDPOINT=http://localhost:8892/sparql\ + ./bin/genenetwork2 ./etc/default_settings.py --with-pdb +``` + +**NOTE**: This should only ever be run in development. +**NOTE 2**: You will probably need to tell pdb to continue at least once before +the system begins serving the pages. + +Now, you can add the `breakpoint()` call wherever you need to debug and the +terminal where you started the application with `--with-pdb` will allow you to +issue commands to pdb to debug your application. + +## Development + +It may be useful to pull in the GN3 python modules locally. For this +use `GN3_PYTHONPATH` environment that gets injected in +the ./bin/genenetwork2 startup. + +A continuously deployed instance of genenetwork2 is available at +[https://cd.genenetwork.org/](https://cd.genenetwork.org/). This +instance is redeployed on every commit provided that the [continuous +integration tests](https://ci.genenetwork.org/jobs/genenetwork2) pass. + +## Testing + +To have tests pass, the redis and mariadb instance should be running, because of +asserts sprinkled in the code base. + +Right now, the only tests running in CI are unittests. Please make +sure the existing unittests are green when submitting a PR. + +From the root directory of the repository, you can run the tests with something +like: + +```sh +env GN_PROFILE=~/opt/gn-latest SERVER_PORT=5300 \ + SQL_URI=<uri-to-override-the-default> \ + ./bin/genenetwork2 ./etc/default_settings.py \ + -c -m pytest -vv +``` + +In the case where you use the default `etc/default_settings.py` configuration file, you can override any setting as demonstrated with the `SQL_URI` setting in the command above. + +In order to avoid having to set up a whole host of settings every time with the `env` command, you could copy the `etc/default_settings.py` file to a new location (outside the repository is best), and pass that to `bin/genenetwork2` instead. + +See +[./bin/genenetwork2](https://github.com/genenetwork/genenetwork2/blob/testing/doc/docker-container.org) +for more details. + +#### Mechanical Rob + +We are building 'Mechanical Rob' automated testing using Python +[requests](https://github.com/genenetwork/genenetwork2/tree/testing/test/requests) +which can be run with: + +```sh +env GN2_PROFILE=~/opt/gn-latest \ + ./bin/genenetwork2 \ + GN_PROXY_URL="http://localhost:8080" \ + GN3_LOCAL_URL="http://localhost:8081 "\ + ./etc/default_settings.py -c \ + ../test/requests/test-website.py -a http://localhost:5003 +``` + +The GN2_PROFILE is the Guix profile that contains all +dependencies. The ./bin/genenetwork2 script sets up the environment +and executes test-website.py in a Python interpreter. The -a switch +says to run all tests and the URL points to the running GN2 http +server. + +#### Unit tests + +To run unittests, first `cd` into the genenetwork2 directory: + +```sh +# You can use the coverage tool to run the tests +# You could omit the -v which makes the output verbose +runcmd coverage run -m unittest discover -v + +# Alternatively, you could run the unittests using: +runpython -m unittest discover -v + +# To generate a report in wqflask/coverage_html_report/: +runcmd coverage html +``` + +The `runcmd` and `runpython` are shell aliases defined in the following way: + +```sh +alias runpython="env GN2_PROFILE=~/opt/gn-latest TMPDIR=/tmp SERVER_PORT=5004 GENENETWORK_FILES=/gnu/data/gn2_data/ GN_PROXY_URL="http://localhost:8080" GN3_LOCAL_URL="http://localhost:8081" ./bin/genenetwork2 + +alias runcmd="time env GN2_PROFILE=~/opt/gn-latest TMPDIR=//tmp SERVER_PORT=5004 GENENETWORK_FILES=/gnu/data/gn2_data/ GN_PROXY_URL="http://localhost:8080" GN3_LOCAL_URL="http://localhost:8081" ./bin/genenetwork2 ./etc/default_settings.py -cli" +``` + +Replace some of the env variables as per your use case. + +### Troubleshooting + +If the menu does not pop up check your `GN2_BASE_URL`. E.g. + +``` +curl http://gn2-pjotr.genenetwork.org/api/v_pre1/gen_dropdown +``` + +check the logs. If there is ERROR 1054 (42S22): Unknown column +'InbredSet.Family' in 'field list' it may be you are trying the small +database. + +### Run Scripts + +As part of the profiling effort, some scripts are added to run specific parts of the system under a profiler without running the entire web-server - as such, to run the script, you could do something like: + +``` +env HOME=/home/frederick \ + GN2_PROFILE=~/opt/gn2-latest \ + GN3_DEV_REPO_PATH=~/genenetwork/genenetwork3 \ + SQL_URI="mysql://username:password@host-ip:host-port/db_webqtl" \ + SPARQL_ENDPOINT=http://localhost:8892/sparql\ + SERVER_PORT=5001 \ + bin/genenetwork2 ../gn2_settings.py \ + -cli python3 -m scripts.profile_corrs \ + ../performance_$(date +"%Y%m%dT%H:%M:%S").profile +``` + +and you can find the performance metrics at the file specified, in this case, a file starting with `performance_` with the date and time of the run, and ending with `.profile`. + +Please replace the environment variables in the sample command above with the appropriate values for your environment. diff --git a/topics/installation.gmi b/topics/installation.gmi new file mode 100644 index 0000000..74bfe91 --- /dev/null +++ b/topics/installation.gmi @@ -0,0 +1,347 @@ +# Installation + +* Introduction + +Large system deployments can get very [[http://genenetwork.org/environments/][complex]]. In this document we +explain the GeneNetwork version 2 (GN2) reproducible deployment system +which is based on GNU Guix (see also [[https://github.com/pjotrp/guix-notes/blob/master/README.md][Guix-notes]]). The Guix +system can be used to install GN with all its files and dependencies. + +The official installation path is from a checked out version of the +main Guix package tree and that of the Genenetwork package +tree. Current supported versions can be found as the SHA values of +'gn-latest' branches of [[https://gitlab.com/genenetwork/guix-bioinformatics][Guix bioinformatics]] and [[https://gitlab.com/genenetwork/guix][GNU Guix]]. + +For a full view of runtime dependencies as defined by GNU Guix, see +an example of the [[#gn2-dependency-graph][GN2 Dependency Graph]]. + +* Check list + +To run GeneNetwork the following services need to function: + +1. [ ] GNU Guix with a guix profile for genenetwork2 +1. [ ] A path to the (static) genotype files +1. [ ] Gn-proxy for authentication +1. [ ] The genenetwork3 service +1. [ ] Redis +1. [ ] Mariadb + +* Installing Guix packages + +Make sure to install GNU Guix using the binary download instructions +on the main website. Follow the instructions on +[[GUIX-Reproducible-from-source.org]] to download pre-built binaries. Note +the download amounts to several GBs of data. Debian-derived distros +may support + +: apt-get install guix + +* Creating a GNU Guix profile + +We run a GNU Guix channel with packages at [[https://git.genenetwork.org/guix-bioinformatics/guix-bioinformatics][guix-bioinformatics]]. The README has instructions for hosting a channel, but typically we use the GUIX_PACKAGE_PATH instead. First upgrade to a recent guix with + +: mkdir ~/opt +: guix pull -p ~/opt/guix-pull + +It should upgrade (ignore the locales warnings). You can optionally specify the specific git checkout of guix with + +: guix pull -p ~/opt/guix-pull --commit=f04883d + +which is useful when you need to roll back to an earlier version (sometimes our channel goes out of sync). Next, we install GeneNetwork2 with + +: source ~/opt/guix-pull/etc/profile +: git clone https://git.genenetwork.org/guix-bioinformatics/guix-bioinformatics.git ~/guix-bioinformatics +: cd ~/guix-bioinformatics +: env GUIX_PACKAGE_PATH=$HOME/guix-bioinformatics guix package -i genenetwork2 -p ~/opt/genenetwork2 + +you probably also need guix-past (the upstream channel for older packages): + +: git clone https://gitlab.inria.fr/guix-hpc/guix-past.git ~/guix-past +: cd ~/guix-past +: env GUIX_PACKAGE_PATH=$HOME/guix-bioinformatics:$HOME/guix-past/modules ~/opt/guix-pull/bin/guix package -i genenetwork2 -p ~/opt/genenetwork2 + +ignore the warnings. Guix should install the software without trying +to build everything. If you system insists on building all packages, +try the `--dry-run` switch and fix the [[https://guix.gnu.org/manual/en/html_node/Substitute-Server-Authorization.html][substitutes]]. You may add the +`--substitute-urls="http://guix.genenetwork.org https://ci.guix.gnu.org https://mirror.hydra.gnu.org"` switch. + +The guix.genenetwork.org has most of our packages pre-built(!). To use +it on your own machine the public key is + +#+begin_src scheme +(public-key + (ecc + (curve Ed25519) + (q #E50F005E6DA2F85749B9AA62C8E86BB551CE2B541DC578C4DBE613B39EC9E750#))) +#+end_src + +Once we have a GNU Guix profile, a running database (see below) and the file storage, +we should be ready to fire up GeneNetwork: + +* Running GN2 + +Check out the source with git: + +: git clone git@github.com:genenetwork/genenetwork2.git +: cd genenetwork2 + +Run GN2 with above Guix profile + +: export GN2_PROFILE=$HOME/opt/genenetwork2 +: env TMPDIR=$HOME/tmp WEBSERVER_MODE=DEBUG LOG_LEVEL=DEBUG SERVER_PORT=5012 GENENETWORK_FILES=/export/data/genenetwork/genotype_files SQL_URI=mysql://webqtlout:webqtlout@localhost/db_webqtl ./bin/genenetwork2 etc/default_settings.py -gunicorn-dev + +the debug and logging switches can be particularly useful when +developing GN2. Location and files are the current ones for Penguin2. + +It may be useful to tunnel the web server to your local browser with +an ssh tunnel: + +If you want to test a service running on the server on a certain +port (say 8202) use + + ssh -L 8202:127.0.0.1:8202 -f -N myname@penguin2.genenetwork.org + +And browse on your local machine to http://localhost:8202/ + +* Run gn-proxy + +GeneNetwork requires a separate gn-proxy server which handles +authorisation and access control. For instructions see the +[[https://github.com/genenetwork/gn-proxy][README]]. Note it may already be running on our servers! + +* Run Redis + +Redis part of GN2 deployment and will be started by the ./bin/genenetwork2 +startup script. + +* Run MariaDB server +** Install MariaDB with GNU GUIx + +These are the steps you can take to install a fresh installation of +mariadb (which comes as part of the GNU Guix genenetwork2 install). + +As root configure the Guix profile + +: . ~/opt/genenetwork2/etc/profile + +and run for example + +#+BEGIN_SRC bash +adduser mariadb && addgroup mariadb +mkdir -p /export2/mariadb/database +chown mariadb.mariadb -R /export2/mariadb/ +mkdir -p /var/run/mysqld +chown mariadb.mariadb /var/run/mysqld +su mariadb +mysql --version + mysql Ver 15.1 Distrib 10.1.45-MariaDB, for Linux (x86_64) using readline 5.1 +mysql_install_db --user=mariadb --datadir=/export2/mariadb/database +mysqld -u mariadb --datadir=/exportdb/mariadb/database/mariadb --explicit_defaults_for_timestamp -P 12048" +#+END_SRC + +If you want to run as root you may have to set + +: /etc/my.cnf +: [mariadbd] +: user=root + +You also need to set + +: ft_min_word_len = 3 + +To make sure word text searches (shh) work and rebuild the tables if +required. + +To check error output in a file on start-up run with something like + +: mariadbd -u mariadb --console --explicit_defaults_for_timestamp --datadir=/gnu/mariadb --log-error=~/test.log + +Other tips are that Guix installs mariadbd in your profile, so this may work + +: /home/user/.guix-profile/bin/mariadbd -u mariadb --explicit_defaults_for_timestamp --datadir=/gnu/mariadb + +When you get errors like: + +: qlalchemy.exc.IntegrityError: (_mariadb_exceptions.IntegrityError) (1215, 'Cannot add foreign key constraint') + +you may need to set + +: set foreign_key_checks=0 + +** Load the small database in MySQL + +At this point we require the underlying distribution to install and +run mysqld (see next section for GNU Guix). Currently we have two databases for deployment, +'db_webqtl_s' is the small testing database containing experiments +from BXD mice and 'db_webqtl_plant' which contains all plant related +material. + +Download one database from + +http://ipfs.genenetwork.org/ipfs/QmRUmYu6ogxEdzZeE8PuXMGCDa8M3y2uFcfo4zqQRbpxtk + +After installation unzip the database binary in the MySQL directory + +#+BEGIN_SRC sh +cd ~/mysql +p7zip -d db_webqtl_s.7z +chown -R mysql:mysql db_webqtl_s/ +chmod 700 db_webqtl_s/ +chmod 660 db_webqtl_s/* +#+END_SRC + +restart MySQL service (mysqld). Login as root + +: mysql_upgrade -u root --force + +: myslq -u root + +and + +: mysql> show databases; +: +--------------------+ +: | Database | +: +--------------------+ +: | information_schema | +: | db_webqtl_s | +: | mysql | +: | performance_schema | +: +--------------------+ + +Set permissions and match password in your settings file below: + +: mysql> grant all privileges on db_webqtl_s.* to gn2@"localhost" identified by 'webqtl'; + +You may need to change "localhost" to whatever domain you are +connecting from (mysql will give an error). + +Note that if the mysql connection is not working, try connecting to +the IP address and check server firewall, hosts.allow and mysql IP +configuration (see below). + +Note for the plant database you can rename it to db_webqtl_s, or +change the settings in etc/default_settings.py to match your path. + +* Get genotype files + +The script looks for genotype files. You can find them in +http://ipfs.genenetwork.org/ipfs/QmXQy3DAUWJuYxubLHLkPMNCEVq1oV7844xWG2d1GSPFPL + +#+BEGIN_SRC sh +mkdir -p $HOME/genotype_files +cd $HOME/genotype_files + +#+END_SRC + +* GN2 Dependency Graph + +List of all runtime dependencies for GN2 as installed by GNU Guix. + +https://genenetwork.org/environments/ + +* Working with the GN2 source code + +See [[development.org]]. + +* Read more + +If you want to understand the architecture of GN2 read +[[Architecture.org]]. The rest of this document is mostly on deployment +of GN2. + +* Trouble shooting + +** ImportError: No module named jinja2 + +If you have all the Guix packages installed this error points out that +the environment variables are not set. Copy-paste the paths into your +terminal (mainly so PYTHON_PATH and R_LIBS_SITE are set) from the +information given by guix: + +: guix package --search-paths + +On one system: + +: export PYTHONPATH="$HOME/.guix-profile/lib/python3.8/site-packages" +: export R_LIBS_SITE="$HOME/.guix-profile/site-library/" +: export GEM_PATH="$HOME/.guix-profile/lib/ruby/gems/2.2.0" + +and perhaps a few more. +** ERROR: 'can not find directory $HOME/gn2_data' or 'can not find directory $HOME/genotype_files/genotype' + +The default settings file looks in your $HOME/gn2_data. Since these +files come with a Guix installation you should take a hint from the +values in the installed version of default_settings.py (see above in +this document). + +You can use the GENENETWORK_FILES switch to set the datadir, for example + +: env GN2_PROFILE=~/opt/gn-latest GENENETWORK_FILES=/gnu/data/gn2_data ./bin/genenetwork2 + +** Can't run a module + +In rare cases, development modules are not brought in with Guix +because no source code is available. This can lead to missing modules +on a running server. Please check with the authors when a module +is missing. +** Rpy2 error 'show' now found + +This error + +: __show = rpy2.rinterface.baseenv.get("show") +: LookupError: 'show' not found + +means that R was updated in your path, and that Rpy2 needs to be +recompiled against this R - don't you love informative messages? + +In our case it means that GN's PYTHONPATH is not in sync with +R_LIBS_SITE. Please check your GNU Guix GN2 installation paths, +you man need to reinstall. Note that this may be the point you +may want to start using profiles (see profile section). + +** Mysql can't connect server through socket ERROR + +The following error + +: sqlalchemy.exc.OperationalError: (_mysql_exceptions.OperationalError) (2002, 'Can\'t connect to local MySQL server through socket \'/run/mysqld/mysqld.sock\' (2 "No such file or directory")') + +means that MySQL is trying to connect locally to a non-existent MySQL +server, something you may see in a container. Typically replicated with something like + +: mysql -h localhost + +try to connect over the network interface instead, e.g. + +: mysql -h 127.0.0.1 + +if that works run genenetwork after setting SQL_URI to something like + +: export SQL_URI=mysql://gn2:mysql_password@127.0.0.1/db_webqtl_s + +* NOTES + +** Deploying GN2 official + +Let's see how fast we can deploy a second copy of GN2. + +- [ ] Base install + + [ ] First install a Debian server with GNU Guix on board + + [ ] Get Guix build going + - [ ] Build the correct version of Guix + - [ ] Check out the correct gn-stable version of guix-bioinformatics http://git.genenetwork.org/pjotrp/guix-bioinformatics + - [ ] guix package -i genenetwork2 -p /usr/local/guix-profiles/gn2-stable + + [ ] Create a gn2 user and home with space + + [ ] Install redis + - [ ] add to systemd + - [ ] update redis.cnf + - [ ] update database + + [ ] Install mariadb (currently debian mariadb-server) + - [ ] add to systemd + - [ ] system stop mysql + - [ ] update mysql.cnf + - [ ] update database (see gn-services/services/mariadb.md) + - [ ] check tables + + [ ] run gn2 + + [ ] update nginx + + [ ] install genenetwork3 + - [ ] add to systemd |