summaryrefslogtreecommitdiff
path: root/topics
diff options
context:
space:
mode:
authorPjotr Prins2023-06-18 14:30:50 +0200
committerPjotr Prins2023-06-18 14:30:50 +0200
commit24e3cc15aed86b6922a9c1c2c52f60d160f36e18 (patch)
tree8a369086bbb6f2d4ffe04be5955fc614b706eae4 /topics
parent4e5bd4c2af89885b1fc4ceb33d54427898b5eed1 (diff)
downloadgn-gemtext-24e3cc15aed86b6922a9c1c2c52f60d160f36e18.tar.gz
Started work on installation docs again -- moved out of GN README
Diffstat (limited to 'topics')
-rw-r--r--topics/deployment.gmi7
-rw-r--r--topics/developing-against-gn.gmi198
-rw-r--r--topics/installation.gmi347
3 files changed, 549 insertions, 3 deletions
diff --git a/topics/deployment.gmi b/topics/deployment.gmi
index 92a2c01..7cf6dec 100644
--- a/topics/deployment.gmi
+++ b/topics/deployment.gmi
@@ -2,9 +2,7 @@
# Description
-This page attempts to document the deployment process we have for GeneNetwork.
-We use Guix system containers for deployment of CI/CD and
-the Guix configuration for the CI/CD container should be considered the authoritative reference.
+This page attempts to document the deployment process we have for GeneNetwork. We use Guix system containers for deployment of CI/CD and the Guix configuration for the CI/CD container should be considered the authoritative reference.
=> https://github.com/genenetwork/genenetwork-machines/blob/main/genenetwork-development.scm
@@ -14,7 +12,10 @@ See also
## genenetwork2
+To install GN2 by hand for development we also track
+=> ./developing-against-gn.gmi
+=> ./installation.gmi
## genenetwork3
diff --git a/topics/developing-against-gn.gmi b/topics/developing-against-gn.gmi
new file mode 100644
index 0000000..b94b681
--- /dev/null
+++ b/topics/developing-against-gn.gmi
@@ -0,0 +1,198 @@
+# Developing against GeneNetwork
+
+## Configuration
+
+GeneNetwork2 comes with a [default configuration file](./etc/default_settings.py)
+which can be used as a starting point.
+
+The recommended way to deal with the configurations is to **copy** this default configuration file to a location outside of the repository, say,
+
+```sh
+.../genenetwork2$ cp etc/default_settings.py "${HOME}/configurations/gn2.py"
+```
+
+then change the appropriate values in the new file. You can then pass in the new
+file as the configuration file when launching the application,
+
+```sh
+.../genenetwork2$ bin/genenetwork "${HOME}/configurations/gn2.py" <command-to-run>
+```
+
+The other option is to override the configurations in `etc/default_settings.py`
+by setting the configuration you want to override as an environment variable e.g.
+to override the `SQL_URI` value, you could do something like:
+
+```sh
+.../genenetwork2$ env SQL_URI="mysql://<user>:<passwd>@<host>:<port>/<db_name>" \
+ bin/genenetwork "${HOME}/configurations/gn2.py" <command-to-run>
+```
+
+replacing the placeholders in the angle brackets with appropriate values.
+
+For a detailed breakdown of the configuration variables and their use, see the
+[configuration documentation](doc/configurations.org)
+
+## Run
+
+Once having installed GN2 it can be run through a browser
+interface
+
+```sh
+genenetwork2
+```
+
+A quick example is
+
+```sh
+env GN2_PROFILE=~/opt/gn-latest SERVER_PORT=5300 \
+ GENENETWORK_FILES=~/data/gn2_data/ \
+ GN_PROXY_URL="http://localhost:8080"\
+ GN3_LOCAL_URL="http://localhost:8081"\
+ SPARQL_ENDPOINT=http://localhost:8892/sparql\
+ ./bin/genenetwork2 ./etc/default_settings.py -gunicorn-dev
+```
+
+For full examples (you may need to set a number of environment
+variables), including running scripts and a Python REPL, also see the
+startup script [./bin/genenetwork2](https://github.com/genenetwork/genenetwork2/blob/testing/bin/genenetwork2).
+
+Also mariadb and redis need to be running, see
+[INSTALL](./doc/README.org).
+
+## Debugging
+
+To run the application under the pdb debugger, you can add the `--with-pdb`
+option when launching the application, for example:
+
+```sh
+env GN2_PROFILE=~/opt/gn-latest SERVER_PORT=5300 \
+ GENENETWORK_FILES=~/data/gn2_data/ \
+ GN_PROXY_URL="http://localhost:8080"\
+ GN3_LOCAL_URL="http://localhost:8081"\
+ SPARQL_ENDPOINT=http://localhost:8892/sparql\
+ ./bin/genenetwork2 ./etc/default_settings.py --with-pdb
+```
+
+**NOTE**: This should only ever be run in development.
+**NOTE 2**: You will probably need to tell pdb to continue at least once before
+the system begins serving the pages.
+
+Now, you can add the `breakpoint()` call wherever you need to debug and the
+terminal where you started the application with `--with-pdb` will allow you to
+issue commands to pdb to debug your application.
+
+## Development
+
+It may be useful to pull in the GN3 python modules locally. For this
+use `GN3_PYTHONPATH` environment that gets injected in
+the ./bin/genenetwork2 startup.
+
+A continuously deployed instance of genenetwork2 is available at
+[https://cd.genenetwork.org/](https://cd.genenetwork.org/). This
+instance is redeployed on every commit provided that the [continuous
+integration tests](https://ci.genenetwork.org/jobs/genenetwork2) pass.
+
+## Testing
+
+To have tests pass, the redis and mariadb instance should be running, because of
+asserts sprinkled in the code base.
+
+Right now, the only tests running in CI are unittests. Please make
+sure the existing unittests are green when submitting a PR.
+
+From the root directory of the repository, you can run the tests with something
+like:
+
+```sh
+env GN_PROFILE=~/opt/gn-latest SERVER_PORT=5300 \
+ SQL_URI=<uri-to-override-the-default> \
+ ./bin/genenetwork2 ./etc/default_settings.py \
+ -c -m pytest -vv
+```
+
+In the case where you use the default `etc/default_settings.py` configuration file, you can override any setting as demonstrated with the `SQL_URI` setting in the command above.
+
+In order to avoid having to set up a whole host of settings every time with the `env` command, you could copy the `etc/default_settings.py` file to a new location (outside the repository is best), and pass that to `bin/genenetwork2` instead.
+
+See
+[./bin/genenetwork2](https://github.com/genenetwork/genenetwork2/blob/testing/doc/docker-container.org)
+for more details.
+
+#### Mechanical Rob
+
+We are building 'Mechanical Rob' automated testing using Python
+[requests](https://github.com/genenetwork/genenetwork2/tree/testing/test/requests)
+which can be run with:
+
+```sh
+env GN2_PROFILE=~/opt/gn-latest \
+ ./bin/genenetwork2 \
+ GN_PROXY_URL="http://localhost:8080" \
+ GN3_LOCAL_URL="http://localhost:8081 "\
+ ./etc/default_settings.py -c \
+ ../test/requests/test-website.py -a http://localhost:5003
+```
+
+The GN2_PROFILE is the Guix profile that contains all
+dependencies. The ./bin/genenetwork2 script sets up the environment
+and executes test-website.py in a Python interpreter. The -a switch
+says to run all tests and the URL points to the running GN2 http
+server.
+
+#### Unit tests
+
+To run unittests, first `cd` into the genenetwork2 directory:
+
+```sh
+# You can use the coverage tool to run the tests
+# You could omit the -v which makes the output verbose
+runcmd coverage run -m unittest discover -v
+
+# Alternatively, you could run the unittests using:
+runpython -m unittest discover -v
+
+# To generate a report in wqflask/coverage_html_report/:
+runcmd coverage html
+```
+
+The `runcmd` and `runpython` are shell aliases defined in the following way:
+
+```sh
+alias runpython="env GN2_PROFILE=~/opt/gn-latest TMPDIR=/tmp SERVER_PORT=5004 GENENETWORK_FILES=/gnu/data/gn2_data/ GN_PROXY_URL="http://localhost:8080" GN3_LOCAL_URL="http://localhost:8081" ./bin/genenetwork2
+
+alias runcmd="time env GN2_PROFILE=~/opt/gn-latest TMPDIR=//tmp SERVER_PORT=5004 GENENETWORK_FILES=/gnu/data/gn2_data/ GN_PROXY_URL="http://localhost:8080" GN3_LOCAL_URL="http://localhost:8081" ./bin/genenetwork2 ./etc/default_settings.py -cli"
+```
+
+Replace some of the env variables as per your use case.
+
+### Troubleshooting
+
+If the menu does not pop up check your `GN2_BASE_URL`. E.g.
+
+```
+curl http://gn2-pjotr.genenetwork.org/api/v_pre1/gen_dropdown
+```
+
+check the logs. If there is ERROR 1054 (42S22): Unknown column
+'InbredSet.Family' in 'field list' it may be you are trying the small
+database.
+
+### Run Scripts
+
+As part of the profiling effort, some scripts are added to run specific parts of the system under a profiler without running the entire web-server - as such, to run the script, you could do something like:
+
+```
+env HOME=/home/frederick \
+ GN2_PROFILE=~/opt/gn2-latest \
+ GN3_DEV_REPO_PATH=~/genenetwork/genenetwork3 \
+ SQL_URI="mysql://username:password@host-ip:host-port/db_webqtl" \
+ SPARQL_ENDPOINT=http://localhost:8892/sparql\
+ SERVER_PORT=5001 \
+ bin/genenetwork2 ../gn2_settings.py \
+ -cli python3 -m scripts.profile_corrs \
+ ../performance_$(date +"%Y%m%dT%H:%M:%S").profile
+```
+
+and you can find the performance metrics at the file specified, in this case, a file starting with `performance_` with the date and time of the run, and ending with `.profile`.
+
+Please replace the environment variables in the sample command above with the appropriate values for your environment.
diff --git a/topics/installation.gmi b/topics/installation.gmi
new file mode 100644
index 0000000..74bfe91
--- /dev/null
+++ b/topics/installation.gmi
@@ -0,0 +1,347 @@
+# Installation
+
+* Introduction
+
+Large system deployments can get very [[http://genenetwork.org/environments/][complex]]. In this document we
+explain the GeneNetwork version 2 (GN2) reproducible deployment system
+which is based on GNU Guix (see also [[https://github.com/pjotrp/guix-notes/blob/master/README.md][Guix-notes]]). The Guix
+system can be used to install GN with all its files and dependencies.
+
+The official installation path is from a checked out version of the
+main Guix package tree and that of the Genenetwork package
+tree. Current supported versions can be found as the SHA values of
+'gn-latest' branches of [[https://gitlab.com/genenetwork/guix-bioinformatics][Guix bioinformatics]] and [[https://gitlab.com/genenetwork/guix][GNU Guix]].
+
+For a full view of runtime dependencies as defined by GNU Guix, see
+an example of the [[#gn2-dependency-graph][GN2 Dependency Graph]].
+
+* Check list
+
+To run GeneNetwork the following services need to function:
+
+1. [ ] GNU Guix with a guix profile for genenetwork2
+1. [ ] A path to the (static) genotype files
+1. [ ] Gn-proxy for authentication
+1. [ ] The genenetwork3 service
+1. [ ] Redis
+1. [ ] Mariadb
+
+* Installing Guix packages
+
+Make sure to install GNU Guix using the binary download instructions
+on the main website. Follow the instructions on
+[[GUIX-Reproducible-from-source.org]] to download pre-built binaries. Note
+the download amounts to several GBs of data. Debian-derived distros
+may support
+
+: apt-get install guix
+
+* Creating a GNU Guix profile
+
+We run a GNU Guix channel with packages at [[https://git.genenetwork.org/guix-bioinformatics/guix-bioinformatics][guix-bioinformatics]]. The README has instructions for hosting a channel, but typically we use the GUIX_PACKAGE_PATH instead. First upgrade to a recent guix with
+
+: mkdir ~/opt
+: guix pull -p ~/opt/guix-pull
+
+It should upgrade (ignore the locales warnings). You can optionally specify the specific git checkout of guix with
+
+: guix pull -p ~/opt/guix-pull --commit=f04883d
+
+which is useful when you need to roll back to an earlier version (sometimes our channel goes out of sync). Next, we install GeneNetwork2 with
+
+: source ~/opt/guix-pull/etc/profile
+: git clone https://git.genenetwork.org/guix-bioinformatics/guix-bioinformatics.git ~/guix-bioinformatics
+: cd ~/guix-bioinformatics
+: env GUIX_PACKAGE_PATH=$HOME/guix-bioinformatics guix package -i genenetwork2 -p ~/opt/genenetwork2
+
+you probably also need guix-past (the upstream channel for older packages):
+
+: git clone https://gitlab.inria.fr/guix-hpc/guix-past.git ~/guix-past
+: cd ~/guix-past
+: env GUIX_PACKAGE_PATH=$HOME/guix-bioinformatics:$HOME/guix-past/modules ~/opt/guix-pull/bin/guix package -i genenetwork2 -p ~/opt/genenetwork2
+
+ignore the warnings. Guix should install the software without trying
+to build everything. If you system insists on building all packages,
+try the `--dry-run` switch and fix the [[https://guix.gnu.org/manual/en/html_node/Substitute-Server-Authorization.html][substitutes]]. You may add the
+`--substitute-urls="http://guix.genenetwork.org https://ci.guix.gnu.org https://mirror.hydra.gnu.org"` switch.
+
+The guix.genenetwork.org has most of our packages pre-built(!). To use
+it on your own machine the public key is
+
+#+begin_src scheme
+(public-key
+ (ecc
+ (curve Ed25519)
+ (q #E50F005E6DA2F85749B9AA62C8E86BB551CE2B541DC578C4DBE613B39EC9E750#)))
+#+end_src
+
+Once we have a GNU Guix profile, a running database (see below) and the file storage,
+we should be ready to fire up GeneNetwork:
+
+* Running GN2
+
+Check out the source with git:
+
+: git clone git@github.com:genenetwork/genenetwork2.git
+: cd genenetwork2
+
+Run GN2 with above Guix profile
+
+: export GN2_PROFILE=$HOME/opt/genenetwork2
+: env TMPDIR=$HOME/tmp WEBSERVER_MODE=DEBUG LOG_LEVEL=DEBUG SERVER_PORT=5012 GENENETWORK_FILES=/export/data/genenetwork/genotype_files SQL_URI=mysql://webqtlout:webqtlout@localhost/db_webqtl ./bin/genenetwork2 etc/default_settings.py -gunicorn-dev
+
+the debug and logging switches can be particularly useful when
+developing GN2. Location and files are the current ones for Penguin2.
+
+It may be useful to tunnel the web server to your local browser with
+an ssh tunnel:
+
+If you want to test a service running on the server on a certain
+port (say 8202) use
+
+ ssh -L 8202:127.0.0.1:8202 -f -N myname@penguin2.genenetwork.org
+
+And browse on your local machine to http://localhost:8202/
+
+* Run gn-proxy
+
+GeneNetwork requires a separate gn-proxy server which handles
+authorisation and access control. For instructions see the
+[[https://github.com/genenetwork/gn-proxy][README]]. Note it may already be running on our servers!
+
+* Run Redis
+
+Redis part of GN2 deployment and will be started by the ./bin/genenetwork2
+startup script.
+
+* Run MariaDB server
+** Install MariaDB with GNU GUIx
+
+These are the steps you can take to install a fresh installation of
+mariadb (which comes as part of the GNU Guix genenetwork2 install).
+
+As root configure the Guix profile
+
+: . ~/opt/genenetwork2/etc/profile
+
+and run for example
+
+#+BEGIN_SRC bash
+adduser mariadb && addgroup mariadb
+mkdir -p /export2/mariadb/database
+chown mariadb.mariadb -R /export2/mariadb/
+mkdir -p /var/run/mysqld
+chown mariadb.mariadb /var/run/mysqld
+su mariadb
+mysql --version
+ mysql Ver 15.1 Distrib 10.1.45-MariaDB, for Linux (x86_64) using readline 5.1
+mysql_install_db --user=mariadb --datadir=/export2/mariadb/database
+mysqld -u mariadb --datadir=/exportdb/mariadb/database/mariadb --explicit_defaults_for_timestamp -P 12048"
+#+END_SRC
+
+If you want to run as root you may have to set
+
+: /etc/my.cnf
+: [mariadbd]
+: user=root
+
+You also need to set
+
+: ft_min_word_len = 3
+
+To make sure word text searches (shh) work and rebuild the tables if
+required.
+
+To check error output in a file on start-up run with something like
+
+: mariadbd -u mariadb --console --explicit_defaults_for_timestamp --datadir=/gnu/mariadb --log-error=~/test.log
+
+Other tips are that Guix installs mariadbd in your profile, so this may work
+
+: /home/user/.guix-profile/bin/mariadbd -u mariadb --explicit_defaults_for_timestamp --datadir=/gnu/mariadb
+
+When you get errors like:
+
+: qlalchemy.exc.IntegrityError: (_mariadb_exceptions.IntegrityError) (1215, 'Cannot add foreign key constraint')
+
+you may need to set
+
+: set foreign_key_checks=0
+
+** Load the small database in MySQL
+
+At this point we require the underlying distribution to install and
+run mysqld (see next section for GNU Guix). Currently we have two databases for deployment,
+'db_webqtl_s' is the small testing database containing experiments
+from BXD mice and 'db_webqtl_plant' which contains all plant related
+material.
+
+Download one database from
+
+http://ipfs.genenetwork.org/ipfs/QmRUmYu6ogxEdzZeE8PuXMGCDa8M3y2uFcfo4zqQRbpxtk
+
+After installation unzip the database binary in the MySQL directory
+
+#+BEGIN_SRC sh
+cd ~/mysql
+p7zip -d db_webqtl_s.7z
+chown -R mysql:mysql db_webqtl_s/
+chmod 700 db_webqtl_s/
+chmod 660 db_webqtl_s/*
+#+END_SRC
+
+restart MySQL service (mysqld). Login as root
+
+: mysql_upgrade -u root --force
+
+: myslq -u root
+
+and
+
+: mysql> show databases;
+: +--------------------+
+: | Database |
+: +--------------------+
+: | information_schema |
+: | db_webqtl_s |
+: | mysql |
+: | performance_schema |
+: +--------------------+
+
+Set permissions and match password in your settings file below:
+
+: mysql> grant all privileges on db_webqtl_s.* to gn2@"localhost" identified by 'webqtl';
+
+You may need to change "localhost" to whatever domain you are
+connecting from (mysql will give an error).
+
+Note that if the mysql connection is not working, try connecting to
+the IP address and check server firewall, hosts.allow and mysql IP
+configuration (see below).
+
+Note for the plant database you can rename it to db_webqtl_s, or
+change the settings in etc/default_settings.py to match your path.
+
+* Get genotype files
+
+The script looks for genotype files. You can find them in
+http://ipfs.genenetwork.org/ipfs/QmXQy3DAUWJuYxubLHLkPMNCEVq1oV7844xWG2d1GSPFPL
+
+#+BEGIN_SRC sh
+mkdir -p $HOME/genotype_files
+cd $HOME/genotype_files
+
+#+END_SRC
+
+* GN2 Dependency Graph
+
+List of all runtime dependencies for GN2 as installed by GNU Guix.
+
+https://genenetwork.org/environments/
+
+* Working with the GN2 source code
+
+See [[development.org]].
+
+* Read more
+
+If you want to understand the architecture of GN2 read
+[[Architecture.org]]. The rest of this document is mostly on deployment
+of GN2.
+
+* Trouble shooting
+
+** ImportError: No module named jinja2
+
+If you have all the Guix packages installed this error points out that
+the environment variables are not set. Copy-paste the paths into your
+terminal (mainly so PYTHON_PATH and R_LIBS_SITE are set) from the
+information given by guix:
+
+: guix package --search-paths
+
+On one system:
+
+: export PYTHONPATH="$HOME/.guix-profile/lib/python3.8/site-packages"
+: export R_LIBS_SITE="$HOME/.guix-profile/site-library/"
+: export GEM_PATH="$HOME/.guix-profile/lib/ruby/gems/2.2.0"
+
+and perhaps a few more.
+** ERROR: 'can not find directory $HOME/gn2_data' or 'can not find directory $HOME/genotype_files/genotype'
+
+The default settings file looks in your $HOME/gn2_data. Since these
+files come with a Guix installation you should take a hint from the
+values in the installed version of default_settings.py (see above in
+this document).
+
+You can use the GENENETWORK_FILES switch to set the datadir, for example
+
+: env GN2_PROFILE=~/opt/gn-latest GENENETWORK_FILES=/gnu/data/gn2_data ./bin/genenetwork2
+
+** Can't run a module
+
+In rare cases, development modules are not brought in with Guix
+because no source code is available. This can lead to missing modules
+on a running server. Please check with the authors when a module
+is missing.
+** Rpy2 error 'show' now found
+
+This error
+
+: __show = rpy2.rinterface.baseenv.get("show")
+: LookupError: 'show' not found
+
+means that R was updated in your path, and that Rpy2 needs to be
+recompiled against this R - don't you love informative messages?
+
+In our case it means that GN's PYTHONPATH is not in sync with
+R_LIBS_SITE. Please check your GNU Guix GN2 installation paths,
+you man need to reinstall. Note that this may be the point you
+may want to start using profiles (see profile section).
+
+** Mysql can't connect server through socket ERROR
+
+The following error
+
+: sqlalchemy.exc.OperationalError: (_mysql_exceptions.OperationalError) (2002, 'Can\'t connect to local MySQL server through socket \'/run/mysqld/mysqld.sock\' (2 "No such file or directory")')
+
+means that MySQL is trying to connect locally to a non-existent MySQL
+server, something you may see in a container. Typically replicated with something like
+
+: mysql -h localhost
+
+try to connect over the network interface instead, e.g.
+
+: mysql -h 127.0.0.1
+
+if that works run genenetwork after setting SQL_URI to something like
+
+: export SQL_URI=mysql://gn2:mysql_password@127.0.0.1/db_webqtl_s
+
+* NOTES
+
+** Deploying GN2 official
+
+Let's see how fast we can deploy a second copy of GN2.
+
+- [ ] Base install
+ + [ ] First install a Debian server with GNU Guix on board
+ + [ ] Get Guix build going
+ - [ ] Build the correct version of Guix
+ - [ ] Check out the correct gn-stable version of guix-bioinformatics http://git.genenetwork.org/pjotrp/guix-bioinformatics
+ - [ ] guix package -i genenetwork2 -p /usr/local/guix-profiles/gn2-stable
+ + [ ] Create a gn2 user and home with space
+ + [ ] Install redis
+ - [ ] add to systemd
+ - [ ] update redis.cnf
+ - [ ] update database
+ + [ ] Install mariadb (currently debian mariadb-server)
+ - [ ] add to systemd
+ - [ ] system stop mysql
+ - [ ] update mysql.cnf
+ - [ ] update database (see gn-services/services/mariadb.md)
+ - [ ] check tables
+ + [ ] run gn2
+ + [ ] update nginx
+ + [ ] install genenetwork3
+ - [ ] add to systemd