diff options
Diffstat (limited to 'doc')
-rw-r--r-- | doc/GUIX-Reproducible-from-source.org | 6 | ||||
-rw-r--r-- | doc/README.org | 614 | ||||
-rw-r--r-- | doc/database.org | 150 | ||||
-rw-r--r-- | doc/development.org | 55 | ||||
-rw-r--r-- | doc/docker-container.org | 82 | ||||
-rw-r--r-- | doc/guix_profile_setup.org | 39 | ||||
-rw-r--r-- | doc/heatmap-generation.org | 34 | ||||
-rw-r--r-- | doc/images/gn2_header_collections.png | bin | 0 -> 7890 bytes | |||
-rw-r--r-- | doc/images/heatmap_form.png | bin | 0 -> 9363 bytes | |||
-rw-r--r-- | doc/images/heatmap_with_hover_tools.png | bin | 0 -> 42578 bytes | |||
-rw-r--r-- | doc/rpy2-performance.org | 182 | ||||
-rw-r--r-- | doc/testing.org | 66 |
12 files changed, 742 insertions, 486 deletions
diff --git a/doc/GUIX-Reproducible-from-source.org b/doc/GUIX-Reproducible-from-source.org index 19e4d14f..fffa9571 100644 --- a/doc/GUIX-Reproducible-from-source.org +++ b/doc/GUIX-Reproducible-from-source.org @@ -167,7 +167,7 @@ the Guix suggested environment vars. Check the output of #+begin_src bash guix package --search-paths -export PYTHONPATH="/root/.guix-profile/lib/python2.7/site-packages" +export PYTHONPATH="/root/.guix-profile/lib/python3.8/site-packages" export R_LIBS_SITE="/root/.guix-profile/site-library/" #+end_src @@ -265,7 +265,7 @@ software. If something is not working, take a hint from the settings file that comes in the Guix installation. It sits in something like -: cat ~/.guix-profile/lib/python2.7/site-packages/genenetwork2-2.0-py2.7.egg/etc/default_settings.py +: cat ~/.guix-profile/lib/python3.8/site-packages/genenetwork2-2.0-py2.7.egg/etc/default_settings.py ** Set up nginx port forwarding @@ -380,7 +380,7 @@ After setting the paths for the server #+begin_src bash export PATH=~/.guix-profile/bin:$PATH -export PYTHONPATH="$HOME/.guix-profile/lib/python2.7/site-packages" +export PYTHONPATH="$HOME/.guix-profile/lib/python3.8/site-packages" export R_LIBS_SITE="$HOME/.guix-profile/site-library/" export GUIX_GTK3_PATH="$HOME/.guix-profile/lib/gtk-3.0" export GI_TYPELIB_PATH="$HOME/.guix-profile/lib/girepository-1.0" diff --git a/doc/README.org b/doc/README.org index c2ef2d57..1236016e 100644 --- a/doc/README.org +++ b/doc/README.org @@ -2,11 +2,16 @@ * Table of Contents :TOC: - [[#introduction][Introduction]] - - [[#install][Install]] + - [[#check-list][Check list]] + - [[#installing-guix-packages][Installing Guix packages]] + - [[#creating-a-gnu-guix-profile][Creating a GNU Guix profile]] - [[#running-gn2][Running GN2]] + - [[#run-gn-proxy][Run gn-proxy]] + - [[#run-redis][Run Redis]] - [[#run-mariadb-server][Run MariaDB server]] - [[#install-mariadb-with-gnu-guix][Install MariaDB with GNU GUIx]] - [[#load-the-small-database-in-mysql][Load the small database in MySQL]] + - [[#get-genotype-files][Get genotype files]] - [[#gn2-dependency-graph][GN2 Dependency Graph]] - [[#working-with-the-gn2-source-code][Working with the GN2 source code]] - [[#read-more][Read more]] @@ -16,7 +21,6 @@ - [[#cant-run-a-module][Can't run a module]] - [[#rpy2-error-show-now-found][Rpy2 error 'show' now found]] - [[#mysql-cant-connect-server-through-socket-error][Mysql can't connect server through socket ERROR]] - - [[#irc-session][IRC session]] - [[#notes][NOTES]] - [[#deploying-gn2-official][Deploying GN2 official]] @@ -35,50 +39,135 @@ tree. Current supported versions can be found as the SHA values of For a full view of runtime dependencies as defined by GNU Guix, see an example of the [[#gn2-dependency-graph][GN2 Dependency Graph]]. -* Install +* Check list + +To run GeneNetwork the following services need to function: + +1. [ ] GNU Guix with a guix profile for genenetwork2 +1. [ ] A path to the (static) genotype files +1. [ ] Gn-proxy for authentication +1. [ ] The genenetwork3 service +1. [ ] Redis +1. [ ] Mariadb + +* Installing Guix packages Make sure to install GNU Guix using the binary download instructions on the main website. Follow the instructions on [[GUIX-Reproducible-from-source.org]] to download pre-built binaries. Note -the download amounts to several GBs of data. +the download amounts to several GBs of data. Debian-derived distros +may support + +: apt-get install guix + +* Creating a GNU Guix profile + +We run a GNU Guix channel with packages at [[https://git.genenetwork.org/guix-bioinformatics/guix-bioinformatics][guix-bioinformatics]]. The +README has instructions for hosting a channel, but typically we use +the GUIX_PACKAGE_PATH instead. First upgrade to a recent guix with + +: mkdir ~/opt +: guix pull -p ~/opt/guix-pull + +It should upgrade (ignore the locales warnings). You can optionally +specify the specific git checkout of guix with + +: guix pull -p ~/opt/guix-pull --commit=f04883d + +which is useful when you ned to roll back to an earlier version +(sometimes our channel goes out of sync). Next, we install +GeneNetwork2 with + +: source ~/opt/guix-pull/etc/profile +: git clone https://git.genenetwork.org/guix-bioinformatics/guix-bioinformatics.git ~/guix-bioinformatics +: cd ~/guix-bioinformatics +: git pull +: env GUIX_PACKAGE_PATH=$HOME/guix-bioinformatics guix package -i genenetwork2 -p ~/opt/genenetwork2 + +you probably also need guix-past (the upstream channel for older packages): + +: git clone https://gitlab.inria.fr/guix-hpc/guix-past.git ~/guix-past +: cd ~/guix-past +: git pull +: env GUIX_PACKAGE_PATH=$HOME/guix-bioinformatics:$HOME/guix-past/modules ~/opt/guix-pull/bin/guix package -i genenetwork2 -p ~/opt/genenetwork2 + +ignore the warnings. Guix should install the software without trying +to build everything. If you system insists on building all packages, +try the `--dry-run` switch and fix the [[https://guix.gnu.org/manual/en/html_node/Substitute-Server-Authorization.html][substitutes]]. You may add the +`--substitute-urls="http://guix.genenetwork.org https://ci.guix.gnu.org https://mirror.hydra.gnu.org"` switch. + +The guix.genenetwork.org has most of our packages pre-built(!). To use +it on your own machine the public key is + +#+begin_src scheme +(public-key + (ecc + (curve Ed25519) + (q #E50F005E6DA2F85749B9AA62C8E86BB551CE2B541DC578C4DBE613B39EC9E750#))) +#+end_src + +Once we have a GNU Guix profile, a running database (see below) and the file storage, +we should be ready to fire up GeneNetwork: * Running GN2 -Default settings for GN2 are listed in a file called -[[../etc/default_settings.py][default_settings.py]]. You can copy this file and pass it as a new -parameter to the genenetwork2 command, e.g. +Check out the source with git: -: genenetwork2 mysettings.py +: git clone git@github.com:genenetwork/genenetwork2.git +: cd genenetwork2 -or you can set environment variables to override individual parameters, e.g. +Run GN2 with above Guix profile -: env SERVER_PORT=5004 SQL_URI=mysql://user:pwd@dbhostname/db_webqtl genenetwork2 +: export GN2_PROFILE=$HOME/opt/genenetwork2 +: env TMPDIR=$HOME/tmp WEBSERVER_MODE=DEBUG LOG_LEVEL=DEBUG SERVER_PORT=5012 GENENETWORK_FILES=/export/data/genenetwork/genotype_files SQL_URI=mysql://webqtlout:webqtlout@localhost/db_webqtl ./bin/genenetwork2 etc/default_settings.py -gunicorn-dev the debug and logging switches can be particularly useful when -developing GN2. +developing GN2. Location and files are the current ones for Penguin2. + +It may be useful to tunnel the web server to your local browser with +an ssh tunnel: + +If you want to test a service running on the server on a certain +port (say 8202) use -* Running Redis + ssh -L 8202:127.0.0.1:8202 -f -N myname@penguin2.genenetwork.org -Install redis. Make sure you add the setting: +And browse on your local machine to http://localhost:8202/ -: appendonly yes +* Run gn-proxy + +GeneNetwork requires a separate gn-proxy server which handles +authorisation and access control. For instructions see the +[[https://github.com/genenetwork/gn-proxy][README]]. Note it may already be running on our servers! + +* Run Redis + +Redis part of GN2 deployment and will be started by the ./bin/genenetwork2 +startup script. * Run MariaDB server ** Install MariaDB with GNU GUIx -/Note: we moved to MariaDB/ - These are the steps you can take to install a fresh installation of mariadb (which comes as part of the GNU Guix genenetwork2 install). -As root configure and run +As root configure the Guix profile + +: . ~/opt/genenetwork2/etc/profile + +and run for example #+BEGIN_SRC bash adduser mariadb && addgroup mariadb -mysqld --datadir=/home/mariadb/database --initialize-insecure -mkdir -p /var/run/mariadbd -chown mariadb.mariadb /var/run/mariadbd -mysqld -u mariadb --datadir=/home/mariadb/database/mariadb --explicit_defaults_for_timestamp -P 12048" +mkdir -p /export2/mariadb/database +chown mariadb.mariadb -R /export2/mariadb/ +mkdir -p /var/run/mysqld +chown mariadb.mariadb /var/run/mysqld +su mariadb +mysql --version + mysql Ver 15.1 Distrib 10.1.45-MariaDB, for Linux (x86_64) using readline 5.1 +mysql_install_db --user=mariadb --datadir=/export2/mariadb/database +mysqld -u mariadb --datadir=/exportdb/mariadb/database/mariadb --explicit_defaults_for_timestamp -P 12048" #+END_SRC If you want to run as root you may have to set @@ -120,21 +209,22 @@ material. Download one database from -[[http://files.genenetwork.org/raw_database/]] - -[[https://s3.amazonaws.com/genenetwork2/db_webqtl_s.zip]] - -Check the md5sum. +http://ipfs.genenetwork.org/ipfs/QmRUmYu6ogxEdzZeE8PuXMGCDa8M3y2uFcfo4zqQRbpxtk -After installation inflate the database binary in the MySQL directory +After installation unzip the database binary in the MySQL directory -: cd ~/mysql -: chown -R mysql:mysql db_webqtl_s/ -: chmod 700 db_webqtl_s/ -: chmod 660 db_webqtl_s/* +#+BEGIN_SRC sh +cd ~/mysql +p7zip -d db_webqtl_s.7z +chown -R mysql:mysql db_webqtl_s/ +chmod 700 db_webqtl_s/ +chmod 660 db_webqtl_s/* +#+END_SRC restart MySQL service (mysqld). Login as root +: mysql_upgrade -u root --force + : myslq -u root and @@ -151,7 +241,7 @@ and Set permissions and match password in your settings file below: -: mysql> grant all privileges on db_webqtl_s.* to gn2@"localhost" identified by 'mysql_password'; +: mysql> grant all privileges on db_webqtl_s.* to gn2@"localhost" identified by 'webqtl'; You may need to change "localhost" to whatever domain you are connecting from (mysql will give an error). @@ -163,6 +253,17 @@ configuration (see below). Note for the plant database you can rename it to db_webqtl_s, or change the settings in etc/default_settings.py to match your path. +* Get genotype files + +The script looks for genotype files. You can find them in +http://ipfs.genenetwork.org/ipfs/QmXQy3DAUWJuYxubLHLkPMNCEVq1oV7844xWG2d1GSPFPL + +#+BEGIN_SRC sh +mkdir -p $HOME/genotype_files +cd $HOME/genotype_files + +#+END_SRC + * GN2 Dependency Graph Graph of all runtime dependencies as installed by GNU Guix. @@ -193,7 +294,7 @@ information given by guix: On one system: -: export PYTHONPATH="$HOME/.guix-profile/lib/python2.7/site-packages" +: export PYTHONPATH="$HOME/.guix-profile/lib/python3.8/site-packages" : export R_LIBS_SITE="$HOME/.guix-profile/site-library/" : export GEM_PATH="$HOME/.guix-profile/lib/ruby/gems/2.2.0" @@ -249,439 +350,30 @@ if that works run genenetwork after setting SQL_URI to something like : export SQL_URI=mysql://gn2:mysql_password@127.0.0.1/db_webqtl_s - -* IRC session - -Here an IRC session where we installed GN2 from scratch using GNU Guix -and a download of the test database. - -#+begin_src -<pjotrp> time to get binary install sorted :) [07:03] -<pjotrp> Guix is designed for distributed installation servers -<pjotrp> we have one on guix.genenetwork.org -<pjotrp> it contains all the prebuild packages -<pjotrp> for GN -<user01> okay [07:04] -<pjotrp> let's step back however [07:05] -<pjotrp> I presume the environment is set with all guix package --search-paths -<pjotrp> right? -<user01> yep -<user01> set to the ones in ~/.guix-profile/ -<pjotrp> good, and you are in gn-deploy-guix repo [07:06] -<user01> yep [07:07] -<pjotrp> git log shows - -Author: David Thompson <dthompson2@worcester.edu> -Date: Sun Mar 27 21:20:19 2016 -0400 - -<user01> yes -<pjotrp> env GUIX_PACKAGE_PATH=../guix-bioinformatics ./pre-inst-env guix - package -A genenetwork2 [07:08] -<pjotrp> shows - -genenetwork2 2.0-a8fcff4 out ../guix-bioinformatics/gn/packages/genenetwork.scm:144:2 -genenetwork2-database-small 1.0 out ../guix-bioinformatics/gn/packages/genenetwork.scm:270:4 -genenetwork2-files-small 1.0 out ../guix-bioinformatics/gn/packages/genenetwork.scm:228:4 - -<user01> yeah [07:09] -<pjotrp> OK, we are in sync. This means we should be able to install the exact - same software -<pjotrp> I need to start up my guix daemon - I usually run it in a screen -<pjotrp> screen -S guix-daemon -<user01> hah, I don't have screen installed yet [07:11] -<pjotrp> comes with guix ;) [07:12] -<pjotrp> no worries, you can run it any way you want -<pjotrp> $HOME/.guix-profile/bin/guix-daemon --build-users-group=guixbuild -<user01> then something's weird, because it says I don't have it -<pjotrp> oh, you need to install it first [07:13] -<pjotrp> guix package -A screen -<pjotrp> screen 4.3.1 out gnu/packages/screen.scm:34:2 -<pjotrp> but you can skip this install, for now -<user01> alright [07:14] -<pjotrp> env GUIX_PACKAGE_PATH=../guix-bioinformatics ./pre-inst-env guix - package -i genenetwork2 --dry-run -<pjotrp> substitute: updating list of substitutes from - 'https://mirror.hydra.gnu.org'... 79.1% -<pjotrp> you see that? -<pjotrp> followed by [07:15] -substitute: updating list of substitutes from -'https://hydra.gnu.org'... 100.0% -The following derivations would be built: - /gnu/store/rk7nw0rjqqsha958m649wrykadx6mmhl-profile.drv - -/gnu/store/7b0qjybvfx8syzvfs7p5rdablwhbkbvs-module-import-compiled.drv - /gnu/store/cy9zahbbf23d3cqyy404lk9f50z192kp-module-import.drv - /gnu/store/ibdn603i8grf0jziy5gjsly34wx82lmk-gtk-icon-themes.drv - -<pjotrp> which should have the same HASH values /gnu/store/7b0qjybvf... etc. - [07:16] -<user01> profile has a different hash -<pjotrp> but the next ones? -<user01> they're the same -<pjotrp> not sure why profile differs. Do you see the contact with - mirror.hydra.org? [07:17] -<user01> yeah -<pjotrp> OK, that means you set the key correctly for that one :) -<pjotrp> alright we are at the same state now. You can see most packages need - to be rebuild because they are no longer cached as binaries on hydra - [07:18] -<pjotrp> things move fast... -<user01> hehe -<pjotrp> let me also do the same on my laptop - which I have staged before - [07:19] -<pjotrp> btw, to set the path I often do [07:20] -<pjotrp> export - PATH="/home/wrk/.guix-profile/bin:/home/wrk/.guix-profile/sbin":$PATH -<pjotrp> to keep things like 'screen' from Debian -<pjotrp> Once past building guix itself that is normally OK [07:21] -<user01> ah, okay -<user01> will do that -<pjotrp> the guix build requires certain versions of tools, so you don't want - to mix foreign tools in [07:23] -<user01> makes sense [07:24] -<pjotrp> On my laptop I am trying the main updating list of substitutes from - 'http://hydra.gnu.org'... 10.5% [07:27] -<pjotrp> it is a bit slow, but let's see if there is a difference with the - mirror -<pjotrp> you can see there are two servers here. Actually with recent daemons, - if the mirror fails it will try the main server [07:28] -<pjotrp> I documented the use of a caching server here [07:29] -<pjotrp> https://github.com/pjotrp/guix-notes/blob/master/REPRODUCIBLE.org -<pjotrp> this is exactly what we are doing now -<user01> alrighty [07:35] -<pjotrp> To see if a remote server has a guix server running it should respond - [07:36] -<pjotrp> lynx http://guix.genenetwork.org:8080 --dump -<pjotrp> Resource not found: / -<pjotrp> -<pjotrp> you see that? -<user01> yes [07:37] -<pjotrp> good. The main hydra server is too slow. So on my laptop I forced - using the mirror with [07:38] -<pjotrp> env GUIX_PACKAGE_PATH=../guix-bioinformatics/ ./pre-inst-env guix - package -i genenetwork2 --dry-run - --substitute-urls="http://mirror.hydra.gnu.org" -<pjotrp> -<pjotrp> the list looks the same to me [07:40] -<user01> me too -<pjotrp> note that some packages will be built and some downloaded, right? - [07:41] -<user01> yes -<pjotrp> atlas is actually a binary on my system [07:43] -<pjotrp> I mean in that list -<pjotrp> so, it should not build. Same as yours? -<user01> yeah, atlas and r-gtable are the ones to be downloaded -<pjotrp> You should not have seen that error ;) -<pjotrp> we should try and install it this way, try [07:44] -<pjotrp> env GUIX_PACKAGE_PATH=../guix-bioinformatics ./pre-inst-env guix - package -i genenetwork2 --cores=4 --max-jobs=4 --keep-going [07:46] -<pjotrp> set CPUs and max-jobs to something sensible -<pjotrp> Does your VM have multiple cores? -<pjotrp> note you can always press Ctrl-C during install -<user01> it doesn't, I'll reboot it and give it another core [07:47] -<user02> Hey [07:48] -<user02> I'm here -<user02> Will be stepping away for some breakfast -<pjotrp> Can you do the same as us -<pjotrp> Can you see the irc log -<user02> Alright -<user02> Yes, I can -<user02> Please email me a copy in five minutes -<pjotrp> user01: so when I use the GN server [07:56] -<pjotrp> env GUIX_PACKAGE_PATH=../guix-bioinformatics ./pre-inst-env guix - package -i genenetwork2 --dry-run - --substitute-urls=http://guix.genenetwork.org:8080 -<pjotrp> I don't need to build anything [07:57] -<pjotrp> (this won't work for you, yet) -<pjotrp> to get it to work you need to 'trust' it [07:58] -<pjotrp> but, first get the build going -<pjotrp> I'll have a coffee while you and get building -<user01> yeah it's doing its thing now [08:01] -<pjotrp> cool [08:02] -<pjotrp> in a separate terminal you can try and install with the gn mirror - [08:05] -<pjotrp> I'll send you the public key and you can paste it as said - https://github.com/pjotrp/guix-notes/blob/master/REPRODUCIBLE.org - [08:06] -<user01> alright -<pjotrp> should be in the E-mail [08:09] -<pjotrp> getting it working it kinda nasty since the server gives no feedback -<pjotrp> it works when you see no more in the build list ;) [08:11] -<pjotrp> btw, you can install software in parallel. Guix does that. -<pjotrp> even the same packages -<pjotrp> so keep building ;) -<pjotrp> try and do this with Debian... -<pjotrp> coffee for me [08:12] -<user01> the first build failed [08:15] -<pjotrp> OK, Dennis fixed that one yesterday [08:27] -<pjotrp> the problem is that sometime source tarballs disappear [08:28] -<pjotrp> R is notorious for that -<user01> haha, that's inconvenient.. -<pjotrp> well, it is good that Guix catches them -<pjotrp> but we do not cache sources -<pjotrp> binaries are cached - to some degree - so we don't have to rebuild - those [08:29] -<pjotrp> time to use the guix cache at guix.genenetwork.org -<pjotrp> try and install the key (it is in the E-mail) -<pjotrp> and see what this lists [08:31] -<pjotrp> env GUIX_PACKAGE_PATH=../guix-bioinformatics ./pre-inst-env guix - package -i genenetwork2 - --substitute-urls=http://guix.genenetwork.org --dry-run -<pjotrp> should be all binary installs -<user01> it's not.. [08:32] -<user01> if I remove --substitute-urls, the list changes, does that mean I - have the key set up correctly at least? [08:33] -<pjotrp> dunno [08:35] -<pjotrp> how many packages does it want to build? -<pjotrp> should be zero -<user01> four -<pjotrp> Ah, that is OK - those are default profile things -<user01> genenetwork2 is among the ones to be downloaded so [08:36] -<pjotrp> remove --dry-run -<pjotrp> yeah, good sign :) -<pjotrp> we'll still hit a snag, but run it -<pjotrp> should be fast -<user01> doing it [08:37] -<user01> it worked! [08:38] -<user01> I think [08:39] -<pjotrp> heh [08:40] -<pjotrp> you mean it is finished? -<user01> yep -<pjotrp> type genenetwork2 -<user01> complains about not being able to connect to the database [08:41] -<pjotrp> last snag :) -<pjotrp> no database -<pjotrp> well, we succeeded in installing a same-byte install of a very - complex system :) [08:42] -<pjotrp> (always take time to congratulate yourself) -<pjotrp> now we need to install mysql -<user01> hehe :) -<pjotrp> this can be done throug guix or through debian [08:43] -<pjotrp> the latter is a bit easier here, so let's do that -<pjotrp> fun note: you can mix debian and guix -<pjotrp> Follow instructions on [08:44] -<pjotrp> - https://github.com/genenetwork/genenetwork2/tree/staging/doc#run-mysql-server -<pjotrp> apt-get install mysql-common [08:45] -<pjotrp> may do it -<pjotrp> You can also install with guix, but I need to document that -<pjotrp> btw your internet must be fast :) [08:46] -<user01> hehe it is ;) -<pjotrp> when the database is installed [08:48] -<pjotrp> be sure to set the password as instructed [08:50] -<pjotrp> when mysql is set the genenetwork2 command should fire up the web - server on localhost:5003 [08:58] -<pjotrp> btw my internet is way slower :) [09:00] -<user02> I'm back [09:04] -<user02> fixed router firmware upgrade problem -<user02> unbricking -<pjotrp> tssk [09:07] -<user02> I'll never leave routers to update themselves again [09:08] -<user02> self-brick highway -<user02> Resuming [09:09] -<pjotrp> auto-updates are evil -<pjotrp> always switch them off -<pjotrp> user02: can you install genenetwork like user has done? [09:10] -<pjotrp> pretty well documented here now :) -<user02> Yes I can [09:11] -<user02> Already installed key -<pjotrp> user02: you are getting binary packages only now? [09:13] -<user02> That's the sanest way to go now -<user02> seriously -<pjotrp> everything should be pre-built from guix.genenetwork.org -<pjotrp> you are downloading? -<user02> yes [09:15] -<pjotrp> cool. Maybe an idea to set up a server -<pjotrp> for your own use -<user02> Stuck at downloading preprocesscore -<pjotrp> should not [09:24] -<pjotrp> what does env GUIX_PACKAGE_PATH=../guix-bioinformatics/ - ./pre-inst-env guix package -i genenetwork2 - --substitute-urls="http://guix.genenetwork.org" --dry-run - [09:25] -<pjotrp> say for r-prepocesscore -<pjotrp> download or build? -<pjotrp> mine says download [09:26] -<user02> it only lists the derivatives to be built -<user02> nothing else happens [09:27] -<pjotrp> OK, so there is a problem -<pjotrp> your key may not be working -<pjotrp> everything should be listed as 'to be download' [09:28] -<user02> Hmm -<user02> Ah -<user02> I know where I messed up -<pjotrp> where? -<user02> I did add the key -<user02> However -<pjotrp> (I am documenting) -<user02> I did not tell guix to trust it -<pjotrp> yes -<pjotrp> and there is another potential problem -<user02> Remember the documentation on installing guix? -<user02> You have to tell guix to trust the default key [09:29] -<user02> Right? -<user02> So in this case -<pjotrp> read the IRC log -<user02> That step is mandatory -<pjotrp> user01: how are you doing? -<pjotrp> user02: - https://github.com/pjotrp/guix-notes/blob/master/REPRODUCIBLE.org#using-gnu-guix-archive - [09:30] -<user01> a little bit left on the db download -<pjotrp> user02: you should see no more building -<pjotrp> user02: another issue may be that you updated r-preprocesscore - package in guix-buinformatics [09:32] -<pjotrp> all downstream packages will want to rebuild -<user02> no, not really -<user02> It's not even installed -<pjotrp> checkout a branch of the the old version - make sure we are in synch -<pjotrp> should be at - /gnu/store/y1f3r2xs3fhyadd46nd2aqbr2p9qv2ra-r-biocpreprocesscore-1.32.0 - [09:33] -<pjotrp> -<user03> pjotrp: Possibly we should use the archive utility of Guix to do - deployment to avoid such out-of-sync differences :) [09:34] -<pjotrp> maybe. I did not get archive to update profiles properly [09:37] -<pjotrp> Also it is good that they get to understand guix - this way -<pjotrp> carved in stone, eh [09:38] -<user02> Yeah, all good [09:39] -<user02> My mistake was skipping the guix archive part -<user02> Can we begin with the install? -<user02> It's telling me of derivatives that will be downloaded [09:40] -<user02> So we're good -<user02> Here goes -<pjotrp> yeeha [09:42] -<user02> pjotrp, where is this guix.genenetwork.org located at? -<pjotrp> Tennessee -<user02> It's...it's....sloooooooowwwwwwwwwwwwww -<pjotrp> not from Europe -<pjotrp> is it downloading at all? -<user02> It should be extended -<user02> Yes...like at 100KB/s [09:43] -<user02> tear-jerker -<user02> Verizon problems -<user02> who's the host? -<pjotrp> I am getting 500Kb/s -<pjotrp> UT -<user02> Guix's servers can run off more than one server, right? -<user02> I'd like to host that particular server here -<user02> For speed -<pjotrp> yes -<user02> Sooner or later -<user02> It will be a necessity [09:45] -<pjotrp> exactly what I am doing - this is our server -<pjotrp> guix.genenetwork.org:8080 -<user02> All done installing [09:46] -<pjotrp> what? -<user02> Now the databases -<pjotrp> what do you mean by slow exactly? -<user02> Yes, it's installed -<pjotrp> can you run genenetwork2 -<user02> setting variables -<user02> If I try running it now, it will fail as I don't have the DBs [09:47] -<pjotrp> cool - you had a lot of prebuilt packages already -<pjotrp> OK, follow the instructions I wrote above -<user01> now everything seems to be working for me :) -<user02> OK -<pjotrp> user01: excellent! -<pjotrp> you see a webserver? -<user01> yep, can connect to localhost:5003 [09:48] -<pjotrp> So now you are running a guix copy of GN2 -<pjotrp> you can see where it lives with `which genenetwork2` or ls -l - ~/.guix-profile/bin/genenetwork2 [09:49] -<pjotrp> - /gnu/store/1kma5xszvzsvmbb4k699h7gvdncw901i-genenetwork2-2.0-a8fcff4/bin/genenetwork2 -<pjotrp> it is a script -<pjotrp> written by guix, open it [09:50] -<pjotrp> inside it points to paths and our script at -<pjotrp> - /gnu/store/1kma5xszvzsvmbb4k699h7gvdncw901i-genenetwork2-2.0-a8fcff4/bin/.genenetwork2-real -<pjotrp> if you open that you can see how the webserver is started [09:51] -<pjotrp> next step is to run a recent version of GN2 -<user01> okay [09:52] -<pjotrp> See - https://github.com/genenetwork/genenetwork2/tree/staging/doc#run-your-own-copy-of-gn2 -<pjotrp> but do not checkout that genetwork2_diet -<pjotrp> we reverted to the main tree -<pjotrp> clone git@github.com:genenetwork/genenetwork2.git [09:53] -<pjotrp> instead and checkout the staging branch -<pjotrp> that is effectively my branch [09:54] -<pjotrp> when that is done you should be able to fire up the webserver from - there [09:55] -<pjotrp> using ./bin/genenetwork2 -<user02> now installing DBs -<user02> Downloading -<pjotrp> annoyingly the source tree is ~700Mb [09:56] -<user02> Can it also be done by installing the guix package - genenetwork2-database-small? -<pjotrp> I changed it in the diet version to 8Mb, but I had to revert -<user01> I need to make my VM bigger... -<pjotrp> user02: not ready [09:57] -<user02> ok -<pjotrp> user01: sorry -<pjotrp> user01: you could mount a local dir inside the VM for development -<pjotrp> that would allow you to use MAC tools for editing -<pjotrp> just an idea -<user01> yeah, I figure I'll do something like that -<pjotrp> do you use emacs? [09:58] -<user01> yep -<pjotrp> that can also run on remote files over ssh -<pjotrp> that's an alternative -<pjotrp> kudos for using emacs :), wdyt user03 -<user02> 79 minutes to go downloading the db -<pjotrp> user02: sorry about that [09:59] -<pjotrp> it is 2GB -<user02> user, you can also mount the directory via sshfs -<user02> Mac OSX runs OpenSSH -<pjotrp> user02: sopa -<user02> You can therefore mount a directory outside the VM to the VM via - sshfs [10:00] -<pjotrp> yes, 3 options now -<user02> That way, you can set up a VM only for it's logic -<user02> Apps + the OS it runs [10:01] -<user02> For data, let it reside on physical host accessible via sshfs -<user02> Use this Arch wiki reference: - https://wiki.archlinux.org/index.php/SSHFS -<user02> I edited that last somewhere in 2015, may have been updated since - then -<user01> alright, cool! [10:04] -<pjotrp> user01: you are almost done [10:06] -<pjotrp> I wrote an elixir package for guix :) -<pjotrp> env GUIX_PACKAGE_PATH=../guix-bioinformatics/ ./pre-inst-env guix - package -A elixir - --substitute-urls="http://guix.genenetwork.org" [10:08] -<pjotrp> elixir 1.2.3 out - ../guix-bioinformatics/gn/packages/elixir.scm:31:2 -<pjotrp> -<pjotrp> I am building it on guix.genenetwork.org right now [10:09] -<user01> nice [10:10] -#+end_src - * NOTES ** Deploying GN2 official Let's see how fast we can deploy a second copy of GN2. -- [-] Base install - + [X] First install a Debian server with GNU Guix on board - + [X] Get Guix build going - - [X] Build the correct version of Guix - - [X] Check out the correct gn-stable version of guix-bioinformatics http://git.genenetwork.org/pjotrp/guix-bioinformatics - - [X] guix package -i genenetwork2 -p /usr/local/guix-profiles/gn2-stable - + [X] Create a gn2 user and home with space - + [X] Install redis (currently debian) - - [X] add to systemd - - [X] update redis.cnf - - [X] update database - + [X] Install mariadb (currently debian mariadb-server) - - [X] add to systemd - - [X] system stop mysql - - [X] update mysql.cnf - - [X] update database (see gn-services/services/mariadb.md) - - [X] check tables - + [ ] run gn2 (rust-qtlreaper not working) - + [X] update nginx +- [ ] Base install + + [ ] First install a Debian server with GNU Guix on board + + [ ] Get Guix build going + - [ ] Build the correct version of Guix + - [ ] Check out the correct gn-stable version of guix-bioinformatics http://git.genenetwork.org/pjotrp/guix-bioinformatics + - [ ] guix package -i genenetwork2 -p /usr/local/guix-profiles/gn2-stable + + [ ] Create a gn2 user and home with space + + [ ] Install redis + - [ ] add to systemd + - [ ] update redis.cnf + - [ ] update database + + [ ] Install mariadb (currently debian mariadb-server) + - [ ] add to systemd + - [ ] system stop mysql + - [ ] update mysql.cnf + - [ ] update database (see gn-services/services/mariadb.md) + - [ ] check tables + + [ ] run gn2 + + [ ] update nginx + [ ] install genenetwork3 - [ ] add to systemd diff --git a/doc/database.org b/doc/database.org index 5107b660..d5462d4e 100644 --- a/doc/database.org +++ b/doc/database.org @@ -1339,7 +1339,8 @@ The SNP count info for the BXD is calculated like this startMb += stepMb #+end_src -select * from BXDSnpPosition limit 5; +: select * from BXDSnpPosition limit 5; + +------+-----------+-----------+----------+ | Chr | StrainId1 | StrainId2 | Mb | +------+-----------+-----------+----------+ @@ -1368,3 +1369,150 @@ mysql> select * from SnpSource limit 5; Empty set (0.00 sec) Hmmm. This is the test database. Then there are the plink files and VCF files. + +* Optimize SQL? + +We were facing some issues with slow queries. A query +was really slow on Penguin2: + +: time mysql -u webqtlout -pwebqtlout db_webqtl < ~/chunk.sql > /dev/null +: real 0m13.082s +: user 0m0.292s +: sys 0m0.032s + +Runs in 1s on Tux01 and 13s on P2, why is that? The gist of it +was increasing an InnoDB cache size(!) + +Interestingly, Penguin2 is running InnoDB on a much slower storage. +It has more indices that Tux01(?!). Probably due to things we have +been trying to make the datatables faster. + +Meanwhile the query is one with many joins: + +#+begin_src sql +SELECT ProbeSet.Name,ProbeSetXRef.DataId, T4.value, T5.value, T6.value, T7.value, T8.value, T9.value, T10.value, T11.value, T12.value, T14.value, T15.value, T17.value, T18.value, T19.value, T20.value, T21.value, T22.value, T24.value, T25.value, T26.value, T28.value, T29.value, T30.value, T31.value, T35.value, T36.value, T37.value, T39.value, T98.value, T99.value, T100.value, T103.value, T487.value, T105.value, T106.value, T110.value FROM (ProbeSet, ProbeSetXRef, ProbeSetFreeze) + left join ProbeSetData as T4 on T4.Id = ProbeSetXRef.DataId + and T4.StrainId=4 + (...) + left join ProbeSetData as T110 on T110.Id = ProbeSetXRef.DataId + and T110.StrainId=110 + WHERE ProbeSetXRef.ProbeSetFreezeId = ProbeSetFreeze.Id + and ProbeSetFreeze.Name = 'HC_M2_0606_P' + and ProbeSet.Id = ProbeSetXRef.ProbeSetId + order by ProbeSet.Id +#+end_src + +And is blazingly fast on Tux01 and (now) fast enough on Penguin2. + +First I checked the tables for indices and storage type. Next I +checked the difference in configuration. + +** Check tables + +Tables (ProbeSetData, ProbeSet, ProbeSetXRef, ProbeSetFreeze) + +*** ProbeSetData + +Same on Tux01 and P2: + +: show indexes from ProbeSetData ; + +| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | +|--------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------| +| ProbeSetData | 0 | DataId | 1 | Id | A | 47769944 | NULL | NULL | | BTREE | | | +| ProbeSetData | 0 | DataId | 2 | StrainId | A | 5111384047 | NULL | NULL | | BTREE | | | + +*** ProbeSetFreeze + +Tux01 has less indexes than P2(!): + +| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | +|----------------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------| +| ProbeSetFreeze | 0 | PRIMARY | 1 | Id | A | 911 | NULL | NULL | | BTREE | | | +| ProbeSetFreeze | 0 | FullName | 1 | FullName | A | 911 | NULL | NULL | | BTREE | | | +| ProbeSetFreeze | 0 | Name | 1 | Name | A | 911 | NULL | NULL | YES | BTREE | | | +| ProbeSetFreeze | 1 | NameIndex | 1 | Name2 | A | 911 | NULL | NULL | | BTREE | | | +: 4 rows in set (0.000 sec) + +| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | +|----------------+------------+-----------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------| +| ProbeSetFreeze | 0 | PRIMARY | 1 | Id | A | 883 | NULL | NULL | | BTREE | | | +| ProbeSetFreeze | 0 | FullName | 1 | FullName | A | 883 | NULL | NULL | | BTREE | | | +| ProbeSetFreeze | 0 | Name | 1 | Name | A | 883 | NULL | NULL | YES | BTREE | | | +| ProbeSetFreeze | 1 | NameIndex | 1 | Name2 | A | 883 | NULL | NULL | | BTREE | | | +| ProbeSetFreeze | 1 | ShortName | 1 | ShortName | A | 883 | NULL | NULL | | BTREE | | | +| ProbeSetFreeze | 1 | ProbeFreezeId | 1 | ProbeFreezeId | A | 441 | NULL | NULL | | BTREE | | | +| ProbeSetFreeze | 1 | conf_and_public | 1 | confidentiality | A | 3 | NULL | NULL | | BTREE | | | +| ProbeSetFreeze | 1 | conf_and_public | 2 | public | A | 4 | NULL | NULL | | BTREE | | | +: 8 rows in set (0.00 sec) + +*** ProbeSet + +Identical indexes + +*** ProbeSetXRef + +Tux01 has less indexes than P2(!): + +: MariaDB [db_webqtl]> show indexes from ProbeSetXRef ; +| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | +|--------------+------------+------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+---------------| +| ProbeSetXRef | 0 | ProbeSetId | 1 | ProbeSetFreezeId | A | 885 | NULL | NULL | | BTREE | | | +| ProbeSetXRef | 0 | ProbeSetId | 2 | ProbeSetId | A | 47713039 | NULL | NULL | | BTREE | | | +| ProbeSetXRef | 0 | DataId_IDX | 1 | DataId | A | 47713039 | NULL | NULL | | BTREE | | | +| ProbeSetXRef | 1 | Locus_IDX | 1 | Locus | A | 15904346 | NULL | NULL | YES | BTREE | | | +: 4 rows in set (0.000 sec) + + + +: MariaDB [db_webqtl]> show indexes from ProbeSetXRef ; +| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | +|--------------+------------+-------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+---------------| +| ProbeSetXRef | 0 | ProbeSetId | 1 | ProbeSetFreezeId | A | 856 | NULL | NULL | | BTREE | | | +| ProbeSetXRef | 0 | ProbeSetId | 2 | ProbeSetId | A | 46412145 | NULL | NULL | | BTREE | | | +| ProbeSetXRef | 0 | DataId_IDX | 1 | DataId | A | 46412145 | NULL | NULL | | BTREE | | | +| ProbeSetXRef | 1 | ProbeSetId1 | 1 | ProbeSetId | A | 5156905 | NULL | NULL | | BTREE | | | +| ProbeSetXRef | 1 | Locus | 1 | Locus | A | 23206072 | NULL | NULL | YES | BTREE | | | +: 5 rows in set (0.00 sec) + +** Check storage + +The database in Tux01 is mounted on NVME. On Penguin2 it +is slower SATA with RAID5. + +Also on Penguin2 the following tables are using InnoDB instead of +MyISAM + +#+begin_src sh +-rw-rw---- 1 mysql mysql 79691776 Oct 15 2019 AccessLog.ibd +-rw-rw---- 1 mysql mysql 196608 Oct 24 2019 Docs.ibd +-rw-rw---- 1 mysql mysql 63673729024 Jul 10 2020 GenoData.ibd +-rw-rw---- 1 mysql mysql 34787557376 Jul 9 2020 ProbeData.ibd +-rw-rw---- 1 mysql mysql 254690721792 Jul 10 2020 ProbeSetData.ibd +-rw-rw---- 1 mysql mysql 32103202816 Jul 9 2020 SnpAll.ibd +-rw-rw---- 1 mysql mysql 98304 May 6 2020 TraitMetadata.ibd +#+end_src + +This [[https://www.liquidweb.com/kb/mysql-performance-myisam-vs-innodb/][article]] suggests that myISAM will be faster for our use case. + +** Configuration + +There was one setting on Tux01 missing on P2 + +: +innodb_buffer_pool_size=1024M + +Running the same query twice (so you can see the warmup after +a restart of MariaDB) + +#+begin_src sh +penguin2:/etc$ time mysql -u webqtlout -pwebqtlout db_webqtl < ~/chunk.sql > ~/test.out +real 0m4.253s +user 0m0.276s +sys 0m0.040s +penguin2:/etc$ time mysql -u webqtlout -pwebqtlout db_webqtl < ~/chunk.sql > ~/test.out +real 0m2.633s +user 0m0.296s +sys 0m0.028s +#+end_src + +That is much better :) diff --git a/doc/development.org b/doc/development.org index 5e6e318b..cd3beea3 100644 --- a/doc/development.org +++ b/doc/development.org @@ -41,3 +41,58 @@ JS_GN_PATH (3) is for development purposes. By default is is set to $HOME/genenetwork/javascript. Say you are working on an updated version of a JS module not yet in (1) you can simply check out that module in that path and it should show up. + +* Python modules + +Python modules are automatically found in the Guix profile. + +For development purposes it may be useful to try some Python package. +Obviously this is only a temporary measure and when you decide to +include the package it should be packaged in [[http://git.genenetwork.org/guix-bioinformatics/guix-bioinformatics][our GNU Guix software +stack]]! + +To add packages you need to make sure the correct Python is used (currently +Python 2.7) to install a package. E.g.. + +#+BEGIN_SRC sh +python --version + Python 2.7.16 +pip --version + pip 18.1 from /usr/lib/python2.7/dist-packages/pip (python 2.7) +#+END_SRC + +You can install a Python package locally with pip, e.g. + +#+BEGIN_SRC sh +pip install hjson +#+END_SRC + +This installed in ~$HOME/.local/lib/python3.8/site-packages~. To add +the search path for GeneNetwork use the environment variable + +#+BEGIN_SRC sh +export PYTHON_GN_PATH=$HOME/.local/lib/python3.8/site-packages +#+END_SRC + +Now you should be able to do + +#+BEGIN_SRC python +import hjson +#+END_SRC + +In fact you can kick off a Python shell with something like + +#+BEGIN_SRC python +env SERVER_PORT=5013 WEBSERVER_MODE=DEBUG LOG_LEVEL=DEBUG \ + SQL_URI=mysql://gn2:webqtl@localhost/db_webqtl_s \ + GN2_PROFILE=~/opt/genenetwork2 \ + ./bin/genenetwork2 ./etc/default_settings.py -c +Python 2.7.17 (default, Jan 1 1970, 00:00:01) +[GCC 7.5.0] on linux2 +Type "help", "copyright", "credits" or "license" for more information. +>>> import hjson +#+END_SRC + +It should now also work in GN2. + +* TODO External tools diff --git a/doc/docker-container.org b/doc/docker-container.org new file mode 100644 index 00000000..79b8272f --- /dev/null +++ b/doc/docker-container.org @@ -0,0 +1,82 @@ +#+TITLE: Genenetwork2 Dockerized + +* Table of Contents :TOC: +- [[#introduction][Introduction]] +- [[#creating-the-docker-images][Creating the Docker Images]] +- [[#pushing-to-dockerhub][Pushing to DockerHub]] + +* Introduction + +The CI(Continuous Integration) system for Genenetwork2 uses [[https://github.com/features/actions][Github +Actions]]. As such, it's important to have a way to run tests using +facilities provided by GUIX in a reproducible way. This project +leverages GUIX to generate a docker container from which the unittests +are ran from. + +Find instructions on how to set docker up inside GUIX [[https://github.com/pjotrp/guix-notes/blob/master/CONTAINERS.org#run-docker][here]]. This +document will not get into that. It's assumed that you have a working +docker setup. + +The rest of this document outlines how the docker container used in +the CI builds was created. + +* Creating the Docker Images + +The general idea is that GUIX is used to generate a set of binaries, +which will be added to a base mariaDB image. + +First create the gn2 tar archive by running: + +#+begin_src sh +env GUIX_PACKAGE_PATH="/home/bonface/projects/guix-bioinformatics:/home/bonface/projects/guix-past/modules" \ + ./pre-inst-env guix pack --no-grafts\ + -S /gn2-profile=/ \ + screen genenetwork2 + #+end_src + +The output will look something similar to: + +: /gnu/store/x3m77vwaqcwba24p5s4lrb7w2ii16lj9-tarball-pack.tar.gz + +Now create a folder from which will host the following dockerfile. You +can name this file Dockerfile. Note that mariadb is the base image +since it already has mariadb installed for us. + +#+begin_src conf :mkdirp yes :tangle ~/docker/Dockerfile +FROM mariadb:latest + +COPY ./gn2.tar.gz /tmp/gn2.tar.gz +RUN tar -xzf /tmp/gn2.tar.gz -C / && rm -f /tmp/gn2.tar.gz && \ + mkdir -p /usr/local/mysql /genotype_files/genotype/json +#+end_src + +Build the image(Note the fullstop at the end): + +: sudo docker build -t genenetwork2:latest -f Dockerfile . + +To load the image interactively you've just created: + +: docker run -ti "genenetwork2:latest" bash + +Assuming you have a docker instance running, you could always run +commands in it e.g: + +: docker run "genenetwork2:latest" python --version + +* Pushing to DockerHub + +We use DockerHub to store the docker images from which we use on our +CI environment using Github Actions. + +To push to dockerhub, first get the image name by running =docker +images=. Push to dockerhub using a command similar to: + +: docker push bonfacekilz/genenetwork2:latest + +Right now, we have 2 images on DockerHub: + +- https://hub.docker.com/repository/docker/bonfacekilz/python2-genenetwork2: + Contains the python2 version of gn2. Don't use this. Please use the + python3 image! +- https://hub.docker.com/repository/docker/bonfacekilz/python3-genenetwork2: + Contains the python3 version of gn2. diff --git a/doc/guix_profile_setup.org b/doc/guix_profile_setup.org new file mode 100644 index 00000000..c397377c --- /dev/null +++ b/doc/guix_profile_setup.org @@ -0,0 +1,39 @@ +* Setting up GUIX profile for GN
+
+First create a guix profile with the latest packages:
+
+: ~/opt/guix/bin/guix pull
+
+This will create a profile with the latest packages under`~/.config/guix/current`
+
+Now you have the latest guix. Check: `$HOME/.config/guix/current/bin/guix --version`
+
+At this point, it's worth mentioning that installing
+python3-genenetwork using `$HOME/.config/guix/current/bin/guix` should
+work; but let's use the dev version(since that may come handy in
+time), and it's a nice thing to know.
+
+Next, we ensure that the appropriate GUILE<sub>PATHS</sub> are set:
+
+: export GUILE_LOAD_PATH=$HOME/.config/guix/current/share/guile/site/3.0/
+: export GUILE_LOAD_COMPILED_PATH=$HOME/.config/guix/current/lib/guile/3.0/site-ccache/
+
+Get into the container:
+
+: $HOME/.config/guix/current/bin/guix environment -C guix --ad-hoc bash gcc-toolchain
+: ./bootstrap
+: ./configure --localstatedir=/var --sysconfdir=/etc
+
+Check that everything works:
+
+: make check
+
+Clean up and build:
+
+: make clean-go
+: make -j 4
+: exit
+
+Install Python3 (substitute paths when necessary):
+
+: env GUIX_PACKAGE_PATH='/home/zas1024/guix-bioinformatics:/home/zas1024/guix-past/modules' $HOME/.config/guix/current/bin/guix install python3-genenetwork2 -p ~/opt/python3-genenetwork2 --substitute-urls="http://guix.genenetwork.org https://berlin.guixsd.org https://ci.guix.gnu.org https://mirror.hydra.gnu.org"
diff --git a/doc/heatmap-generation.org b/doc/heatmap-generation.org new file mode 100644 index 00000000..a697c70b --- /dev/null +++ b/doc/heatmap-generation.org @@ -0,0 +1,34 @@ +#+STARTUP: inlineimages +#+TITLE: Heatmap Generation +#+AUTHOR: Muriithi Frederick Muriuki + +* Generating Heatmaps + +Like a lot of other features, the heatmap generation requires an existing collection. If none exists, see [[][Creating a new collection]] for how to create a new collection. + +Once you have a collection, you can navigate to the collections page by clicking on the "Collections" link in the header + + +[[./images/gn2_header_collections.png]] + +From that page, pick the collection that you want to work with by clicking on its name on the collections table. + +That takes you to that collection's page, where you can select the data that you want to use to generate the heatmap. + +** Selecting Orientation + +Once you have selected the data, select the orientation of the heatmap you want generated. You do this by selecting either *"vertical"* or *"horizontal"* in the heatmaps form: + +[[./images/heatmap_form.png]] + +Once you have selected the orientation, click on the "Generate Heatmap" button as in the image above. + +The heatmap generation might take a while, but once it is done, an image shows up above the data table. + +** Downloading the PNG copy of the Heatmap + +Once the heatmap image is shown, hovering over it, displays some tools to interact with the image. + +To download, hover over the heatmap image, and click on the "Download plot as png" icon as shown. + +[[./images/heatmap_with_hover_tools.png]] diff --git a/doc/images/gn2_header_collections.png b/doc/images/gn2_header_collections.png Binary files differnew file mode 100644 index 00000000..ac23f9c1 --- /dev/null +++ b/doc/images/gn2_header_collections.png diff --git a/doc/images/heatmap_form.png b/doc/images/heatmap_form.png Binary files differnew file mode 100644 index 00000000..163fbb60 --- /dev/null +++ b/doc/images/heatmap_form.png diff --git a/doc/images/heatmap_with_hover_tools.png b/doc/images/heatmap_with_hover_tools.png Binary files differnew file mode 100644 index 00000000..4ab79f99 --- /dev/null +++ b/doc/images/heatmap_with_hover_tools.png diff --git a/doc/rpy2-performance.org b/doc/rpy2-performance.org new file mode 100644 index 00000000..8f917ca0 --- /dev/null +++ b/doc/rpy2-performance.org @@ -0,0 +1,182 @@ +* Python-Rpy2 performance issues with genenetwork2 + +At one point, genenetwork2 was down. A possible cause was that it +wrote into the log file in an infinite loop due to rpy2(v3.4.4), so a +solution was to empty it. Currently, as a work around, rpy2 is +disabled by removing it's imports. This affects WGCNA/ CTL imports and +commenting out Biweight Midcorrelation option in the trait page. See: + +- [[https://github.com/genenetwork/genenetwork2/commit/1baf5f7611909c651483208184c5fbf7d4a7a088][1baf5f7]] +- [[https://github.com/genenetwork/genenetwork2/commit/afee4d625248565857df98d3510f680ae6204864][afee4d6]] +- [[https://github.com/genenetwork/genenetwork2/commit/c458bf0ad731e5e5fd9cbd0686936b3a441bae63][c458bf0]] +- [[https://github.com/genenetwork/genenetwork2/commit/d31f3f763471b19559ca74e73b52b3cb5e7153ce][d31f3f7]] + +** Reproducing the problem + +I went back to commit #b8408cea. With regards to logs, I never +experienced any log issue. Perhaps it's because of how I start my +server: + +: env SERVER_PORT=5004 TMPDIR=/home/bonface/tmp WEBSERVER_MODE=DEBUG LOG_LEVEL=DEBUG GENENETWORK_FILES=/home/bonface/data/genotype_files/ GN2_PROFILE=/home/bonface/opt/python3-genenetwork2 ./scripts/run_debug.sh + +However, when loading the homepage, I occasionally ran into this trace: + +#+begin_src +DEBUG:wqflask.views:.check_access_permissions: @app.before_request check_access_permissions +DEBUG:wqflask.views:.shutdown_session: remove db_session +WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: Error: ignoring SIGPIPE signal + +WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: In addition: +WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: Warning messages: + +WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: 1: +WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: In (function (package, help, pos = 2, lib.loc = NULL, character.only = FALSE, : +WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: + +WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: library '/home/bonface/R/x86_64-unknown-linux-gnu-library/4.0' contains no packages + +WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: 2: +WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: In (function (package, help, pos = 2, lib.loc = NULL, character.only = FALSE, : +WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: + +WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: library '/home/bonface/R/x86_64-unknown-linux-gnu-library/4.0' contains no packages + +WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: Fatal error: unable to initialize the JIT + + +WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: + *** caught segfault *** + +WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: address (nil), cause 'memory not mapped' + +WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: +Possible actions: +1: abort (with core dump, if enabled) +2: normal R exit +3: exit R without saving workspace +4: exit R saving workspace + +Selection: + +#+end_src + +This blocks the flask service. Seems to be related to: [[https://github.com/rpy2/rpy2/issues/769][rpy2-issue#769]] +and [[https://github.com/rpy2/rpy2/issues/809][rpy2-issue#809]]. I tried to reproduce this problem using some endpoint: + +#+begin_src python +@app.route("/test") + def test(): + from rpy2 import robjects as ro + from rpy2 import rinterface + from threading import Thread + + def rpy2_init_simple(): + rinterface.initr_simple() + + thread = Thread(target=rpy2_init_simple) + thread.start() + return "This is a test after importing rpy2" +#+end_src + +which generates this trace: + +#+begin_src +/home/bonface/opt/python3-genenetwork2/lib/python3.8/site-packages/rpy2/rinterface.py:955: UserWarning: R is not initialized by the main thread. + Its taking over SIGINT cannot be reversed here, and as a + consequence the embedded R cannot be interrupted with Ctrl-C. + Consider (re)setting the signal handler of your choice from + the main thread. +warnings.warn( +DEBUG:wqflask.views:.shutdown_session: remove db_session + +#+end_src + +Modifying the endpoint to: + +#+begin_src python +@app.route("/test") + def test(): + import wqflask.correlation.show_corr_results + import wqflask.ctl.ctl_analysis + import time + from wqflask.correlation.correlation_functions import cal_zero_order_corr_for_tiss + + print("Sleeping for 3 seconds") + time.sleep(3) + return "This is a test after importing rpy2" +#+end_src + +and refreshing the page a couple of times, I get: + +#+begin_src +DEBUG:wqflask.views:.check_access_permissions: @app.before_request check_access_ +permissions +Sleeping for 3 seconds +DEBUG:wqflask.views:.shutdown_session: remove db_session +WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: Error: ignoring SIGPI +PE signal + +WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: In addition: +WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: Warning messages: + +WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: 1: +WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: In (function (package +, help, pos = 2, lib.loc = NULL, character.only = FALSE, : +WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: + +WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: library '/home/bonfa +ce/R/x86_64-unknown-linux-gnu-library/4.0' contains no packages + +WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: 2: +WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: In (function (package +, help, pos = 2, lib.loc = NULL, character.only = FALSE, : +WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: + +WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: library '/home/bonfa +ce/R/x86_64-unknown-linux-gnu-library/4.0' contains no packages + +WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: +\*** caught segfault *** + +WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: address (nil), cause +'memory not mapped' + +WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: +Possible actions: +1: abort (with core dump, if enabled) +2: normal R exit +3: exit R without saving workspace +4: exit R saving workspace + +Selection: [2021-06-16 13:11:00 +0300] [18657] [INFO] Handling signal: winch +[2021-06-16 13:11:00 +0300] [18657] [INFO] Handling signal: winch +[2021-06-16 13:13:02 +0300] [18657] [INFO] Handling signal: winch +#+end_src + +However, this seems to be non-deterministic, in the sense that I can't +really pin what causes the above. I've tried to write a Locust Test +that simulates users hitting that endpoint: + +#+begin_src python +"""Load test a single trait page""" +from locust import HttpUser, task, between + + + class LoadTest(HttpUser): + wait_time = between(1, 2.5) + + @task + def fetch_trait(self): + """Fetch a single trait""" + self.client.get("/test") +#+end_src + + +** A possible solution + +From this [[https://github.com/rpy2/rpy2/issues/809#issuecomment-845923975][comment]], a possible reason for the above traces, is that +from Flask's end, a [[https://tldp.org/LDP/lpg/node20.html][SIGPIPE]] is somehow generated by our Python +code. However, at this particular point, the R thread just happens to +be running, and R can't handle this correctly. This seems to have been +fixed in this [[https://github.com/rpy2/rpy2/pull/810][PR]] with a this [[https://github.com/rpy2/rpy2/issues/809#issuecomment-851618215][explanation]]. On our end, to have these +changes, we have to update our python-rpy2 version. diff --git a/doc/testing.org b/doc/testing.org index 1d5cc8b8..d5ab117d 100644 --- a/doc/testing.org +++ b/doc/testing.org @@ -1,43 +1,67 @@ #+TITLE: Testing GN2 * Table of Contents :TOC: - - [[#introduction][Introduction]] - - [[#run-tests][Run tests]] - - [[#setup][Setup]] - - [[#running][Running]] +- [[#introduction][Introduction]] +- [[#run-tests][Run tests]] + - [[#setup][Setup]] + - [[#running][Running]] * Introduction -For integration testing we currently use the brilliant Ruby Mechanize -gem against the small database; a setup we call mechanical Rob because -it emulates someone clicking through the website and checking results. +For integration testing, we currently use [[https://github.com/genenetwork/genenetwork2/tree/testing/test/requests][Mechanica Rob]] against the +small [[https://github.com/genenetwork/genenetwork2/blob/testing/doc/database.org][database]]; a setup we call Mechanical Rob because it emulates +someone clicking through the website and checking results. -These scripts invoke calls to a running webserver and test the -response. If a page changes or is broken tests will break and we are -informed. In principle, Mechanical Rob is run before code merges are -committed to the main server. +These scripts invoke calls to a running webserver and test the response. +If a page changes or breaks, tests will fail. In principle, Mechanical +Rob runs before code merges get committed to the main server. -In the future we may move to Python mechanize - it'll be easy to mix -the Ruby and Python versions. +For unit tests, we use python's =unittest= framework. Coverage reports +get generated using [[https://coverage.readthedocs.io/en/coverage-5.2.1/][coverage.py]] which you could also use to run +unit tests. When adding new functionality, it is advisable to add +unit tests. * Run tests ** Setup -Mechanize is not yet included in Guix deployment. +Everything required for testing is already package with guix: +: ./pre-ins-env guix package -i genenetwork2 -p ~/opt/genenetwork2 ** Running -Run the tests from the root of the genenetwork2 source tree as, for -example, +Run the tests from the root of the genenetwork2 source tree as. Ensure +that Redis and Mariadb are running. -: ./bin/test-website http://localhost:5003/ (default) +To run Mechanical Rob: +: time env GN2_PROFILE=~/opt/genenetwork2 TMPDIR=~/tmp SERVER_PORT=5004 GENENETWORK_FILES=/gnu/data/gn2_data/ ./bin/genenetwork2 ./etc/default_settings.py -c ~/projects/genenetwork2/test/requests/test-website.py -a http://localhost:5004 -If you are using the small deployment database you can use +Use these aliases for the following examples. -: ./bin/test-website --skip -n +#+begin_src sh +alias runpython="env GN2_PROFILE=~/opt/gn-latest TMPDIR=/tmp SERVER_PORT=5004 GENENETWORK_FILES=/gnu/data/gn2_data/ ./bin/genenetwork2 -To run individual tests on localhost you can do +alias runcmd="time env GN2_PROFILE=~/opt/gn-latest TMPDIR=//tmp SERVER_PORT=5004 GENENETWORK_FILES=/gnu/data/gn2_data/ ./bin/genenetwork2 ./etc/default_settings.py -cli" +#+end_src -: ruby -Itest -Itest/lib test/lib/mapping.rb --name="/Mapping/" +You could use them in your =.bashrc= or =.zshrc= file. + +To run unit tests: + +: runpython -m unittest discover -v + +Or alternatively using the coverage tool: + +: runcmd coverage run -m unittest discover -v + +To generate a html coverage report in =wqflask/coverage_html_report/= + +: runcmd coverage html + +To output the report to =STDOUT=: + +: runcmd coverage report + +All the configs for running the coverage tool are in +=wqflask/.coveragerc= |