From 7cc37bf2efba6873fccd0f1756c89d25400afd47 Mon Sep 17 00:00:00 2001 From: Pjotr Prins Date: Fri, 9 Sep 2016 08:34:36 +0200 Subject: Doc: note on guix paths --- doc/README.org | 46 ++++++++++++++++++++++++++-------------------- 1 file changed, 26 insertions(+), 20 deletions(-) (limited to 'doc') diff --git a/doc/README.org b/doc/README.org index b3c78f29..aa05654f 100644 --- a/doc/README.org +++ b/doc/README.org @@ -117,7 +117,7 @@ cd guix-gn-latest ** Step 3: Authorize the GN Guix server GN2 has its own GNU Guix binary distribution server. To trust it you have -to add the following key +to add the following key #+begin_src scheme (public-key @@ -136,9 +136,9 @@ guix archive --authorize and hit Ctrl-D. -Now you can use the substitute server to install GN2 binaries. +Now you can use the substitute server to install GN2 binaries. -** Step 4: Install and run GN2 +** Step 4: Install and run GN2 Since this is a quick and dirty install we are going to override the GNU Guix package path by pointing the package path to our repository: @@ -208,7 +208,7 @@ https://s3.amazonaws.com/genenetwork2/db_webqtl_s.zip Check the md5sum. After installation inflate the database binary in the MySQL directory -(this installation path is subject to change soon) +(this installation path is subject to change soon) : chown -R mysql:mysql db_webqtl_s/ : chmod 700 db_webqtl_s/ @@ -271,10 +271,10 @@ R_LIBS_SITE are set) from the information given by guix: Inside the repository: : cd genenetwork2 -: ./bin/genenetwork2 +: ./bin/genenetwork2 -Will fire up your local repo http://localhost:5003/ using the -settings in ./etc/default_settings.py. These settings may +Will fire up your local repo http://localhost:5003/ using the +settings in ./etc/default_settings.py. These settings may not reflect your system. To override settings create your own from a copy of default_settings.py and pass it into GN2 with @@ -348,7 +348,7 @@ Make dirs Add users -: adduser nobody ; addgroup nobody +: adduser nobody ; addgroup nobody Run nginx @@ -392,6 +392,12 @@ Make a note of the paths with ./pre-inst-env guix package --search-paths #+end_src bash +or this should also work if guix is installed + +#+begin_src bash +guix package --search-paths +#+end_src bash + After setting the paths for the server #+begin_src bash @@ -413,7 +419,7 @@ genenetwork2 will start the default server which listens on port 5003, i.e., http://localhost:5003/. -OK, we are where we were before with step 4. Only difference is that we +OK, we are where we were before with step 4. Only difference is that we used our own compiled guix server. * Trouble shooting @@ -433,7 +439,7 @@ On one system: : export R_LIBS_SITE="$HOME/.guix-profile/site-library/" : export GEM_PATH="$HOME/.guix-profile/lib/ruby/gems/2.2.0" -and perhaps a few more. +and perhaps a few more. ** ERROR: can not find directory $HOME/gn2_data The default settings file looks in your $HOME/gn2_data. Since these @@ -466,7 +472,7 @@ and a download of the test database. set to the ones in ~/.guix-profile/ good, and you are in gn-latest-guix repo [07:06] yep [07:07] - git log shows + git log shows Author: David Thompson Date: Sun Mar 27 21:20:19 2016 -0400 @@ -488,7 +494,7 @@ genenetwork2-files-small 1.0 out ../guix-bioinformatics/gn/packages/g hah, I don't have screen installed yet [07:11] comes with guix ;) [07:12] no worries, you can run it any way you want - $HOME/.guix-profile/bin/guix-daemon --build-users-group=guixbuild + $HOME/.guix-profile/bin/guix-daemon --build-users-group=guixbuild then something's weird, because it says I don't have it oh, you need to install it first [07:13] guix package -A screen @@ -546,11 +552,11 @@ The following derivations would be built: https://github.com/pjotrp/guix-notes/blob/master/REPRODUCIBLE.org this is exactly what we are doing now alrighty [07:35] - To see if a remote server has a guix server running it should respond + To see if a remote server has a guix server running it should respond [07:36] lynx http://guix.genenetwork.org:8080 --dump Resource not found: / - + you see that? yes [07:37] good. The main hydra server is too slow. So on my laptop I forced @@ -558,7 +564,7 @@ The following derivations would be built: env GUIX_PACKAGE_PATH=../guix-bioinformatics/ ./pre-inst-env guix package -i genenetwork2 --dry-run --substitute-urls="http://mirror.hydra.gnu.org" - + the list looks the same to me [07:40] me too note that some packages will be built and some downloaded, right? @@ -688,7 +694,7 @@ The following derivations would be built: everything should be pre-built from guix.genenetwork.org you are downloading? yes [09:15] - cool. Maybe an idea to set up a server + cool. Maybe an idea to set up a server for your own use Stuck at downloading preprocesscore should not [09:24] @@ -735,7 +741,7 @@ The following derivations would be built: should be at /gnu/store/y1f3r2xs3fhyadd46nd2aqbr2p9qv2ra-r-biocpreprocesscore-1.32.0 [09:33] - + pjotrp: Possibly we should use the archive utility of Guix to do deployment to avoid such out-of-sync differences :) [09:34] maybe. I did not get archive to update profiles properly [09:37] @@ -802,7 +808,7 @@ The following derivations would be built: but do not checkout that genetwork2_diet we reverted to the main tree clone git@github.com:genenetwork/genenetwork2.git [09:53] - instead and checkout the staging branch + instead and checkout the staging branch that is effectively my branch [09:54] when that is done you should be able to fire up the webserver from there [09:55] @@ -825,7 +831,7 @@ The following derivations would be built: yep that can also run on remote files over ssh that's an alternative - kudos for using emacs :), wdyt user03 + kudos for using emacs :), wdyt user03 79 minutes to go downloading the db user02: sorry about that [09:59] it is 2GB @@ -850,7 +856,7 @@ The following derivations would be built: --substitute-urls="http://guix.genenetwork.org:8080" [10:08] elixir 1.2.3 out ../guix-bioinformatics/gn/packages/elixir.scm:31:2 - + I am building it on guix.genenetwork.org right now [10:09] nice [10:10] #+end_src -- cgit 1.4.1 From 0621666fba97b3646271bb037b6c43503e981abf Mon Sep 17 00:00:00 2001 From: Pjotr Prins Date: Sat, 10 Sep 2016 10:03:44 +0200 Subject: Doc: Rpy2 note --- doc/README.org | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-) (limited to 'doc') diff --git a/doc/README.org b/doc/README.org index aa05654f..2b27d562 100644 --- a/doc/README.org +++ b/doc/README.org @@ -6,7 +6,7 @@ - [[#step-1-install-gnu-guix][Step 1: Install GNU Guix]] - [[#step-2-checkout-the-gn2-git-repositories][Step 2: Checkout the GN2 git repositories]] - [[#step-3-authorize-the-gn-guix-server][Step 3: Authorize the GN Guix server]] - - [[#step-4-install-and-run-gn2-][Step 4: Install and run GN2 ]] + - [[#step-4-install-and-run-gn2][Step 4: Install and run GN2]] - [[#run-mysql-server][Run MySQL server]] - [[#gn2-dependency-graph][GN2 Dependency Graph]] - [[#source-deployment][Source deployment]] @@ -20,6 +20,7 @@ - [[#importerror-no-module-named-jinja2][ImportError: No module named jinja2]] - [[#error-can-not-find-directory-homegn2_data][ERROR: can not find directory $HOME/gn2_data]] - [[#cant-run-a-module][Can't run a module]] + - [[#rpy2-error-show-now-found][Rpy2 error 'show' now found]] - [[#irc-session][IRC session]] * Introduction @@ -453,6 +454,21 @@ In rare cases, development modules are not brought in with Guix because no source code is available. This can lead to missing modules on a running server. Please check with the authors when a module is missing. +** Rpy2 error 'show' now found + +This error + +: __show = rpy2.rinterface.baseenv.get("show") +: LookupError: 'show' not found + +means that R was updated in your path, and that Rpy2 needs to be +recompiled against this R - don't you love informative messages? + +In our case it means that GN's PYTHONPATH is not in sync with +R_LIBS_SITE. Please check your GNU Guix GN2 installation paths, +you man need to reinstall. Note that this may be the point you +may want to start using profiles (see profile section). + * IRC session Here an IRC session where we installed GN2 from scratch using GNU Guix -- cgit 1.4.1 From ae1a7f0c8bed6b1a3445a4fac26a578851715629 Mon Sep 17 00:00:00 2001 From: Pjotr Prins Date: Sat, 24 Sep 2016 06:59:37 +0000 Subject: Doc: Section on reproducibility - fixed SVG URLs --- doc/Architecture.org | 61 ++++++++++++++++++++++++++++++++++++++++++++++++++++ doc/README.org | 4 ++-- 2 files changed, 63 insertions(+), 2 deletions(-) (limited to 'doc') diff --git a/doc/Architecture.org b/doc/Architecture.org index 04e05e40..ec56f9a9 100644 --- a/doc/Architecture.org +++ b/doc/Architecture.org @@ -2,6 +2,7 @@ * Table of Contents :TOC: - [[#introduction][Introduction]] + - [[#reproducibility-and-interoperability][Reproducibility and interoperability]] - [[#webserver][Webserver]] - [[#gnserver-rest][GnServer (REST)]] - [[#gnexec][GnExec]] @@ -14,6 +15,66 @@ This document describes the architecture of GN2. Because GN2 is evolving, only a high-level overview is given here. +* Reproducibility and interoperability + +Reproducible data analysis and software interoperability should be key +goals for any system that aims to bring research groups +together. These goals are increasingly relevant with growing data +sizes and increasingly complex analysis pipelines. Rigor, +reproducibility, and robustness starts with data that should abide by +Findable, Accessible, Interoperable, and Re-usable (FAIR) principles +(see the Wilkinson Nature paper on [[http://www.nature.com/articles/sdata201618][FAIR Guiding Principles for +scientific data management and stewardship]]). + +With GN2 we are solving these requirements by assigning unique +identifiers (cryptographic HASH values calculated over immutable data +content and including that value in the file names or directories) and +making these identifiers available through web interfaces (e.g., +through a REST API). This means that at any point in the future the +exact same data can be retrieved using a known non-changeable +identifier (see also +https://github.com/pjotrp/genenetwork2/blob/staging/doc/submit-data.org). + +Synchronisation, integrity checking and backups become trivial using +these HASH values, even for very large datasets. Since everything is +managed at the file system level we can also use Unix authorisation +systems. HIPAA compliancy is achieved by using HASH values and +bringing the software into the controlled HIPAA environment. + +In the context of GeneNetwork we are using git and github for version +control of software source code +(https://github.com/genenetwork/). Software can be treated just like +data, i.e., git uses HASH identifiers to retrieve specific versions of +source. I.e., versions of source code are identifiable and retrievable +and can be matched with data into an analysis pipeline. The +combination of software and data, again, makes a unique HASH value +which identifies the analysis pipe-line. + +For combining runnable software and data into an analysis pipeline we +use GNU Guix which, yet again, turns everything into a unique HASH +value which allows for exact retrieval and reproducibility. Not only +that, GNU Guix gives control of the software and all its dependencies, +use GNU Guix which, yet again, turns everything into a unique HASH +value which allows for exact retrieval and reproducibility. Not only +that, GNU Guix gives control of the software and all its dependencies, +calculating a HASH value for all dependencies, all the way down to +versions of R, BLAS and glibc. This way of packaging software +ascertains that identical software pipelines are easily setup on +different system or in the Cloud. Meaning that everyone ends up using +the exact same combination of software versions in a pipeline. + +For software development we use GNU Guix for integration testing and +deployment (described in JOSS paper). We also use automated test tools +(Ruby mechanize) for integration testing of the web services and we +use unit testing of all backend services. All our software source code +is published as `free and open source software' (FOSS) which means +that anyone can view code on github, comment on it, or even +contribute. GeneNetwork is becoming increasingly modular and has a +growing number of contributers who, in principle, abide by the THE +SMALL TOOLS MANIFESTO FOR BIOINFORMATICS which we wrote up +(https://github.com/pjotrp/bioinformatics) and was signed by 51 +bioinformaticians. + * Webserver The main [[https://github.com/genenetwork/genenetwork2][GN2 webserver]] is built on [[http://flask.pocoo.org/][Python flask]] and this GN2 source diff --git a/doc/README.org b/doc/README.org index 2b27d562..0f56914a 100644 --- a/doc/README.org +++ b/doc/README.org @@ -29,7 +29,7 @@ If you want to understand the architecture of GN2 read [[Architecture.org]]. The rest of this document is mostly on deployment of GN2. -Large system deployments can get very [[http://biobeat.org/gn2.svg][complex]]. In this document we +Large system deployments can get very [[http://biogems.info/contrib/genenetwork/gn2.svg ][complex]]. In this document we explain the GeneNetwork version 2 (GN2) reproducible deployment system which is based on GNU Guix (see also Pjotr's [[https://github.com/pjotrp/guix-notes/blob/master/README.md][Guix-notes]]). The Guix system can be used to install GN with all its files and dependencies. @@ -243,7 +243,7 @@ change the settings in etc/default_settings.py to match your path. Graph of all runtime dependencies as installed by GNU Guix. #+ATTR_HTML: :title GN2_graph -[[http://biobeat.org/gn2.svg]] +http://biogems.info/contrib/genenetwork/gn2.svg * Source deployment -- cgit 1.4.1