about summary refs log tree commit diff
path: root/doc
diff options
context:
space:
mode:
Diffstat (limited to 'doc')
-rw-r--r--doc/Architecture.org59
-rw-r--r--doc/README.org68
-rw-r--r--doc/testing.org43
3 files changed, 147 insertions, 23 deletions
diff --git a/doc/Architecture.org b/doc/Architecture.org
index 04e05e40..c5876196 100644
--- a/doc/Architecture.org
+++ b/doc/Architecture.org
@@ -2,6 +2,7 @@
 
 * Table of Contents                                                     :TOC:
  - [[#introduction][Introduction]]
+ - [[#reproducibility-and-interoperability][Reproducibility and interoperability]]
  - [[#webserver][Webserver]]
  - [[#gnserver-rest][GnServer (REST)]]
  - [[#gnexec][GnExec]]
@@ -14,6 +15,64 @@
 This document describes the architecture of GN2. Because GN2 is
 evolving, only a high-level overview is given here.
 
+* Reproducibility and interoperability
+
+Reproducible data analysis and software interoperability should be key
+goals for any system that aims to bring research groups
+together. These goals are increasingly relevant with growing data
+sizes and increasingly complex analysis pipelines. Rigor,
+reproducibility, and robustness starts with data that should abide by
+Findable, Accessible, Interoperable, and Re-usable (FAIR) principles
+(see the Wilkinson Nature paper on [[http://www.nature.com/articles/sdata201618][FAIR Guiding Principles for
+scientific data management and stewardship]]).
+
+GeneNetwork (GN2) solves this by assigning unique identifiers
+(cryptographic HASH values calculated over immutable data content),
+including these values in file or directory names, and making them
+available through web interfaces (e.g., through a through a REST
+API). This means that at any point in the future the exact same data
+can be retrieved using a known non-changeable identifier (see also
+https://github.com/pjotrp/genenetwork2/blob/staging/doc/submit-data.org).
+
+Synchronisation, integrity checking and backups become trivial using
+these HASH values, even for very large datasets. Since everything is
+managed at the file system level we can also use Unix authorisation
+systems. HIPAA compliancy is achieved by using HASH references and
+bringing the software into the controlled HIPAA environment.
+
+In the context of GeneNetwork we are using git for version control of
+software source code (https://github.com/genenetwork/). Software can
+be treated just like data, i.e., git uses HASH identifiers to retrieve
+specific versions of source. I.e., versions of source code are
+identifiable and retrievable and can be matched with data into an
+analysis pipeline. The combination of software and data, again, makes
+a unique HASH value which identifies the analysis pipeline.
+
+For combining runnable software and data into an analysis pipeline we
+use GNU Guix which, yet again, turns everything into a unique HASH
+value which allows for exact retrieval and reproducibility. Not only
+that, GNU Guix gives control of the software and all its dependencies,
+use GNU Guix which, yet again, turns everything into a unique HASH
+value which allows for exact retrieval and reproducibility. Not only
+that, GNU Guix gives control of the software and all its dependencies,
+calculating a HASH value for all dependencies, all the way down to
+versions of R, BLAS and glibc. This way of packaging software
+ascertains that identical software pipelines are easily setup on
+different system or in the Cloud. Meaning that everyone ends up using
+the exact same combination of software versions in a pipeline.
+
+For software development we use GNU Guix for integration testing and
+deployment (described in JOSS paper). We also use automated test tools
+(Ruby mechanize) for integration testing of the web services and we
+use unit testing of all backend services. All our software source code
+is published as `free and open source software' (FOSS) which means
+that anyone can view code on github, comment on, or even contribute
+to. GeneNetwork is becoming increasingly modular and has a growing
+number of contributers who subscribe to the principles of THE SMALL
+TOOLS MANIFESTO FOR BIOINFORMATICS
+(https://github.com/pjotrp/bioinformatics) which we drew up and was
+signed by over fifty bioinformaticians.
+
 * Webserver
 
 The main [[https://github.com/genenetwork/genenetwork2][GN2 webserver]] is built on [[http://flask.pocoo.org/][Python flask]] and this GN2 source
diff --git a/doc/README.org b/doc/README.org
index b3c78f29..0f56914a 100644
--- a/doc/README.org
+++ b/doc/README.org
@@ -6,7 +6,7 @@
    - [[#step-1-install-gnu-guix][Step 1: Install GNU Guix]]
    - [[#step-2-checkout-the-gn2-git-repositories][Step 2: Checkout the GN2 git repositories]]
    - [[#step-3-authorize-the-gn-guix-server][Step 3: Authorize the GN Guix server]]
-   - [[#step-4-install-and-run-gn2-][Step 4: Install and run GN2 ]]
+   - [[#step-4-install-and-run-gn2][Step 4: Install and run GN2]]
  - [[#run-mysql-server][Run MySQL server]]
  - [[#gn2-dependency-graph][GN2 Dependency Graph]]
  - [[#source-deployment][Source deployment]]
@@ -20,6 +20,7 @@
    - [[#importerror-no-module-named-jinja2][ImportError: No module named jinja2]]
    - [[#error-can-not-find-directory-homegn2_data][ERROR: can not find directory $HOME/gn2_data]]
    - [[#cant-run-a-module][Can't run a module]]
+   - [[#rpy2-error-show-now-found][Rpy2 error 'show' now found]]
  - [[#irc-session][IRC session]]
 
 * Introduction
@@ -28,7 +29,7 @@ If you want to understand the architecture of GN2 read
 [[Architecture.org]].  The rest of this document is mostly on deployment
 of GN2.
 
-Large system deployments can get very [[http://biobeat.org/gn2.svg][complex]]. In this document we
+Large system deployments can get very [[http://biogems.info/contrib/genenetwork/gn2.svg ][complex]]. In this document we
 explain the GeneNetwork version 2 (GN2) reproducible deployment system
 which is based on GNU Guix (see also Pjotr's [[https://github.com/pjotrp/guix-notes/blob/master/README.md][Guix-notes]]). The Guix
 system can be used to install GN with all its files and dependencies.
@@ -117,7 +118,7 @@ cd guix-gn-latest
 ** Step 3: Authorize the GN Guix server
 
 GN2 has its own GNU Guix binary distribution server. To trust it you have
-to add the following key 
+to add the following key
 
 #+begin_src scheme
 (public-key
@@ -136,9 +137,9 @@ guix archive --authorize
 
 and hit Ctrl-D.
 
-Now you can use the substitute server to install GN2 binaries. 
+Now you can use the substitute server to install GN2 binaries.
 
-** Step 4: Install and run GN2 
+** Step 4: Install and run GN2
 
 Since this is a quick and dirty install we are going to override the
 GNU Guix package path by pointing the package path to our repository:
@@ -208,7 +209,7 @@ https://s3.amazonaws.com/genenetwork2/db_webqtl_s.zip
 Check the md5sum.
 
 After installation inflate the database binary in the MySQL directory
-(this installation path is subject to change soon) 
+(this installation path is subject to change soon)
 
 : chown -R mysql:mysql db_webqtl_s/
 : chmod 700 db_webqtl_s/
@@ -242,7 +243,7 @@ change the settings in etc/default_settings.py to match your path.
 Graph of all runtime dependencies as installed by GNU Guix.
 
 #+ATTR_HTML: :title GN2_graph
-[[http://biobeat.org/gn2.svg]]
+http://biogems.info/contrib/genenetwork/gn2.svg
 
 * Source deployment
 
@@ -271,10 +272,10 @@ R_LIBS_SITE are set) from the information given by guix:
 Inside the repository:
 
 : cd genenetwork2
-: ./bin/genenetwork2 
+: ./bin/genenetwork2
 
-Will fire up your local repo http://localhost:5003/ using the  
-settings in ./etc/default_settings.py. These settings may 
+Will fire up your local repo http://localhost:5003/ using the
+settings in ./etc/default_settings.py. These settings may
 not reflect your system. To override settings create your own from a copy of
 default_settings.py and pass it into GN2 with
 
@@ -348,7 +349,7 @@ Make dirs
 
 Add users
 
-: adduser nobody ; addgroup nobody 
+: adduser nobody ; addgroup nobody
 
 Run nginx
 
@@ -392,6 +393,12 @@ Make a note of the paths with
 ./pre-inst-env guix package --search-paths
 #+end_src bash
 
+or this should also work if guix is installed
+
+#+begin_src bash
+guix package --search-paths
+#+end_src bash
+
 After setting the paths for the server
 
 #+begin_src bash
@@ -413,7 +420,7 @@ genenetwork2
 will start the default server which listens on port 5003, i.e.,
 http://localhost:5003/.
 
-OK, we are where we were before with step 4. Only difference is that we 
+OK, we are where we were before with step 4. Only difference is that we
 used our own compiled guix server.
 
 * Trouble shooting
@@ -433,7 +440,7 @@ On one system:
 : export R_LIBS_SITE="$HOME/.guix-profile/site-library/"
 : export GEM_PATH="$HOME/.guix-profile/lib/ruby/gems/2.2.0"
 
-and perhaps a few more. 
+and perhaps a few more.
 ** ERROR: can not find directory $HOME/gn2_data
 
 The default settings file looks in your $HOME/gn2_data. Since these
@@ -447,6 +454,21 @@ In rare cases, development modules are not brought in with Guix
 because no source code is available. This can lead to missing modules
 on a running server. Please check with the authors when a module
 is missing.
+** Rpy2 error 'show' now found
+
+This error
+
+: __show = rpy2.rinterface.baseenv.get("show")
+: LookupError: 'show' not found
+
+means that R was updated in your path, and that Rpy2 needs to be
+recompiled against this R - don't you love informative messages?
+
+In our case it means that GN's PYTHONPATH is not in sync with
+R_LIBS_SITE. Please check your GNU Guix GN2 installation paths,
+you man need to reinstall. Note that this may be the point you
+may want to start using profiles (see profile section).
+
 * IRC session
 
 Here an IRC session where we installed GN2 from scratch using GNU Guix
@@ -466,7 +488,7 @@ and a download of the test database.
 <user01> set to the ones in ~/.guix-profile/
 <pjotrp> good, and you are in gn-latest-guix repo  [07:06]
 <user01> yep  [07:07]
-<pjotrp> git log shows 
+<pjotrp> git log shows
 
 Author: David Thompson <dthompson2@worcester.edu>
 Date:   Sun Mar 27 21:20:19 2016 -0400
@@ -488,7 +510,7 @@ genenetwork2-files-small        1.0     out ../guix-bioinformatics/gn/packages/g
 <user01> hah, I don't have screen installed yet  [07:11]
 <pjotrp> comes with guix ;)  [07:12]
 <pjotrp> no worries, you can run it any way you want
-<pjotrp> $HOME/.guix-profile/bin/guix-daemon --build-users-group=guixbuild 
+<pjotrp> $HOME/.guix-profile/bin/guix-daemon --build-users-group=guixbuild
 <user01> then something's weird, because it says I don't have it
 <pjotrp> oh, you need to install it first  [07:13]
 <pjotrp> guix package -A screen
@@ -546,11 +568,11 @@ The following derivations would be built:
 <pjotrp> https://github.com/pjotrp/guix-notes/blob/master/REPRODUCIBLE.org
 <pjotrp> this is exactly what we are doing now
 <user01> alrighty  [07:35]
-<pjotrp> To see if a remote server has a guix server running it should respond 
+<pjotrp> To see if a remote server has a guix server running it should respond
                                                                         [07:36]
 <pjotrp> lynx http://guix.genenetwork.org:8080 --dump
 <pjotrp> Resource not found: /
-<pjotrp> 
+<pjotrp>
 <pjotrp> you see that?
 <user01> yes  [07:37]
 <pjotrp> good. The main hydra server is too slow. So on my laptop I forced
@@ -558,7 +580,7 @@ The following derivations would be built:
 <pjotrp> env GUIX_PACKAGE_PATH=../guix-bioinformatics/ ./pre-inst-env guix
          package -i genenetwork2 --dry-run
          --substitute-urls="http://mirror.hydra.gnu.org"
-<pjotrp> 
+<pjotrp>
 <pjotrp> the list looks the same to me  [07:40]
 <user01> me too
 <pjotrp> note that some packages will be built and some downloaded, right?
@@ -688,7 +710,7 @@ The following derivations would be built:
 <pjotrp> everything should be pre-built from guix.genenetwork.org
 <pjotrp> you are downloading?
 <user02> yes  [09:15]
-<pjotrp> cool. Maybe an idea to set up a server 
+<pjotrp> cool. Maybe an idea to set up a server
 <pjotrp> for your own use
 <user02> Stuck at downloading preprocesscore
 <pjotrp> should not  [09:24]
@@ -735,7 +757,7 @@ The following derivations would be built:
 <pjotrp> should be at
          /gnu/store/y1f3r2xs3fhyadd46nd2aqbr2p9qv2ra-r-biocpreprocesscore-1.32.0
                                                                         [09:33]
-<pjotrp> 
+<pjotrp>
 <user03> pjotrp: Possibly we should use the archive utility of Guix to do
         deployment to avoid such out-of-sync differences :)  [09:34]
 <pjotrp> maybe. I did not get archive to update profiles properly  [09:37]
@@ -802,7 +824,7 @@ The following derivations would be built:
 <pjotrp> but do not checkout that genetwork2_diet
 <pjotrp> we reverted to the main tree
 <pjotrp> clone git@github.com:genenetwork/genenetwork2.git  [09:53]
-<pjotrp> instead and checkout the staging branch 
+<pjotrp> instead and checkout the staging branch
 <pjotrp> that is effectively my branch  [09:54]
 <pjotrp> when that is done you should be able to fire up the webserver from
          there  [09:55]
@@ -825,7 +847,7 @@ The following derivations would be built:
 <user01> yep
 <pjotrp> that can also run on remote files over ssh
 <pjotrp> that's an alternative
-<pjotrp> kudos for using emacs :), wdyt user03 
+<pjotrp> kudos for using emacs :), wdyt user03
 <user02> 79 minutes to go downloading the db
 <pjotrp> user02: sorry about that  [09:59]
 <pjotrp> it is 2GB
@@ -850,7 +872,7 @@ The following derivations would be built:
          --substitute-urls="http://guix.genenetwork.org:8080"   [10:08]
 <pjotrp> elixir  1.2.3   out
          ../guix-bioinformatics/gn/packages/elixir.scm:31:2
-<pjotrp> 
+<pjotrp>
 <pjotrp> I am building it on guix.genenetwork.org right now  [10:09]
 <user01> nice  [10:10]
 #+end_src
diff --git a/doc/testing.org b/doc/testing.org
new file mode 100644
index 00000000..1d5cc8b8
--- /dev/null
+++ b/doc/testing.org
@@ -0,0 +1,43 @@
+#+TITLE: Testing GN2
+
+* Table of Contents                                                     :TOC:
+ - [[#introduction][Introduction]]
+ - [[#run-tests][Run tests]]
+   - [[#setup][Setup]]
+   - [[#running][Running]]
+
+* Introduction
+
+For integration testing we currently use the brilliant Ruby Mechanize
+gem against the small database; a setup we call mechanical Rob because
+it emulates someone clicking through the website and checking results.
+
+These scripts invoke calls to a running webserver and test the
+response.  If a page changes or is broken tests will break and we are
+informed.  In principle, Mechanical Rob is run before code merges are
+committed to the main server.
+
+In the future we may move to Python mechanize - it'll be easy to mix
+the Ruby and Python versions.
+
+* Run tests
+
+** Setup
+
+Mechanize is not yet included in Guix deployment.
+
+
+** Running
+
+Run the tests from the root of the genenetwork2 source tree as, for
+example,
+
+:  ./bin/test-website http://localhost:5003/ (default)
+
+If you are using the small deployment database you can use
+
+:  ./bin/test-website --skip -n
+
+To run individual tests on localhost you can do
+
+:  ruby -Itest -Itest/lib test/lib/mapping.rb --name="/Mapping/"