From 7cc37bf2efba6873fccd0f1756c89d25400afd47 Mon Sep 17 00:00:00 2001
From: Pjotr Prins
Date: Fri, 9 Sep 2016 08:34:36 +0200
Subject: Doc: note on guix paths

---
 doc/README.org | 46 ++++++++++++++++++++++++++--------------------
 1 file changed, 26 insertions(+), 20 deletions(-)

(limited to 'doc')
diff --git a/doc/README.org b/doc/README.org
index b3c78f29..aa05654f 100644
--- a/doc/README.org
+++ b/doc/README.org
@@ -117,7 +117,7 @@ cd guix-gn-latest
 ** Step 3: Authorize the GN Guix server
 
 GN2 has its own GNU Guix binary distribution server. To trust it you have
-to add the following key 
+to add the following key
 
 #+begin_src scheme
 (public-key
@@ -136,9 +136,9 @@ guix archive --authorize
 
 and hit Ctrl-D.
 
-Now you can use the substitute server to install GN2 binaries. 
+Now you can use the substitute server to install GN2 binaries.
 
-** Step 4: Install and run GN2 
+** Step 4: Install and run GN2
 
 Since this is a quick and dirty install we are going to override the
 GNU Guix package path by pointing the package path to our repository:
@@ -208,7 +208,7 @@ https://s3.amazonaws.com/genenetwork2/db_webqtl_s.zip
 Check the md5sum.
 
 After installation inflate the database binary in the MySQL directory
-(this installation path is subject to change soon) 
+(this installation path is subject to change soon)
 
 : chown -R mysql:mysql db_webqtl_s/
 : chmod 700 db_webqtl_s/
@@ -271,10 +271,10 @@ R_LIBS_SITE are set) from the information given by guix:
 Inside the repository:
 
 : cd genenetwork2
-: ./bin/genenetwork2 
+: ./bin/genenetwork2
 
-Will fire up your local repo http://localhost:5003/ using the  
-settings in ./etc/default_settings.py. These settings may 
+Will fire up your local repo http://localhost:5003/ using the
+settings in ./etc/default_settings.py. These settings may
 not reflect your system. To override settings create your own from a copy of
 default_settings.py and pass it into GN2 with
 
@@ -348,7 +348,7 @@ Make dirs
 
 Add users
 
-: adduser nobody ; addgroup nobody 
+: adduser nobody ; addgroup nobody
 
 Run nginx
 
@@ -392,6 +392,12 @@ Make a note of the paths with
 ./pre-inst-env guix package --search-paths
 #+end_src bash
 
+or this should also work if guix is installed
+
+#+begin_src bash
+guix package --search-paths
+#+end_src bash
+
 After setting the paths for the server
 
 #+begin_src bash
@@ -413,7 +419,7 @@ genenetwork2
 will start the default server which listens on port 5003, i.e.,
 http://localhost:5003/.
 
-OK, we are where we were before with step 4. Only difference is that we 
+OK, we are where we were before with step 4. Only difference is that we
 used our own compiled guix server.
 
 * Trouble shooting
@@ -433,7 +439,7 @@ On one system:
 : export R_LIBS_SITE="$HOME/.guix-profile/site-library/"
 : export GEM_PATH="$HOME/.guix-profile/lib/ruby/gems/2.2.0"
 
-and perhaps a few more. 
+and perhaps a few more.
 ** ERROR: can not find directory $HOME/gn2_data
 
 The default settings file looks in your $HOME/gn2_data. Since these
@@ -466,7 +472,7 @@ and a download of the test database.
 <user01> set to the ones in ~/.guix-profile/
 <pjotrp> good, and you are in gn-latest-guix repo  [07:06]
 <user01> yep  [07:07]
-<pjotrp> git log shows 
+<pjotrp> git log shows
 
 Author: David Thompson <dthompson2@worcester.edu>
 Date:   Sun Mar 27 21:20:19 2016 -0400
@@ -488,7 +494,7 @@ genenetwork2-files-small        1.0     out ../guix-bioinformatics/gn/packages/g
 <user01> hah, I don't have screen installed yet  [07:11]
 <pjotrp> comes with guix ;)  [07:12]
 <pjotrp> no worries, you can run it any way you want
-<pjotrp> $HOME/.guix-profile/bin/guix-daemon --build-users-group=guixbuild 
+<pjotrp> $HOME/.guix-profile/bin/guix-daemon --build-users-group=guixbuild
 <user01> then something's weird, because it says I don't have it
 <pjotrp> oh, you need to install it first  [07:13]
 <pjotrp> guix package -A screen
@@ -546,11 +552,11 @@ The following derivations would be built:
 <pjotrp> https://github.com/pjotrp/guix-notes/blob/master/REPRODUCIBLE.org
 <pjotrp> this is exactly what we are doing now
 <user01> alrighty  [07:35]
-<pjotrp> To see if a remote server has a guix server running it should respond 
+<pjotrp> To see if a remote server has a guix server running it should respond
                                                                         [07:36]
 <pjotrp> lynx http://guix.genenetwork.org:8080 --dump
 <pjotrp> Resource not found: /
-<pjotrp> 
+<pjotrp>
 <pjotrp> you see that?
 <user01> yes  [07:37]
 <pjotrp> good. The main hydra server is too slow. So on my laptop I forced
@@ -558,7 +564,7 @@ The following derivations would be built:
 <pjotrp> env GUIX_PACKAGE_PATH=../guix-bioinformatics/ ./pre-inst-env guix
          package -i genenetwork2 --dry-run
          --substitute-urls="http://mirror.hydra.gnu.org"
-<pjotrp> 
+<pjotrp>
 <pjotrp> the list looks the same to me  [07:40]
 <user01> me too
 <pjotrp> note that some packages will be built and some downloaded, right?
@@ -688,7 +694,7 @@ The following derivations would be built:
 <pjotrp> everything should be pre-built from guix.genenetwork.org
 <pjotrp> you are downloading?
 <user02> yes  [09:15]
-<pjotrp> cool. Maybe an idea to set up a server 
+<pjotrp> cool. Maybe an idea to set up a server
 <pjotrp> for your own use
 <user02> Stuck at downloading preprocesscore
 <pjotrp> should not  [09:24]
@@ -735,7 +741,7 @@ The following derivations would be built:
 <pjotrp> should be at
          /gnu/store/y1f3r2xs3fhyadd46nd2aqbr2p9qv2ra-r-biocpreprocesscore-1.32.0
                                                                         [09:33]
-<pjotrp> 
+<pjotrp>
 <user03> pjotrp: Possibly we should use the archive utility of Guix to do
         deployment to avoid such out-of-sync differences :)  [09:34]
 <pjotrp> maybe. I did not get archive to update profiles properly  [09:37]
@@ -802,7 +808,7 @@ The following derivations would be built:
 <pjotrp> but do not checkout that genetwork2_diet
 <pjotrp> we reverted to the main tree
 <pjotrp> clone git@github.com:genenetwork/genenetwork2.git  [09:53]
-<pjotrp> instead and checkout the staging branch 
+<pjotrp> instead and checkout the staging branch
 <pjotrp> that is effectively my branch  [09:54]
 <pjotrp> when that is done you should be able to fire up the webserver from
          there  [09:55]
@@ -825,7 +831,7 @@ The following derivations would be built:
 <user01> yep
 <pjotrp> that can also run on remote files over ssh
 <pjotrp> that's an alternative
-<pjotrp> kudos for using emacs :), wdyt user03 
+<pjotrp> kudos for using emacs :), wdyt user03
 <user02> 79 minutes to go downloading the db
 <pjotrp> user02: sorry about that  [09:59]
 <pjotrp> it is 2GB
@@ -850,7 +856,7 @@ The following derivations would be built:
          --substitute-urls="http://guix.genenetwork.org:8080"   [10:08]
 <pjotrp> elixir  1.2.3   out
          ../guix-bioinformatics/gn/packages/elixir.scm:31:2
-<pjotrp> 
+<pjotrp>
 <pjotrp> I am building it on guix.genenetwork.org right now  [10:09]
 <user01> nice  [10:10]
 #+end_src
-- 
cgit 1.4.1


From 0621666fba97b3646271bb037b6c43503e981abf Mon Sep 17 00:00:00 2001
From: Pjotr Prins
Date: Sat, 10 Sep 2016 10:03:44 +0200
Subject: Doc: Rpy2 note

---
 doc/README.org | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

(limited to 'doc')

diff --git a/doc/README.org b/doc/README.org
index aa05654f..2b27d562 100644
--- a/doc/README.org
+++ b/doc/README.org
@@ -6,7 +6,7 @@
    - [[#step-1-install-gnu-guix][Step 1: Install GNU Guix]]
    - [[#step-2-checkout-the-gn2-git-repositories][Step 2: Checkout the GN2 git repositories]]
    - [[#step-3-authorize-the-gn-guix-server][Step 3: Authorize the GN Guix server]]
-   - [[#step-4-install-and-run-gn2-][Step 4: Install and run GN2 ]]
+   - [[#step-4-install-and-run-gn2][Step 4: Install and run GN2]]
  - [[#run-mysql-server][Run MySQL server]]
  - [[#gn2-dependency-graph][GN2 Dependency Graph]]
  - [[#source-deployment][Source deployment]]
@@ -20,6 +20,7 @@
    - [[#importerror-no-module-named-jinja2][ImportError: No module named jinja2]]
    - [[#error-can-not-find-directory-homegn2_data][ERROR: can not find directory $HOME/gn2_data]]
    - [[#cant-run-a-module][Can't run a module]]
+   - [[#rpy2-error-show-now-found][Rpy2 error 'show' now found]]
  - [[#irc-session][IRC session]]
 
 * Introduction
@@ -453,6 +454,21 @@ In rare cases, development modules are not brought in with Guix
 because no source code is available. This can lead to missing modules
 on a running server. Please check with the authors when a module
 is missing.
+** Rpy2 error 'show' now found
+
+This error
+
+: __show = rpy2.rinterface.baseenv.get("show")
+: LookupError: 'show' not found
+
+means that R was updated in your path, and that Rpy2 needs to be
+recompiled against this R - don't you love informative messages?
+
+In our case it means that GN's PYTHONPATH is not in sync with
+R_LIBS_SITE. Please check your GNU Guix GN2 installation paths,
+you man need to reinstall. Note that this may be the point you
+may want to start using profiles (see profile section).
+
 * IRC session
 
 Here an IRC session where we installed GN2 from scratch using GNU Guix
-- 
cgit 1.4.1


From ae1a7f0c8bed6b1a3445a4fac26a578851715629 Mon Sep 17 00:00:00 2001
From: Pjotr Prins
Date: Sat, 24 Sep 2016 06:59:37 +0000
Subject: Doc: Section on reproducibility      - fixed SVG URLs

---
 doc/Architecture.org | 61 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 doc/README.org       |  4 ++--
 2 files changed, 63 insertions(+), 2 deletions(-)

(limited to 'doc')

diff --git a/doc/Architecture.org b/doc/Architecture.org
index 04e05e40..ec56f9a9 100644
--- a/doc/Architecture.org
+++ b/doc/Architecture.org
@@ -2,6 +2,7 @@
 
 * Table of Contents                                                     :TOC:
  - [[#introduction][Introduction]]
+ - [[#reproducibility-and-interoperability][Reproducibility and interoperability]]
  - [[#webserver][Webserver]]
  - [[#gnserver-rest][GnServer (REST)]]
  - [[#gnexec][GnExec]]
@@ -14,6 +15,66 @@
 This document describes the architecture of GN2. Because GN2 is
 evolving, only a high-level overview is given here.
 
+* Reproducibility and interoperability
+
+Reproducible data analysis and software interoperability should be key
+goals for any system that aims to bring research groups
+together. These goals are increasingly relevant with growing data
+sizes and increasingly complex analysis pipelines. Rigor,
+reproducibility, and robustness starts with data that should abide by
+Findable, Accessible, Interoperable, and Re-usable (FAIR) principles
+(see the Wilkinson Nature paper on [[http://www.nature.com/articles/sdata201618][FAIR Guiding Principles for
+scientific data management and stewardship]]).
+
+With GN2 we are solving these requirements by assigning unique
+identifiers (cryptographic HASH values calculated over immutable data
+content and including that value in the file names or directories) and
+making these identifiers available through web interfaces (e.g.,
+through a REST API). This means that at any point in the future the
+exact same data can be retrieved using a known non-changeable
+identifier (see also
+https://github.com/pjotrp/genenetwork2/blob/staging/doc/submit-data.org).
+
+Synchronisation, integrity checking and backups become trivial using
+these HASH values, even for very large datasets. Since everything is
+managed at the file system level we can also use Unix authorisation
+systems. HIPAA compliancy is achieved by using HASH values and
+bringing the software into the controlled HIPAA environment.
+
+In the context of GeneNetwork we are using git and github for version
+control of software source code
+(https://github.com/genenetwork/). Software can be treated just like
+data, i.e., git uses HASH identifiers to retrieve specific versions of
+source. I.e., versions of source code are identifiable and retrievable
+and can be matched with data into an analysis pipeline. The
+combination of software and data, again, makes a unique HASH value
+which identifies the analysis pipe-line.
+
+For combining runnable software and data into an analysis pipeline we
+use GNU Guix which, yet again, turns everything into a unique HASH
+value which allows for exact retrieval and reproducibility. Not only
+that, GNU Guix gives control of the software and all its dependencies,
+use GNU Guix which, yet again, turns everything into a unique HASH
+value which allows for exact retrieval and reproducibility. Not only
+that, GNU Guix gives control of the software and all its dependencies,
+calculating a HASH value for all dependencies, all the way down to
+versions of R, BLAS and glibc. This way of packaging software
+ascertains that identical software pipelines are easily setup on
+different system or in the Cloud. Meaning that everyone ends up using
+the exact same combination of software versions in a pipeline.
+
+For software development we use GNU Guix for integration testing and
+deployment (described in JOSS paper). We also use automated test tools
+(Ruby mechanize) for integration testing of the web services and we
+use unit testing of all backend services. All our software source code
+is published as `free and open source software' (FOSS) which means
+that anyone can view code on github, comment on it, or even
+contribute. GeneNetwork is becoming increasingly modular and has a
+growing number of contributers who, in principle, abide by the THE
+SMALL TOOLS MANIFESTO FOR BIOINFORMATICS which we wrote up
+(https://github.com/pjotrp/bioinformatics) and was signed by 51
+bioinformaticians.
+
 * Webserver
 
 The main [[https://github.com/genenetwork/genenetwork2][GN2 webserver]] is built on [[http://flask.pocoo.org/][Python flask]] and this GN2 source
diff --git a/doc/README.org b/doc/README.org
index 2b27d562..0f56914a 100644
--- a/doc/README.org
+++ b/doc/README.org
@@ -29,7 +29,7 @@ If you want to understand the architecture of GN2 read
 [[Architecture.org]].  The rest of this document is mostly on deployment
 of GN2.
 
-Large system deployments can get very [[http://biobeat.org/gn2.svg][complex]]. In this document we
+Large system deployments can get very [[http://biogems.info/contrib/genenetwork/gn2.svg ][complex]]. In this document we
 explain the GeneNetwork version 2 (GN2) reproducible deployment system
 which is based on GNU Guix (see also Pjotr's [[https://github.com/pjotrp/guix-notes/blob/master/README.md][Guix-notes]]). The Guix
 system can be used to install GN with all its files and dependencies.
@@ -243,7 +243,7 @@ change the settings in etc/default_settings.py to match your path.
 Graph of all runtime dependencies as installed by GNU Guix.
 
 #+ATTR_HTML: :title GN2_graph
-[[http://biobeat.org/gn2.svg]]
+http://biogems.info/contrib/genenetwork/gn2.svg
 
 * Source deployment
 
-- 
cgit 1.4.1