aboutsummaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
Diffstat (limited to 'doc')
-rw-r--r--doc/API_readme.md169
-rw-r--r--doc/GUIX-Reproducible-from-source.org355
-rw-r--r--doc/README.org907
-rw-r--r--doc/database.org165
-rw-r--r--doc/development.org98
-rw-r--r--doc/elasticsearch.org247
-rw-r--r--doc/testing.org66
7 files changed, 1231 insertions, 776 deletions
diff --git a/doc/API_readme.md b/doc/API_readme.md
new file mode 100644
index 00000000..be6668dc
--- /dev/null
+++ b/doc/API_readme.md
@@ -0,0 +1,169 @@
+# API Query Documentation #
+---
+# Fetching Dataset/Trait info/data #
+---
+## Fetch Species List ##
+
+To get a list of species with data available in GN (and their associated names and ids):
+```
+curl http://genenetwork.org/api/v_pre1/species
+[ { "FullName": "Mus musculus", "Id": 1, "Name": "mouse", "TaxonomyId": 10090 }, ... { "FullName": "Populus trichocarpa", "Id": 10, "Name": "poplar", "TaxonomyId": 3689 } ]
+```
+
+Or to get a single species info:
+```
+curl http://genenetwork.org/api/v_pre1/species/mouse
+```
+OR
+```
+curl http://genenetwork.org/api/v_pre1/species/mouse.json
+```
+
+*For all queries where the last field is a user-specified name/ID, there will be the option to append a file format type. Currently there is only JSON (and it will default to JSON if none is provided), but other formats will be added later*
+
+## Fetch Groups/RISets ##
+
+This query can optionally filter by species:
+
+```
+curl http://genenetwork.org/api/v_pre1/groups (for all species)
+```
+OR
+```
+curl http://genenetwork.org/api/v_pre1/groups/mouse (for just mouse groups/RISets)
+[ { "DisplayName": "BXD", "FullName": "BXD RI Family", "GeneticType": "riset", "Id": 1, "MappingMethodId": "1", "Name": "BXD", "SpeciesId": 1, "public": 2 }, ... { "DisplayName": "AIL LGSM F34 and F39-43 (GBS)", "FullName": "AIL LGSM F34 and F39-43 (GBS)", "GeneticType": "intercross", "Id": 72, "MappingMethodId": "2", "Name": "AIL-LGSM-F34-F39-43-GBS", "SpeciesId": 1, "public": 2 } ]
+```
+
+## Fetch Genotypes for Group/RISet ##
+```
+curl http://genenetwork.org/api/v_pre1/genotypes/bimbam/BXD
+curl http://genenetwork.org/api/v_pre1/genotypes/BXD.bimbam
+```
+Returns a group's genotypes in one of several formats - bimbam, rqtl2, or geno (a format used by qtlreaper which is just a CSV file consisting of marker positions and genotypes)
+
+Rqtl2 genotype queries can also include the dataset name and will return a zip of the genotypes, phenotypes, and gene map (marker names/positions). For example:
+```
+curl http://genenetwork.org/api/v_pre1/genotypes/rqtl2/BXD/HC_M2_0606_P.zip
+```
+
+## Fetch Datasets ##
+```
+curl http://genenetwork.org/api/v_pre1/datasets/bxd
+```
+OR
+```
+curl http://genenetwork.org/api/v_pre1/datasets/mouse/bxd
+[ { "AvgID": 1, "CreateTime": "Fri, 01 Aug 2003 00:00:00 GMT", "DataScale": "log2", "FullName": "UTHSC/ETHZ/EPFL BXD Liver Polar Metabolites Extraction A, CD Cohorts (Mar 2017) log2", "Id": 1, "Long_Abbreviation": "BXDMicroArray_ProbeSet_August03", "ProbeFreezeId": 3, "ShortName": "Brain U74Av2 08/03 MAS5", "Short_Abbreviation": "Br_U_0803_M", "confidentiality": 0, "public": 0 }, ... { "AvgID": 3, "CreateTime": "Tue, 14 Aug 2018 00:00:00 GMT", "DataScale": "log2", "FullName": "EPFL/LISP BXD CD Liver Affy Mouse Gene 1.0 ST (Aug18) RMA", "Id": 859, "Long_Abbreviation": "EPFLMouseLiverCDRMAApr18", "ProbeFreezeId": 181, "ShortName": "EPFL/LISP BXD CD Liver Affy Mouse Gene 1.0 ST (Aug18) RMA", "Short_Abbreviation": "EPFLMouseLiverCDRMA0818", "confidentiality": 0, "public": 1 } ]
+```
+(I added the option to specify species just in case we end up with the same group name across multiple species at some point, though it's currently unnecessary)
+
+## Fetch Individual Dataset Info ##
+### For mRNA Assay/"ProbeSet" ###
+
+```
+curl http://genenetwork.org/api/v_pre1/dataset/HC_M2_0606_P
+```
+OR
+```
+curl http://genenetwork.org/api/v_pre1/dataset/bxd/HC_M2_0606_P```
+{ "confidential": 0, "data_scale": "log2", "dataset_type": "mRNA expression", "full_name": "Hippocampus Consortium M430v2 (Jun06) PDNN", "id": 112, "name": "HC_M2_0606_P", "public": 2, "short_name": "Hippocampus M430v2 BXD 06/06 PDNN", "tissue": "Hippocampus mRNA", "tissue_id": 9 }
+```
+(This also has the option to specify group/riset)
+
+### For "Phenotypes" (basically non-mRNA Expression; stuff like weight, sex, etc) ###
+For these traits, the query fetches publication info and takes the group and phenotype 'ID' as input. For example:
+```
+curl http://genenetwork.org/api/v_pre1/dataset/bxd/10001
+{ "dataset_type": "phenotype", "description": "Central nervous system, morphology: Cerebellum weight, whole, bilateral in adults of both sexes [mg]", "id": 10001, "name": "CBLWT2", "pubmed_id": 11438585, "title": "Genetic control of the mouse cerebellum: identification of quantitative trait loci modulating size and architecture", "year": "2001" }
+```
+
+## Fetch Sample Data for Dataset ##
+```
+curl http://genenetwork.org/api/v_pre1/sample_data/HSNIH-PalmerPublish.csv
+```
+
+Returns a CSV file with sample/strain names as the columns and trait IDs as rows
+
+## Fetch Sample Data for Single Trait ##
+```
+curl http://genenetwork.org/api/v_pre1/sample_data/HC_M2_0606_P/1436869_at
+[ { "data_id": 23415463, "sample_name": "129S1/SvImJ", "sample_name_2": "129S1/SvImJ", "se": 0.123, "value": 8.201 }, { "data_id": 23415463, "sample_name": "A/J", "sample_name_2": "A/J", "se": 0.046, "value": 8.413 }, { "data_id": 23415463, "sample_name": "AKR/J", "sample_name_2": "AKR/J", "se": 0.134, "value": 8.856 }, ... ]
+```
+
+## Fetch Trait List for Dataset ##
+```
+curl http://genenetwork.org/api/v_pre1/traits/HXBBXHPublish.json
+[ { "Additive": 0.0499967532467532, "Id": 10001, "LRS": 16.2831307029479, "Locus": "rs106114574", "PhenotypeId": 1449, "PublicationId": 319, "Sequence": 1 }, ... ]
+```
+
+Both JSON and CSV formats can be specified, with JSON as default. There is also an optional "ids_only" and "names_only" parameter that will only return a list of trait IDs or names, respectively.
+
+## Fetch Trait Info (Name, Description, Location, etc) ##
+### For mRNA Expression/"ProbeSet" ###
+```
+curl http://genenetwork.org/api/v_pre1/trait/HC_M2_0606_P/1436869_at
+{ "additive": -0.214087568058076, "alias": "HHG1; HLP3; HPE3; SMMCI; Dsh; Hhg1", "chr": "5", "description": "sonic hedgehog (hedgehog)", "id": 99602, "locus": "rs8253327", "lrs": 12.7711275309832, "mb": 28.457155, "mean": 9.27909090909091, "name": "1436869_at", "p_value": 0.306, "se": null, "symbol": "Shh" }
+```
+
+### For "Phenotypes" ###
+For phenotypes this just gets the max LRS, its location, and additive effect (as calculated by qtlreaper)
+
+Since each group/riset only has one phenotype "dataset", this query takes either the group/riset name or the group/riset name + "Publish" (for example "BXDPublish", which is the dataset name in the DB) as input
+```
+curl http://genenetwork.org/api/v_pre1/trait/BXD/10001
+{ "additive": 2.39444435069444, "id": 4, "locus": "rs48756159", "lrs": 13.4974911471087 }
+```
+
+---
+
+# Analyses #
+---
+## Mapping ##
+Currently two mapping tools can be used - GEMMA and R/qtl. qtlreaper will be added later with Christian Fischer's RUST implementation - https://github.com/chfi/rust-qtlreaper
+
+Each method's query takes the following parameters respectively (more will be added):
+### GEMMA ###
+* trait_id (*required*) - ID for trait being mapped
+* db (*required*) - DB name for trait above (Short_Abbreviation listed when you query for datasets)
+* use_loco - Whether to use LOCO (leave one chromosome out) method (default = false)
+* maf - minor allele frequency (default = 0.01)
+
+Example query:
+```
+curl http://genenetwork.org/api/v_pre1/mapping?trait_id=10015&db=BXDPublish&method=gemma&use_loco=true
+```
+
+### R/qtl ###
+(See the R/qtl guide for information on some of these options - http://www.rqtl.org/manual/qtl-manual.pdf)
+* trait_id (*required*) - ID for trait being mapped
+* db (*required*) - DB name for trait above (Short_Abbreviation listed when you query for datasets)
+* rqtl_method - hk (default) | ehk | em | imp | mr | mr-imp | mr-argmax ; Corresponds to the "method" option for the R/qtl scanone function.
+* rqtl_model - normal (default) | binary | 2-part | np ; corresponds to the "model" option for the R/qtl scanone function
+* num_perm - number of permutations; 0 by default
+* control_marker - Name of marker to use as control; this relies on the user knowing the name of the marker they want to use as a covariate
+* interval_mapping - Whether to use interval mapping; "false" by default
+* pair_scan - *NYI*
+
+Example query:
+```
+curl http://genenetwork.org/api/v_pre1/mapping?trait_id=1418701_at&db=HC_M2_0606_P&method=rqtl&num_perm=100
+```
+
+Some combinations of methods/models may not make sense. The R/qtl manual should be referred to for any questions on its use (specifically the scanone function in this case)
+
+## Calculate Correlation ##
+Currently only Sample and Tissue correlations are implemented
+
+This query currently takes the following parameters (though more will be added):
+* trait_id (*required*) - ID for trait used for correlation
+* db (*required*) - DB name for the trait above (this is the Short_Abbreviation listed when you query for datasets)
+* target_db (*required*) - Target DB name to be correlated against
+* type - sample (default) | tissue
+* method - pearson (default) | spearman
+* return - Number of results to return (default = 500)
+
+Example query:
+```
+curl http://genenetwork.org/api/v_pre1/correlation?trait_id=1427571_at&db=HC_M2_0606_P&target_db=BXDPublish&type=sample&return_count=100
+[ { "#_strains": 6, "p_value": 0.004804664723032055, "sample_r": -0.942857142857143, "trait": 20511 }, { "#_strains": 6, "p_value": 0.004804664723032055, "sample_r": -0.942857142857143, "trait": 20724 }, { "#_strains": 12, "p_value": 1.8288943424888848e-05, "sample_r": -0.9233615170820528, "trait": 13536 }, { "#_strains": 7, "p_value": 0.006807187408935392, "sample_r": 0.8928571428571429, "trait": 10157 }, { "#_strains": 7, "p_value": 0.006807187408935392, "sample_r": -0.8928571428571429, "trait": 20392 }, ... ]
+```
diff --git a/doc/GUIX-Reproducible-from-source.org b/doc/GUIX-Reproducible-from-source.org
index 4399ea26..19e4d14f 100644
--- a/doc/GUIX-Reproducible-from-source.org
+++ b/doc/GUIX-Reproducible-from-source.org
@@ -2,19 +2,187 @@
* Table of Contents :TOC:
- [[#introduction][Introduction]]
- - [[#binary-deployment][Binary deployment]]
+ - [[#binary-deployment-through-gnu-guix][Binary deployment through GNU Guix]]
+ - [[#quick-installation-recipe][Quick installation recipe]]
+ - [[#step-1-install-gnu-guix][Step 1: Install GNU Guix]]
+ - [[#step-2-checkout-the-gn2-git-repositories][Step 2: Checkout the GN2 git repositories]]
+ - [[#step-3-authorize-the-gn-guix-server][Step 3: Authorize the GN Guix server]]
+ - [[#step-4-install-and-run-gn2][Step 4: Install and run GN2]]
- [[#from-source-deployment][From source deployment]]
- [[#create-archive][Create archive]]
+ - [[#source-deployment][Source deployment]]
+ - [[#run-your-own-copy-of-gn2][Run your own copy of GN2]]
+ - [[#set-up-nginx-port-forwarding][Set up nginx port forwarding]]
+ - [[#source-deployment-and-other-information-on-reproducibility][Source deployment and other information on reproducibility]]
+ - [[#update-to-recent-guix][Update to recent guix]]
+ - [[#install-gn2][Install GN2]]
+ - [[#run-gn2][Run GN2]]
* Introduction
Large system deployments tend to get very complex. In this document we
explain the GeneNetwork deployment system which is based on GNU Guix
-(see Pjotr's [[https://github.com/pjotrp/guix-notes/blob/master/README.md][Guix-notes]]).
+(see Pjotr's [[https://github.com/pjotrp/guix-notes/blob/master/README.md][Guix-notes]] and the main [[README.org]] doc).
-* Binary deployment
+* Binary deployment through GNU Guix
+** Quick installation recipe
-NYA (will go to README)
+This is a recipe for quick and dirty installation of GN2. For
+convenience everything is installed as root, though in reality only
+GNU Guix has to be installed as root. I tested this recipe on a fresh
+install of Debian 8.3.0 (in KVM) though it should work on any modern
+Linux distribution (including CentOS).
+
+Note that GN2 consists of an approx. 5 GB installation including
+database. If you use a virtual machine we recommend to use at least
+double.
+
+** Step 1: Install GNU Guix
+
+Fetch the GNU Guix binary from [[https://www.gnu.org/software/guix/download/][here]] (middle panel) and follow
+[[https://www.gnu.org/software/guix/manual/html_node/Binary-Installation.html][instructions]]. Essentially, download and unpack the tar ball (which
+creates directories in /gnu and /var/guix), add build users and group
+(Guix builds software as unpriviliged users) and run the Guix daemon
+after fixing the paths (also known as the 'profile').
+
+Once you have succeeded, you have to [[https://github.com/pjotrp/guix-notes/blob/master/INSTALL.org#set-the-key][set the key]] (getting permission
+to download binaries from the GNU server) and you should be able to
+install the hello package using binary packages (no building)
+
+#+begin_src bash
+export PATH=~/.guix-profile/bin:$PATH
+guix pull
+guix package -i hello --dry-run
+#+end_src
+
+Which should show something like
+
+: The following files would be downloaded:
+: /gnu/store/zby49aqfbd9w9br4l52mvb3y6f9vfv22-hello-2.10
+: ...
+#+end_src
+
+means binary installs. The actual installation command of 'hello' is
+
+#+begin_src bash
+guix package -i hello
+hello
+ Hello, world!
+#+end_src
+
+If you actually see things building it means that Guix is not yet
+properly installed and up-to-date, i.e., the key is missing or you
+need to do a 'guix pull'. Press Ctrl-C to interrupt.
+
+If you need more help we have another writeup in [[https://github.com/pjotrp/guix-notes/blob/master/INSTALL.org#binary-installation][guix-notes]]. To get
+rid of the locale warning see [[https://github.com/pjotrp/guix-notes/blob/master/INSTALL.org#set-locale][set-locale]].
+
+** Step 2: Checkout the GN2 git repositories
+
+To fixate the software dependency graph GN2 uses git repositories of
+Guix packages. First install git if it is missing
+
+#+begin_src bash
+guix package -i git
+export GIT_SSL_CAINFO=/etc/ssl/certs/ca-certificates.crt
+#+end_src
+
+check out the git repositories (gn-deploy branch)
+
+#+begin_src bash
+cd ~
+mkdir genenetwork
+cd genenetwork
+git clone --branch gn-deploy https://github.com/genenetwork/guix-bioinformatics
+git clone --branch gn-deploy --recursive https://github.com/genenetwork/guix guix-gn-deploy
+cd guix-gn-deploy
+#+end_src bash
+
+To test whether this is working try:
+
+#+begin_src bash
+#+end_src bash
+
+** Step 3: Authorize the GN Guix server
+
+GN2 has its own GNU Guix binary distribution server. To trust it you have
+to add the following key
+
+#+begin_src scheme
+(public-key
+ (ecc
+ (curve Ed25519)
+ (q #11217788B41ADC8D5B8E71BD87EF699C65312EC387752899FE9C888856F5C769#)
+ )
+)
+#+end_src
+
+by pasting it into the command
+
+#+begin_src bash
+guix archive --authorize
+#+end_src
+
+and hit Ctrl-D.
+
+Now you can use the substitute server to install GN2 binaries.
+
+** Step 4: Install and run GN2
+
+Since this is a quick and dirty install we are going to override the
+GNU Guix package path by pointing the package path to our repository:
+
+#+begin_src bash
+rm /root/.config/guix/latest
+ln -s ~/genenetwork/guix-gn-deploy/ /root/.config/guix/latest
+#+end_src
+
+Now check whether you can find the GN2 package with
+
+#+begin_src bash
+env GUIX_PACKAGE_PATH=~/genenetwork/guix-bioinformatics/ guix package -A genenetwork2
+ genenetwork2 2.0-a8fcff4 out gn/packages/genenetwork.scm:144:2
+#+end_src
+
+(ignore the source file newer then ... messages, this is caused by the
+/root/.config/guix/latest override).
+
+And install with
+
+#+begin_src bash
+env GUIX_PACKAGE_PATH=~/genenetwork/guix-bioinformatics/ \
+ guix package -i genenetwork2 \
+ --substitute-urls="http://guix.genenetwork.org"
+#+end_src
+
+Note: the order of the substitute url's may make a difference in speed
+(put the one first that is fastest for your location and time of day).
+
+Note: if your system starts building or gives an error it may well be
+Step 3 did not succeed. The installation should actually be smooth at
+this point and only do binary installs (no compiling).
+
+After installation you should be able to run genenetwork2 after updating
+the Guix suggested environment vars. Check the output of
+
+#+begin_src bash
+guix package --search-paths
+export PYTHONPATH="/root/.guix-profile/lib/python2.7/site-packages"
+export R_LIBS_SITE="/root/.guix-profile/site-library/"
+#+end_src
+
+and copy-paste the listed exports into the terminal before running:
+
+#+begin_src bash
+genenetwork2
+#+end_src
+
+It will complain that the database is missing. See the next section on
+running MySQL server for downloading and installing a MySQL GN2
+database. After installing the database restart genenetwork2 and point
+your browser at [[http://localhost:5003/]].
+
+End of the GN2 installation recipe!
* From source deployment
@@ -52,3 +220,182 @@ gn-stable-guix$ env GUIX_PACKAGE_PATH=../guix-bioinformatics ./pre-inst-env guix
* Create archive
: env GUIX_PACKAGE_PATH=../../genenetwork/guix-bioinformatics/ ./pre-inst-env guix archive --export -r genenetwork2 > guix_gn2-2.0-9e9475053.nar
+
+
+* Source deployment
+
+This section gives a more elaborate instruction for installing GN2
+from source.
+
+First execute above 4 steps:
+
+ - [[#step-1-install-gnu-guix][Step 1: Install GNU Guix]]
+ - [[#step-2-checkout-the-gn2-git-repositories][Step 2: Checkout the GN2 git repositories]]
+ - [[#step-3-authorize-the-gn-guix-server][Step 3: Authorize the GN Guix server]]
+ - [[#step-4-install-and-run-gn2-][Step 4: Install and run GN2 ]]
+
+
+** Run your own copy of GN2
+
+At some point you may want to fix the source code. Assuming you have
+Guix and Genenetwork2 installed (as described above) clone the GN2
+repository from https://github.com/genenetwork/genenetwork2.
+
+Copy-paste the paths into your terminal (mainly so PYTHON_PATH and
+R_LIBS_SITE are set) from the information given by guix:
+
+: guix package --search-paths
+
+Inside the repository:
+
+: cd genenetwork2
+: ./bin/genenetwork2
+
+Will fire up your local repo http://localhost:5003/ using the
+settings in ./etc/default_settings.py. These settings may
+not reflect your system. To override settings create your own from a copy of
+default_settings.py and pass it into GN2 with
+
+: ./bin/genenetwork2 $HOME/my_settings.py
+
+and everything *should* work (note the full path to the settings
+file). This way we develop against the exact same dependency graph of
+software.
+
+If something is not working, take a hint from the settings file
+that comes in the Guix installation. It sits in something like
+
+: cat ~/.guix-profile/lib/python2.7/site-packages/genenetwork2-2.0-py2.7.egg/etc/default_settings.py
+
+** Set up nginx port forwarding
+
+nginx can be used as a reverse proxy for GN2. For example, we want to
+expose GN2 on port 80 while it is running on port 5003. Essentially
+the configuration looks like
+
+#+begin_src js
+ server {
+ listen 80;
+ server_name test-gn2.genenetwork.org;
+ access_log logs/test-gn2.access.log;
+
+ proxy_connect_timeout 3000;
+ proxy_send_timeout 3000;
+ proxy_read_timeout 3000;
+ send_timeout 3000;
+
+ location / {
+ proxy_set_header Host $http_host;
+ proxy_set_header Connection keep-alive;
+ proxy_set_header X-Real-IP $remote_addr;
+ proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
+ proxy_set_header X-Forwarded-Host $server_name;
+ proxy_pass http://127.0.0.1:5003;
+ }
+}
+#+end_src js
+
+Install the nginx webserver (as root)
+
+: guix package -i nginx
+
+The nginx example configuration examples can be found in the Guix
+store through
+
+: ls -l /root/.guix-profile/sbin/nginx
+: lrwxrwxrwx 3 root guixbuild 66 Dec 31 1969 /root/.guix-profile/sbin/nginx -> /gnu/store/g0wrcl5z27rmk5b52rldzvk1bzzbnz2l-nginx-1.8.1/sbin/nginx
+
+Use that path
+
+: ls /gnu/store/g0wrcl5z27rmk5b52rldzvk1bzzbnz2l-nginx-1.8.1/share/nginx/conf/
+: fastcgi.conf koi-win scgi_params
+: fastcgi.conf.default mime.types scgi_params.default
+: fastcgi_params mime.types.default uwsgi_params
+: fastcgi_params.default nginx.conf uwsgi_params.default
+: koi-utf nginx.conf.default win-utf
+
+And copy any relevant files to /etc/nginx. A configuration file for
+GeneNetwork (reverse proxy) port forwarding can be found in the source
+repository under ./etc/nginx-genenetwork.conf. Copy this file to /etc
+(still as root)
+: cp ./etc/nginx-genenetwork.conf /etc/nginx/
+
+Make dirs
+
+: mkdir -p /var/spool/nginx/logs
+
+Add users
+
+: adduser nobody ; addgroup nobody
+
+Run nginx
+
+: /root/.guix-profile/sbin/nginx -c /etc/nginx/nginx-genenetwork.conf -p /var/spool/nginx
+
+* Source deployment and other information on reproducibility
+
+See the document [[GUIX-Reproducible-from-source.org]].
+
+** Update to recent guix
+
+We now compile Guix from scratch.
+
+Create, install and run a recent version of the guix-daemon by
+compiling the guix repository you have installed with git in
+step 2. Follow [[https://github.com/pjotrp/guix-notes/blob/master/INSTALL.org#building-gnu-guix-from-source-using-guix][these]] steps carefully after
+
+: cd ~/genenetwork/guix-gn-deploy
+
+Make sure to restart the guix daemon and run guix client from this
+directory.
+
+** Install GN2
+
+Reinstall genenetwork2 using the new tree
+
+#+begin_src bash
+env GUIX_PACKAGE_PATH=~/genenetwork/guix-bioinformatics/ ./pre-inst-env guix package -i genenetwork2 --substitute-urls="http://guix.genenetwork.org https://mirror.guixsd.org"
+#+end_src bash
+
+Note the use of ./pre-inst-env here!
+
+Actually, it should be the same installation as in step 4, so nothing
+gets downloaded.
+
+** Run GN2
+
+Make a note of the paths with
+
+#+begin_src bash
+./pre-inst-env guix package --search-paths
+#+end_src bash
+
+or this should also work if guix is installed
+
+#+begin_src bash
+guix package --search-paths
+#+end_src bash
+
+After setting the paths for the server
+
+#+begin_src bash
+export PATH=~/.guix-profile/bin:$PATH
+export PYTHONPATH="$HOME/.guix-profile/lib/python2.7/site-packages"
+export R_LIBS_SITE="$HOME/.guix-profile/site-library/"
+export GUIX_GTK3_PATH="$HOME/.guix-profile/lib/gtk-3.0"
+export GI_TYPELIB_PATH="$HOME/.guix-profile/lib/girepository-1.0"
+export XDG_DATA_DIRS="$HOME/.guix-profile/share"
+export GIO_EXTRA_MODULES="$HOME/.guix-profile/lib/gio/modules"
+#+end_src bash
+
+run the main script (in ~/.guix-profile/bin)
+
+#+begin_src bash
+genenetwork2
+#+end_src bash
+
+will start the default server which listens on port 5003, i.e.,
+http://localhost:5003/.
+
+OK, we are where we were before with step 4. Only difference is that we
+used our own compiled guix server.
diff --git a/doc/README.org b/doc/README.org
index b38ea664..46df03c7 100644
--- a/doc/README.org
+++ b/doc/README.org
@@ -2,33 +2,28 @@
* Table of Contents :TOC:
- [[#introduction][Introduction]]
- - [[#quick-installation-recipe][Quick installation recipe]]
- - [[#step-1-install-gnu-guix][Step 1: Install GNU Guix]]
- - [[#step-2-checkout-the-gn2-git-repositories][Step 2: Checkout the GN2 git repositories]]
- - [[#step-3-authorize-the-gn-guix-server][Step 3: Authorize the GN Guix server]]
- - [[#step-4-install-and-run-gn2][Step 4: Install and run GN2]]
- - [[#run-mysql-server][Run MySQL server]]
+ - [[#install][Install]]
+ - [[#running-gn2][Running GN2]]
+ - [[#run-gn-proxy][Run gn-proxy]]
+ - [[#run-redis][Run Redis]]
+ - [[#run-mariadb-server][Run MariaDB server]]
+ - [[#install-mariadb-with-gnu-guix][Install MariaDB with GNU GUIx]]
+ - [[#load-the-small-database-in-mysql][Load the small database in MySQL]]
+ - [[#get-genotype-files][Get genotype files]]
- [[#gn2-dependency-graph][GN2 Dependency Graph]]
- - [[#source-deployment][Source deployment]]
- - [[#run-your-own-copy-of-gn2][Run your own copy of GN2]]
- - [[#set-up-nginx-port-forwarding][Set up nginx port forwarding]]
- - [[#source-deployment-and-other-information-on-reproducibility][Source deployment and other information on reproducibility]]
- - [[#update-to-recent-guix][Update to recent guix]]
- - [[#install-gn2][Install GN2]]
- - [[#run-gn2][Run GN2]]
+ - [[#working-with-the-gn2-source-code][Working with the GN2 source code]]
+ - [[#read-more][Read more]]
- [[#trouble-shooting][Trouble shooting]]
- [[#importerror-no-module-named-jinja2][ImportError: No module named jinja2]]
- - [[#error-can-not-find-directory-homegn2_data][ERROR: can not find directory $HOME/gn2_data]]
+ - [[#error-can-not-find-directory-homegn2_data-or-can-not-find-directory-homegenotype_filesgenotype][ERROR: 'can not find directory $HOME/gn2_data' or 'can not find directory $HOME/genotype_files/genotype']]
- [[#cant-run-a-module][Can't run a module]]
- [[#rpy2-error-show-now-found][Rpy2 error 'show' now found]]
- - [[#irc-session][IRC session]]
+ - [[#mysql-cant-connect-server-through-socket-error][Mysql can't connect server through socket ERROR]]
+ - [[#notes][NOTES]]
+ - [[#deploying-gn2-official][Deploying GN2 official]]
* Introduction
-If you want to understand the architecture of GN2 read
-[[Architecture.org]]. The rest of this document is mostly on deployment
-of GN2.
-
Large system deployments can get very [[http://biogems.info/contrib/genenetwork/gn2.svg ][complex]]. In this document we
explain the GeneNetwork version 2 (GN2) reproducible deployment system
which is based on GNU Guix (see also [[https://github.com/pjotrp/guix-notes/blob/master/README.md][Guix-notes]]). The Guix
@@ -37,195 +32,126 @@ system can be used to install GN with all its files and dependencies.
The official installation path is from a checked out version of the
main Guix package tree and that of the Genenetwork package
tree. Current supported versions can be found as the SHA values of
-'gn-latest' branches of [[https://github.com/genenetwork/guix-bioinformatics/tree/gn-latest][Guix bioinformatics]] and [[https://github.com/genenetwork/guix/tree/gn-latest][GNU Guix main]].
+'gn-latest' branches of [[https://gitlab.com/genenetwork/guix-bioinformatics][Guix bioinformatics]] and [[https://gitlab.com/genenetwork/guix][GNU Guix]].
For a full view of runtime dependencies as defined by GNU Guix, see
-the [[#gn2-dependency-graph][GN2 Dependency Graph]].
-
-* Quick installation recipe
-
-This is a recipe for quick and dirty installation of GN2. For
-convenience everything is installed as root, though in reality only
-GNU Guix has to be installed as root. I tested this recipe on a fresh
-install of Debian 8.3.0 (in KVM) though it should work on any modern
-Linux distribution (including CentOS). For more elaborate installation
-instructions see [[#source-deployment][Source deployment]].
-
-Note that GN2 consists of an approx. 5 GB installation including
-database. If you use a virtual machine we recommend to use at least
-double.
-
-** Step 1: Install GNU Guix
-
-Fetch the GNU Guix binary from [[https://www.gnu.org/software/guix/download/][here]] (middle panel) and follow
-[[https://www.gnu.org/software/guix/manual/html_node/Binary-Installation.html][instructions]]. Essentially, download and unpack the tar ball (which
-creates directories in /gnu and /var/guix), add build users and group
-(Guix builds software as unpriviliged users) and run the Guix daemon
-after fixing the paths (also known as the 'profile').
-
-Once you have succeeded, you have to [[https://github.com/pjotrp/guix-notes/blob/master/INSTALL.org#set-the-key][set the key]] (getting permission
-to download binaries from the GNU server) and you should be able to
-install the hello package using binary packages (no building)
-
-#+begin_src bash
-export PATH=~/.guix-profile/bin:$PATH
-guix pull
-guix package -i hello --dry-run
-#+end_src
-
-Which should show something like
+an example of the [[#gn2-dependency-graph][GN2 Dependency Graph]].
-: The following files would be downloaded:
-: /gnu/store/zby49aqfbd9w9br4l52mvb3y6f9vfv22-hello-2.10
-: ...
-#+end_src
+* Install
-means binary installs. The actual installation command of 'hello' is
+Make sure to install GNU Guix using the binary download instructions
+on the main website. Follow the instructions on
+[[GUIX-Reproducible-from-source.org]] to download pre-built binaries. Note
+the download amounts to several GBs of data.
-#+begin_src bash
-guix package -i hello
-hello
- Hello, world!
-#+end_src
+* Running GN2
-If you actually see things building it means that Guix is not yet
-properly installed and up-to-date, i.e., the key is missing or you
-need to do a 'guix pull'. Press Ctrl-C to interrupt.
+Default settings for GN2 are listed in a file called
+[[../etc/default_settings.py][default_settings.py]]. You can copy this file and pass it as a new
+parameter to the genenetwork2 command, e.g.
-If you need more help we have another writeup in [[https://github.com/pjotrp/guix-notes/blob/master/INSTALL.org#binary-installation][guix-notes]]. To get
-rid of the locale warning see [[https://github.com/pjotrp/guix-notes/blob/master/INSTALL.org#set-locale][set-locale]].
+: genenetwork2 mysettings.py
-** Step 2: Checkout the GN2 git repositories
+or you can set environment variables to override individual parameters, e.g.
-To fixate the software dependency graph GN2 uses git repositories of
-Guix packages. First install git if it is missing
+: env SERVER_PORT=5004 SQL_URI=mysql://user:pwd@dbhostname/db_webqtl genenetwork2
-#+begin_src bash
-guix package -i git
-export GIT_SSL_CAINFO=/etc/ssl/certs/ca-certificates.crt
-#+end_src
+the debug and logging switches can be particularly useful when
+developing GN2.
-check out the git repositories (gn-deploy branch)
+* Run gn-proxy
-#+begin_src bash
-cd ~
-mkdir genenetwork
-cd genenetwork
-git clone --branch gn-deploy https://github.com/genenetwork/guix-bioinformatics
-git clone --branch gn-deploy --recursive https://github.com/genenetwork/guix guix-gn-deploy
-cd guix-gn-deploy
-#+end_src bash
+GeneNetwork requires a separate gn-proxy server which handles
+authorisation and access control. For instructions see the [[https://github.com/genenetwork/gn-proxy][README]].
-To test whether this is working try:
+* Run Redis
-#+begin_src bash
-#+end_src bash
+Redis part of GN2 deployment and will be started by the ./bin/genenetwork2
+startup script.
+* Run MariaDB server
+** Install MariaDB with GNU GUIx
-** Step 3: Authorize the GN Guix server
+These are the steps you can take to install a fresh installation of
+mariadb (which comes as part of the GNU Guix genenetwork2 install).
-GN2 has its own GNU Guix binary distribution server. To trust it you have
-to add the following key
+As root configure the Guix profile
-#+begin_src scheme
-(public-key
- (ecc
- (curve Ed25519)
- (q #11217788B41ADC8D5B8E71BD87EF699C65312EC387752899FE9C888856F5C769#)
- )
-)
-#+end_src
+: . ~/opt/genenetwork2/etc/profile
-by pasting it into the command
+and run for example
-#+begin_src bash
-guix archive --authorize
-#+end_src
+#+BEGIN_SRC bash
+adduser mariadb && addgroup mariadb
+mkdir -p /export2/mariadb/database
+chown mariadb.mariadb -R /export2/mariadb/
+mkdir -p /var/run/mysqld
+chown mariadb.mariadb /var/run/mysqld
+su mariadb
+mysql --version
+ mysql Ver 15.1 Distrib 10.1.45-MariaDB, for Linux (x86_64) using readline 5.1
+mysql_install_db --user=mariadb --datadir=/export2/mariadb/database
+mysqld -u mariadb --datadir=/exportdb/mariadb/database/mariadb --explicit_defaults_for_timestamp -P 12048"
+#+END_SRC
-and hit Ctrl-D.
+If you want to run as root you may have to set
-Now you can use the substitute server to install GN2 binaries.
+: /etc/my.cnf
+: [mariadbd]
+: user=root
-** Step 4: Install and run GN2
+You also need to set
-Since this is a quick and dirty install we are going to override the
-GNU Guix package path by pointing the package path to our repository:
+: ft_min_word_len = 3
-#+begin_src bash
-rm /root/.config/guix/latest
-ln -s ~/genenetwork/guix-gn-deploy/ /root/.config/guix/latest
-#+end_src
+To make sure word text searches (shh) work and rebuild the tables if
+required.
-Now check whether you can find the GN2 package with
+To check error output in a file on start-up run with something like
-#+begin_src bash
-env GUIX_PACKAGE_PATH=~/genenetwork/guix-bioinformatics/ guix package -A genenetwork2
- genenetwork2 2.0-a8fcff4 out gn/packages/genenetwork.scm:144:2
-#+end_src
+: mariadbd -u mariadb --console --explicit_defaults_for_timestamp --datadir=/gnu/mariadb --log-error=~/test.log
-(ignore the source file newer then ... messages, this is caused by the
-/root/.config/guix/latest override).
+Other tips are that Guix installs mariadbd in your profile, so this may work
-And install with
+: /home/user/.guix-profile/bin/mariadbd -u mariadb --explicit_defaults_for_timestamp --datadir=/gnu/mariadb
-#+begin_src bash
-env GUIX_PACKAGE_PATH=~/genenetwork/guix-bioinformatics/ \
- guix package -i genenetwork2 \
- --substitute-urls="http://guix.genenetwork.org"
-#+end_src
+When you get errors like:
-Note: the order of the substitute url's may make a difference in speed
-(put the one first that is fastest for your location and time of day).
+: qlalchemy.exc.IntegrityError: (_mariadb_exceptions.IntegrityError) (1215, 'Cannot add foreign key constraint')
-Note: if your system starts building or gives an error it may well be
-Step 3 did not succeed. The installation should actually be smooth at
-this point and only do binary installs (no compiling).
+you may need to set
-After installation you should be able to run genenetwork2 after updating
-the Guix suggested environment vars. Check the output of
+: set foreign_key_checks=0
-#+begin_src bash
-guix package --search-paths
-export PYTHONPATH="/root/.guix-profile/lib/python2.7/site-packages"
-export R_LIBS_SITE="/root/.guix-profile/site-library/"
-#+end_src
-
-and copy-paste the listed exports into the terminal before running:
-
-#+begin_src bash
-genenetwork2
-#+end_src
-
-It will complain that the database is missing. See the next section on
-running MySQL server for downloading and installing a MySQL GN2
-database. After installing the database restart genenetwork2 and point
-your browser at [[http://localhost:5003/]].
-
-End of the GN2 installation recipe!
-
-* Run MySQL server
+** Load the small database in MySQL
At this point we require the underlying distribution to install and
-run mysqld. Currently we have two databases for deployment,
+run mysqld (see next section for GNU Guix). Currently we have two databases for deployment,
'db_webqtl_s' is the small testing database containing experiments
from BXD mice and 'db_webqtl_plant' which contains all plant related
material.
Download one database from
-http://files.genenetwork.org/raw_database/
-https://s3.amazonaws.com/genenetwork2/db_webqtl_s.zip
+http://ipfs.genenetwork.org/ipfs/QmRUmYu6ogxEdzZeE8PuXMGCDa8M3y2uFcfo4zqQRbpxtk
+
+After installation unzip the database binary in the MySQL directory
-Check the md5sum.
+#+BEGIN_SRC sh
+cd ~/mysql
+p7zip -d db_webqtl_s.7z
+chown -R mysql:mysql db_webqtl_s/
+chmod 700 db_webqtl_s/
+chmod 660 db_webqtl_s/*
+#+END_SRC
-After installation inflate the database binary in the MySQL directory
-(this installation path is subject to change soon)
+restart MySQL service (mysqld). Login as root
-: chown -R mysql:mysql db_webqtl_s/
-: chmod 700 db_webqtl_s/
-: chmod 660 db_webqtl_s/*
+: mysql_upgrade -u root --force
-restart MySQL service (mysqld). Login as root and
+: myslq -u root
+
+and
: mysql> show databases;
: +--------------------+
@@ -239,199 +165,45 @@ restart MySQL service (mysqld). Login as root and
Set permissions and match password in your settings file below:
-: mysql> grant all privileges on db_webqtl_s.* to gn2@"localhost" identified by 'mysql_password';
+: mysql> grant all privileges on db_webqtl_s.* to gn2@"localhost" identified by 'webqtl';
+
+You may need to change "localhost" to whatever domain you are
+connecting from (mysql will give an error).
Note that if the mysql connection is not working, try connecting to
the IP address and check server firewall, hosts.allow and mysql IP
-configuration.
+configuration (see below).
Note for the plant database you can rename it to db_webqtl_s, or
change the settings in etc/default_settings.py to match your path.
-* GN2 Dependency Graph
-
-Graph of all runtime dependencies as installed by GNU Guix.
-
-#+ATTR_HTML: :title GN2_graph
-http://biogems.info/contrib/genenetwork/gn2.svg
-
-* Source deployment
-
-This section gives a more elaborate instruction for installing GN2
-from source.
-
-First execute above 4 steps:
-
- - [[#step-1-install-gnu-guix][Step 1: Install GNU Guix]]
- - [[#step-2-checkout-the-gn2-git-repositories][Step 2: Checkout the GN2 git repositories]]
- - [[#step-3-authorize-the-gn-guix-server][Step 3: Authorize the GN Guix server]]
- - [[#step-4-install-and-run-gn2-][Step 4: Install and run GN2 ]]
-
-
-** Run your own copy of GN2
-
-At some point you may want to fix the source code. Assuming you have
-Guix and Genenetwork2 installed (as described above) clone the GN2
-repository from https://github.com/genenetwork/genenetwork2.
-
-Copy-paste the paths into your terminal (mainly so PYTHON_PATH and
-R_LIBS_SITE are set) from the information given by guix:
-
-: guix package --search-paths
-
-Inside the repository:
-
-: cd genenetwork2
-: ./bin/genenetwork2
-
-Will fire up your local repo http://localhost:5003/ using the
-settings in ./etc/default_settings.py. These settings may
-not reflect your system. To override settings create your own from a copy of
-default_settings.py and pass it into GN2 with
-
-: ./bin/genenetwork2 $HOME/my_settings.py
-
-and everything *should* work (note the full path to the settings
-file). This way we develop against the exact same dependency graph of
-software.
-
-If something is not working, take a hint from the settings file
-that comes in the Guix installation. It sits in something like
-
-: cat ~/.guix-profile/lib/python2.7/site-packages/genenetwork2-2.0-py2.7.egg/etc/default_settings.py
-
-** Set up nginx port forwarding
-
-nginx can be used as a reverse proxy for GN2. For example, we want to
-expose GN2 on port 80 while it is running on port 5003. Essentially
-the configuration looks like
-
-#+begin_src js
- server {
- listen 80;
- server_name test-gn2.genenetwork.org;
- access_log logs/test-gn2.access.log;
-
- proxy_connect_timeout 3000;
- proxy_send_timeout 3000;
- proxy_read_timeout 3000;
- send_timeout 3000;
-
- location / {
- proxy_set_header Host $http_host;
- proxy_set_header Connection keep-alive;
- proxy_set_header X-Real-IP $remote_addr;
- proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
- proxy_set_header X-Forwarded-Host $server_name;
- proxy_pass http://127.0.0.1:5003;
- }
-}
-#+end_src js
+* Get genotype files
-Install the nginx webserver (as root)
+The script looks for genotype files. You can find them in
+http://ipfs.genenetwork.org/ipfs/QmXQy3DAUWJuYxubLHLkPMNCEVq1oV7844xWG2d1GSPFPL
-: guix package -i nginx
+#+BEGIN_SRC sh
+mkdir -p $HOME/genotype_files
+cd $HOME/genotype_files
-The nginx example configuration examples can be found in the Guix
-store through
+#+END_SRC
-: ls -l /root/.guix-profile/sbin/nginx
-: lrwxrwxrwx 3 root guixbuild 66 Dec 31 1969 /root/.guix-profile/sbin/nginx -> /gnu/store/g0wrcl5z27rmk5b52rldzvk1bzzbnz2l-nginx-1.8.1/sbin/nginx
-
-Use that path
-
-: ls /gnu/store/g0wrcl5z27rmk5b52rldzvk1bzzbnz2l-nginx-1.8.1/share/nginx/conf/
-: fastcgi.conf koi-win scgi_params
-: fastcgi.conf.default mime.types scgi_params.default
-: fastcgi_params mime.types.default uwsgi_params
-: fastcgi_params.default nginx.conf uwsgi_params.default
-: koi-utf nginx.conf.default win-utf
-
-And copy any relevant files to /etc/nginx. A configuration file for
-GeneNetwork (reverse proxy) port forwarding can be found in the source
-repository under ./etc/nginx-genenetwork.conf. Copy this file to /etc
-(still as root)
-: cp ./etc/nginx-genenetwork.conf /etc/nginx/
-
-Make dirs
-
-: mkdir -p /var/spool/nginx/logs
-
-Add users
-
-: adduser nobody ; addgroup nobody
-
-Run nginx
-
-: /root/.guix-profile/sbin/nginx -c /etc/nginx/nginx-genenetwork.conf -p /var/spool/nginx
-
-* Source deployment and other information on reproducibility
-
-See the document [[GUIX-Reproducible-from-source.org]].
-
-** Update to recent guix
-
-We now compile Guix from scratch.
-
-Create, install and run a recent version of the guix-daemon by
-compiling the guix repository you have installed with git in
-step 2. Follow [[https://github.com/pjotrp/guix-notes/blob/master/INSTALL.org#building-gnu-guix-from-source-using-guix][these]] steps carefully after
-
-: cd ~/genenetwork/guix-gn-deploy
-
-Make sure to restart the guix daemon and run guix client from this
-directory.
-
-** Install GN2
-
-Reinstall genenetwork2 using the new tree
-
-#+begin_src bash
-env GUIX_PACKAGE_PATH=~/genenetwork/guix-bioinformatics/ ./pre-inst-env guix package -i genenetwork2 --substitute-urls="http://guix.genenetwork.org https://mirror.guixsd.org"
-#+end_src bash
-
-Note the use of ./pre-inst-env here!
-
-Actually, it should be the same installation as in step 4, so nothing
-gets downloaded.
-
-** Run GN2
-
-Make a note of the paths with
-
-#+begin_src bash
-./pre-inst-env guix package --search-paths
-#+end_src bash
-
-or this should also work if guix is installed
-
-#+begin_src bash
-guix package --search-paths
-#+end_src bash
+* GN2 Dependency Graph
-After setting the paths for the server
+Graph of all runtime dependencies as installed by GNU Guix.
-#+begin_src bash
-export PATH=~/.guix-profile/bin:$PATH
-export PYTHONPATH="$HOME/.guix-profile/lib/python2.7/site-packages"
-export R_LIBS_SITE="$HOME/.guix-profile/site-library/"
-export GUIX_GTK3_PATH="$HOME/.guix-profile/lib/gtk-3.0"
-export GI_TYPELIB_PATH="$HOME/.guix-profile/lib/girepository-1.0"
-export XDG_DATA_DIRS="$HOME/.guix-profile/share"
-export GIO_EXTRA_MODULES="$HOME/.guix-profile/lib/gio/modules"
-#+end_src bash
+#+ATTR_HTML: :title GN2_graph
+http://biogems.info/contrib/genenetwork/gn2.svg
-run the main script (in ~/.guix-profile/bin)
+* Working with the GN2 source code
-#+begin_src bash
-genenetwork2
-#+end_src bash
+See [[development.org]].
-will start the default server which listens on port 5003, i.e.,
-http://localhost:5003/.
+* Read more
-OK, we are where we were before with step 4. Only difference is that we
-used our own compiled guix server.
+If you want to understand the architecture of GN2 read
+[[Architecture.org]]. The rest of this document is mostly on deployment
+of GN2.
* Trouble shooting
@@ -451,13 +223,17 @@ On one system:
: export GEM_PATH="$HOME/.guix-profile/lib/ruby/gems/2.2.0"
and perhaps a few more.
-** ERROR: can not find directory $HOME/gn2_data
+** ERROR: 'can not find directory $HOME/gn2_data' or 'can not find directory $HOME/genotype_files/genotype'
The default settings file looks in your $HOME/gn2_data. Since these
files come with a Guix installation you should take a hint from the
values in the installed version of default_settings.py (see above in
this document).
+You can use the GENENETWORK_FILES switch to set the datadir, for example
+
+: env GN2_PROFILE=~/opt/gn-latest GENENETWORK_FILES=/gnu/data/gn2_data ./bin/genenetwork2
+
** Can't run a module
In rare cases, development modules are not brought in with Guix
@@ -479,410 +255,49 @@ R_LIBS_SITE. Please check your GNU Guix GN2 installation paths,
you man need to reinstall. Note that this may be the point you
may want to start using profiles (see profile section).
-* IRC session
-
-Here an IRC session where we installed GN2 from scratch using GNU Guix
-and a download of the test database.
-
-#+begin_src
-<pjotrp> time to get binary install sorted :) [07:03]
-<pjotrp> Guix is designed for distributed installation servers
-<pjotrp> we have one on guix.genenetwork.org
-<pjotrp> it contains all the prebuild packages
-<pjotrp> for GN
-<user01> okay [07:04]
-<pjotrp> let's step back however [07:05]
-<pjotrp> I presume the environment is set with all guix package --search-paths
-<pjotrp> right?
-<user01> yep
-<user01> set to the ones in ~/.guix-profile/
-<pjotrp> good, and you are in gn-deploy-guix repo [07:06]
-<user01> yep [07:07]
-<pjotrp> git log shows
-
-Author: David Thompson <dthompson2@worcester.edu>
-Date: Sun Mar 27 21:20:19 2016 -0400
-
-<user01> yes
-<pjotrp> env GUIX_PACKAGE_PATH=../guix-bioinformatics ./pre-inst-env guix
- package -A genenetwork2 [07:08]
-<pjotrp> shows
-
-genenetwork2 2.0-a8fcff4 out ../guix-bioinformatics/gn/packages/genenetwork.scm:144:2
-genenetwork2-database-small 1.0 out ../guix-bioinformatics/gn/packages/genenetwork.scm:270:4
-genenetwork2-files-small 1.0 out ../guix-bioinformatics/gn/packages/genenetwork.scm:228:4
-
-<user01> yeah [07:09]
-<pjotrp> OK, we are in sync. This means we should be able to install the exact
- same software
-<pjotrp> I need to start up my guix daemon - I usually run it in a screen
-<pjotrp> screen -S guix-daemon
-<user01> hah, I don't have screen installed yet [07:11]
-<pjotrp> comes with guix ;) [07:12]
-<pjotrp> no worries, you can run it any way you want
-<pjotrp> $HOME/.guix-profile/bin/guix-daemon --build-users-group=guixbuild
-<user01> then something's weird, because it says I don't have it
-<pjotrp> oh, you need to install it first [07:13]
-<pjotrp> guix package -A screen
-<pjotrp> screen 4.3.1 out gnu/packages/screen.scm:34:2
-<pjotrp> but you can skip this install, for now
-<user01> alright [07:14]
-<pjotrp> env GUIX_PACKAGE_PATH=../guix-bioinformatics ./pre-inst-env guix
- package -i genenetwork2 --dry-run
-<pjotrp> substitute: updating list of substitutes from
- 'https://mirror.hydra.gnu.org'... 79.1%
-<pjotrp> you see that?
-<pjotrp> followed by [07:15]
-substitute: updating list of substitutes from
-'https://hydra.gnu.org'... 100.0%
-The following derivations would be built:
- /gnu/store/rk7nw0rjqqsha958m649wrykadx6mmhl-profile.drv
-
-/gnu/store/7b0qjybvfx8syzvfs7p5rdablwhbkbvs-module-import-compiled.drv
- /gnu/store/cy9zahbbf23d3cqyy404lk9f50z192kp-module-import.drv
- /gnu/store/ibdn603i8grf0jziy5gjsly34wx82lmk-gtk-icon-themes.drv
-
-<pjotrp> which should have the same HASH values /gnu/store/7b0qjybvf... etc.
- [07:16]
-<user01> profile has a different hash
-<pjotrp> but the next ones?
-<user01> they're the same
-<pjotrp> not sure why profile differs. Do you see the contact with
- mirror.hydra.org? [07:17]
-<user01> yeah
-<pjotrp> OK, that means you set the key correctly for that one :)
-<pjotrp> alright we are at the same state now. You can see most packages need
- to be rebuild because they are no longer cached as binaries on hydra
- [07:18]
-<pjotrp> things move fast...
-<user01> hehe
-<pjotrp> let me also do the same on my laptop - which I have staged before
- [07:19]
-<pjotrp> btw, to set the path I often do [07:20]
-<pjotrp> export
- PATH="/home/wrk/.guix-profile/bin:/home/wrk/.guix-profile/sbin":$PATH
-<pjotrp> to keep things like 'screen' from Debian
-<pjotrp> Once past building guix itself that is normally OK [07:21]
-<user01> ah, okay
-<user01> will do that
-<pjotrp> the guix build requires certain versions of tools, so you don't want
- to mix foreign tools in [07:23]
-<user01> makes sense [07:24]
-<pjotrp> On my laptop I am trying the main updating list of substitutes from
- 'http://hydra.gnu.org'... 10.5% [07:27]
-<pjotrp> it is a bit slow, but let's see if there is a difference with the
- mirror
-<pjotrp> you can see there are two servers here. Actually with recent daemons,
- if the mirror fails it will try the main server [07:28]
-<pjotrp> I documented the use of a caching server here [07:29]
-<pjotrp> https://github.com/pjotrp/guix-notes/blob/master/REPRODUCIBLE.org
-<pjotrp> this is exactly what we are doing now
-<user01> alrighty [07:35]
-<pjotrp> To see if a remote server has a guix server running it should respond
- [07:36]
-<pjotrp> lynx http://guix.genenetwork.org:8080 --dump
-<pjotrp> Resource not found: /
-<pjotrp>
-<pjotrp> you see that?
-<user01> yes [07:37]
-<pjotrp> good. The main hydra server is too slow. So on my laptop I forced
- using the mirror with [07:38]
-<pjotrp> env GUIX_PACKAGE_PATH=../guix-bioinformatics/ ./pre-inst-env guix
- package -i genenetwork2 --dry-run
- --substitute-urls="http://mirror.hydra.gnu.org"
-<pjotrp>
-<pjotrp> the list looks the same to me [07:40]
-<user01> me too
-<pjotrp> note that some packages will be built and some downloaded, right?
- [07:41]
-<user01> yes
-<pjotrp> atlas is actually a binary on my system [07:43]
-<pjotrp> I mean in that list
-<pjotrp> so, it should not build. Same as yours?
-<user01> yeah, atlas and r-gtable are the ones to be downloaded
-<pjotrp> You should not have seen that error ;)
-<pjotrp> we should try and install it this way, try [07:44]
-<pjotrp> env GUIX_PACKAGE_PATH=../guix-bioinformatics ./pre-inst-env guix
- package -i genenetwork2 --cores=4 --max-jobs=4 --keep-going [07:46]
-<pjotrp> set CPUs and max-jobs to something sensible
-<pjotrp> Does your VM have multiple cores?
-<pjotrp> note you can always press Ctrl-C during install
-<user01> it doesn't, I'll reboot it and give it another core [07:47]
-<user02> Hey [07:48]
-<user02> I'm here
-<user02> Will be stepping away for some breakfast
-<pjotrp> Can you do the same as us
-<pjotrp> Can you see the irc log
-<user02> Alright
-<user02> Yes, I can
-<user02> Please email me a copy in five minutes
-<pjotrp> user01: so when I use the GN server [07:56]
-<pjotrp> env GUIX_PACKAGE_PATH=../guix-bioinformatics ./pre-inst-env guix
- package -i genenetwork2 --dry-run
- --substitute-urls=http://guix.genenetwork.org:8080
-<pjotrp> I don't need to build anything [07:57]
-<pjotrp> (this won't work for you, yet)
-<pjotrp> to get it to work you need to 'trust' it [07:58]
-<pjotrp> but, first get the build going
-<pjotrp> I'll have a coffee while you and get building
-<user01> yeah it's doing its thing now [08:01]
-<pjotrp> cool [08:02]
-<pjotrp> in a separate terminal you can try and install with the gn mirror
- [08:05]
-<pjotrp> I'll send you the public key and you can paste it as said
- https://github.com/pjotrp/guix-notes/blob/master/REPRODUCIBLE.org
- [08:06]
-<user01> alright
-<pjotrp> should be in the E-mail [08:09]
-<pjotrp> getting it working it kinda nasty since the server gives no feedback
-<pjotrp> it works when you see no more in the build list ;) [08:11]
-<pjotrp> btw, you can install software in parallel. Guix does that.
-<pjotrp> even the same packages
-<pjotrp> so keep building ;)
-<pjotrp> try and do this with Debian...
-<pjotrp> coffee for me [08:12]
-<user01> the first build failed [08:15]
-<pjotrp> OK, Dennis fixed that one yesterday [08:27]
-<pjotrp> the problem is that sometime source tarballs disappear [08:28]
-<pjotrp> R is notorious for that
-<user01> haha, that's inconvenient..
-<pjotrp> well, it is good that Guix catches them
-<pjotrp> but we do not cache sources
-<pjotrp> binaries are cached - to some degree - so we don't have to rebuild
- those [08:29]
-<pjotrp> time to use the guix cache at guix.genenetwork.org
-<pjotrp> try and install the key (it is in the E-mail)
-<pjotrp> and see what this lists [08:31]
-<pjotrp> env GUIX_PACKAGE_PATH=../guix-bioinformatics ./pre-inst-env guix
- package -i genenetwork2
- --substitute-urls=http://guix.genenetwork.org --dry-run
-<pjotrp> should be all binary installs
-<user01> it's not.. [08:32]
-<user01> if I remove --substitute-urls, the list changes, does that mean I
- have the key set up correctly at least? [08:33]
-<pjotrp> dunno [08:35]
-<pjotrp> how many packages does it want to build?
-<pjotrp> should be zero
-<user01> four
-<pjotrp> Ah, that is OK - those are default profile things
-<user01> genenetwork2 is among the ones to be downloaded so [08:36]
-<pjotrp> remove --dry-run
-<pjotrp> yeah, good sign :)
-<pjotrp> we'll still hit a snag, but run it
-<pjotrp> should be fast
-<user01> doing it [08:37]
-<user01> it worked! [08:38]
-<user01> I think [08:39]
-<pjotrp> heh [08:40]
-<pjotrp> you mean it is finished?
-<user01> yep
-<pjotrp> type genenetwork2
-<user01> complains about not being able to connect to the database [08:41]
-<pjotrp> last snag :)
-<pjotrp> no database
-<pjotrp> well, we succeeded in installing a same-byte install of a very
- complex system :) [08:42]
-<pjotrp> (always take time to congratulate yourself)
-<pjotrp> now we need to install mysql
-<user01> hehe :)
-<pjotrp> this can be done throug guix or through debian [08:43]
-<pjotrp> the latter is a bit easier here, so let's do that
-<pjotrp> fun note: you can mix debian and guix
-<pjotrp> Follow instructions on [08:44]
-<pjotrp>
- https://github.com/genenetwork/genenetwork2/tree/staging/doc#run-mysql-server
-<pjotrp> apt-get install mysql-common [08:45]
-<pjotrp> may do it
-<pjotrp> You can also install with guix, but I need to document that
-<pjotrp> btw your internet must be fast :) [08:46]
-<user01> hehe it is ;)
-<pjotrp> when the database is installed [08:48]
-<pjotrp> be sure to set the password as instructed [08:50]
-<pjotrp> when mysql is set the genenetwork2 command should fire up the web
- server on localhost:5003 [08:58]
-<pjotrp> btw my internet is way slower :) [09:00]
-<user02> I'm back [09:04]
-<user02> fixed router firmware upgrade problem
-<user02> unbricking
-<pjotrp> tssk [09:07]
-<user02> I'll never leave routers to update themselves again [09:08]
-<user02> self-brick highway
-<user02> Resuming [09:09]
-<pjotrp> auto-updates are evil
-<pjotrp> always switch them off
-<pjotrp> user02: can you install genenetwork like user has done? [09:10]
-<pjotrp> pretty well documented here now :)
-<user02> Yes I can [09:11]
-<user02> Already installed key
-<pjotrp> user02: you are getting binary packages only now? [09:13]
-<user02> That's the sanest way to go now
-<user02> seriously
-<pjotrp> everything should be pre-built from guix.genenetwork.org
-<pjotrp> you are downloading?
-<user02> yes [09:15]
-<pjotrp> cool. Maybe an idea to set up a server
-<pjotrp> for your own use
-<user02> Stuck at downloading preprocesscore
-<pjotrp> should not [09:24]
-<pjotrp> what does env GUIX_PACKAGE_PATH=../guix-bioinformatics/
- ./pre-inst-env guix package -i genenetwork2
- --substitute-urls="http://guix.genenetwork.org" --dry-run
- [09:25]
-<pjotrp> say for r-prepocesscore
-<pjotrp> download or build?
-<pjotrp> mine says download [09:26]
-<user02> it only lists the derivatives to be built
-<user02> nothing else happens [09:27]
-<pjotrp> OK, so there is a problem
-<pjotrp> your key may not be working
-<pjotrp> everything should be listed as 'to be download' [09:28]
-<user02> Hmm
-<user02> Ah
-<user02> I know where I messed up
-<pjotrp> where?
-<user02> I did add the key
-<user02> However
-<pjotrp> (I am documenting)
-<user02> I did not tell guix to trust it
-<pjotrp> yes
-<pjotrp> and there is another potential problem
-<user02> Remember the documentation on installing guix?
-<user02> You have to tell guix to trust the default key [09:29]
-<user02> Right?
-<user02> So in this case
-<pjotrp> read the IRC log
-<user02> That step is mandatory
-<pjotrp> user01: how are you doing?
-<pjotrp> user02:
- https://github.com/pjotrp/guix-notes/blob/master/REPRODUCIBLE.org#using-gnu-guix-archive
- [09:30]
-<user01> a little bit left on the db download
-<pjotrp> user02: you should see no more building
-<pjotrp> user02: another issue may be that you updated r-preprocesscore
- package in guix-buinformatics [09:32]
-<pjotrp> all downstream packages will want to rebuild
-<user02> no, not really
-<user02> It's not even installed
-<pjotrp> checkout a branch of the the old version - make sure we are in synch
-<pjotrp> should be at
- /gnu/store/y1f3r2xs3fhyadd46nd2aqbr2p9qv2ra-r-biocpreprocesscore-1.32.0
- [09:33]
-<pjotrp>
-<user03> pjotrp: Possibly we should use the archive utility of Guix to do
- deployment to avoid such out-of-sync differences :) [09:34]
-<pjotrp> maybe. I did not get archive to update profiles properly [09:37]
-<pjotrp> Also it is good that they get to understand guix
- this way
-<pjotrp> carved in stone, eh [09:38]
-<user02> Yeah, all good [09:39]
-<user02> My mistake was skipping the guix archive part
-<user02> Can we begin with the install?
-<user02> It's telling me of derivatives that will be downloaded [09:40]
-<user02> So we're good
-<user02> Here goes
-<pjotrp> yeeha [09:42]
-<user02> pjotrp, where is this guix.genenetwork.org located at?
-<pjotrp> Tennessee
-<user02> It's...it's....sloooooooowwwwwwwwwwwwww
-<pjotrp> not from Europe
-<pjotrp> is it downloading at all?
-<user02> It should be extended
-<user02> Yes...like at 100KB/s [09:43]
-<user02> tear-jerker
-<user02> Verizon problems
-<user02> who's the host?
-<pjotrp> I am getting 500Kb/s
-<pjotrp> UT
-<user02> Guix's servers can run off more than one server, right?
-<user02> I'd like to host that particular server here
-<user02> For speed
-<pjotrp> yes
-<user02> Sooner or later
-<user02> It will be a necessity [09:45]
-<pjotrp> exactly what I am doing - this is our server
-<pjotrp> guix.genenetwork.org:8080
-<user02> All done installing [09:46]
-<pjotrp> what?
-<user02> Now the databases
-<pjotrp> what do you mean by slow exactly?
-<user02> Yes, it's installed
-<pjotrp> can you run genenetwork2
-<user02> setting variables
-<user02> If I try running it now, it will fail as I don't have the DBs [09:47]
-<pjotrp> cool - you had a lot of prebuilt packages already
-<pjotrp> OK, follow the instructions I wrote above
-<user01> now everything seems to be working for me :)
-<user02> OK
-<pjotrp> user01: excellent!
-<pjotrp> you see a webserver?
-<user01> yep, can connect to localhost:5003 [09:48]
-<pjotrp> So now you are running a guix copy of GN2
-<pjotrp> you can see where it lives with `which genenetwork2` or ls -l
- ~/.guix-profile/bin/genenetwork2 [09:49]
-<pjotrp>
- /gnu/store/1kma5xszvzsvmbb4k699h7gvdncw901i-genenetwork2-2.0-a8fcff4/bin/genenetwork2
-<pjotrp> it is a script
-<pjotrp> written by guix, open it [09:50]
-<pjotrp> inside it points to paths and our script at
-<pjotrp>
- /gnu/store/1kma5xszvzsvmbb4k699h7gvdncw901i-genenetwork2-2.0-a8fcff4/bin/.genenetwork2-real
-<pjotrp> if you open that you can see how the webserver is started [09:51]
-<pjotrp> next step is to run a recent version of GN2
-<user01> okay [09:52]
-<pjotrp> See
- https://github.com/genenetwork/genenetwork2/tree/staging/doc#run-your-own-copy-of-gn2
-<pjotrp> but do not checkout that genetwork2_diet
-<pjotrp> we reverted to the main tree
-<pjotrp> clone git@github.com:genenetwork/genenetwork2.git [09:53]
-<pjotrp> instead and checkout the staging branch
-<pjotrp> that is effectively my branch [09:54]
-<pjotrp> when that is done you should be able to fire up the webserver from
- there [09:55]
-<pjotrp> using ./bin/genenetwork2
-<user02> now installing DBs
-<user02> Downloading
-<pjotrp> annoyingly the source tree is ~700Mb [09:56]
-<user02> Can it also be done by installing the guix package
- genenetwork2-database-small?
-<pjotrp> I changed it in the diet version to 8Mb, but I had to revert
-<user01> I need to make my VM bigger...
-<pjotrp> user02: not ready [09:57]
-<user02> ok
-<pjotrp> user01: sorry
-<pjotrp> user01: you could mount a local dir inside the VM for development
-<pjotrp> that would allow you to use MAC tools for editing
-<pjotrp> just an idea
-<user01> yeah, I figure I'll do something like that
-<pjotrp> do you use emacs? [09:58]
-<user01> yep
-<pjotrp> that can also run on remote files over ssh
-<pjotrp> that's an alternative
-<pjotrp> kudos for using emacs :), wdyt user03
-<user02> 79 minutes to go downloading the db
-<pjotrp> user02: sorry about that [09:59]
-<pjotrp> it is 2GB
-<user02> user, you can also mount the directory via sshfs
-<user02> Mac OSX runs OpenSSH
-<pjotrp> user02: sopa
-<user02> You can therefore mount a directory outside the VM to the VM via
- sshfs [10:00]
-<pjotrp> yes, 3 options now
-<user02> That way, you can set up a VM only for it's logic
-<user02> Apps + the OS it runs [10:01]
-<user02> For data, let it reside on physical host accessible via sshfs
-<user02> Use this Arch wiki reference:
- https://wiki.archlinux.org/index.php/SSHFS
-<user02> I edited that last somewhere in 2015, may have been updated since
- then
-<user01> alright, cool! [10:04]
-<pjotrp> user01: you are almost done [10:06]
-<pjotrp> I wrote an elixir package for guix :)
-<pjotrp> env GUIX_PACKAGE_PATH=../guix-bioinformatics/ ./pre-inst-env guix
- package -A elixir
- --substitute-urls="http://guix.genenetwork.org" [10:08]
-<pjotrp> elixir 1.2.3 out
- ../guix-bioinformatics/gn/packages/elixir.scm:31:2
-<pjotrp>
-<pjotrp> I am building it on guix.genenetwork.org right now [10:09]
-<user01> nice [10:10]
-#+end_src
+** Mysql can't connect server through socket ERROR
+
+The following error
+
+: sqlalchemy.exc.OperationalError: (_mysql_exceptions.OperationalError) (2002, 'Can\'t connect to local MySQL server through socket \'/run/mysqld/mysqld.sock\' (2 "No such file or directory")')
+
+means that MySQL is trying to connect locally to a non-existent MySQL
+server, something you may see in a container. Typically replicated with something like
+
+: mysql -h localhost
+
+try to connect over the network interface instead, e.g.
+
+: mysql -h 127.0.0.1
+
+if that works run genenetwork after setting SQL_URI to something like
+
+: export SQL_URI=mysql://gn2:mysql_password@127.0.0.1/db_webqtl_s
+
+* NOTES
+
+** Deploying GN2 official
+
+Let's see how fast we can deploy a second copy of GN2.
+
+- [ ] Base install
+ + [ ] First install a Debian server with GNU Guix on board
+ + [ ] Get Guix build going
+ - [ ] Build the correct version of Guix
+ - [ ] Check out the correct gn-stable version of guix-bioinformatics http://git.genenetwork.org/pjotrp/guix-bioinformatics
+ - [ ] guix package -i genenetwork2 -p /usr/local/guix-profiles/gn2-stable
+ + [ ] Create a gn2 user and home with space
+ + [ ] Install redis
+ - [ ] add to systemd
+ - [ ] update redis.cnf
+ - [ ] update database
+ + [ ] Install mariadb (currently debian mariadb-server)
+ - [ ] add to systemd
+ - [ ] system stop mysql
+ - [ ] update mysql.cnf
+ - [ ] update database (see gn-services/services/mariadb.md)
+ - [ ] check tables
+ + [ ] run gn2
+ + [ ] update nginx
+ + [ ] install genenetwork3
+ - [ ] add to systemd
diff --git a/doc/database.org b/doc/database.org
index 624174a4..5107b660 100644
--- a/doc/database.org
+++ b/doc/database.org
@@ -1,9 +1,19 @@
-- github Document reduction issue
+* Database Information
+
+WARNING: This document contains information on the GN databases which
+will change over time. The GN database is currently MySQL based and,
+while efficient, contains a number of design choices we want to grow
+'out' of. Especially with an eye on reproducibility we want to
+introduce versioning.
+
+So do not treat the information in this document as a final way of
+accessing data. It is better to use the
+[[https://github.com/genenetwork/gn_server/blob/master/doc/API.md][REST API]].
* The small test database (2GB)
The default install comes with a smaller database which includes a
-number of the BSD's and the Human liver dataset (GSE9588).
+number of the BXD's and the Human liver dataset (GSE9588).
* GeneNetwork database
@@ -750,9 +760,30 @@ show indexes from ProbeSetFreeze;
| 1 | 5 | 0.303492 |
+--------+----------+----------+
-** Publication and publishdata (all pheno)
+** Publication
+
+Publication:
+
+| Id | PubMed_ID | Abstract | Title | Pages | Month | Year |
+
-Phenotype pubs
+** Publishdata (all pheno)
+
+One of three phenotype tables.
+
+mysql> select * from PublishData limit 5;
++---------+----------+-------+
+| Id | StrainId | value |
++---------+----------+-------+
+| 8966353 | 349 | 29.6 |
+| 8966353 | 350 | 27.8 |
+| 8966353 | 351 | 26.6 |
+| 8966353 | 352 | 28.5 |
+| 8966353 | 353 | 24.6 |
++---------+----------+-------+
+5 rows in set (0.25 sec)
+
+See below for phenotype access.
** QuickSearch
@@ -1073,7 +1104,37 @@ select * from ProbeSetXRef limit 5;
i.e., for Strain Id 1 (DataId) 1, the locus '10.095.400' has a
phenotype value of 5.742.
-GeneNetwork1 already has a limited REST interface, if you do
+Interestingly ProbeData and PublishData have the same layout as
+ProbeSetData. ProbeData is only in use for Affy assays - and not used
+for computations. PublishData contains trait values. ProbeSetData.id
+matches ProbeSetXRef.DataId while PublishData.id matches
+PublishXRef.DataId.
+
+select * from PublishXRef limit 3;
++-------+-------------+-------------+---------------+---------+----------------+------------------+-----------+----------+-------------------------------------------------------+
+| Id | InbredSetId | PhenotypeId | PublicationId | DataId | Locus | LRS | additive | Sequence | comments |
++-------+-------------+-------------+---------------+---------+----------------+------------------+-----------+----------+-------------------------------------------------------+
+| 10001 | 8 | 1 | 1 | 8966353 | D2Mit5 | 10.18351644706 | -1.20875 | 1 | |
+| 10001 | 7 | 2 | 53 | 8966813 | D7Mit25UT | 9.85534330983917 | -2.86875 | 1 | |
+| 10001 | 4 | 3 | 81 | 8966947 | CEL-6_57082524 | 11.7119505898121 | -23.28875 | 1 | elissa modified Abstract at Tue Jun 7 11:38:00 2005 |
++-------+-------------+-------------+---------------+---------+----------------+------------------+-----------+----------+-------------------------------------------------------+
+3 rows in set (0.00 sec)
+
+ties the trait data (PublishData) with the inbredsetid (matching
+PublishFreeze.InbredSetId), locus and publication.
+
+select * from PublishFreeze -> ;
++----+------------+--------------------------+-------------+------------+--------+-------------+-----------------+-----------------+
+| Id | Name | FullName | ShortName | CreateTime | public | InbredSetId | confidentiality | AuthorisedUsers |
++----+------------+--------------------------+-------------+------------+--------+-------------+-----------------+-----------------+
+| 1 | BXDPublish | BXD Published Phenotypes | BXDPublish | 2004-07-17 | 2 | 1 | 0 | NULL |
+| 18 | HLCPublish | HLC Published Phenotypes | HLC Publish | 2012-02-20 | 2 | 34 | 0 | NULL |
++----+------------+--------------------------+-------------+------------+--------+-------------+-----------------+-----------------+
+2 rows in set (0.02 sec)
+
+which gives us the datasets.
+
+GeneNetwork1 has a limited REST interface, if you do
: curl "http://robot.genenetwork.org/webqtl/main.py?cmd=get&probeset=1443823_s_at&db=HC_M2_0606_P"
@@ -1082,6 +1143,9 @@ we get
: ProbeSetID B6D2F1 C57BL/6J DBA/2J BXD1 BXD2 BXD5 BXD6 BXD8 BXD9 BXD11 BXD12 BXD13 BXD15 BXD16 BXD19 BXD20 BXD21 BXD22 BXD23 BXD24 BXD27 BXD28 BXD29 BXD31 BXD32 BXD33 BXD34 BXD38 BXD39 BXD40 BXD42 BXD67 BXD68 BXD43 BXD44 BXD45 BXD48 BXD50 BXD51 BXD55 BXD60 BXD61 BXD62 BXD63 BXD64 BXD65 BXD66 BXD69 BXD70 BXD73 BXD74 BXD75 BXD76 BXD77 BXD79 BXD73a BXD83 BXD84 BXD85 BXD86 BXD87 BXD89 BXD90 BXD65b BXD93 BXD94 A/J AKR/J C3H/HeJ C57BL/6ByJ CXB1 CXB2 CXB3 CXB4 CXB5 CXB6 CXB7 CXB8 CXB9 CXB10 CXB11 CXB12 CXB13 BXD48a 129S1/SvImJ BALB/cJ BALB/cByJ LG/J NOD/ShiLtJ PWD/PhJ BXD65a BXD98 BXD99 CAST/EiJ KK/HlJ WSB/EiJ NZO/HlLtJ PWK/PhJ D2B6F1
: 1443823_s_at 15.251 15.626 14.716 15.198 14.918 15.057 15.232 14.968 14.87 15.084 15.192 14.924 15.343 15.226 15.364 15.36 14.792 14.908 15.344 14.948 15.08 15.021 15.176 15.14 14.796 15.443 14.636 14.921 15.22 15.62 14.816 15.39 15.428 14.982 15.05 15.13 14.722 14.636 15.242 15.527 14.825 14.416 15.125 15.362 15.226 15.176 15.328 14.895 15.141 15.634 14.922 14.764 15.122 15.448 15.398 15.089 14.765 15.234 15.302 14.774 14.979 15.212 15.29 15.012 15.041 15.448 14.34 14.338 14.809 15.046 14.816 15.232 14.933 15.255 15.21 14.766 14.8 15.506 15.749 15.274 15.599 15.673 14.651 14.692 14.552 14.563 14.164 14.546 15.044 14.695 15.162 14.772 14.645 15.493 14.75 14.786 15.003 15.148 15.221
+(see https://github.com/genenetwork/gn_server/blob/master/doc/API.md
+for the latest REST API).
+
getTraitData is defined in the file [[https://github.com/genenetwork/genenetwork/blob/master/web/webqtl/textUI/cmdClass.py#L134][web/webqtl/textUI/cmdClass.py]].
probe is None, so the code at line 199 is run
@@ -1165,6 +1229,97 @@ select * from ProbeSetData limit 5;
5 rows in set (0.00 sec)
linked by ProbeSetXRef.dataid.
+
+*** For PublishData:
+
+List datasets for BXD (InbredSetId=1):
+
+select * from PublishXRef where InbredSetId=1 limit 3;
++-------+-------------+-------------+---------------+---------+-----------+------------------+------------------+----------+--------------------------------------------------------------------------------+
+| Id | InbredSetId | PhenotypeId | PublicationId | DataId | Locus | LRS | additive | Sequence | comments |
++-------+-------------+-------------+---------------+---------+-----------+------------------+------------------+----------+--------------------------------------------------------------------------------+
+| 10001 | 1 | 4 | 116 | 8967043 | rs8253516 | 13.4974914158039 | 2.39444444444444 | 1 | robwilliams modified post_publication_description at Mon Jul 30 14:58:10 2012
+ |
+| 10002 | 1 | 10 | 116 | 8967044 | rs3666069 | 22.0042692151629 | 2.08178571428572 | 1 | robwilliams modified phenotype at Thu Oct 28 21:43:28 2010
+ |
+| 10003 | 1 | 15 | 116 | 8967045 | D18Mit4 | 15.5929163293343 | 19.0882352941176 | 1 | robwilliams modified phenotype at Mon May 23 20:52:19 2011
+ |
++-------+-------------+-------------+---------------+---------+-----------+------------------+------------------+----------+--------------------------------------------------------------------------------+
+
+where ID is the 'record' or, effectively, dataset.
+
+select distinct(publicationid) from PublishXRef where InbredSetId=1 limit 3;
++---------------+
+| publicationid |
++---------------+
+| 116 |
+| 117 |
+| 118 |
++---------------+
+
+select distinct
+PublishXRef.id,publicationid,phenotypeid,Phenotype.post_publication_description
+from PublishXRef,Phenotype where InbredSetId=1 and
+phenotypeid=Phenotype.id limit 3;
++-------+---------------+-------------+----------------------------------------------------------------------------------------------------------------------------+
+| id | publicationid | phenotypeid | post_publication_description |
++-------+---------------+-------------+----------------------------------------------------------------------------------------------------------------------------+
+| 10001 | 116 | 4 | Central nervous system, morphology: Cerebellum weight [mg] |
+| 10002 | 116 | 10 | Central nervous system, morphology: Cerebellum weight after adjustment for covariance with brain size [mg] |
+| 10003 | 116 | 15 | Central nervous system, morphology: Brain weight, male and female adult average, unadjusted for body weight, age, sex [mg] |
++-------+---------------+-------------+----------------------------------------------------------------------------------------------------------------------------+
+
+The id field is the same that is used in the GN2 web interface and the
+PublicationID ties the datasets together.
+
+To list trait values:
+
+SELECT Strain.Name, PublishData.id, PublishData.value from
+(Strain,PublishData, PublishXRef) Where PublishData.StrainId =
+Strain.id limit 3;
+
++------+---------+-------+
+| Name | id | value |
++------+---------+-------+
+| CXB1 | 8966353 | 29.6 |
+| CXB1 | 8966353 | 29.6 |
+| CXB1 | 8966353 | 29.6 |
++------+---------+-------+
+
+here id should match dataid again:
+
+SELECT Strain.Name, PublishData.id, PublishData.value from
+(Strain,PublishData, PublishXRef) Where PublishData.StrainId =
+Strain.id and PublishXRef.dataid=8967043 and
+PublishXRef.dataid=PublishData.id limit 3;
++------+---------+-------+
+| Name | id | value |
++------+---------+-------+
+| BXD1 | 8967043 | 61.4 |
+| BXD2 | 8967043 | 49 |
+| BXD5 | 8967043 | 62.5 |
++------+---------+-------+
+
+*** Datasets
+
+The REST API aims to present a unified interface for genotype and
+phenotype data. Phenotype datasets appear in two major forms in the
+database and we want to present them as one resource.
+
+Dataset names are defined in ProbeSetFreeze.name and Published.id ->
+publication (we'll ignore the probe dataset that uses
+ProbeFreeze.name). These tables should be meshed. It looks like the
+ids are non-overlapping with the publish record IDs starting at 10,001
+(someone has been smart, though it sets the limit of probesets now to
+10,000).
+
+The datasets are organized differently in these tables. All published
+BXD data is grouped on BXDpublished with the publications as
+'datasets'. So, that is how we list them in the REST API.
+
+To fetch all the datasets we first list ProbeSetFreeze entries. Then
+we list the Published entries.
+
** Fetch genotype information
*** SNPs
diff --git a/doc/development.org b/doc/development.org
new file mode 100644
index 00000000..e65ccd58
--- /dev/null
+++ b/doc/development.org
@@ -0,0 +1,98 @@
+* Development
+
+** Using GN2_PROFILE
+
+After cloning the git source tree you can run the contained GN2 using
+an existing GN2_PROFILE, i.e., use a profile that was create to run a
+binary installation of GN2. This profile may be found by typing
+
+: which genenetwork2
+: /home/wrk/opt/gn-latest-guix/bin/genenetwork2
+
+An example of running the development version would be
+
+: env GN2_PROFILE=/home/wrk/opt/gn-latest-guix ./bin/genenetwork2
+
+Profiles are stored in /gnu/store, so you may pick one up there
+
+: readlink -f $(dirname $(dirname `which genenetwork2`))
+: /gnu/store/dvckpaw770b00l6rv4ijql8wrk11iypv-profile
+
+and use that instead.
+
+Note that the genenetwork2 script sets up the environment for running
+the webserver. This includes path to R modules and python modules. These
+are output on startup. To make sure there is no environment pollution you can
+
+** Javascript modules
+
+As of release 2.10-pre4 we Javascript modules are installed in three places:
+
+1. JS_GUIX_PATH: the Guix store - these are Guix pre-packaged modules
+2. The git source tree (./wqflask/wqflask/static/packages/)
+3. JS_GN_PATH: a local directory containing (temporary) development modules
+
+Packages currently in git (2) will move to JS_GUIX_PATH (1) over
+time. This is to keep better track of origin updates. Putting packages
+in git (2) is actively discouraged(!), unless there are GN2 specific
+adaptations to the original Javascript modules.
+
+JS_GN_PATH (3) is for development purposes. By default is is set to
+$HOME/genenetwork/javascript. Say you are working on an updated
+version of a JS module not yet in (1) you can simply check out that
+module in that path and it should show up.
+
+* Python modules
+
+Python modules are automatically found in the Guix profile.
+
+For development purposes it may be useful to try some Python package.
+Obviously this is only a temporary measure and when you decide to
+include the package it should be packaged in [[http://git.genenetwork.org/guix-bioinformatics/guix-bioinformatics][our GNU Guix software
+stack]]!
+
+To add packages you need to make sure the correct Python is used (currently
+Python 2.7) to install a package. E.g..
+
+#+BEGIN_SRC sh
+python --version
+ Python 2.7.16
+pip --version
+ pip 18.1 from /usr/lib/python2.7/dist-packages/pip (python 2.7)
+#+END_SRC
+
+You can install a Python package locally with pip, e.g.
+
+#+BEGIN_SRC sh
+pip install hjson
+#+END_SRC
+
+This installed in ~$HOME/.local/lib/python2.7/site-packages~. To add
+the search path for GeneNetwork use the environment variable
+
+#+BEGIN_SRC sh
+export PYTHON_GN_PATH=$HOME/.local/lib/python2.7/site-packages
+#+END_SRC
+
+Now you should be able to do
+
+#+BEGIN_SRC python
+import hjson
+#+END_SRC
+
+In fact you can kick off a Python shell with something like
+
+#+BEGIN_SRC python
+env SERVER_PORT=5013 WEBSERVER_MODE=DEBUG LOG_LEVEL=DEBUG \
+ SQL_URI=mysql://gn2:webqtl@localhost/db_webqtl_s \
+ GN2_PROFILE=~/opt/genenetwork2 \
+ ./bin/genenetwork2 ./etc/default_settings.py -c
+Python 2.7.17 (default, Jan 1 1970, 00:00:01)
+[GCC 7.5.0] on linux2
+Type "help", "copyright", "credits" or "license" for more information.
+>>> import hjson
+#+END_SRC
+
+It should now also work in GN2.
+
+* TODO External tools
diff --git a/doc/elasticsearch.org b/doc/elasticsearch.org
new file mode 100644
index 00000000..864a8363
--- /dev/null
+++ b/doc/elasticsearch.org
@@ -0,0 +1,247 @@
+* Elasticsearch
+
+** Introduction
+
+GeneNetwork uses elasticsearch (ES) for all things considered
+'state'. One example is user collections, another is user management.
+
+** Example
+
+To get the right environment, first you can get a python REPL with something like
+
+: env GN2_PROFILE=~/opt/gn-latest ./bin/genenetwork2 ../etc/default_settings.py -cli python
+
+(make sure to use the correct GN2_PROFILE!)
+
+Next try
+
+#+BEGIN_SRC python
+
+from elasticsearch import Elasticsearch, TransportError
+
+es = Elasticsearch([{ "host": 'localhost', "port": '9200' }])
+
+# Dump all data
+
+es.search("*")
+
+# To fetch an E-mail record from the users index
+
+record = es.search(
+ index = 'users', doc_type = 'local', body = {
+ "query": { "match": { "email_address": "myname@email.com" } }
+ })
+
+# It is also possible to do wild card matching
+
+q = { "query": { "wildcard" : { "full_name" : "pjot*" } }}
+es.search(index = 'users', doc_type = 'local', body = q)
+
+# To get elements from that record:
+
+record['hits']['hits'][0][u'_source']['full_name']
+u'Pjotr'
+
+record['hits']['hits'][0][u'_source']['email_address']
+u"myname@email.com"
+
+#+END_SRC
+
+** Health
+
+ES provides support for checking its health:
+
+: curl -XGET http://localhost:9200/_cluster/health?pretty=true
+
+#+BEGIN_SRC json
+
+
+ {
+ "cluster_name" : "asgard",
+ "status" : "yellow",
+ "timed_out" : false,
+ "number_of_nodes" : 1,
+ "number_of_data_nodes" : 1,
+ "active_primary_shards" : 5,
+ "active_shards" : 5,
+ "relocating_shards" : 0,
+ "initializing_shards" : 0,
+ "unassigned_shards" : 5
+ }
+
+#+END_SRC
+
+Yellow means just one instance is running (no worries).
+
+To get full cluster info
+
+: curl -XGET "localhost:9200/_cluster/stats?human&pretty"
+
+#+BEGIN_SRC json
+{
+ "_nodes" : {
+ "total" : 1,
+ "successful" : 1,
+ "failed" : 0
+ },
+ "cluster_name" : "elasticsearch",
+ "timestamp" : 1529050366452,
+ "status" : "yellow",
+ "indices" : {
+ "count" : 3,
+ "shards" : {
+ "total" : 15,
+ "primaries" : 15,
+ "replication" : 0.0,
+ "index" : {
+ "shards" : {
+ "min" : 5,
+ "max" : 5,
+ "avg" : 5.0
+ },
+ "primaries" : {
+ "min" : 5,
+ "max" : 5,
+ "avg" : 5.0
+ },
+ "replication" : {
+ "min" : 0.0,
+ "max" : 0.0,
+ "avg" : 0.0
+ }
+ }
+ },
+ "docs" : {
+ "count" : 14579,
+ "deleted" : 0
+ },
+ "store" : {
+ "size" : "44.7mb",
+ "size_in_bytes" : 46892794
+ },
+ "fielddata" : {
+ "memory_size" : "0b",
+ "memory_size_in_bytes" : 0,
+ "evictions" : 0
+ },
+ "query_cache" : {
+ "memory_size" : "0b",
+ "memory_size_in_bytes" : 0,
+ "total_count" : 0,
+ "hit_count" : 0,
+ "miss_count" : 0,
+ "cache_size" : 0,
+ "cache_count" : 0,
+ "evictions" : 0
+ },
+ "completion" : {
+ "size" : "0b",
+ "size_in_bytes" : 0
+ },
+ "segments" : {
+ "count" : 24,
+ "memory" : "157.3kb",
+ "memory_in_bytes" : 161112,
+ "terms_memory" : "122.6kb",
+ "terms_memory_in_bytes" : 125569,
+ "stored_fields_memory" : "15.3kb",
+ "stored_fields_memory_in_bytes" : 15728,
+ "term_vectors_memory" : "0b",
+ "term_vectors_memory_in_bytes" : 0,
+ "norms_memory" : "10.8kb",
+ "norms_memory_in_bytes" : 11136,
+ "points_memory" : "111b",
+ "points_memory_in_bytes" : 111,
+ "doc_values_memory" : "8.3kb",
+ "doc_values_memory_in_bytes" : 8568,
+ "index_writer_memory" : "0b",
+ "index_writer_memory_in_bytes" : 0,
+ "version_map_memory" : "0b",
+ "version_map_memory_in_bytes" : 0,
+ "fixed_bit_set" : "0b",
+ "fixed_bit_set_memory_in_bytes" : 0,
+ "max_unsafe_auto_id_timestamp" : -1,
+ "file_sizes" : { }
+ }
+ },
+ "nodes" : {
+ "count" : {
+ "total" : 1,
+ "data" : 1,
+ "coordinating_only" : 0,
+ "master" : 1,
+ "ingest" : 1
+ },
+ "versions" : [
+ "6.2.1"
+ ],
+ "os" : {
+ "available_processors" : 16,
+ "allocated_processors" : 16,
+ "names" : [
+ {
+ "name" : "Linux",
+ "count" : 1
+ }
+ ],
+ "mem" : {
+ "total" : "125.9gb",
+ "total_in_bytes" : 135189286912,
+ "free" : "48.3gb",
+ "free_in_bytes" : 51922628608,
+ "used" : "77.5gb",
+ "used_in_bytes" : 83266658304,
+ "free_percent" : 38,
+ "used_percent" : 62
+ }
+ },
+ "process" : {
+ "cpu" : {
+ "percent" : 0
+ },
+ "open_file_descriptors" : {
+ "min" : 415,
+ "max" : 415,
+ "avg" : 415
+ }
+ },
+ "jvm" : {
+ "max_uptime" : "1.9d",
+ "max_uptime_in_millis" : 165800616,
+ "versions" : [
+ {
+ "version" : "9.0.4",
+ "vm_name" : "OpenJDK 64-Bit Server VM",
+ "vm_version" : "9.0.4+11",
+ "vm_vendor" : "Oracle Corporation",
+ "count" : 1
+ }
+ ],
+ "mem" : {
+ "heap_used" : "1.1gb",
+ "heap_used_in_bytes" : 1214872032,
+ "heap_max" : "23.8gb",
+ "heap_max_in_bytes" : 25656426496
+ },
+ "threads" : 110
+ },
+ "fs" : {
+ "total" : "786.4gb",
+ "total_in_bytes" : 844400918528,
+ "free" : "246.5gb",
+ "free_in_bytes" : 264688160768,
+ "available" : "206.5gb",
+ "available_in_bytes" : 221771468800
+ },
+ "plugins" : [ ],
+ "network_types" : {
+ "transport_types" : {
+ "netty4" : 1
+ },
+ "http_types" : {
+ "netty4" : 1
+ }
+ }
+ }
+}
+#+BEGIN_SRC json
diff --git a/doc/testing.org b/doc/testing.org
index 1d5cc8b8..d5ab117d 100644
--- a/doc/testing.org
+++ b/doc/testing.org
@@ -1,43 +1,67 @@
#+TITLE: Testing GN2
* Table of Contents :TOC:
- - [[#introduction][Introduction]]
- - [[#run-tests][Run tests]]
- - [[#setup][Setup]]
- - [[#running][Running]]
+- [[#introduction][Introduction]]
+- [[#run-tests][Run tests]]
+ - [[#setup][Setup]]
+ - [[#running][Running]]
* Introduction
-For integration testing we currently use the brilliant Ruby Mechanize
-gem against the small database; a setup we call mechanical Rob because
-it emulates someone clicking through the website and checking results.
+For integration testing, we currently use [[https://github.com/genenetwork/genenetwork2/tree/testing/test/requests][Mechanica Rob]] against the
+small [[https://github.com/genenetwork/genenetwork2/blob/testing/doc/database.org][database]]; a setup we call Mechanical Rob because it emulates
+someone clicking through the website and checking results.
-These scripts invoke calls to a running webserver and test the
-response. If a page changes or is broken tests will break and we are
-informed. In principle, Mechanical Rob is run before code merges are
-committed to the main server.
+These scripts invoke calls to a running webserver and test the response.
+If a page changes or breaks, tests will fail. In principle, Mechanical
+Rob runs before code merges get committed to the main server.
-In the future we may move to Python mechanize - it'll be easy to mix
-the Ruby and Python versions.
+For unit tests, we use python's =unittest= framework. Coverage reports
+get generated using [[https://coverage.readthedocs.io/en/coverage-5.2.1/][coverage.py]] which you could also use to run
+unit tests. When adding new functionality, it is advisable to add
+unit tests.
* Run tests
** Setup
-Mechanize is not yet included in Guix deployment.
+Everything required for testing is already package with guix:
+: ./pre-ins-env guix package -i genenetwork2 -p ~/opt/genenetwork2
** Running
-Run the tests from the root of the genenetwork2 source tree as, for
-example,
+Run the tests from the root of the genenetwork2 source tree as. Ensure
+that Redis and Mariadb are running.
-: ./bin/test-website http://localhost:5003/ (default)
+To run Mechanical Rob:
+: time env GN2_PROFILE=~/opt/genenetwork2 TMPDIR=~/tmp SERVER_PORT=5004 GENENETWORK_FILES=/gnu/data/gn2_data/ ./bin/genenetwork2 ./etc/default_settings.py -c ~/projects/genenetwork2/test/requests/test-website.py -a http://localhost:5004
-If you are using the small deployment database you can use
+Use these aliases for the following examples.
-: ./bin/test-website --skip -n
+#+begin_src sh
+alias runpython="env GN2_PROFILE=~/opt/gn-latest TMPDIR=/tmp SERVER_PORT=5004 GENENETWORK_FILES=/gnu/data/gn2_data/ ./bin/genenetwork2
-To run individual tests on localhost you can do
+alias runcmd="time env GN2_PROFILE=~/opt/gn-latest TMPDIR=//tmp SERVER_PORT=5004 GENENETWORK_FILES=/gnu/data/gn2_data/ ./bin/genenetwork2 ./etc/default_settings.py -cli"
+#+end_src
-: ruby -Itest -Itest/lib test/lib/mapping.rb --name="/Mapping/"
+You could use them in your =.bashrc= or =.zshrc= file.
+
+To run unit tests:
+
+: runpython -m unittest discover -v
+
+Or alternatively using the coverage tool:
+
+: runcmd coverage run -m unittest discover -v
+
+To generate a html coverage report in =wqflask/coverage_html_report/=
+
+: runcmd coverage html
+
+To output the report to =STDOUT=:
+
+: runcmd coverage report
+
+All the configs for running the coverage tool are in
+=wqflask/.coveragerc=