summaryrefslogtreecommitdiff
path: root/topics
diff options
context:
space:
mode:
Diffstat (limited to 'topics')
-rw-r--r--topics/better-logging.gmi27
-rw-r--r--topics/coding-guidelines.gmi8
-rw-r--r--topics/queries-and-prepared-statements-in-python.gmi95
-rw-r--r--topics/setting-up-local-development-database.gmi186
-rw-r--r--topics/systems/dns-changes.gmi19
-rw-r--r--topics/systems/migrate-p2.gmi12
-rw-r--r--topics/systems/orchestration.gmi35
-rw-r--r--topics/use-exceptions-to-indicate-errors.gmi16
-rw-r--r--topics/uthsc-vpn-with-free-software.gmi37
9 files changed, 429 insertions, 6 deletions
diff --git a/topics/better-logging.gmi b/topics/better-logging.gmi
new file mode 100644
index 0000000..8de3fb3
--- /dev/null
+++ b/topics/better-logging.gmi
@@ -0,0 +1,27 @@
+# Improving Logging in GN2
+
+## What Are We Trying To Solve?
+
+We prioritise maintaining user functionality over speed in GN [with time this speed will be improved]. As such we should be pay more attention at not breaking any currently working GN2 functionality. And when/if we do, trouble-shooting should be easy. On this front, one way is to stream-line logging in both GN2/GN3 and make it more script friendly - only report when something fails, not to instrument variables - and in so doing make the process of monitoring easier.
+
+## Goals
+
+* Have script-friendly error/info logs.
+* Remove noise from GN2.
+* Separate logging into different files: error logs, info logs. Add this somewhere with Flask itself instead of re-directing STDOUT to a file.
+
+### Non-goals
+
+* Logging in GN3.
+* Parsing logs to extract goals.
+* Getting rid of "gn.db" global object and in so doing removing "MySqlAlchemy" [that we really shouldn't be using].
+* Adding log messages to existing functions.
+
+## Actual Design
+
+* Get rid of "utility.logger" module and replace it with Flask's or Python's in-built logging.
+* Configure the logging system to automatically add the module name, line number, time-stamps etc.
+
+## Resources
+
+=> https://realpython.com/python-logging/ Logging in Python
diff --git a/topics/coding-guidelines.gmi b/topics/coding-guidelines.gmi
new file mode 100644
index 0000000..47cb697
--- /dev/null
+++ b/topics/coding-guidelines.gmi
@@ -0,0 +1,8 @@
+# Coding guidelines
+
+We aim to adhere to the following coding guidelines.
+
+=> /topics/use-exceptions-to-indicate-errors Exceptions, not None return values
+=> /topics/better-logging Log messages
+
+This document is an index of other documents describing coding guidelines. Add more here as you write/discover them.
diff --git a/topics/queries-and-prepared-statements-in-python.gmi b/topics/queries-and-prepared-statements-in-python.gmi
new file mode 100644
index 0000000..642ed96
--- /dev/null
+++ b/topics/queries-and-prepared-statements-in-python.gmi
@@ -0,0 +1,95 @@
+# Queries and Prepared Statements in Python
+
+String interpolation when writing queries is a really bad idea; it leads to exposure to SQL Injection attacks. To mitigate against this, we need to write queries using placeholders for values, then passing in the values as arguments to the **execute** function.
+
+As a demonstration, using some existing code, do not write a query like this:
+
+```
+curr.execute(
+ """
+ SELECT Strain.Name, Strain.Id FROM Strain, Species
+ WHERE Strain.Name IN {}
+ and Strain.SpeciesId=Species.Id
+ and Species.name = '{}'
+ """.format(
+ create_in_clause(list(sample_data.keys())),
+ *mescape(dataset.group.species)))
+```
+
+In the query above, we interpolate the values of the 'sample_data.keys()' values and that of the 'dataset.group.species' values.
+
+The code above can be rewritten to something like:
+
+```
+sample_data_keys = tuple(key for key in sample_data.keys())
+
+curr.execute(
+ """
+ SELECT Strain.Name, Strain.Id FROM Strain, Species
+ WHERE Strain.Name IN ({})
+ and Strain.SpeciesId=Species.Id
+ and Species.name = %s
+ """.format(", ".join(sample_data_keys)),
+ (sample_data_keys + (dataset.group.species,)))
+```
+
+In this new query, the IN clause ends up being a string of the form
+
+> %s, %s, %s, ...
+
+for the total number of items in the 'sample_data_key' tuple.
+
+There is one more '%s' placeholder for the 'Species.name' value, so, the final tuple we provide as an argument to execute needs to add the 'dataset.group.species' value.
+
+**IMPORTANT 01**: the total number of placeholders (%s) must be the same as the total number of arguments passed into the 'execute' function.
+
+**IMPORTANT 02**: the order of the values must correspond to the order of the placeholders.
+
+### Aside
+
+The functions 'create_in_clause' and 'mescape' are defined as below:
+
+```
+from MySQLdb import escape_string as escape_
+
+def create_in_clause(items):
+ """Create an in clause for mysql"""
+ in_clause = ', '.join("'{}'".format(x) for x in mescape(*items))
+ in_clause = '( {} )'.format(in_clause)
+ return in_clause
+
+def mescape(*items):
+ """Multiple escape"""
+ return [escape_(str(item)).decode('utf8') for item in items]
+
+def escape(string_):
+ return escape_(string_).decode('utf8')
+```
+
+
+## Parameter Style
+
+In the section above, we show the most common parameter style used in most cases.
+
+If you want to use a mapping object (dict), you have the option of using the '%(<text>)s' format for the query. In that case, we could rewrite the query above into something like:
+
+```
+sample_data_dict = {f"sample_{idx}: key for idx,key in enumerate(sample_data.keys())}
+
+curr.execute(
+ """
+ SELECT Strain.Name, Strain.Id FROM Strain, Species
+ WHERE Strain.Name IN ({})
+ and Strain.SpeciesId=Species.Id
+ and Species.name = %(species_name)s
+ """.format(", ".join([f"%({key})s" for key in sample_data_dict.keys()])),
+ {**sample_data_dict, "species_name": dataset.group.species})
+```
+
+## Final Note
+
+While this has dealt mostly with the MySQLdb driver for Python3, the idea is the same for the psycopg2 (PostgreSQL) driver and others (with some minor variation in the details).
+
+The concept is also similar in many other languages.
+
+The main takeaway is that you really should not be manually escaping the values - instead, you should let the driver do that for you, by providing placeholders in the query, and the values to use separately.
diff --git a/topics/setting-up-local-development-database.gmi b/topics/setting-up-local-development-database.gmi
new file mode 100644
index 0000000..67dd88d
--- /dev/null
+++ b/topics/setting-up-local-development-database.gmi
@@ -0,0 +1,186 @@
+# Setting up Local Development Database
+
+## Introduction
+
+You need to setup a quick local database for development without needing root permissions and polluting your environment.
+
+* ${HOME} is the path to your home directory
+* An assumption is made that the GeneNetwork2 profile is in ${HOME}/opt/gn_profiles/gn2_latest for the purposes of this documentation. Please replace as appropriate.
+* We install the database files under ${HOME}/genenetwork/mariadb. Change as appropriate.
+
+## Setup Database Server
+
+Setup directories
+
+```
+mkdir -pv ${HOME}/genenetwork/mariadb/var/run
+mkdir -pv ${HOME}/genenetwork/mariadb/var/lib/data
+mkdir -pv ${HOME}/genenetwork/mariadb/var/lib/mysql
+```
+
+Setup default my.cnf
+
+```
+cat <<EOF > ${HOME}/genenetwork/mariadb/my.cnf
+[client-server]
+socket=${HOME}/genenetwork/mariadb/var/run/mysqld/mysqld.sock
+port=3307
+
+[server]
+user=$(whoami)
+socket=${HOME}/genenetwork/mariadb/var/run/mysqld/mysqld.sock
+basedir=${HOME}/opt/gn_profiles/gn2_latest
+datadir=${HOME}/genenetwork/mariadb/var/lib/data
+ft_min_word_len=3
+EOF
+```
+
+Install the database
+
+```
+${HOME}/opt/gn_profiles/gn2_latest/bin/mysql_install_db \
+ --defaults-file=${HOME}/genenetwork/mariadb/my.cnf
+```
+
+Running the daemon:
+
+```
+${HOME}/opt/gn_profiles/gn2_latest/bin/mysqld_safe \
+ --defaults-file=${HOME}/genenetwork/mariadb/my.cnf
+```
+
+Connect to daemon
+
+```
+${HOME}/opt/gn_profiles/gn2_latest/bin/mysql \
+ --defaults-file=${HOME}/genenetwork/mariadb/my.cnf
+```
+
+Set up password for user
+
+```
+MariaDB [(none)]> USE mysql;
+MariaDB [mysql]> ALTER USER '<your-username>'@'localhost' IDENTIFIED BY '<the-new-password>';
+MariaDB [mysql]> FLUSH PRIVILEGES;
+```
+
+Now logout and login again with
+
+```
+$ ${HOME}/opt/gn_profiles/gn2_latest/bin/mysql \
+ --defaults-file=${HOME}/genenetwork/mariadb/my.cnf --password mysql
+```
+
+enter the newly set password and voila, you are logged in and your user has the password set up.
+
+Now, setup a new user, say webqtlout, and a default database they can connect to
+
+```
+MariaDB [mysql]> CREATE DATABASE webqtlout;
+MariaDB [mysql]> CREATE USER 'webqtlout'@'localhost' IDENTIFIED BY '<some-password>';
+MariaDB [mysql]> GRANT ALL PRIVILEGES ON webqtlout.* TO 'webqtlout'@'localhost';
+```
+
+Now logout, and log back in as the new webqtlout user:
+
+```
+${HOME}/opt/gn_profiles/gn2_latest/bin/mysql \
+ --defaults-file=${HOME}/genenetwork/mariadb/my.cnf \
+ --user=webqtlout --host=localhost --password webqtlout
+```
+
+and enter the password you provided.
+
+
+## Setting up the Small Database
+
+Download the database from
+
+=> http://ipfs.genenetwork.org/ipfs/QmRUmYu6ogxEdzZeE8PuXMGCDa8M3y2uFcfo4zqQRbpxtk
+
+Say you downloaded the file in ${HOME}/Downloads, you can now add the database to your server.
+
+First stop the server:
+
+```
+$ ps aux | grep mysqld # get the process ids
+$ kill -s SIGTERM <pid-of-mysqld> <pid-of-mysqld_safe>
+```
+
+Now extract the database archive in the mysql data directory:
+
+```
+$ cd ${HOME}/genenetwork/mariadb/var/lib/data
+$ p7zip -k -d ${HOME}/Downloads/db_webqtl_s.7z
+```
+
+Now restart the server:
+
+```
+${HOME}/opt/gn_profiles/gn2_latest/bin/mysqld_safe \
+ --defaults-file=${HOME}/genenetwork/mariadb/my.cnf
+```
+
+Then update the databases
+
+```
+$ ${HOME}/opt/gn_profiles/gn2_latest/bin/mysql_upgrade \
+ --defaults-file=${HOME}/genenetwork/mariadb/my.cnf \
+ --user=frederick --password --force
+```
+
+and login as the administrative user:
+
+```
+$ ${HOME}/opt/gn_profiles/gn2_latest/bin/mysql \
+ --defaults-file=${HOME}/genenetwork/mariadb/my.cnf \
+ --user=$(whoami) --password
+```
+
+and grant the privileges to your normal user:
+
+```
+MariaDB [mysql]> GRANT ALL PRIVILEGES ON db_webqtl_s.* TO 'webqtlout'@'localhost';
+```
+
+now logout as the administrative user and log back in as the normal user
+
+```
+${HOME}/opt/gn_profiles/gn2_latest/bin/mysql \
+ --defaults-file=${HOME}/genenetwork/mariadb/my.cnf \
+ --user=webqtlout --host=localhost --password db_webqtlout_s
+
+MariaDB [db_webqtlout_s]> SELECT * FROM ProbeSetData LIMIT 20;
+```
+
+verify you see some data.
+
+### A Note on Connection to the Server
+
+So far, we have been connecting to the server by specifying --defaults-file option, e.g.
+
+```
+${HOME}/opt/gn_profiles/gn2_latest/bin/mysql \
+ --defaults-file=${HOME}/genenetwork/mariadb/my.cnf \
+ --user=webqtlout --host=localhost --password db_webqtlout_s
+```
+
+which allows connection via the unix socket.
+
+We could drop that specification and connect via the port with:
+
+```
+${HOME}/opt/gn_profiles/gn2_latest/bin/mysql \
+ --user=webqtlout --host=127.0.0.1 --port=3307 --password db_webqtlout_s
+```
+
+In this version, the host specification was changed from
+```
+--host=localhost
+```
+to
+```
+--host=127.0.0.1
+```
+
+whereas, the **--defaults-file** file specification was dropped and a new **--port** specification was added.
diff --git a/topics/systems/dns-changes.gmi b/topics/systems/dns-changes.gmi
index 7c42589..a535cab 100644
--- a/topics/systems/dns-changes.gmi
+++ b/topics/systems/dns-changes.gmi
@@ -9,15 +9,22 @@ We are moving thing to a new DNS hosting service. We have accounts on both. To m
* Sign in to your GoDaddy account.
* Export the DNS record to a file
* Print the DNS settings to a PDF
-* Start a transfer from DNSsimple to get an auth code
+* On GoDaddy disable WHOIS privacy protection (on the domains table)
+* On GoDaddy start a transfer from DNSsimple to get an auth code
+ Click your username at the top right of the page.
+ Select My Products.
+ Click Manage next to the relevant domain.
+ Scroll down to Additional Settings.
+ Click Get authorization code. Note: If you have more than 6 domains in your account, click Email my code
- + Set transfer on DNSsimple - tick DNS box
- + Check DNS on switch - it may not be completely automatic
- + Cherk record on DNSsimple
- + Check transfer with `dig systemsgenetics.org NS`
* On DNSimple add the authorisation code under Tamara
-* Import DNS settings on DNSimple
+ + Set transfer on DNSimple - tick DNS box
+ + Check the `DNS on' switch - it may not be completely automatic
+ + Cherk record on DNSimple
+ + Check transfer with `dig systemsgenetics.org NS`
+* Import DNS settings on DNSimple (cut-N-paste)
+ + Edit delegation - make sure the delegation box is set
+=> https://support.dnsimple.com/articles/delegating-dnsimple-registered
+* Test
+ + dig systemsgenetics.org [NS]
+ + dig systemsgenetics.org @ns1.dnsimple.com NS
+ + whois systemsgenetics.org
diff --git a/topics/systems/migrate-p2.gmi b/topics/systems/migrate-p2.gmi
new file mode 100644
index 0000000..c7fcb90
--- /dev/null
+++ b/topics/systems/migrate-p2.gmi
@@ -0,0 +1,12 @@
+* Penguin2 crash
+
+This week the boot partition of P2 crashed. We have a few lessons here, not least having a fallback for all services ;)
+
+* Tasks
+
+- [ ] setup space.uthsc.edu for GN2 development
+- [ ] update DNS to tux02 128.169.4.52 and space 128.169.5.175
+- [ ] move CI/CD to tux02
+
+
+* Notes
diff --git a/topics/systems/orchestration.gmi b/topics/systems/orchestration.gmi
new file mode 100644
index 0000000..5e0a298
--- /dev/null
+++ b/topics/systems/orchestration.gmi
@@ -0,0 +1,35 @@
+* Orchestration and fallbacks
+
+After the Penguin2 crash in Aug. 2022 it has become increasingly clear how hard it is to deploy GeneNetwork. GNU Guix helps a great deal with dependencies, but it does not handle orchestration between machines/services well. Also we need to look at the future.
+
+What is GN today in terms of services
+
+ 1. Main GN2 server (Python, 20+ processes, 3+ instances: depends on all below)
+ 2. Matching GN3 server and REST endpoint (Python: less dependencies)
+ 3. Mariadb
+ 4. redis
+ 5. virtuoso
+ 6. GN-proxy (Racket, authentication handler: redis, mariadb)
+ 7. Alias proxy (Racket, gene aliases wikidata)
+ 8. Jupyter R and Julia notebooks
+ 9. BNW server (Octave)
+10. UCSC browser
+11. GN1 instances (older python, 12 instances in principle, 2 running today)
+12. Access to HPC for GEMMA (coming)
+13. Backup services (sheepdog, rsync, borg)
+14. monitoring services (incl. systemd, gunicorn, shepherd, sheepdog)
+15. mail server
+16. https certificates
+17. http(s) proxy (nginx)
+18. CI/CD server (with github webhooks)
+
+I am still missing a few! All run by a man and his diligent dog.
+
+For the future the orchestration needs to be more robust and resilient. This means:
+
+ 1. A fallback for every service on a separate machine
+ 2. Improved privacy protection for (future) human data
+ 3. Separate servers serving different data sources
+ 4. Partial synchronization between data sources
+
+The only way we *can* scale is by adding machines. But the system is not yet ready for that. Also getting rid of monolithic primary databases in favor of files helps synchronization.
diff --git a/topics/use-exceptions-to-indicate-errors.gmi b/topics/use-exceptions-to-indicate-errors.gmi
new file mode 100644
index 0000000..e302dd3
--- /dev/null
+++ b/topics/use-exceptions-to-indicate-errors.gmi
@@ -0,0 +1,16 @@
+# Use exceptions to indicate errors
+
+Often, we indicate that a function has encountered an error by returning a None value. Here's why this is a bad idea and why you should use exceptions instead.
+
+When we return None values to indicate errors, we have to take care to check the return value of every function call and propagate errors higher and higher up the function call stack until we reach a point where the error is handled. This clutters up the code, and is one reason why writing correct code in languages like C that don't have exceptions is a pain.
+
+With exceptions, we only have to create an exception handler (try/except block in Python) at the highest level. Any exception raised by functions below that level are automatically passed on to the except block with no additional programmer effort.
+
+Here's an example where we run mapping, and if there's an error, we return an error page. Else, we return the results page. Notice that we do not check the return value template_vars.
+```
+try:
+ template_vars = run_mapping.RunMapping(start_vars, temp_uuid)
+ return render_template("mapping_results.html", **template_vars)
+except:
+ return render_template("mapping_error.html")
+```
diff --git a/topics/uthsc-vpn-with-free-software.gmi b/topics/uthsc-vpn-with-free-software.gmi
new file mode 100644
index 0000000..1593c3a
--- /dev/null
+++ b/topics/uthsc-vpn-with-free-software.gmi
@@ -0,0 +1,37 @@
+# UTHSC VPN with free software
+
+It is possible to connect to the UTHSC VPN using only free software. For this, you need the openconnect-sso package. openconnect-sso is a wrapper around openconnect that handles the web-based single sign-on and runs openconnect with the right arguments.
+=> https://github.com/vlaci/openconnect-sso/ openconnect-sso
+=> https://www.infradead.org/openconnect/ openconnect
+
+To connect, run openconnect-sso as follows. A browser window will pop up for you to complete the Duo authentication. Once done, you will be connected to the VPN.
+```
+$ openconnect-sso --server uthscvpn1.uthsc.edu --authgroup UTHSC
+```
+Note that openconnect-sso should be run as a regular user, not as root. After passing Duo authentication, openconnect-sso will try to gain root priviliges to set up the network routes. At that point, it will prompt you for your password using sudo.
+
+## Avoid tunneling all your network traffic through the VPN (aka Split Tunneling)
+
+openconnect, by default, tunnels all your traffic through the VPN. This is not good for your privacy. It is better to tunnel only the traffic destined to the specific hosts that you want to access. This can be done using the vpn-slice script.
+=> https://github.com/dlenski/vpn-slice/ vpn-slice
+
+For example, to connect to the UTHSC VPN but only access the hosts tux01 and tux02e through the VPN, run the following command.
+```
+$ openconnect-sso --server uthscvpn1.uthsc.edu --authgroup UTHSC -- --script 'vpn-slice tux01 tux02e'
+```
+The vpn-slice script looks up the hostnames tux01 and tux02e on the VPN DNS and adds /etc/hosts entries and routes to your system. vpn-slice can also set up more complicated routes. To learn more, read the vpn-slice documentation.
+
+## qtwebengine text rendering bug
+
+There is currently a bug in Guix with qtwebengine text rendering.
+=> https://issues.guix.gnu.org/52672
+This causes text to not render in the Duo authentication browser window. Until this bug is fixed, work around it by setting the following environment variable.
+```
+export QTWEBENGINE_CHROMIUM_FLAGS=--disable-seccomp-filter-sandbox
+```
+
+## Acknowledgement
+
+Many thanks to Pjotr Prins and Erik Garrison without whose earlier work this guide would not be possible.
+=> https://github.com/pjotrp/linux-at-university-of-tennessee
+=> https://github.com/ekg/openconnect-sso-docker