summaryrefslogtreecommitdiff
path: root/issues/systems/gn2-time-machines.gmi
blob: 513a91a0437cb8fc2f4d56c5a1933f81132e5e0f (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
# GN2 Time Machines

GN1 time machines are pretty straightforward. With GN2 the complexity has increased a lot because of interacting services and a larger dependency graph.

Here I track what it takes today to install a fallback instance of GN2 that is 'frozen' in time.

## Tags

* assigned: pjotrp
* status: in progress
* priority: medium
* type: system administration
* keywords: systems, production

## Tasks

Also a time line:

* [X] Install machine software and physical (est. 4-8 hours)
* [X] Sync backups on a daily basis and add monitoring (2 hours)
* [X] Set up Mariadb and sync from backup (4 hours)
* [X] GN2 production environment with nginx & genotype_files (2 hours)
* [ ] GN3 Genenetwork3 service (Python)
* [ ] GN3 aliases server (Racket)
* [ ] GN3 auth proxy (Racket)
* [ ] set up https and letsencrypt
* [ ] setup logrotate for production log files
* [ ] Check performance and install monitors

## INFO

### Setting up Guix

We tend to install software in a guix profile. E.g.

```
guix pull -p ~/opt/guix-pull
. /home/wrk/opt/guix-pull/etc/profile
guix package -i mariadb -p /usr/local/guix-profiles/mariadb
```

To get to genenetwork we use a channel. The last working channel on the CI can be downloaded from https://ci.genenetwork.org/channels.scm. Now do

```
guix pull -C channels.scm -p ~/opt/guix-gn-channel
. ~/opt/guix-gn-channel/etc/profile
guix package -i genenetwork2 -p ~/opt/genenetwork2
```

That sets the profile to ~/opt/genenetwork2.

Note that these commands may take a while. And when guix starts building lots of software it may be necessary to configure a substitute server (we use guix.genenetwork.org) adding --substitute-urls="http://guix.genenetwork.org https://ci.guix.info".

### Mariadb (est. 1-2 hours)

Set up a global Mariadb

```
guix package -i mariadb -p /usr/local/guix-profiles/mariadb
```

Usually I use the Debian version to set up defaults

```
apt-get install mariadb
cd /etc/systemd/system
cp /lib/systemd/system/mariadb.service .
systemctl disable mariadb
```

Add  to systemd

```diff
+Type=simple
+CapabilityBoundingSet=CAP_IPC_LOCK CAP_DAC_OVERRIDE CAP_AUDIT_WRITE
+PrivateDevices=false
+ProtectHome=false
+ExecStart=/usr/local/guix-profiles/mariadb/bin/mariadbd --pid-file=/var/run/mysqld/mariadb.pid $MYSQLD_OPTS $_WSREP_NEW_CLUSTER $_WSREP_START_POSITION
+PIDFile=/usr/local/mysql/data/mysqld.pid
+# ExecStartPost=/bin/sh -c "systemctl unset-environment _WSREP_START_POSITION"
-ExecStartPost=/etc/mysql/debian-start
+RestartSec=15s
+TimeoutStartSec=infinity
+TimeoutStopSec=infinity
```

comment out the galera ExecStart too.

```
systemctl enable mariadb-guix.service
```

Make sure all symlinks point to our configuration file.

Before starting systemd you may want to make sure the database is running.

```
/usr/local/guix-profiles/mariadb/bin/mariadbd --pid-file=/var/run/mysqld/mariadb.pid --verbose (--help)
```

as root you should be able to login with

```
mysql -e 'show databases'
```

### Mariadb database from backup

We have daily incremental backups on P2, Tux02 and Epysode. First restore the files with

```
. ~/.borg-pass
cd /export2/tux01-restore
borg extract --progress /export2/backup/tux01/borg-tux01::borg-backup-mariadb-20220815-03:13-Mon
```

Extracting 430Gb takes about 90 minutes.

Now make sure mariadb is stopped. Copy the database to fast storage. Set permissions correctly:

```
chown mysql.mysql -R /var/lib/mysql
```

Check them and symlink the DB dir:

```
root@epysode:/export/tux01-mirror#
cp -vau /export2/tux01-restore/home/backup/tux01_mariadb_new .
systemctl stop mariadb
ln -s /export/tux01-mirror/tux01_mariadb_new/latest /var/lib/mysql
systemctl start mariadb
/usr/local/guix-profiles/guix-profiles/mariadb/bin/mysql_upgrade -u webqtlout -pwebqtlout
/export/backup/scripts/tux02/system_check.sh
```

In the process I discover that ibdata1 file has grown to 100GB. Not a problem yet, but we should purge that on production at some point

=> https://www.percona.com/blog/2013/08/20/why-is-the-ibdata1-file-continuously-growing-in-mysql/

(obviously we don't want to use mysqldump right now, but I'll need to do some future work).

### Setting up GN2

Create a gn2 user and checkout the git repo in /home/gn2/production/gene. Note that there exists also a backup of gn2 in borg which has a 'run_production.sh' script.

Running the script will give feedback

```
su gn2
cd /home/gn2/production/
sh run_production.sh
```

You'll find you need the Guix install of gn2. Starting with guix section above.

### Genotype files

GN2 requires a set of files that is in the backup

```
borg extract borg-genenetwork::borg-ZACH-home-20220819-04:04-Fri home/zas1024/gn2-zach/genotype_files/
```

move the genotype_files and update the path in `gn2_settings.py` which is in the same dir as the run_production.sh script.

### Configure Nginx

You'll need to tell Nginx to forward to the web server. Something like:

```
server {
    listen 80;
    server_name gn2-fallback.genenetwork.org;

    access_log  /var/log/nginx/gn2-danny-access.log;
    error_log  /var/log/nginx/gn2-danny-error.log;

    location / {
            proxy_pass         http://127.0.0.1:5000/;
            proxy_redirect     off;

            proxy_set_header   Host             $host;
            proxy_set_header   X-Real-IP        $remote_addr;
            proxy_set_header   X-Forwarded-For  $proxy_add_x_forwarded_for;

      client_max_body_size 8050m;
      proxy_read_timeout 300;
      proxy_connect_timeout 300;                                                                                 proxy_send_timeout 300;

    }
}
```

### Setting up GN3

Without gn3 the menu will not show on the main page and you see 'There was an error retrieving and setting the menu. Try again later.'

GN3 is a separate REST server that has its own dependencies. A bit confusingly it is also a Python module dependency for GN2. So we need to set up both 'routes'.

First checkout the genenetwork3 repo as gn2 user

```
su gn2
cd /home/gn2
mkdir -p gn3_production
cd gn3_production
git clone https://github.com/genenetwork/genenetwork3.git
```

Check the genenetwork3 README for latest instructions on starting the service as a Guix container. Typically

```
guix shell -C --network --expose=$HOME/production/genotype_files/ -Df guix.scm
```

where genotype_files is the dir you installed earlier.

Run it with, for example

```
export FLASK_APP="main.py"
flask run --port=8081
```

I.e., the same port as GN2 expects in gn2_settings.py. Test with

```
curl localhost:8081/api/version
"1.0"
```

Next set up the external API with nginx by adding the following path to above definition:

```
    location /gn3 {
            rewrite /gn3/(.*) /$1  break;
            proxy_pass         http://127.0.0.1:8081/;
            proxy_redirect     off;
            proxy_set_header   Host $host;
    }
```

and if DNS is correct you should get

```
curl gn2-fallback.genenetwork.org/gn3/api/version
"1.0"
```

To generate the main menu the server does a request to
$.ajax(gn_server_url +'api/menu/generate/json. On production that is
https://genenetwork.org/api3/api/menu/generate/json which is actually gn3(!)

```
curl http://gn2-fallback.genenetwork.org/gn3/api/menu/generate/json
```

If this gives an error check the gn3 output log.

Perhaps obviously, on a production server GN3 should be running as a proper service.

### Alias service

There is another GN3 service that resolves wikidata Gene aliases

```
su gn2
cd ~/gn3_production
git clone https://github.com/genenetwork/gn3.git
```

follow the instructions in the README and you should get

```
curl localhost:8000/gene/aliases/Shh
["Hx","ShhNC","9530036O11Rik","Dsh","Hhg1","Hxl3","M100081","ShhNC"]
```

### Authentication proxy

The proxy also needs to run.

```
su gn2
cd ~/gn3_production
git clone https://github.com/genenetwork/gn-proxy.git
```

See README

### Trouble shooting

Check the server log for errors from the server. There should be one in /home/gn2/production/tmp/. For example you may see

```
ERROR:wqflask:404: Not Found:  7:20AM UTC Aug 20, 2022: http://gn2-fallback.genenetwork.org/api/api/menu/generate/json
```

pointing out the setting in gn2_settings.py is wrong.

Use the console bar of the browse to see what JS error you get.

If you get CORS errors it is because you are using a server that is not genenetwork.org and this is usually a configuration issue.