topics/data-uploads/editing-data.gmi


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145

## Tags

* assigned: bonfacem

### Introduction

At the moment, you can edit metadata related to a published phenotype and a probeset. When an edit is done, the diff data is stored in a table, `metadata_audit` in json format that looks something like:

```js
{"Probeset": {"id_": {"old": "108845", "new": "108845"}, "description": {"old": "EST BG068973 (olfactory receptor related sequence).", "new": "EST BG068973 (olfactory receptor related sequence). Test"}}, "probeset_name": "1446112_at", "author": "Bonface Munyoki K.", "timestamp": "2021-07-10 08:31:18"}

{"Probeset": {"id_": {"old": "108845", "new": "108845"}, "description": {"old": "EST BG068973 (olfactory receptor related sequence). Test", "new": "EST BG068973 (olfactory receptor related sequence). Testing"}}, "probeset_name": "1446112_at", "author": "Bonface Munyoki K.", "timestamp": "2021-07-10 08:31:36"}

{"Probeset": {"id_": {"old": "108845", "new": "108845"}, "description": {"old": "EST BG068973 (olfactory receptor related sequence). Testing", "new": "EST BG068973 (olfactory receptor related sequence)."}}, "probeset_name": "1446112_at", "author": "Bonface Munyoki K.", "timestamp": "2021-07-10 10:15:37"}

{"Phenotype": {"pre_pub_description": {"old": "Testing the description", "new": "Testing the description"}, "post_pub_description": {"old": "Central nervous system, morphology: Internal granule layer (IGL) of the cerebellum volume, adjusted for sex, age, body and brain weight [mm3]", "new": "Central nervous system, morphology: Internal granule layer (IGL) of the cerebellum volume, adjusted for sex, age, body and brain weight [mm3]"}, "original_description": {"old": "Original post publication description: Central nervous system, morphology: Internal granule layer (IGL) of the cerebellum volume, adjusted for sex, age, body and brain weight [mm3]", "new": "Original post publication description: Central nervous system, morphology: Internal granule layer (IGL) of the cerebellum volume, adjusted for sex, age, body and brain weight [mm3]"}, "units": {"old": "mm3", "new": "mm3"}, "pre_pub_abbreviation": {"old": "", "new": ""}, "post_pub_abbreviation": {"old": "AdjIGLVol", "new": "AdjIGLVol"}, "lab_code": {"old": "", "new": ""}, "submitter": {"old": "robwilliams", "new": "robwilliams"}, "owner": {"old": "", "new": ""}, "authorized_users": {"old": "robwilliams", "new": "robwilliams"}}, "dataset_id": "10007", "author": "Bonface Munyoki K.", "timestamp": "2021-07-16 16:35:56"}

{"Phenotype": {"pre_pub_description": {"old": "Testing the description", "new": "Testing the descriptiony(test)"}}, "dataset_id": "10007", "author": "Bonface Munyoki K.", "timestamp": "2021-07-16 18:25:15"}
```

Note how we use one table to store different diffs from different data types. This is important, since if the need ever arises to use a different schema or db altogether, we only use one column.


### Setup

First, run this sql script to add the "metadata<sub>audit</sub>" table:

=> https://github.com/genenetwork/genenetwork3/blob/main/sql/metadata_audit.sql Sql script to run

Example

```
mysql -u webqtlout db_webqtl < metadata_audit.sql
```

And check this works

```
select * FROM information_schema.COLUMNS WHERE table_schema=DATABASE() AND TABLE_NAME='metadata_audit';
```

For everything to run, use the latest commit on the main branch from gn3. One way to make sure that you are running the latest changes, make sure that you have genenetwork3 cloned somewhere in your path; and keep it updated. Thereafter set, a PATH in "bin/genenetwork2" which should look something similar to `export PYTHONPATH=/home/bonface/projects/genenetwork3:$PYTHON_GN_PATH:$GN2_BASE_DIR/wqflask:$PYTHONPATH`.

Alternatively, install gn3 in it's own profile. This can be sometimes rinconvenient since the latest changes from gn3 can be ahead of the the package definition in our channel.


### How to edit a trait Editing a trait/ probeset

These are the steps you can do to edit a phenotype/ probeset:

- Log in to a trait page or a probeset (Make sure that your are logged in). From the "Trait Data and Analysis page" for the probeset/ trait, there's an "Edit" button under "Details and Links". Use this to edit the metadata.

- An example of editing a published phenotype is: "*trait/10007/edit/inbredset-id/10007*"

After editing traits, you could see changes in a diff format at the very top of the update page.


### Notes

- Atm, creds are hard-coded. See:

=> https://github.com/genenetwork/genenetwork2/blob/testing/wqflask/wqflask/decorators.py

That needs to be improved.

- For Probeset data, atm some tables aren't yet updated(unfortunately those tables aren't used anywhere). I need to figure that out.

#### Tue 23 Nov 2021

- Fixed excel issue described here:

=> https://github.com/genenetwork/genenetwork2/compare/testing...BonfaceKilz:bug/fix-excel-adding-new-line?expand=1

### Wed 22 Dec 2021

Consider the case where a user deletes, updates, or modifies data from
Genenetwork using the csv download they got from the data-upload page.
In the event they do not approve the data, and they upload that csv
file again, we have a case in gn2 where data needs to be deleted,
inserted, or modified twice.  In the case of deletions, or insertions,
an error will be generated because you cannot remove or insert that
same data twice.  Also, the count that's displayed as a flash message
will not be correct (visually) since-- at the time of this writing--
we count the number of edits from a CSV file.  A working patch will be
sent later today(or tomorrow).

Also, the GN2 build is not working.  With that in mind below I outline
how I've plugged in pudb, inspired by jgarte's demo:

=> https://git.genenetwork.org/jgart/qc_crlf_demo

- First, in *wqflask/runserver*, I removed Flask DebugToolbar(this
  somehow made Flask crash, and I haven't investigated it).  Here is
  how the diff looks:

```
modified   wqflask/runserver.py
@@ -23,9 +23,7 @@ app_config()
 werkzeug_logger = logging.getLogger('werkzeug')
 
 if WEBSERVER_MODE == 'DEBUG':
-    from flask_debugtoolbar import DebugToolbarExtension
     app.debug = True
-    toolbar = DebugToolbarExtension(app)
     app.run(host='0.0.0.0',
             port=SERVER_PORT,
             debug=True,

```

- As a work-around for the failing gn2 build, install pudb in it's own profile

```
guix install pudb -p ~/opt/pudb
```

- Add pudb to your python path in *bin/genenetwork2* as show in this
  diff:

```
-# We may change this one:
-export PYTHONPATH=$PYTHON_GN_PATH:$GN2_BASE_DIR/wqflask:$GN3_PYTHONPATH:$PYTHONPATH
+# We may change this one: 
+export PYTHONPATH=$GN2_BASE_DIR/wqflask:/home/bonface/opt/pudb/lib/python3.9/site-packages/:/home/bonface/projects/genenetwork3:$PYTHONPATH
```

- To use pudb, just stick `import pudb; pu.db` wherever you fancy.
  Happy debugging!

### Wed 26 Jan 2022

Fixed precision point errors.  During file upload, excel appended
extra values; so instead of `BXD1,18.8,x,x`, you'd have
`BXD1,18.8000001,20,x`.  There are 2 parts to fixing this:

- During generating the json file(when creating the diff), ignore
  modifications where |ε| > 0.0001.

- When listing diffs, ignore modification entries where |ε| > 0.0001.

For this fix, the second bit was ignored, because it'll lead to
unnecessarily complicated code, given that only one person has really
been testing the system.