1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
|
# ProbeSetData
This is by far the largest table (~100Gb). I need to add disk space to be able to host it on the NVME.
Prototocol from
=> ./move-to-innodb.gmi
Every table update has to follow the template:
### Check recent backups
* [X] ascertain there is a backup
* [ ] copy original files
### Make temporary table
If you have enough space, you can create a copy of the actual table and do the work on that:
```
CREATE TABLE mytest AS SELECT * FROM ProbeSetData;
```
Then you can change the column as desired:
```
ALTER TABLE tbl_name MODIFY COLUMN col_name BIGINT AUTO_INCREMENT;
```
Once the process is done, you can rename the tables:
```
DROP ProbeSetData;
RENAME TABLE tbl_name TO new_tbl_name, tbl_name2 TO ProbeSetData;
```
### Check table structure
```
SHOW CREATE TABLE ProbeSetData;
```
A really simple table
```
| ProbeSetData | CREATE TABLE 'ProbeSetData' (
'Id' int(10) unsigned NOT NULL DEFAULT 0,
'StrainId' int(20) NOT NULL,
'value' float NOT NULL,
UNIQUE KEY 'DataId' ('Id','StrainId'),
KEY 'strainid' ('StrainId')
) ENGINE=MyISAM DEFAULT CHARSET=latin1 |
```
```
MariaDB [db_webqtl]> select * from ProbeSetData limit 2;
+----+----------+-------+
| Id | StrainId | value |
+----+----------+-------+
| 1 | 1 | 5.742 |
| 1 | 2 | 5.006 |
+----+----------+-------+
2 rows in set (0.000 sec)
```
But large
```
MariaDB [db_webqtl]> select count(*) from ProbeSetData;
+------------+
| count(*) |
+------------+
| 5141631478 |
+------------+
1 row in set (0.000 sec)
```
Do this also on Penguin2.
```
| ProbeSetData | CREATE TABLE 'ProbeSetData' (
'Id' int(10) unsigned NOT NULL DEFAULT 0,
'StrainId' smallint(5) unsigned NOT NULL DEFAULT 0,
'value' float NOT NULL,
UNIQUE KEY 'DataId' ('Id','StrainId')
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
```
The table layout is a bit puzzling.
```
MariaDB [db_webqtl]> select max(StrainId) from ProbeSetData;
+---------------+
| max(StrainId) |
+---------------+
| 65777 |
+---------------+
MariaDB [db_webqtl]> select max(Id) from ProbeSetData;
+----------+
| max(Id) |
+----------+
| 91406791 |
+----------+
```
For StrainID MediumInt is a better fit with unsigned range is 0 to 16777215 and Id should be a INT (instead of BIGINT for the time being). That will make the table+indices smaller and faster.
When a value overflows MariaDB will throw an error. So even if we grow out of values we can quickly resize the table in, say, 10 years. That is, if we are still using SQL.
### Check GN1,2,3 code for use of table
```
rg ProbeSetData --color=always |less -R
rg ProbeSetData --type=py -l|fzf --preview="rg --color=always -A 20 ProbeSetData {}" --preview-window=right:85%:wrap
```
No surprises from the code scan.
### Create test
Some select statement and maybe a page of GN2.
The tests we can share from ProbeSetXRef.
Page of GN2 is the mapping page, e.g.
=> http://genenetwork.org/show_trait?trait_id=1427571_at&dataset=HC_M2_0606_P
### Create primary key
```
ALTER TABLE ProbeSetData
ADD PRIMARY KEY(col1,col2);
```
### Create indices
```
SHOW INDEX FROM ProbeSetData;
DROP INDEX ProbeSetId ON ProbeSetData;
```
### Convert to innodb
```
ALTER TABLE ProbeSetData ENGINE = InnoDB;
```
### Change charset
```
ALTER TABLE ProbeSetSE CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;
```
### Update fulltext
For those cases see bottom of move-to-innodb.gmi.
### Run optimiser
```
OPTIMIZE NO_WRITE_TO_BINLOG TABLE ProbeSetData;
```
After running the optimiser rename the tables
```
RENAME TABLE orig TO orig_old, mytest TO orig;
```
### Check test
|