# ProbeSetData

This is by far the largest table (~100Gb). I need to add disk space to be able to host it on the NVME.

Prototocol from

=> ./move-to-innodb.gmi

Every table update has to follow the template:

### Check recent backups

* [X] ascertain there is a backup
* [ ] copy original files

### Make temporary table

If you have enough space, you can create a copy of the actual table and do the work on that:

```
CREATE TABLE mytest AS SELECT * FROM ProbeSetData;
```

Then you can change the column as desired:

```
ALTER TABLE tbl_name MODIFY COLUMN col_name BIGINT AUTO_INCREMENT;
```

Once the process is done, you can rename the tables:

```
DROP ProbeSetData;
RENAME TABLE tbl_name TO new_tbl_name, tbl_name2 TO ProbeSetData;
```

### Check table structure

```
SHOW CREATE TABLE ProbeSetData;
```

A really simple table

```
| ProbeSetData | CREATE TABLE 'ProbeSetData' (
  'Id' int(10) unsigned NOT NULL DEFAULT 0,
  'StrainId' int(20) NOT NULL,
  'value' float NOT NULL,
  UNIQUE KEY 'DataId' ('Id','StrainId'),
  KEY 'strainid' ('StrainId')
) ENGINE=MyISAM DEFAULT CHARSET=latin1 |
```

```
MariaDB [db_webqtl]> select * from ProbeSetData limit 2;
+----+----------+-------+
| Id | StrainId | value |
+----+----------+-------+
|  1 |        1 | 5.742 |
|  1 |        2 | 5.006 |
+----+----------+-------+
2 rows in set (0.000 sec)
```

But large

```
MariaDB [db_webqtl]> select count(*) from ProbeSetData;
+------------+
| count(*)   |
+------------+
| 5141631478 |
+------------+
1 row in set (0.000 sec)

```

Do this also on Penguin2.

```
| ProbeSetData | CREATE TABLE 'ProbeSetData' (
  'Id' int(10) unsigned NOT NULL DEFAULT 0,
  'StrainId' smallint(5) unsigned NOT NULL DEFAULT 0,
  'value' float NOT NULL,
  UNIQUE KEY 'DataId' ('Id','StrainId')
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
```

The table layout is a bit puzzling.

```
MariaDB [db_webqtl]> select max(StrainId) from ProbeSetData;
+---------------+
| max(StrainId) |
+---------------+
|         65777 |
+---------------+
MariaDB [db_webqtl]> select max(Id) from ProbeSetData;
+----------+
| max(Id)  |
+----------+
| 91406791 |
+----------+
```

For StrainID MediumInt is a better fit with unsigned range is 0 to 16777215 and Id should be a INT (instead of BIGINT for the time being). That will make the table+indices smaller and faster.

### Check GN1,2,3 code for use of table

```
rg ProbeSetData --color=always |less -R
rg ProbeSetData --type=py -l|fzf --preview="rg --color=always -A 20 ProbeSetData {}" --preview-window=right:85%:wrap
```

No surprises from the code scan.

### Create test

Some select statement and maybe a page of GN2.

The tests we can share from ProbeSetXRef.

Page of GN2 is the mapping page, e.g.

=> http://genenetwork.org/show_trait?trait_id=1427571_at&dataset=HC_M2_0606_P

### Create primary key

```
ALTER TABLE ProbeSetData
  ADD PRIMARY KEY(col1,col2);
```

### Create indices

```
SHOW INDEX FROM ProbeSetData;
DROP INDEX ProbeSetId ON ProbeSetData;
```

### Convert to innodb

```
ALTER TABLE ProbeSetData ENGINE = InnoDB;
```

### Change charset

```
ALTER TABLE ProbeSetSE CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;
```

### Update fulltext

For those cases see bottom of move-to-innodb.gmi.

### Run optimiser

```
OPTIMIZE NO_WRITE_TO_BINLOG TABLE ProbeSetData;
```

After running the optimiser rename the tables

```
RENAME TABLE orig TO orig_old, mytest TO orig;
```

### Check test