# ProbeSetData This is by far the largest table (~100Gb). I need to add disk space to be able to host it on the NVME. Prototocol from => ./move-to-innodb.gmi Every table update has to follow the template: ### Check recent backups * [X] ascertain there is a backup * [ ] copy original files ### Make temporary table If you have enough space, you can create a copy of the actual table and do the work on that: ``` CREATE TABLE mytest AS SELECT * FROM ProbeSetData; ``` Then you can change the column as desired: ``` ALTER TABLE tbl_name MODIFY COLUMN col_name BIGINT AUTO_INCREMENT; ``` Once the process is done, you can rename the tables: ``` DROP ProbeSetData; RENAME TABLE tbl_name TO new_tbl_name, tbl_name2 TO ProbeSetData; ``` ### Check table structure ``` SHOW CREATE TABLE ProbeSetData; ``` A really simple table ``` | ProbeSetData | CREATE TABLE 'ProbeSetData' ( 'Id' int(10) unsigned NOT NULL DEFAULT 0, 'StrainId' int(20) NOT NULL, 'value' float NOT NULL, UNIQUE KEY 'DataId' ('Id','StrainId'), KEY 'strainid' ('StrainId') ) ENGINE=MyISAM DEFAULT CHARSET=latin1 | ``` ``` MariaDB [db_webqtl]> select * from ProbeSetData limit 2; +----+----------+-------+ | Id | StrainId | value | +----+----------+-------+ | 1 | 1 | 5.742 | | 1 | 2 | 5.006 | +----+----------+-------+ 2 rows in set (0.000 sec) ``` But large ``` MariaDB [db_webqtl]> select count(*) from ProbeSetData; +------------+ | count(*) | +------------+ | 5141631478 | +------------+ 1 row in set (0.000 sec) ``` Do this also on Penguin2. ``` | ProbeSetData | CREATE TABLE 'ProbeSetData' ( 'Id' int(10) unsigned NOT NULL DEFAULT 0, 'StrainId' smallint(5) unsigned NOT NULL DEFAULT 0, 'value' float NOT NULL, UNIQUE KEY 'DataId' ('Id','StrainId') ) ENGINE=InnoDB DEFAULT CHARSET=latin1 | ``` The table layout is a bit puzzling. ``` MariaDB [db_webqtl]> select max(StrainId) from ProbeSetData; +---------------+ | max(StrainId) | +---------------+ | 65777 | +---------------+ MariaDB [db_webqtl]> select max(Id) from ProbeSetData; +----------+ | max(Id) | +----------+ | 91406791 | +----------+ ``` For StrainID MediumInt is a better fit with unsigned range is 0 to 16777215 and Id should be a INT (instead of BIGINT for the time being). That will make the table+indices smaller and faster. When a value overflows MariaDB will throw an error. So even if we grow out of values we can quickly resize the table in, say, 10 years. That is, if we are still using SQL. ### Check GN1,2,3 code for use of table ``` rg ProbeSetData --color=always |less -R rg ProbeSetData --type=py -l|fzf --preview="rg --color=always -A 20 ProbeSetData {}" --preview-window=right:85%:wrap ``` No surprises from the code scan. ### Create test Some select statement and maybe a page of GN2. The tests we can share from ProbeSetXRef. Page of GN2 is the mapping page, e.g. => http://genenetwork.org/show_trait?trait_id=1427571_at&dataset=HC_M2_0606_P ### Create primary key ``` ALTER TABLE ProbeSetData ADD PRIMARY KEY(col1,col2); ``` ### Create indices ``` SHOW INDEX FROM ProbeSetData; DROP INDEX ProbeSetId ON ProbeSetData; ``` ### Convert to innodb ``` ALTER TABLE ProbeSetData ENGINE = InnoDB; ``` ### Change charset ``` ALTER TABLE ProbeSetSE CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci; ``` ### Update fulltext For those cases see bottom of move-to-innodb.gmi. ### Run optimiser ``` OPTIMIZE NO_WRITE_TO_BINLOG TABLE ProbeSetData; ``` After running the optimiser rename the tables ``` RENAME TABLE orig TO orig_old, mytest TO orig; ``` ### Check test