summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--issues/slow-sql-query-for-xapian-indexing.gmi (renamed from issues/slow-sql-queary-for-xapian-indexing.gmi)40
-rw-r--r--issues/systems/mariadb/move-to-innodb.gmi6
-rw-r--r--tasks/pjotrp.gmi2
3 files changed, 42 insertions, 6 deletions
diff --git a/issues/slow-sql-queary-for-xapian-indexing.gmi b/issues/slow-sql-query-for-xapian-indexing.gmi
index 5ced977..914d1d1 100644
--- a/issues/slow-sql-queary-for-xapian-indexing.gmi
+++ b/issues/slow-sql-query-for-xapian-indexing.gmi
@@ -65,6 +65,12 @@ for t in 'cat tables.lst' ; do mysql -uwebqtlout -pwebqtlout db_webqtl -e "show
which shows that ProbeSet and Geno have utf8 and the others utf8mb4. That may be worth checking. Different text types add overheads to searches (it translates from one to the other).
+In MySQL utf8 is an alias for utf8mb3 which is deprecated and will be removed in a future MySQL release. At that point utf8 will become a reference to utf8mb4. utf8mb4 is a UTF-8 encoding of the Unicode character set using one to four bytes per character.
+
+So we should be using utf8mb4.
+
+This also fixed the global search query.
+
## Indices
ProbeSet has a large number of columns, indices and full text indices(!)
@@ -74,9 +80,41 @@ ProbeSet has a large number of columns, indices and full text indices(!)
The obvious thing to try is to reduce the query and see what speeds it up again. So, let's try removing the utf8 tables and indices first. E.g.
```
-ALTER TABLE ProbeSet CONVERT TO CHARACTER SET utf8mb4 ;
+ALTER TABLE ProbeSet CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;
+ALTER TABLE Geno CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;
+```
+
+Unfortunately I was getting 'Specified key was too long; max key length is 3072 bytes' again as in
+
+=> ./move-to-innodb.gmi
+
+We use the full length of the text fields to create the index in
+
+```
+FULLTEXT KEY `ft_ProbeSet_full` (`Name`,`description`,`Symbol`,`alias`,`GenbankId`,`UniGeneId`,`Probe_Target_Description`),
+```
+
+BLOB and TEXT columns also can be indexed, but a prefix length must be given not to exceed the maximum length. In global search we used the match function - that has to scan the full text.
+
+```
+MATCH (ProbeSet.Name,ProbeSet.description,ProbeSet.symbol,ProbeSet.alias,Probe Set.GenbankId, ProbeSet.UniGeneId, ProbeSet.Probe_Target_Description) AGAINST ('brca2' IN BOOLEAN MODE)
+```
+
+With xapian search that is no longer done through SQL. We'll need to toy a bit with this index. We need to check what 'menu' search does, though we want to consolidate that with xapian (anyhow).
+
+```
+DROP INDEX ft_ProbeSet_full ON ProbeSet;
+```
+
+and recreate
+
+```
+CREATE FULLTEXT INDEX ft_ProbeSet_Alias ON ProbeSet (alias);
+CREATE FULLTEXT INDEX ft_ProbeSet_Alias ON ProbeSet (alias);
+CREATE FULLTEXT INDEX ft_ProbeSet_full ON ProbeSet (Name,description,Symbol,alias,GenbankId,UniGeneId,Probe_Target_Description);
```
+This made the query fast again (100K results in 2s).
## Further reading
diff --git a/issues/systems/mariadb/move-to-innodb.gmi b/issues/systems/mariadb/move-to-innodb.gmi
index fdd3c28..4218c4f 100644
--- a/issues/systems/mariadb/move-to-innodb.gmi
+++ b/issues/systems/mariadb/move-to-innodb.gmi
@@ -583,20 +583,16 @@ Turned out simple utf8 worked:
ALTER TABLE ProbeSet CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;
```
-and utf8mb is not recommended. We'll need fix that FIXME. See
+and utf8mb3 is not recommended. We'll need fix that FIXME. See
=> https://stackoverflow.com/questions/766809/whats-the-difference-between-utf8-general-ci-and-utf8-unicode-ci
-pjotrp 16 Oct 2019 at 09:24 (edited)
-
Full text search is the least straightforward, see
and
Now for ProbeSet - one of the critical locked tables we need this.
-pjotrp 15 Oct 2019 at 16:38 (edited)
-
```
time mysqlcheck -c -u webqtlout -pwebqtlout db_webqtl
db_webqtl.Docs
diff --git a/tasks/pjotrp.gmi b/tasks/pjotrp.gmi
index f9bc57a..998f391 100644
--- a/tasks/pjotrp.gmi
+++ b/tasks/pjotrp.gmi
@@ -41,6 +41,8 @@ Later
=> ./machine-room.gmi machine room
+* [ ] Fix mariadb index search - see
+=> slow-sq
* [ ] Frontend for GN4MSK
* [.] Check Tony's list and improve search for SNPs and Hs
* [ ] GeneNetwork consortium (paper & CTC)