summaryrefslogtreecommitdiff
path: root/issues/database-long-query-after-innodb-migration.gmi
blob: 65d6b0d0e3d6a1f63543aee73ad9ad7dbbe247f1 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
# slow text search query

A slow query turned out to do a join on latin1 and utf8 columns. That was
very slow!

The query contains

```
WHERE (((Phenotype.Post_publication_description
LIKE "%liver%" OR Phenotype.Pre_publication_description LIKE "%liver%" OR
Phenotype.Pre_publication_abbreviation LIKE "%liver%" OR
Phenotype.Post_publication_abbreviation LIKE "%liver%" OR
Phenotype.Lab_code LIKE "%liver%" OR Publication.PubMed_ID LIKE "%liver%"
OR Publication.Abstract LIKE "%liver%" OR Publication.Title LIKE "%liver%"
OR Publication.Authors LIKE "%liver%" OR PublishXRef.Id LIKE "%liver%") ))
```

Below page describes the issue. Essentially an index won't help and
mariadb will scan the whole file for every query. Not good.

=> https://stackoverflow.com/questions/2042269/how-to-speed-up-select-like-queries-in-mysql-on-multiple-columns

This is a typical candidate for FULLTEXT searches where we do a multi
match against the larger fields, e.g.

    Add a full text index on the columns that you need:

        ALTER TABLE table ADD FULLTEXT INDEX index_table_on_x_y_z (x, y, z);

    Then query those columns:

        SELECT * FROM table WHERE MATCH(x,y,z) AGAINST("text")

I think we can try creating a fulltext for index for Abstract, Title
and Authors - since these are longer strings.

Again, I note we are doing this the wrong way. We'll unify xapian -
have you seen how fast that is? But Arun and I need more time to get
the menu search in place.

So, let's try some things.

```
ALTER TABLE Publication ADD FULLTEXT INDEX index_table (Title, Abstract, Authors);
SELECT * FROM Publication WHERE MATCH(Title, Abstract, Authors) AGAINST("diabetes");
```

renders 23 rows in 0.001 seconds. The combined is still slow, so let's check the Phenotype table too. It has

```
Phenotype.Post_publication_description
Phenotype.Pre_publication_description
Phenotype.Pre_publication_abbreviation
Phenotype.Post_publication_abbreviation
Phenotype.Lab_code
Publication.PubMed_ID
```

not sure why we need most of these, but let's create an index

```
ALTER TABLE Phenotype ADD FULLTEXT INDEX index_table (Post_publication_description,Pre_publication_description,Pre_publication_abbreviation,Post_publication_abbreviation,Lab_code);
SELECT * FROM Phenotype WHERE MATCH(Post_publication_description,Pre_publication_description,Pre_publication_abbreviation,Post_publication_abbreviation,Lab_code) AGAINST("liver");
```

and that is fast too. Let's combine these. Still slow (darn!). So it must be on the joins.

```
                INNER JOIN InbredSet ON InbredSet.'SpeciesId' =
 Species.'Id'
                 INNER JOIN PublishXRef ON PublishXRef.'InbredSetId' =
 InbredSet.'Id'
                 INNER JOIN PublishFreeze ON PublishFreeze.'InbredSetId' =
 InbredSet.'Id'
                 INNER JOIN Publication ON Publication.'Id' =
 PublishXRef.'PublicationId'
                 INNER JOIN Phenotype ON Phenotype.'Id' =
 PublishXRef.'PhenotypeId'
                 LEFT JOIN Geno ON PublishXRef.Locus = Geno.Name AND
 Geno.SpeciesId = Species.Id
```

when I remove the final left join the query is fast. That means we can focus on Geno and PublishXRef tables.

First for some reason Geno was still latin1:

```
ALTER TABLE Geno CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;
```

After that the search is fast.

A nice search now:

```
SELECT PublishXRef.Id,
                 CAST(Phenotype.'Pre_publication_description' AS BINARY),
                 CAST(Phenotype.'Post_publication_description' AS BINARY),
                 Publication.'Authors',
                 Publication.'Year',
                 Publication.'PubMed_ID',
                 PublishXRef.'mean',
                 PublishXRef.'LRS',
                 PublishXRef.'additive',
                 PublishXRef.'Locus',
                 InbredSet.'InbredSetCode',
                 Geno.'Chr',
                 Geno.'Mb'
                 FROM Species
                 INNER JOIN InbredSet ON InbredSet.'SpeciesId' =
 Species.'Id'
                 INNER JOIN PublishXRef ON PublishXRef.'InbredSetId' =
 InbredSet.'Id'
                 INNER JOIN PublishFreeze ON PublishFreeze.'InbredSetId' =
 InbredSet.'Id'
                 INNER JOIN Publication ON Publication.'Id' =
 PublishXRef.'PublicationId'
                 INNER JOIN Phenotype ON Phenotype.'Id' =
 PublishXRef.'PhenotypeId'
                 LEFT JOIN Geno ON PublishXRef.Locus = Geno.Name AND
 Geno.SpeciesId = Species.Id
                         WHERE (((
 MATCH(Post_publication_description,Pre_publication_description,Pre_publication_abbreviation,Post_publication_abbreviat
ion,Lab_code) AGAINST("liver")
 OR Publication.PubMed_ID LIKE "%liver%"
 OR MATCH(Title, Abstract, Authors) AGAINST("liver")
 OR PublishXRef.Id LIKE "%liver%") ))
                         and PublishXRef.InbredSetId = 1
                         and PublishXRef.PhenotypeId = Phenotype.Id
                         and PublishXRef.PublicationId = Publication.Id
                         and PublishFreeze.Id = 1
                         ORDER BY PublishXRef.Id
```