diff options
author | Pjotr Prins | 2023-03-13 11:56:24 +0100 |
---|---|---|
committer | Pjotr Prins | 2023-03-13 11:56:24 +0100 |
commit | 6197e350b2696ab77f76f3b29dfaef253961c241 (patch) | |
tree | c3528118a39f1d8a4043b9e8132fc8b8d74f6573 /docs | |
parent | 8d26fd22a01950ef2f00626649639a1ec00c55ce (diff) | |
download | genenetwork3-6197e350b2696ab77f76f3b29dfaef253961c241.tar.gz |
Move doc to issue tracker
Diffstat (limited to 'docs')
-rw-r--r-- | docs/performance/slow-probesetdata-sql.org | 703 |
1 files changed, 0 insertions, 703 deletions
diff --git a/docs/performance/slow-probesetdata-sql.org b/docs/performance/slow-probesetdata-sql.org deleted file mode 100644 index 919eb37..0000000 --- a/docs/performance/slow-probesetdata-sql.org +++ /dev/null @@ -1,703 +0,0 @@ -* Slow ProbeSetData SQL query - -** The case - -We were looking to optimize a query for correlations. On Penguin2 (standard -drives, RAID5) the query took 1 hour. On Tux01 (solid state NVME) it took 6 -minutes. Adding an index for StrainId (see explorations below) reduced that -query to 3 minutes - which is kinda acceptable. The real problem, however is -that this is a quadratic search - so it will get worse quickly - and we need -to solve it. This table has doubled in size in the last 5 years. - -So what does ProbeSetData contain? - -#+BEGIN_SRC SQL -select * from ProbeSetData limit 5; -+----+----------+-------+ -| Id | StrainId | value | -+----+----------+-------+ -| 1 | 1 | 5.742 | -| 1 | 2 | 5.006 | -| 1 | 3 | 6.079 | -| 1 | 4 | 6.414 | -| 1 | 5 | 4.885 | -+----+----------+-------+ -#+END_SRC - -You can see Id is sectioned in the file (and there are not that many Ids) but -StrainId is *distributed* through the database file and some 'StrainIds' match -many data points. Id stands for Dataset (item) and StrainId really means -measurement type or trait measured(!) - -Our query looked for 1,236,088 measurement distributed over a 53Gb file (and -an even larger index file). Turns out the full table is read many many times -over for one particular query pivoting on strainid... - -We have the following options: - -1. Reorder the table -2. Use column based storage -3. Use compression -4. Use a different storage layout - -*** Reorder the table - -We could reorder the table on StrainID which would make this search much faster -but it would many common (dataset) queries slower. So, that is not a great -idea. One thing we could try is add a copy of the first table. Not exactly -elegant but a quick fix for sure. We'll need an embedded procedure to keep it -up-to-data. - -*** Use column based storage - -Column-based storage works when you need a subset of the data in the table. In -this case it won't help much because, even though the pivot itself would be -faster, we still traverse all data to get IDs and values. - -*** Use compression - -Compression reduces the size on disk and may be beneficial. Real life metrics -on the internet don't show that much improvement, but we could try native -compression and/or ZFS. - -*** Use a different storage layout - -My prediction is that we can not get around this. - -** Result - -The final query is shown at the bottom of 'Exploration'. It takes 2 minutes -on Penguin2 and Tux01. Essentially I took out the joins (which parsed the same -table repeatedly) and added an index. The trick is to keep minimizing the query. - -The 2 minute query will do for now and it probably is no longer quadratic. - -We can probably improve things in the future by changing the way ProbeSetData -is stored. - - -** Exploration - -In the following steps I reduce the complex case to a simple case explaining -the performance bottleneck we are seeing today. I did not add comments, but -you can see what I did. - -#+BEGIN_SRC SQL -MariaDB [db_webqtl]> select * from ProbeSetXRef limit 2; -+------------------+------------+--------+------------+--------------------+------------+------------------+---------------------+------------+------------------+--------+--------------------+------+ -| ProbeSetFreezeId | ProbeSetId | DataId | Locus_old | LRS_old | pValue_old | mean | se | Locus | LRS | pValue | additive | h2 | -+------------------+------------+--------+------------+--------------------+------------+------------------+---------------------+------------+------------------+--------+--------------------+------+ -| 1 | 1 | 1 | 10.095.400 | 13.3971627898894 | 0.163 | 5.48794285714286 | 0.08525787814808819 | rs13480619 | 12.590069931048 | 0.269 | -0.28515625 | NULL | -| 1 | 2 | 2 | D15Mit189 | 10.042057464356201 | 0.431 | 9.90165714285714 | 0.0374686634976217 | rs29535974 | 10.5970737900941 | 0.304 | -0.116783333333333 | NULL | -+------------------+------------+--------+------------+--------------------+------------+------------------+---------------------+------------+------------------+--------+--------------------+------+ -2 rows in set (0.001 sec) -#+END_SRC - -#+BEGIN_SRC SQL -MariaDB [db_webqtl]> SELECT ProbeSet.Name,T4.value FROM (ProbeSet, ProbeSetXRef, ProbeSetFreeze) left join ProbeSetData as T4 on T4.Id = ProbeSetXRef.DataId - -> and T4.StrainId=4 - -> limit 5; -+------+-------+ -| Name | value | -+------+-------+ -| NULL | 6.414 | -| NULL | 6.414 | -| NULL | 6.414 | -| NULL | 6.414 | -| NULL | 6.414 | -+------+-------+ -#+END_SRC - - -#+BEGIN_SRC SQL -MariaDB [db_webqtl]> SELECT ProbeSet.Name,T4.value FROM (ProbeSet, ProbeSetXRef) - left join ProbeSetData as T4 on T4.StrainId=4 and T4.Id = ProbeSetXRef.DataId - WHERE ProbeSet.Id = ProbeSetXRef.ProbeSetId order by ProbeSet.Id limit 5; -+-----------+-------+ -| Name | value | -+-----------+-------+ -| 100001_at | 6.414 | -| 100001_at | 6.414 | -| 100001_at | 6.414 | -| 100001_at | 8.414 | -| 100001_at | 8.414 | -+-----------+-------+ -5 rows in set (20.064 sec) -#+END_SRC SQL - -#+BEGIN_SRC SQL -MariaDB [db_webqtl]> SELECT ProbeSet.Name,T4.value,T5.value FROM (ProbeSet, ProbeSetXRef, ProbeSetFreeze) -left join ProbeSetData as T4 on T4.Id = ProbeSetXRef.DataId and T4.StrainId=4 -left join ProbeSetData as T5 on T5.Id = ProbeSetXRef.DataId and T5.StrainId=5 -WHERE ProbeSetXRef.ProbeSetFreezeId = ProbeSetFreeze.Id -and ProbeSetFreeze.Name = 'UMUTAffyExon_0209_RMA' -and ProbeSet.Id = ProbeSetXRef.ProbeSetId -order by ProbeSet.Id limit 5; - -+---------+---------+ -| Name | value | -+---------+---------+ -| 4331726 | 5.52895 | -| 5054239 | 6.29465 | -| 4642578 | 9.13706 | -| 4398221 | 6.77672 | -| 5543360 | 4.30016 | -+---------+---------+ -#+END_SRC - -#+BEGIN_SRC SQL -SELECT ProbeSet.Name,T4.value FROM (ProbeSet, ProbeSetXRef, ProbeSetFreeze) -left join ProbeSetData as T4 on T4.Id = ProbeSetXRef.DataId and T4.StrainId=4 -WHERE ProbeSetXRef.ProbeSetFreezeId = ProbeSetFreeze.Id -and ProbeSetFreeze.Name = 'UMUTAffyExon_0209_RMA' -and ProbeSet.Id = ProbeSetXRef.ProbeSetId -order by ProbeSet.Id ; - -1236087 rows in set (19.173 sec) -#+END_SRC - -#+BEGIN_SRC SQL -SELECT ProbeSet.Name,T4.value FROM (ProbeSet, ProbeSetXRef, ProbeSetFreeze) -left join ProbeSetData as T4 on T4.StrainId=4 and T4.Id = ProbeSetXRef.DataId -WHERE ProbeSetXRef.ProbeSetFreezeId = ProbeSetFreeze.Id -and ProbeSetFreeze.Name = 'UMUTAffyExon_0209_RMA' -and ProbeSet.Id = ProbeSetXRef.ProbeSetId -order by ProbeSet.Id ; - -1236087 rows in set (19.173 sec) -#+END_SRC - -#+BEGIN_SRC SQL -SELECT ProbeSet.Name FROM (ProbeSet, ProbeSetXRef, ProbeSetFreeze) -WHERE ProbeSetXRef.ProbeSetFreezeId = ProbeSetFreeze.Id -and ProbeSetFreeze.Name = 'UMUTAffyExon_0209_RMA' -and ProbeSet.Id = ProbeSetXRef.ProbeSetId -order by ProbeSet.Id ; -#+END_SRC - -Find all the probeset 'names' (probe sequence included) for one dataset: - -#+BEGIN_SRC SQL -SELECT count(DISTINCT ProbeSet.Name) FROM (ProbeSet, ProbeSetXRef, ProbeSetFreeze) WHERE ProbeSetXRef.ProbeSetFreezeId = ProbeSetFreeze.Id and ProbeSetFreeze.Name = 'UMUTAffyExon_0209_RMA' and ProbeSet.Id = ProbeSetXRef.ProbeSetId order by ProbeSet.Id; -+-------------------------------+ -| count(DISTINCT ProbeSet.Name) | -+-------------------------------+ -| 1236087 | -+-------------------------------+ -#+END_SRC - -Now for each of those probesets: - -#+BEGIN_SRC SQL -SELECT ProbeSet.Name,T4.value FROM (ProbeSet, ProbeSetXRef) -left join ProbeSetData as T4 on T4.StrainId=4 and T4.Id = ProbeSetXRef.DataId -WHERE ProbeSet.Id = ProbeSetXRef.ProbeSetId -order by ProbeSet.Id limit 5; -#+END_SRC - -ProbeSetXRef contains the p-values: - -#+BEGIN_SRC SQL -select * from ProbeSetXRef limit 5; -+------------------+------------+--------+------------+--------------------+------------+-------------------+---------------------+------------+------------------+--------+--------------------+------+ -| ProbeSetFreezeId | ProbeSetId | DataId | Locus_old | LRS_old | pValue_old | mean | se | Locus | LRS | pValue | additive | h2 | -+------------------+------------+--------+------------+--------------------+------------+-------------------+---------------------+------------+------------------+--------+--------------------+------+ -| 1 | 1 | 1 | 10.095.400 | 13.3971627898894 | 0.163 | 5.48794285714286 | 0.08525787814808819 | rs13480619 | 12.590069931048 | 0.269 | -0.28515625 | NULL | -| 1 | 2 | 2 | D15Mit189 | 10.042057464356201 | 0.431 | 9.90165714285714 | 0.0374686634976217 | rs29535974 | 10.5970737900941 | 0.304 | -0.116783333333333 | NULL | -#+END_SRC - - -#+BEGIN_SRC SQL -SELECT count(T4.value) FROM (ProbeSet, ProbeSetXRef) -left join ProbeSetData as T4 on T4.StrainId=4 and T4.Id = ProbeSetXRef.DataId -WHERE ProbeSet.Id = ProbeSetXRef.ProbeSetId ; -#+END_SRC - - -#+BEGIN_SRC SQL -SELECT count(T4.value) FROM (ProbeSet, ProbeSetXRef) left join ProbeSetData as T4 on T4.StrainId=4 limit 5; -#+END_SRC - -#+BEGIN_SRC SQL -select value from (ProbeSetData) where StrainId=4 limit 5; -#+END_SRC - -So, this is the sloooow baby: - -#+BEGIN_SRC SQL -select count(id) from (ProbeSetData) where StrainId=4; - -| ProbeSetData | 0 | DataId | 2 | StrainId | A | 4852908856 | NULL | NULL | | BTREE | | | - --rw-rw---- 1 mysql mysql 53G Mar 3 23:49 ProbeSetData.MYD --rw-rw---- 1 mysql mysql 66G Mar 4 03:00 ProbeSetData.MYI -#+END_SRC - -#+BEGIN_SRC SQL -create index strainid on ProbeSetData(StrainId); -Stage: 1 of 2 'Copy to tmp table' 8.77% of stage done -Stage: 2 of 2 'Enabling keys' 0% of stage done -#+END_SRC - -#+BEGIN_SRC SQL -MariaDB [db_webqtl]> create index strainid on ProbeSetData(StrainId); -Query OK, 5111384047 rows affected (2 hours 56 min 25.807 sec) -Records: 5111384047 Duplicates: 0 Warnings: 0 -#+END_SRC - -#+BEGIN_SRC SQL -MariaDB [db_webqtl]> select count(id) from (ProbeSetData) where StrainId=4; - -+-----------+ -| count(id) | -+-----------+ -| 14267545 | -+-----------+ -1 row in set (19.707 sec) -#+END_SRC - - -#+BEGIN_SRC SQL -MariaDB [db_webqtl]> select count(*) from ProbeSetData where strainid = 140; -+----------+ -| count(*) | -+----------+ -| 10717771 | -+----------+ -1 row in set (10.161 sec) -#+END_SRC - -#+BEGIN_SRC SQL -MariaDB [db_webqtl]> select count(*) from ProbeSetData where strainid = 140 and id=4; -+----------+ -| count(*) | -+----------+ -| 0 | -+----------+ -1 row in set (0.000 sec) -#+END_SRC - -#+BEGIN_SRC SQL -MariaDB [db_webqtl]> select count(*) from ProbeSetData where strainid = 4 and id=4; -+----------+ -| count(*) | -+----------+ -| 1 | -+----------+ -1 row in set (0.000 sec) -#+END_SRC - - -#+BEGIN_SRC SQL -select id from ProbeSetFreeze where id=1; - -WHERE ProbeSetXRef.ProbeSetFreezeId = ProbeSetFreeze.Id -and ProbeSetFreeze.Name = 'UMUTAffyExon_0209_RMA' -and ProbeSet.Id = ProbeSetXRef.ProbeSetId -order by ProbeSet.Id limit 5; -#+END_SRC - -#+BEGIN_SRC SQL -select count(ProbeSetId) from ProbeSetXRef where ProbeSetFreezeId=1; -+-------------------+ -| count(ProbeSetId) | -+-------------------+ -| 12422 | -+-------------------+ -1 row in set (0.006 sec) -#+END_SRC - - -#+BEGIN_SRC SQL -select count(ProbeSetId) from (ProbeSetXRef,ProbeSetFreeze) where -ProbeSetXRef.ProbeSetFreezeId = ProbeSetFreeze.Id -and ProbeSetFreeze.Name = 'UMUTAffyExon_0209_RMA'; -#+END_SRC - -#+BEGIN_SRC SQL -MariaDB [db_webqtl]> select count(ProbeSetId) from (ProbeSetXRef,ProbeSetFreeze) where - -> ProbeSetXRef.ProbeSetFreezeId = ProbeSetFreeze.Id - -> and ProbeSetFreeze.Name = 'UMUTAffyExon_0209_RMA'; -+-------------------+ -| count(ProbeSetId) | -+-------------------+ -| 1236087 | -+-------------------+ -1 row in set (0.594 sec) -#+END_SRC - -ProbeSetXRef.ProbeSetFreezeId is 206, so - -#+BEGIN_SRC SQL -MariaDB [db_webqtl]> select count(ProbeSetId) from (ProbeSetXRef) where ProbeSetXRef.ProbeSetFreezeId = 206; -+-------------------+ -| count(ProbeSetId) | -+-------------------+ -| 1236087 | -+-------------------+ -1 row in set (0.224 sec) -#+END_SRC - -#+BEGIN_SRC SQL -MariaDB [db_webqtl]> select count(*) from ProbeSetData where strainid = 1 and id=4; -+----------+ -| count(*) | -+----------+ -| 1 | -+----------+ -1 row in set (0.000 sec) -#+END_SRC - -Now this query is fast because it traverses the ProbeSetData table only once and uses id as a starting point: - -#+BEGIN_SRC SQL -MariaDB [db_webqtl]> select count(*) from (ProbeSetData,ProbeSetXRef) where ProbeSetXRef.ProbeSetFreezeId = 206 and id=ProbeSetId; -+----------+ -| count(*) | -+----------+ -| 10699448 | -+----------+ -1 row in set (4.429 sec) -#+END_SRC - -#+BEGIN_SRC SQL -select id,strainid,value from (ProbeSetData,ProbeSetXRef) where - ProbeSetXRef.ProbeSetFreezeId = 206 and id=ProbeSetId -limit 5; -+--------+----------+-------+ -| id | strainid | value | -+--------+----------+-------+ -| 225088 | 1 | 7.33 | -| 225088 | 2 | 7.559 | -| 225088 | 3 | 7.84 | -| 225088 | 4 | 7.835 | -| 225088 | 5 | 7.652 | -+--------+----------+-------+ -5 rows in set (0.001 sec) -#+END_SRC - - -#+BEGIN_SRC SQL -select id,strainid,value from (ProbeSetData,ProbeSetXRef) where ProbeSetXRef.ProbeSetFreezeId = 206 and id=ProbeSetId and strainid=4 limit 5; -+--------+----------+-------+ -| id | strainid | value | -+--------+----------+-------+ -| 225088 | 4 | 7.835 | -| 225089 | 4 | 9.595 | -| 225090 | 4 | 8.982 | -| 225091 | 4 | 8.153 | -| 225092 | 4 | 7.111 | -+--------+----------+-------+ -5 rows in set (0.000 sec) -#+END_SRC - -#+BEGIN_SRC SQL -select ProbeSet.name,strainid,probesetfreezeid,value from (ProbeSet,ProbeSetData,ProbeSetFreeze,ProbeSetXRef) -where ProbeSetData.id=ProbeSetId - and (strainid=4 or strainid=5) - and ProbeSetXRef.ProbeSetFreezeId = ProbeSetFreeze.Id - and ProbeSet.Id = ProbeSetXRef.ProbeSetId - and ProbeSetFreeze.Name = 'UMUTAffyExon_0209_RMA' -limit 5; -+---------+----------+------------------+-------+ -| name | strainid | probesetfreezeid | value | -+---------+----------+------------------+-------+ -| 4331726 | 4 | 206 | 7.835 | -| 5054239 | 4 | 206 | 9.595 | -| 4642578 | 4 | 206 | 8.982 | -| 4398221 | 4 | 206 | 8.153 | -| 5543360 | 4 | 206 | 7.111 | -+---------+----------+------------------+-------+ -5 rows in set (2.174 sec) -#+END_SRC - -No more joins and super fast!! - -#+BEGIN_SRC SQL -SELECT ProbeSet.Name,T4.StrainID,Probesetfreezeid,T4.value -FROM (ProbeSet, ProbeSetXRef, ProbeSetFreeze) - left join ProbeSetData as T4 on T4.StrainId=4 and T4.Id = ProbeSetXRef.DataId -WHERE ProbeSetXRef.ProbeSetFreezeId = ProbeSetFreeze.Id - and ProbeSetFreeze.Name = 'UMUTAffyExon_0209_RMA' - and ProbeSet.Id = ProbeSetXRef.ProbeSetId -order by ProbeSet.Id limit 5; -+---------+----------+------------------+---------+ -| Name | StrainID | Probesetfreezeid | value | -+---------+----------+------------------+---------+ -| 4331726 | 4 | 206 | 5.52895 | -| 5054239 | 4 | 206 | 6.29465 | -| 4642578 | 4 | 206 | 9.13706 | -| 4398221 | 4 | 206 | 6.77672 | -| 5543360 | 4 | 206 | 4.30016 | -+---------+----------+------------------+---------+ -5 rows in set (0.000 sec) -#+END_SRC - -The difference is the use of ProbeSetId and DataId in ProbeSetFreeze - -: left join ProbeSetData as T4 on T4.StrainId=4 and T4.Id = ProbeSetXRef.DataId - -#+BEGIN_SRC SQL -select ProbeSetData.id,ProbeSet.name,strainid,probesetfreezeid,value from (ProbeSet,ProbeSetData,ProbeSetFreeze,ProbeSetXRef) -where ProbeSetData.id=ProbeSetId - and (strainid=4 or strainid=5) - and ProbeSetXRef.ProbeSetFreezeId = ProbeSetFreeze.Id - and ProbeSet.Id = ProbeSetXRef.ProbeSetId - and ProbeSetFreeze.Name = 'UMUTAffyExon_0209_RMA' -limit 5; -+---------+----------+------------------+-------+ -| name | strainid | probesetfreezeid | value | -+---------+----------+------------------+-------+ -| 4331726 | 4 | 206 | 7.835 | -| 5054239 | 4 | 206 | 9.595 | -| 4642578 | 4 | 206 | 8.982 | -| 4398221 | 4 | 206 | 8.153 | -| 5543360 | 4 | 206 | 7.111 | -+---------+----------+------------------+-------+ -5 rows in set (2.174 sec) -#+END_SRC - - -#+BEGIN_SRC SQL -MariaDB [db_webqtl]> select ProbeSetData.id,ProbeSet.name,strainid,probesetfreezeid,value from (ProbeSet,ProbeSetData,ProbeSetFreeze,ProbeSetXRef) - where ProbeSetData.id=ProbeSetId - and (strainid=4 or strainid=5) - and ProbeSetXRef.ProbeSetFreezeId = ProbeSetFreeze.Id - and ProbeSet.Id = ProbeSetXRef.ProbeSetId - and ProbeSetFreeze.Name = 'UMUTAffyExon_0209_RMA' - limit 5; -+--------+---------+----------+------------------+-------+ -| id | name | strainid | probesetfreezeid | value | -+--------+---------+----------+------------------+-------+ -| 225088 | 4331726 | 4 | 206 | 7.835 | -| 225089 | 5054239 | 4 | 206 | 9.595 | -| 225090 | 4642578 | 4 | 206 | 8.982 | -| 225091 | 4398221 | 4 | 206 | 8.153 | -| 225092 | 5543360 | 4 | 206 | 7.111 | -+--------+---------+----------+------------------+-------+ -5 rows in set (2.085 sec) -#+END_SRC - - -#+BEGIN_SRC SQL -MariaDB [db_webqtl]> SELECT T4.id,ProbeSet.Name,T4.StrainID,Probesetfreezeid,T4.value - -> FROM (ProbeSet, ProbeSetXRef, ProbeSetFreeze) - -> left join ProbeSetData as T4 on T4.StrainId=4 and T4.Id = ProbeSetXRef.DataId - -> WHERE ProbeSetXRef.ProbeSetFreezeId = ProbeSetFreeze.Id - -> and ProbeSetFreeze.Name = 'UMUTAffyExon_0209_RMA' - -> and ProbeSet.Id = ProbeSetXRef.ProbeSetId - -> order by ProbeSet.Id limit 5; -+----------+---------+----------+------------------+---------+ -| id | Name | StrainID | Probesetfreezeid | value | -+----------+---------+----------+------------------+---------+ -| 38574432 | 4331726 | 4 | 206 | 5.52895 | -| 39254882 | 5054239 | 4 | 206 | 6.29465 | -| 38867352 | 4642578 | 4 | 206 | 9.13706 | -| 38637053 | 4398221 | 4 | 206 | 6.77672 | -| 39715382 | 5543360 | 4 | 206 | 4.30016 | -+----------+---------+----------+------------------+---------+ -5 rows in set (0.001 sec) -#+END_SRC - -Now you can see the difference. - -#+BEGIN_SRC SQL -select ProbeSetData.id,ProbeSet.name,strainid,probesetfreezeid,value from (ProbeSet,ProbeSetData,ProbeSetFreeze,ProbeSetXRef) - where ProbeSetData.id=ProbeSetXRef.DataId - and (strainid=4 or strainid=5) - and ProbeSetXRef.ProbeSetFreezeId = 206 - and ProbeSet.Id = ProbeSetXRef.ProbeSetId - and ProbeSet.name = '4331726' - limit 5; - -+----------+---------+----------+------------------+---------+ -| id | name | strainid | probesetfreezeid | value | -+----------+---------+----------+------------------+---------+ -| 38574432 | 4331726 | 4 | 206 | 5.52895 | -+----------+---------+----------+------------------+---------+ -#+END_SRC - -It worked! - -BUT IT IS SLOW for the full query. This is now due to this table being huge -and DataID distributed through the table (sigh!). - -#+BEGIN_SRC SQL -MariaDB [db_webqtl]> select count(*) from ProbeSetXRef; -+----------+ -| count(*) | -+----------+ -| 47713039 | -+----------+ -1 row in set (0.000 sec) -#+END_SRC - - -#+BEGIN_SRC SQL -select ProbeSetData.id,ProbeSet.name,strainid,probesetfreezeid,value from (ProbeSet,ProbeSetData,ProbeSetFreeze,ProbeSetXRef) - where - ProbeSetXRef.ProbeSetFreezeId = 206 - and ProbeSet.Id = ProbeSetXRef.ProbeSetId - and ProbeSetFreeze.Id = 206 - and ProbeSetData.id=ProbeSetXRef.DataId - and (strainid=4 or strainid=5) - limit 5; - -+----------+---------+----------+------------------+---------+ -| id | name | strainid | probesetfreezeid | value | -+----------+---------+----------+------------------+---------+ -| 38549183 | 4304920 | 4 | 206 | 4.97269 | -| 38549184 | 4304921 | 4 | 206 | 6.25133 | -| 38549185 | 4304922 | 4 | 206 | 6.03701 | -| 38549186 | 4304923 | 4 | 206 | 9.10316 | -| 38549187 | 4304925 | 4 | 206 | 8.90826 | -+----------+---------+----------+------------------+---------+ -5 rows in set (23.995 sec) - -select count(ProbeSetId) from ProbeSetXRef where ProbeSetFreezeId = 206; -+-------------------+ -| count(ProbeSetId) | -+-------------------+ -| 1236087 | -+-------------------+ - -select count(ProbeSet.name) from (ProbeSet,ProbeSetXRef) - where ProbeSetFreezeId = 206 - and ProbeSet.Id = ProbeSetXRef.ProbeSetId; -+----------------------+ -| count(ProbeSet.name) | -+----------------------+ -| 1236087 | -+----------------------+ -1 row in set (7.126 sec) - -select ProbeSet.name,ProbeSetData.id from (ProbeSet,ProbeSetXRef,ProbeSetData) - where ProbeSetFreezeId = 206 - and ProbeSet.Id = ProbeSetXRef.ProbeSetId - and ProbeSetData.id = ProbeSetXRef.DataId -; - -MariaDB [db_webqtl]> select count(ProbeSetData.id) from (ProbeSet,ProbeSetXRef,ProbeSetData) where ProbeSetFreezeId = 206 and ProbeSet.Id = ProbeSetXRef.ProbeSetId and ProbeSetData.id = ProbeSetXRef.DataId; -+------------------------+ -| count(ProbeSetData.id) | -+------------------------+ -| 114956091 | -+------------------------+ -1 row in set (35.836 sec) - -select count(*) from (ProbeSet,ProbeSetXRef,ProbeSetData) where ProbeSetFreezeId = 206 and ProbeSet.Id = ProbeSetXRef.ProbeSetId and ProbeSetData.id = ProbeSetXRef.DataId ; -+-----------+ -| count(*) | -+-----------+ -| 114956091 | -+-----------+ -1 row in set (35.392 sec) - - -select ProbeSet.name,ProbeSetData.value from (ProbeSet,ProbeSetXRef,ProbeSetData) - where ProbeSetFreezeId = 206 - and ProbeSet.Id = ProbeSetXRef.ProbeSetId - and ProbeSetData.id = ProbeSetXRef.DataId -; - -select count(value) from (ProbeSet,ProbeSetXRef,ProbeSetData) where ProbeSetFreezeId = 206 and ProbeSet.Id = ProbeSetXRef.ProbeSetId and ProbeSetData.id = ProbeSetXRef.DataId and strainid=5 ; -+--------------+ -| count(value) | -+--------------+ -| 1236087 | -+--------------+ -1 row in set (14.819 sec) - - -select count(value) from (ProbeSet,ProbeSetXRef,ProbeSetData) -where ProbeSetFreezeId = 206 - and ProbeSet.Id = ProbeSetXRef.ProbeSetId - and ProbeSetData.id = ProbeSetXRef.DataId - and ((strainid>=4 and strainid<=31) or strainid in (33,35,36,37,39,98,100,103)) -; - -+--------------+ -| count(value) | -+--------------+ -| 43263045 | -+--------------+ -1 row in set (1 min 40.498 sec) - -select name,strainid,value from (ProbeSet,ProbeSetXRef,ProbeSetData) -where ProbeSetFreezeId = 206 - and ProbeSet.Id = ProbeSetXRef.ProbeSetId - and ProbeSetData.id = ProbeSetXRef.DataId - and ((strainid>=4 and strainid<=31) or strainid in (33,35,36,37,39,98,100,103)) -limit 5; -+---------+----------+---------+ -| name | strainid | value | -+---------+----------+---------+ -| 4331726 | 4 | 5.52895 | -| 4331726 | 5 | 6.76158 | -| 4331726 | 6 | 6.06911 | -| 4331726 | 7 | 6.24858 | -| 4331726 | 8 | 6.36076 | -+---------+----------+---------+ - -select name,strainid,value from (ProbeSet,ProbeSetXRef,ProbeSetData) -where ProbeSetFreezeId = 206 - and ProbeSet.Id = ProbeSetXRef.ProbeSetId - and ProbeSetData.id = ProbeSetXRef.DataId - and ((strainid>=4 and strainid<=31) or strainid in (33,35,36,37,39,98,100,103)) -; - -#+END_SRC - -The final query works in 2.2 minutes on both Penguin2 and Tux01. - -** Original query - -This is the original query generated by GN2 that takes 1 hour on -Penguin2 and 3 minutes on Tux01. Note it fetches all values for these 'traits' so essentially -traverses the full 53GB database table (and even larger index) for each of -them. - -#+BEGIN_SRC SQL -SELECT ProbeSet.Name,T4.value, T5.value, T6.value, T7.value, T8.value, T9.value, T10.value, T11.value, T12.value, T13.value, T14.value, T15.value, T16.value, T17.value, T18.value, T19.value, T20.value, T21.value, T22.value, T23.value, T24.value, T25.value, T26.value, T28.value, - T29.value, T30.value, T31.value, T33.value, T35.value, T36.value, T37.value, T39.value, - T98.value, T100.value, T103.value FROM (ProbeSet, ProbeSetXRef, ProbeSetFreeze) - left join ProbeSetData as T4 on T4.Id = ProbeSetXRef.DataId and T4.StrainId=4 - left join ProbeSetData as T5 on T5.Id = ProbeSetXRef.DataId and T5.StrainId=5 - left join ProbeSetData as T6 on T6.Id = ProbeSetXRef.DataId and T6.StrainId=6 - left join ProbeSetData as T7 on T7.Id = ProbeSetXRef.DataId and T7.StrainId=7 - left join ProbeSetData as T8 on T8.Id = ProbeSetXRef.DataId and T8.StrainId=8 - left join ProbeSetData as T9 on T9.Id = ProbeSetXRef.DataId and T9.StrainId=9 - left join ProbeSetData as T10 on T10.Id = ProbeSetXRef.DataId and T10.StrainId=10 - left join ProbeSetData as T11 on T11.Id = ProbeSetXRef.DataId and T11.StrainId=11 - left join ProbeSetData as T12 on T12.Id = ProbeSetXRef.DataId and T12.StrainId=12 - left join ProbeSetData as T13 on T13.Id = ProbeSetXRef.DataId and T13.StrainId=13 - left join ProbeSetData as T14 on T14.Id = ProbeSetXRef.DataId and T14.StrainId=14 - left join ProbeSetData as T15 on T15.Id = ProbeSetXRef.DataId and T15.StrainId=15 - left join ProbeSetData as T16 on T16.Id = ProbeSetXRef.DataId and T16.StrainId=16 - left join ProbeSetData as T17 on T17.Id = ProbeSetXRef.DataId and T17.StrainId=17 - left join ProbeSetData as T18 on T18.Id = ProbeSetXRef.DataId and T18.StrainId=18 - left join ProbeSetData as T19 on T19.Id = ProbeSetXRef.DataId and T19.StrainId=19 - left join ProbeSetData as T20 on T20.Id = ProbeSetXRef.DataId and T20.StrainId=20 - left join ProbeSetData as T21 on T21.Id = ProbeSetXRef.DataId and T21.StrainId=21 - left join ProbeSetData as T22 on T22.Id = ProbeSetXRef.DataId and T22.StrainId=22 - left join ProbeSetData as T23 on T23.Id = ProbeSetXRef.DataId and T23.StrainId=23 - left join ProbeSetData as T24 on T24.Id = ProbeSetXRef.DataId and T24.StrainId=24 - left join ProbeSetData as T25 on T25.Id = ProbeSetXRef.DataId and T25.StrainId=25 - left join ProbeSetData as T26 on T26.Id = ProbeSetXRef.DataId and T26.StrainId=26 - left join ProbeSetData as T28 on T28.Id = ProbeSetXRef.DataId and T28.StrainId=28 - left join ProbeSetData as T29 on T29.Id = ProbeSetXRef.DataId and T29.StrainId=29 - left join ProbeSetData as T30 on T30.Id = ProbeSetXRef.DataId and T30.StrainId=30 - left join ProbeSetData as T31 on T31.Id = ProbeSetXRef.DataId and T31.StrainId=31 - left join ProbeSetData as T33 on T33.Id = ProbeSetXRef.DataId and T33.StrainId=33 - left join ProbeSetData as T35 on T35.Id = ProbeSetXRef.DataId and T35.StrainId=35 - left join ProbeSetData as T36 on T36.Id = ProbeSetXRef.DataId and T36.StrainId=36 - left join ProbeSetData as T37 on T37.Id = ProbeSetXRef.DataId and T37.StrainId=37 - left join ProbeSetData as T39 on T39.Id = ProbeSetXRef.DataId and T39.StrainId=39 - left join ProbeSetData as T98 on T98.Id = ProbeSetXRef.DataId and T98.StrainId=98 - left join ProbeSetData as T100 on T100.Id = ProbeSetXRef.DataId and T100.StrainId=100 - left join ProbeSetData as T103 on T103.Id = ProbeSetXRef.DataId and T103.StrainId=103 - WHERE ProbeSetXRef.ProbeSetFreezeId = ProbeSetFreeze.Id - and ProbeSetFreeze.Name = 'UMUTAffyExon_0209_RMA' - and ProbeSet.Id = ProbeSetXRef.ProbeSetId - order by ProbeSet.Id -#+END_SRC |