接:
mysql subquery cache vs oracle subquery cache子查询缓存_mysql部分_part2
下面研究下oracle里的subquery cache.
6.oracle具有全面的子查询cache优化功能
oracle同样的查询,不管是in还是exists,对于子查询不能unnest(这里使用no_unnest hints),都有subquery cache:
oracle里常见的unnest subquery的是标量子查询,update关联子查询等走filter操作的子查询,对应执行计划子查询部分
有系统绑定变量,比如:b1,有缓存优化,如果外层关联key基数小,则子查询执行次数少。
oracle建表语句:
-
create table subq_t1(a int,b int);
-
insert into subq_t1(a,b)
-
select 1,1
-
from dual
-
connect by level<=100000;
-
-
insert into subq_t1(a,b)
-
select 2,1level
-
from dual
-
connect by level<=100000;
-
-
insert into subq_t1(a,b)
-
select 3,1level
-
from dual
-
connect by level<=100000;
-
-
-
create table subq_t2(a int,b int);
-
insert into subq_t2(a,b)
-
select level,level
-
from dual
-
connect by level<=100000;
-
-
create index idx_subq_t2 on subq_t2(a);
-
-
subq_t1表的a只有3个值:
-
select count(*),count(distinct a) from subq_t1;
-
-
count(*) count(distincta)
-
---------- ----------------
-
300000 3
-
-
subq_t2的a都是不重复的:
-
select count(*),count(distinct a) from subq_t2;
-
-
count(*) count(distincta)
-
---------- ----------------
-
100000 100000
-
set serveroutput off
-
set autotrace traceonly --不输出结果,太多
-
select/*gather_plan_statistics*/ *
-
from subq_t1 t1
-
where t1.a in
-
(select/*no_unnest*/ t2.a from subq_t2 t2);
-
set autotrace off
-
select sql_id,sql_text from v$sql where sql_text like '%gather_plan_statistics%subq_t2%';
-
select * from table(dbms_xplan.display_cursor('0hrqfk46ss2ns',null,'allstats last'));
可以看到走filter,虽然subq_t1有30w行,但是子查询只执行3次(实际执行次数依赖subq_t1.a的相同值连续情况)
执行计划id=3对于谓词有系统绑定变量: 3 - access("t2"."a"=:b1)
-
plan hash value: 2995694926
-
-
--------------------------------------------------------------------------------------------
-
| id | operation | name | starts | e-rows | a-rows | a-time | buffers |
-
--------------------------------------------------------------------------------------------
-
| 0 | select statement | | 1 | | 300k|00:00:00.17 | 20535 |
-
|* 1 | filter | | 1 | | 300k|00:00:00.17 | 20535 |
-
| 2 | table access full| subq_t1 | 1 | 327k| 300k|00:00:00.11 | 20529 |
-
|* 3 | index range scan | idx_subq_t2 | 3 | 1098 | 3 |00:00:00.01 | 6 |
-
--------------------------------------------------------------------------------------------
-
-
predicate information (identified by operation id):
-
---------------------------------------------------
-
-
1 - filter( is not null)
-
3 - access("t2"."a"=:b1)
改为exists和in一样,因为在oracle里in/exists基本都可以相互转换:
-
set serveroutput off
-
set autotrace traceonly --不输出结果,太多
-
select/*gather_plan_statistics*/ *
-
from subq_t1 t1
-
where exists
-
(select/*no_unnest*/ 1 from subq_t2 t2 where t1.a = t2.a);
-
set autotrace off
-
select sql_id,sql_text from v$sql where sql_text like '%gather_plan_statistics%subq_t2%';
-
select * from table(dbms_xplan.display_cursor('f8uyx3hynta6x',null,'allstats last'));
-
-
plan_table_output
-
----------------------------------------------------------------------------------------------------
-
sql_id f8uyx3hynta6x, child number 0
-
-------------------------------------
-
select/*gather_plan_statistics*/ * from subq_t1 t1 where exists
-
(select/*no_unnest*/ 1 from subq_t2 t2 where t1.a = t2.a)
-
-
plan hash value: 2995694926
-
-
--------------------------------------------------------------------------------------------
-
| id | operation | name | starts | e-rows | a-rows | a-time | buffers |
-
--------------------------------------------------------------------------------------------
-
| 0 | select statement | | 1 | | 300k|00:00:00.16 | 20535 |
-
|* 1 | filter | | 1 | | 300k|00:00:00.16 | 20535 |
-
| 2 | table access full| subq_t1 | 1 | 327k| 300k|00:00:00.09 | 20529 |
-
|* 3 | index range scan | idx_subq_t2 | 3 | 1098 | 3 |00:00:00.01 | 6 |
-
--------------------------------------------------------------------------------------------
-
-
predicate information (identified by operation id):
-
---------------------------------------------------
-
-
1 - filter( is not null)
-
3 - access("t2"."a"=:b1)
oracle标量子查询,也有subquery cache优化:
-
set serveroutput off
-
set autotrace traceonly --不输出结果,太多
-
select/*gather_plan_statistics*/ t1.a,(select t2.b from subq_t2 t2 where t1.a = t2.a) b
-
from subq_t1 t1;
-
set autotrace off
-
select sql_id,sql_text from v$sql where sql_text like '%gather_plan_statistics%subq_t2%';
-
select * from table(dbms_xplan.display_cursor('cwvtbwcaytnz5',null,'allstats last'));
可以看到id=1和id=2只执行3次:
-
plan hash value: 438047693
-
-
-------------------------------------------------------------------------------------------------------------
-
| id | operation | name | starts | e-rows | a-rows | a-time | buffers |
-
-------------------------------------------------------------------------------------------------------------
-
| 0 | select statement | | 1 | | 300k|00:00:00.09 | 20529 |
-
| 1 | table access by index rowid batched| subq_t2 | 3 | 1098 | 3 |00:00:00.01 | 9 |
-
|* 2 | index range scan | idx_subq_t2 | 3 | 439 | 3 |00:00:00.01 | 6 |
-
| 3 | table access full | subq_t1 | 1 | 327k| 300k|00:00:00.09 | 20529 |
-
-------------------------------------------------------------------------------------------------------------
-
-
predicate information (identified by operation id):
-
---------------------------------------------------
-
-
2 - access("t2"."a"=:b1)
甚至在oracle里semi join/anti join走nested loops的(普通nested loops没有缓存优化),都有缓存优化,可以减少被驱动表回表次数,比如:
1.semi join走nested loops有缓存优化
-
set serveroutput off
-
set autotrace traceonly --不输出结果,太多
-
-
--去掉no_unnest
-
select/*gather_plan_statistics leading(t1) use_nl(t2@subq) */ *
-
from subq_t1 t1
-
where t1.a in
-
(select/*qb_name(subq)*/ t2.a from subq_t2 t2);
nested loops semi,子查询只执行3次:
-
set autotrace off
-
select sql_id,sql_text from v$sql where sql_text like '%gather_plan_statistics%subq_t2%';
-
select * from table(dbms_xplan.display_cursor('cacaags9gcy2a',null,'allstats last'));
-
plan hash value: 3871264342
-
-
--------------------------------------------------------------------------------------------
-
| id | operation | name | starts | e-rows | a-rows | a-time | buffers |
-
--------------------------------------------------------------------------------------------
-
| 0 | select statement | | 1 | | 300k|00:00:00.18 | 20535 |
-
| 1 | nested loops semi | | 1 | 327k| 300k|00:00:00.18 | 20535 |
-
| 2 | table access full| subq_t1 | 1 | 327k| 300k|00:00:00.13 | 20529 |
-
|* 3 | index range scan | idx_subq_t2 | 3 | 109k| 3 |00:00:00.01 | 6 |
-
--------------------------------------------------------------------------------------------
-
-
predicate information (identified by operation id):
-
---------------------------------------------------
-
-
3 - access("t1"."a"="t2"."a")
2.anti join同样有subquery cache
-
set serveroutput off
-
set autotrace traceonly --不输出结果,太多
-
-
--去掉no_unnest
-
select/*gather_plan_statistics leading(t1) use_nl(t2@subq) */ *
-
from subq_t1 t1
-
where t1.a not in
-
(select/*qb_name(subq)*/ t2.a from subq_t2 t2 where t2.a is not null) and t1.a is not null;
-
-
set autotrace off
-
select sql_id,sql_text from v$sql where sql_text like '%gather_plan_statistics%subq_t2%';
-
select * from table(dbms_xplan.display_cursor('fs848pak6turs',null,'allstats last'));
因为有缓存,not in的执行效率非常高,没有找到数据:
-
plan hash value: 3776200174
-
-
--------------------------------------------------------------------------------------------
-
| id | operation | name | starts | e-rows | a-rows | a-time | buffers |
-
--------------------------------------------------------------------------------------------
-
| 0 | select statement | | 1 | | 0 |00:00:00.05 | 558 |
-
| 1 | nested loops anti | | 1 | 327k| 0 |00:00:00.05 | 558 |
-
|* 2 | table access full| subq_t1 | 1 | 327k| 300k|00:00:00.06 | 553 |
-
|* 3 | index range scan | idx_subq_t2 | 3 | 1 | 3 |00:00:00.01 | 5 |
-
--------------------------------------------------------------------------------------------
-
-
predicate information (identified by operation id):
-
---------------------------------------------------
-
-
2 - filter("t1"."a" is not null)
-
3 - access("t1"."a"="t2"."a")
-
filter("t2"."a" is not null)
3.普通的nested loops join没有缓存优化,但是有批处理
-
set serveroutput off
-
set autotrace traceonly --不输出结果,太多
-
select/*gather_plan_statistics leading(t1) use_nl(t2)*/ *
-
from subq_t1 t1,subq_t2 t2
-
where t1.a = t2.a;
-
set autotrace off
-
select sql_id,sql_text from v$sql where sql_text like '%gather_plan_statistics%subq_t2%';
-
select * from table(dbms_xplan.display_cursor('2qa1g6az4hmk8',null,'allstats last'));
两个nested loops就是bulk获取rowid的优化:
-
plan hash value: 2089944506
-
-
------------------------------------------------------------------------------------------------------
-
| id | operation | name | starts | e-rows | a-rows | a-time | buffers |
-
------------------------------------------------------------------------------------------------------
-
| 0 | select statement | | 1 | | 300k|00:00:00.70 | 80537 |
-
| 1 | nested loops | | 1 | 327k| 300k|00:00:00.70 | 80537 |
-
| 2 | nested loops | | 1 | 327k| 300k|00:00:00.47 | 60536 |
-
| 3 | table access full | subq_t1 | 1 | 327k| 300k|00:00:00.06 | 20529 |
-
|* 4 | index range scan | idx_subq_t2 | 300k| 1 | 300k|00:00:00.25 | 40007 |
-
| 5 | table access by index rowid| subq_t2 | 300k| 1 | 300k|00:00:00.14 | 20001 |
-
------------------------------------------------------------------------------------------------------
-
-
predicate information (identified by operation id):
-
---------------------------------------------------
-
-
4 - access("t1"."a"="t2"."a")
在oracle里,subquery cache优化比较全面,不能unnest走filter的有cache优化,可以减少执行次数,semi join/anti join同样有cache优化,普通nested loops join没有cache优化但是有批量获取rowid回表的优化。
总结:
1.mysql的子查询subquery cache,仅限于无法unnest subquery,走select_type:subquery,也就是物化方式,如果外部表与子查询关联的key重复值特别多,
则可以有效减少子查询执行次数,具体减少多少,要看这个key存储的连续程度。
至于dependent subquery是mysql子查询兜底招数,虽然改写和执行计划里对外部传递的列有,但是不能减少子查询执行次数,效果比较差。
另外mysql的标量子查询只能走dependent subquery,无subquery cache优化,效率不好。
mariadb是做了subquery cache优化的:见
mysql对于不能unnest的,如果外层表有过滤条件,如果不能走索引,则条件写在子查询前还是后对驱动子查询次数有影响,
能走索引先走索引过滤无影响。
不走索引的,写在前先过滤后然后驱动子查询,写在后则是先驱动子查询后过滤外层条件,可能执行次数会多很多。
2.oracle对无法unnest的子查询、标量子查询、update关联子查询是有缓存结果优化的,一般most缓存数目在1024个key左右,
如果outer table关联key的基数(distinct数目)很小,则可以大幅度减少子查询执行次数,具体也要看数据物理分布连续性是否强,
连续性强,减少的子查询次数越少,如果很离散存储,则子查询次数会增多。详细见:
filter operation中子查询的执行次数问题分析(http://blog.chinaunix.net/uid-7655508-id-5875810.html)。
另外oracle对能够unnest的semi/anti join,走nested loops的也有缓存优化,普通join走nested loops没有缓存优化,但是有批处理优化。
oracle这方面优化做的比较全面。
3.mysql子查询优化还是不够强大,所以在mysql里让子查询能够subquery unnest较好,这样可以转为join,semi join,anti join优化,避免unnest subquery只有2种方法,
完全依赖于外层结果驱动子查询,缓存优化也只有materialization才有,标量子查询在mysql里要慎用,和o不同的是,它没有缓存优化。
在mysql里经常会使用materialization来优化子查询,因为它会按照关联key剔除重复行并且创建索引,一定程度上能够弥补不能缓存的差距。
本系列结束.