本文主要研究下组合索引包含in条件(多个值),在单表查询,关联查询这两种sql查询结果在oracle和mysql里的区别。
oracle具有强大的优化器,一般来说,组合索引在oracle里不管是单表还是关联查询,都可以选择optimal的执行计划,只要统计信息等是准确的。
mysql的优化器相对来说,要弱不少,很多功能不够健全,单表对于组合索引包含in的处理貌似没有什么问题,但是join情况下,
包含in list组合索引的表是被驱动表,则问题比较严重,卖个关子,详细见本文mysql部分讲述。
1.oracle组合索引包含in list的执行计划研究
先上建表语句:
-
drop table t1;
-
drop table t2;
-
create table t1 as select * from dba_objects;
-
create table t2 as select * from dba_objects;
-
--建立组合索引
-
create index idx1_t1 on t1(object_name,owner);
-
create index idx1_t2 on t2(object_id,owner);
-
--收集统计信息
-
exec dbms_stats.gather_table_stats(ownname=>'dingjun123',tabname=>'t1',no_invalidate=>false);
-
exec dbms_stats.gather_table_stats(ownname=>'dingjun123',tabname=>'t2',no_invalidate=>false);
in (list)相当于多个or的组合,其实也是个范围条件,如果前导列是in条件,则排序键必须包含前导列才可能消除排序,排序列没有前导列,则必须前导列在where条件里是等值才可以消除排序。
但是in (list)又是个特殊的范围查询,因为可以转为or条件,这样与其他列组合,可以组合为多个等值条件,这样做索引扫描时,可以利用inlist iterator算子提高效率。
比如:
a=1 and b in (1,2) 可以转为 a=1 and (b=1 or b=2)则转为两个索引区间:
a=1 and b=1
a=1 and b=2
然后使用inlist iterator算子,两个列条件都可以参与index access。
a in (1,2) and b in (1,2)可以转为
(a=1 or a=2 ) and (b=1 or b=2),转为4个区间:
a=1 and b=1
a=1 and b=2
a=2 and b=1
a=2 and b=2
然后利用inlist iterator算子,都可以利用a,b条件index access。
oracle处理组合索引含有in条件时,对于in条件列是索引非前导列,也会考虑选择性:
1)如果前导列选择性很好,后面的列条件是in,可能index access只有前导列,in的作为index filter,这时候不用inlist iterator算子。
2)如果前导列选择性不好,则会调用inlist iterator算子,转为多个索引区间扫描,这样in条件也会index access。
如果in条件是索引前导列,因为索引前导列要参与index access,所以基本都会采用inlist iterator算子,转为多个索引区间,这样都可以参与index access。
1)索引idx1_t1(object_name,owner),对应的非前导列owner是in条件有多个值
这时候是否走inlist iterator要看前导列的选择性是不是很好,如果很好,index access
只需要前导列即可,非前导列in的index filter
如果前导列选择性不好,则会将in list转为or,然后合并前导列,扫描多个索引区间,
使用inlist iterator算子,则多个列都可以参与index access
sql和执行计划如下:
-
select * from t1 where t1.object_name='t1' and t1.owner in ('sys','dingjun123');
-
-
execution plan
-
----------------------------------------------------------
-
plan hash value: 1182251260
-
-
---------------------------------------------------------------------------------------
-
| id | operation | name | rows | bytes | cost (%cpu)| time |
-
---------------------------------------------------------------------------------------
-
| 0 | select statement | | 2 | 196 | 4 (0)| 00:00:01 |
-
| 1 | table access by index rowid| t1 | 2 | 196 | 4 (0)| 00:00:01 |
-
|* 2 | index range scan | idx1_t1 | 2 | | 3 (0)| 00:00:01 |
-
---------------------------------------------------------------------------------------
-
predicate information (identified by operation id):
-
---------------------------------------------------
-
2 - access("t1"."object_name"='t1')
-
filter("t1"."owner"='dingjun123' or "t1"."owner"='sys')
可以从执行计划看到对于索引是(object_name,owner)的,owner非前导列用了in,没有参与index access,而是
index filter,原因是object_name选择性很好,等值查询选择性1/num_distinct=.000016423,不调用inlist iterator算子。
对应列基数和选择率如下:
-
select column_name,num_distinct,density from dba_tab_col_statistics where table_name='t1' and column_name in('owner','object_name');
-
-
column_name num_distinct density
-
-------------------- ------------ ----------
-
owner 27 .037037037
-
object_name 60892 .000016423
如果前导列object_name也是in,因为选择性好,这里也不用第二列,前导列in转为or,调用inlist iterator:
-
select * from t1 where t1.object_name in ('dba_objects','dba_tables') and t1.owner in ('sys','dingjun123');
-
-
execution plan
-
----------------------------------------------------------
-
plan hash value: 1236450337
-
-
------------------------------------------------------------------------------------------------
-
| id | operation | name | rows | bytes | cost (%cpu)| time |
-
------------------------------------------------------------------------------------------------
-
| 0 | select statement | | 4 | 528 | 5 (0)| 00:00:01 |
-
| 1 | inlist iterator | | | | | |
-
| 2 | table access by index rowid batched| t1 | 4 | 528 | 5 (0)| 00:00:01 |
-
|* 3 | index range scan | idx1_t1 | 1 | | 4 (0)| 00:00:01 |
-
------------------------------------------------------------------------------------------------
-
predicate information (identified by operation id):
-
---------------------------------------------------
-
3 - access("t1"."object_name"='dba_objects' or "t1"."object_name"='dba_tables')
-
filter("t1"."owner"='dingjun123' or "t1"."owner"='sys')
-
statistics
-
----------------------------------------------------------
-
0 recursive calls
-
0 db block gets
-
9 consistent gets
-
0 physical reads
-
0 redo size
-
2864 bytes sent via sql*net to client
-
473 bytes received via sql*net from client
-
2 sql*net roundtrips to/from client
-
0 sorts (memory)
-
0 sorts (disk)
-
2 rows processed
如果都是等值的,则都参与index access:
-
select * from t1 where t1.object_name ='dba_objects' and t1.owner ='sys';
-
-
1 row selected.
-
-
elapsed: 00:00:00.01
-
-
execution plan
-
----------------------------------------------------------
-
plan hash value: 2488322393
-
-
-----------------------------------------------------------------------------------------------
-
| id | operation | name | rows | bytes | cost (%cpu)| time |
-
-----------------------------------------------------------------------------------------------
-
| 0 | select statement | | 1 | 132 | 4 (0)| 00:00:01 |
-
| 1 | table access by index rowid batched| t1 | 1 | 132 | 4 (0)| 00:00:01 |
-
|* 2 | index range scan | idx1_t1 | 1 | | 3 (0)| 00:00:01 |
-
-----------------------------------------------------------------------------------------------
-
-
predicate information (identified by operation id):
-
---------------------------------------------------
-
-
2 - access("t1"."object_name"='dba_objects' and "t1"."owner"='sys')
-
-
-
statistics
-
----------------------------------------------------------
-
2 recursive calls
-
0 db block gets
-
7 consistent gets
-
0 physical reads
-
0 redo size
-
2692 bytes sent via sql*net to client
-
435 bytes received via sql*net from client
-
2 sql*net roundtrips to/from client
-
0 sorts (memory)
-
0 sorts (disk)
-
1 rows processed
2)如果前导列的选择性不够好,组合的选择性才好,如果有in条件,则会转为索引扫描多个区间,
调用inlist iterator,这时候in条件可以参与index access
-
drop index idx1_t1;
-
--创建以owner为前导列的索引,owner选择性不好:
-
create index idx2_t1 on t1(owner,object_name);
sql如下:
-
select * from t1 where t1.object_name='dba_objects' and t1.owner in ('sys','dingjun123');
从执行计划可以看到,access里owner转为or然后 and object_name,两个列都可以index access,调用
inlist iterator算子:
-
elapsed: 00:00:00.00
-
-
execution plan
-
----------------------------------------------------------
-
plan hash value: 3655239226
-
-
------------------------------------------------------------------------------------------------
-
| id | operation | name | rows | bytes | cost (%cpu)| time |
-
------------------------------------------------------------------------------------------------
-
| 0 | select statement | | 2 | 264 | 5 (0)| 00:00:01 |
-
| 1 | inlist iterator | | | | | |
-
| 2 | table access by index rowid batched| t1 | 2 | 264 | 5 (0)| 00:00:01 |
-
|* 3 | index range scan | idx2_t1 | 2 | | 4 (0)| 00:00:01 |
-
------------------------------------------------------------------------------------------------
-
-
predicate information (identified by operation id):
-
---------------------------------------------------
-
-
3 - access(("t1"."owner"='dingjun123' or "t1"."owner"='sys') and
-
"t1"."object_name"='dba_objects')
-
-
-
statistics
-
----------------------------------------------------------
-
2 recursive calls
-
0 db block gets
-
10 consistent gets
-
0 physical reads
-
0 redo size
-
2692 bytes sent via sql*net to client
-
451 bytes received via sql*net from client
-
2 sql*net roundtrips to/from client
-
0 sorts (memory)
-
0 sorts (disk)
-
1 rows processed
如果两个列都是in list,因为前导列owner选择性不好,组合选择性好,则会转为多个索引区间,这里索引区间对应2*2=4个,如下所示:
-
object_name='dba_objects' and owner='sys'
-
object_name='dba_objects' and owner='dingjun123'
-
object_name='dba_tables' and owner='sys'
-
object_name='dba_tables' and owner='dingjun123'
sql如下:
-
select * from t1 where t1.object_name in ('dba_objects','dba_tables') and t1.owner in ('sys','dingjun123');
从执行计划可以看到,可以看到都能参与index access,走inlist iterator:
-
elapsed: 00:00:00.00
-
-
execution plan
-
----------------------------------------------------------
-
plan hash value: 3655239226
-
-
------------------------------------------------------------------------------------------------
-
| id | operation | name | rows | bytes | cost (%cpu)| time |
-
------------------------------------------------------------------------------------------------
-
| 0 | select statement | | 4 | 528 | 8 (0)| 00:00:01 |
-
| 1 | inlist iterator | | | | | |
-
| 2 | table access by index rowid batched| t1 | 4 | 528 | 8 (0)| 00:00:01 |
-
|* 3 | index range scan | idx2_t1 | 4 | | 6 (0)| 00:00:01 |
-
------------------------------------------------------------------------------------------------
-
-
predicate information (identified by operation id):
-
---------------------------------------------------
-
-
3 - access(("t1"."owner"='dingjun123' or "t1"."owner"='sys') and
-
("t1"."object_name"='dba_objects' or "t1"."object_name"='dba_tables'))
-
-
statistics
-
----------------------------------------------------------
-
2 recursive calls
-
0 db block gets
-
16 consistent gets
-
0 physical reads
-
0 redo size
-
2864 bytes sent via sql*net to client
-
469 bytes received via sql*net from client
-
2 sql*net roundtrips to/from client
-
0 sorts (memory)
-
0 sorts (disk)
-
2 rows processed
3)多表join情况下的inlist iterator
如果sql是join,t1表是被驱动的走nested loops:
先创建索引:让object_name在前,选择性好:
-
--第二个索引先删掉
-
drop index idx2_t1;
-
--创建object_name为前导列的索引:
-
create index idx1_t1 on t1(object_name,owner);
执行计划走nested loops,t1被驱动,t1表走idx1_t1索引,因为前导列object_name是等值条件,owner是in条件,相当于范围,
可以转为or。
在11g里这个id=6的访问谓词显示有问题,对应的owner in条件也是在index access中,单表查询时,11g里只有object_name,因为object_name
选择性好,join时都参与index access,这是算法上的问题,不过这个不影响性能。
和19c的谓词部分显示不一样,算法不一样,19c还是选择性选择性好的前导列访问:
11g都参与index access,有inlist iterator,object_name前导列是关联列,
sql和执行计划如下:
-
select *
-
from t1,t2
-
where t1.object_name = t2.object_name
-
and t1.owner in ('sys','dingjun123')
-
and t2.object_id <10;
-
-
execution plan
-
----------------------------------------------------------
-
plan hash value: 3442654963
-
-
-----------------------------------------------------------------------------------------
-
| id | operation | name | rows | bytes | cost (%cpu)| time |
-
-----------------------------------------------------------------------------------------
-
| 0 | select statement | | 7 | 1372 | 38 (0)| 00:00:01 |
-
| 1 | nested loops | | | | | |
-
| 2 | nested loops | | 7 | 1372 | 38 (0)| 00:00:01 |
-
| 3 | table access by index rowid| t2 | 7 | 686 | 3 (0)| 00:00:01 |
-
|* 4 | index range scan | idx1_t2 | 7 | | 2 (0)| 00:00:01 |
-
| 5 | inlist iterator | | | | | |
-
|* 6 | index range scan | idx1_t1 | 2 | | 3 (0)| 00:00:01 |
-
| 7 | table access by index rowid | t1 | 1 | 98 | 5 (0)| 00:00:01 |
-
-----------------------------------------------------------------------------------------
-
-
predicate information (identified by operation id):
-
---------------------------------------------------
-
-
4 - access("t2"."object_id"<10)
-
6 - access("t1"."object_name"="t2"."object_name" and
-
("t1"."owner"='dingjun123' or "t1"."owner"='sys'))
-
-
-
statistics
-
----------------------------------------------------------
-
1 recursive calls
-
0 db block gets
-
43 consistent gets
-
0 physical reads
-
0 redo size
-
3314 bytes sent via sql*net to client
-
520 bytes received via sql*net from client
-
2 sql*net roundtrips to/from client
-
0 sorts (memory)
-
0 sorts (disk)
-
8 rows processed
19c执行计划id=5显示的谓词对应owner in ('sys','dingjun123')变成filter("t1"."owner"='dingjun123' or "t1"."owner"='sys'),
19c没有inlist iterator
index access的列只有access("t1"."object_name"="t2"."object_name")
-
19c是比较准确的:
-
-
execution plan
-
----------------------------------------------------------
-
plan hash value: 1792326996
-
-
-------------------------------------------------------------------------------------------------
-
| id | operation | name | rows | bytes | cost (%cpu)| time |
-
-------------------------------------------------------------------------------------------------
-
| 0 | select statement | | 8 | 2112 | 27 (0)| 00:00:01 |
-
| 1 | nested loops | | 8 | 2112 | 27 (0)| 00:00:01 |
-
| 2 | nested loops | | 8 | 2112 | 27 (0)| 00:00:01 |
-
| 3 | table access by index rowid batched| t2 | 8 | 1056 | 3 (0)| 00:00:01 |
-
|* 4 | index range scan | idx1_t2 | 8 | | 2 (0)| 00:00:01 |
-
|* 5 | index range scan | idx1_t1 | 1 | | 2 (0)| 00:00:01 |
-
| 6 | table access by index rowid | t1 | 1 | 132 | 3 (0)| 00:00:01 |
-
-------------------------------------------------------------------------------------------------
-
-
predicate information (identified by operation id):
-
---------------------------------------------------
-
-
4 - access("t2"."object_id"<10)
-
5 - access("t1"."object_name"="t2"."object_name")
-
filter("t1"."owner"='dingjun123' or "t1"."owner"='sys')
如果19c里使用optimizer_features_enable('11.2.0.4'),则会显示和11g一样的计划,都走index access,有inlist iterator,
这么看对于in非前导列的join被驱动走nested loops的执行计划,有改进,感觉19c才是对的,因为使用inlist的cost更大,因为前导列选择性足够好:
-
select/*optimizer_features_enable('11.2.0.4')*/ *
-
from t1,t2
-
where t1.object_name = t2.object_name
-
and t1.owner in ('sys','dingjun123')
-
and t2.object_id <10;
-
execution plan
-
----------------------------------------------------------
-
plan hash value: 3442654963
-
-
-----------------------------------------------------------------------------------------
-
| id | operation | name | rows | bytes | cost (%cpu)| time |
-
-----------------------------------------------------------------------------------------
-
| 0 | select statement | | 8 | 2112 | 43 (0)| 00:00:01 |
-
| 1 | nested loops | | | | | |
-
| 2 | nested loops | | 8 | 2112 | 43 (0)| 00:00:01 |
-
| 3 | table access by index rowid| t2 | 8 | 1056 | 3 (0)| 00:00:01 |
-
|* 4 | index range scan | idx1_t2 | 8 | | 2 (0)| 00:00:01 |
-
| 5 | inlist iterator | | | | | |
-
|* 6 | index range scan | idx1_t1 | 2 | | 3 (0)| 00:00:01 |
-
| 7 | table access by index rowid | t1 | 1 | 132 | 5 (0)| 00:00:01 |
-
-----------------------------------------------------------------------------------------
-
-
predicate information (identified by operation id):
-
---------------------------------------------------
-
-
4 - access("t2"."object_id"<10)
-
6 - access("t1"."object_name"="t2"."object_name" and
-
("t1"."owner"='dingjun123' or "t1"."owner"='sys'))
-
-
-
statistics
-
----------------------------------------------------------
-
2 recursive calls
-
0 db block gets
-
45 consistent gets
-
0 physical reads
-
0 redo size
-
5598 bytes sent via sql*net to client
-
519 bytes received via sql*net from client
-
2 sql*net roundtrips to/from client
-
0 sorts (memory)
-
0 sorts (disk)
-
8 rows processed
下面删除idx1_t1,创建owner索引:
drop index idx1_t1;
--创建以owner为前导列的索引,owner选择性不好,且不是关联列,看能否全部列走索引。
drop index idx2_t1;
###这里创建三个列索引,将选择性好的放后面:
create index idx2_t1 on t1(owner,object_type,object_name);
sql如下:
-
select *
-
from t1,t2
-
where t1.object_name = t2.object_name
-
and t1.owner in ('sys','dingjun123')
-
and t1.object_type in('table','view')
-
and t2.object_id <10;
从执行计划看,使用inlist iterator,和单表测试一样:
这里相当于每一行t2传递给t1,然后扫描4个索引区间,如果将选择性好的object_name放前面则不需要很多区间,
比如索引(owner,object_name,object_type),则object_type in list可能不参与index access,而走index filter:
-
execution plan
-
----------------------------------------------------------
-
plan hash value: 1219290223
-
-
-------------------------------------------------------------------------------------------------
-
| id | operation | name | rows | bytes | cost (%cpu)| time |
-
-------------------------------------------------------------------------------------------------
-
| 0 | select statement | | 8 | 2112 | 59 (0)| 00:00:01 |
-
| 1 | nested loops | | 8 | 2112 | 59 (0)| 00:00:01 |
-
| 2 | nested loops | | 32 | 2112 | 59 (0)| 00:00:01 |
-
| 3 | table access by index rowid batched| t2 | 8 | 1056 | 3 (0)| 00:00:01 |
-
|* 4 | index range scan | idx1_t2 | 8 | | 2 (0)| 00:00:01 |
-
| 5 | inlist iterator | | | | | |
-
|* 6 | index range scan | idx2_t1 | 4 | | 5 (0)| 00:00:01 |
-
| 7 | table access by index rowid | t1 | 1 | 132 | 7 (0)| 00:00:01 |
-
-------------------------------------------------------------------------------------------------
-
-
predicate information (identified by operation id):
-
---------------------------------------------------
-
-
4 - access("t2"."object_id"<10)
-
6 - access(("t1"."owner"='dingjun123' or "t1"."owner"='sys') and
-
("t1"."object_type"='table' or "t1"."object_type"='view') and
-
"t1"."object_name"="t2"."object_name")
4)组合索引前导列是in多个值能否消除排序,取决于排序列是否有前导列
-
drop index idx1_t1;
-
drop index idx2_t1;
-
create index idx2_t1 on t1(owner,object_name);
-
select * from t1 where t1.object_name in ('dba_objects','dba_tables') and t1.owner in ('sys','dingjun123')
-
order by owner desc,object_name desc;
因为排序列按照组合索引顺序,且access两个列都用到,排序方向一致,可以消除排序:
-
execution plan
-
----------------------------------------------------------
-
plan hash value: 3833075694
-
-
-----------------------------------------------------------------------------------------
-
| id | operation | name | rows | bytes | cost (%cpu)| time |
-
-----------------------------------------------------------------------------------------
-
| 0 | select statement | | 4 | 528 | 8 (0)| 00:00:01 |
-
| 1 | inlist iterator | | | | | |
-
| 2 | table access by index rowid | t1 | 4 | 528 | 8 (0)| 00:00:01 |
-
|* 3 | index range scan descending| idx2_t1 | 4 | | 6 (0)| 00:00:01 |
-
-----------------------------------------------------------------------------------------
-
-
predicate information (identified by operation id):
-
---------------------------------------------------
-
-
3 - access(("t1"."owner"='dingjun123' or "t1"."owner"='sys') and
-
("t1"."object_name"='dba_objects' or "t1"."object_name"='dba_tables'))
排序列无前导列,只有第二个列,则必须前导列是等值条件才可以消除排序,这里前导列是in有2个分支,可以看到没有消除排序,
执行计划有:sort order by
sql和执行计划如下:
-
select * from t1 where t1.object_name in ('dba_objects','dba_tables') and t1.owner in ('sys','dingjun123')
-
order by object_name;
-
execution plan
-
----------------------------------------------------------
-
plan hash value: 4272791971
-
-
-------------------------------------------------------------------------------------------------
-
| id | operation | name | rows | bytes | cost (%cpu)| time |
-
-------------------------------------------------------------------------------------------------
-
| 0 | select statement | | 4 | 528 | 9 (12)| 00:00:01 |
-
| 1 | sort order by | | 4 | 528 | 9 (12)| 00:00:01 |
-
| 2 | inlist iterator | | | | | |
-
| 3 | table access by index rowid batched| t1 | 4 | 528 | 8 (0)| 00:00:01 |
-
|* 4 | index range scan | idx2_t1 | 4 | | 6 (0)| 00:00:01 |
-
-------------------------------------------------------------------------------------------------
-
-
predicate information (identified by operation id):
-
---------------------------------------------------
-
-
4 - access(("t1"."owner"='dingjun123' or "t1"."owner"='sys') and
-
("t1"."object_name"='dba_objects' or "t1"."object_name"='dba_tables'))
总结:在oracle里组合索引inlist,不管是单表还是关联,不管多表join的inlist列是否是关联列,都可以充分利用索引扫描,有时会将非前导列放到filter里,也是因为前导列选择性好,不会影响效率。
对于排序来说,因为in相当于范围,如果where是in条件的前导列,排序里没有前导列,则不能消除排序,如果排序列包含此前导列,且索引列方向一致,也可以消除排序。
在oracle里组合索引in在单表、多表join、排序里没有什么问题,遵循索引的leftmost prefix访问规则,inlist组合不管是否是前导列,是否单表,多表join,都能转为or,与其他索引列组合转为多个索引扫描区间,使用inlist iterator算子实现走各区间索引扫描提高效率。
下一篇:
oracle vs mysql 对组合索引包含in list执行计划研究(mysql部分)_part2
阅读(2138) | 评论(0) | 转发(0) |