本部分主要研究默认选择率null数目影响、案例以及默认选择率系列总结。
4.默认选择率还要考虑null数目影响
对于计算null数目影响的:
adjust_sel=(num_rows-num_nulls)/num_rows
其中走索引选择率ix_sel,索引过滤选择率ix_sel_with_filters,
回表选择率如果没有额外过滤条件就是ix_sel_with_filters,有额外条件还要加上额外条件选择率*condition_filters
对于列有统计信息的,选择率=默认选择率*adjust_sel, adjust_sel=(num_rows-num_nulls)/num_rows,
如果绑定变量等无具体值的、使用函数等,可能使用默认选择率。
对于列无统计信息的,这个adjust_sel就没有了,所以就是默认选择率,如果对于num_nulls
很多的列,更加不准确。
也即有统计信息的默认选择率如果考虑null影响,要*adjust_sel.
比如subobject_name null较多:
-
select num_rows,sample_size from dba_tables where table_name='sel_test';
-
-
num_rows sample_size
-
---------- -----------
-
72800 72800
-
-
-
select column_name,num_distinct,density,num_nulls,histogram
-
from dba_tab_col_statistics
-
where table_name='sel_test'
-
and column_name='subobject_name';
-
column_name num_distinct density num_nulls histogram
-
-------------------- ------------ ---------- ---------- ------------------------------
-
subobject_name 564 .00177305 71801 none
则adjust_sel=(num_rows-num_nulls)/num_rows
=(72800-71801)/72800
=0.013722527
建立索引如下:
-
create index idx2_sel_test on sel_test(subobject_name);
like范围有索引走索引默认ix_sel=0.009,ix_sel_with_filters=0.05
-
explain plan for
-
select * from sel_test
-
where subobject_name like :var;
-
-
select * from table(dbms_xplan.display);
-
plan_table_output
-
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-
plan hash value: 2441733028
-
-
-----------------------------------------------------------------------------------------------------
-
| id | operation | name | rows | bytes | cost (%cpu)| time |
-
-----------------------------------------------------------------------------------------------------
-
| 0 | select statement | | 50 | 7500 | 5 (0)| 00:00:01 |
-
| 1 | table access by index rowid batched| sel_test | 50 | 7500 | 5 (0)| 00:00:01 |
-
|* 2 | index range scan | idx2_sel_test | 9 | | 2 (0)| 00:00:01 |
-
-----------------------------------------------------------------------------------------------------
-
-
predicate information (identified by operation id):
-
---------------------------------------------------
-
-
2 - access("subobject_name" like :var)
-
filter("subobject_name" like :var)
因为sub_object_name的null较多,还要考虑null影响选择率,所以ix_sel=
0.009*0.013722527,
所以索引扫描行数=round(72800*0.009*0.013722527)=9,
回表行数=round(72800*0.05*0.013722527)=50
5.示例分析
下面用join例子说简单分析:
删除扩展统计信息:
-
exec dbms_stats.drop_extended_stats(ownname=>user,tabname=>'sel_test',extension=>'(substr("object_name",1,1))');
-
exec dbms_stats.drop_extended_stats(ownname=>user,tabname=>'sel_test',extension=>'(substr("owner",1,3))');
-
exec dbms_stats.drop_extended_stats(ownname=>user,tabname=>'sel_test',extension=>'(substr("object_name",1,1),substr("owner",1,3))');
-
explain plan for
-
select * from sel_test a,sel2 b
-
where substr(a.object_name,1,1) = 'd' and substr(a.owner,1,3)='sys'
-
and a.object_id=b.object_id;
-
select * from table(dbms_xplan.display(null,null,'adaptive'));
执行计划如下,因为是19c,这里是adaptive plan,很显然使用adaptive plan也是不准的,无法收集函数条件统计信息,走nested loops,
没有走hash join:
-
plan_table_output
-
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-
plan hash value: 2436822295
-
-
--------------------------------------------------------------------------------------------
-
| id | operation | name | rows | bytes | cost (%cpu)| time |
-
--------------------------------------------------------------------------------------------
-
| 0 | select statement | | 7 | 1974 | 410 (1)| 00:00:01 |
-
|- * 1 | hash join | | 7 | 1974 | 410 (1)| 00:00:01 |
-
| 2 | nested loops | | 7 | 1974 | 410 (1)| 00:00:01 |
-
| 3 | nested loops | | 7 | 1974 | 410 (1)| 00:00:01 |
-
|- 4 | statistics collector | | | | | |
-
| * 5 | table access full | sel_test | 7 | 1050 | 396 (1)| 00:00:01 |
-
| * 6 | index range scan | idx_sel2 | 1 | | 1 (0)| 00:00:01 |
-
| 7 | table access by index rowid| sel2 | 1 | 132 | 2 (0)| 00:00:01 |
-
|- 8 | table access full | sel2 | 1 | 132 | 2 (0)| 00:00:01 |
-
--------------------------------------------------------------------------------------------
-
-
predicate information (identified by operation id):
-
---------------------------------------------------
-
-
1 - access("a"."object_id"="b"."object_id")
-
5 - filter(substr("a"."object_name",1,1)='d' and substr("a"."owner",1,3)='sys')
-
6 - access("a"."object_id"="b"."object_id")
-
-
note
-
-----
-
- this is an adaptive plan (rows marked '-' are inactive)
如果使用dynamic_sampling(5),则执行计划走hash join:
-
explain plan for
-
select/*dynamic_sampling(a 5)*/ * from sel_test a,sel2 b
-
where substr(a.object_name,1,1) = 'd' and substr(a.owner,1,3)='sys'
-
and a.object_id=b.object_id;
-
select * from table(dbms_xplan.display(null,null,'adaptive'));
-
plan_table_output
-
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-
plan hash value: 2990091768
-
-
--------------------------------------------------------------------------------------------
-
| id | operation | name | rows | bytes | cost (%cpu)| time |
-
--------------------------------------------------------------------------------------------
-
| 0 | select statement | | 2474 | 681k| 794 (2)| 00:00:01 |
-
| * 1 | hash join | | 2474 | 681k| 794 (2)| 00:00:01 |
-
|- 2 | nested loops | | 2474 | 681k| 794 (2)| 00:00:01 |
-
|- 3 | nested loops | | | | | |
-
|- 4 | statistics collector | | | | | |
-
| * 5 | table access full | sel_test | 2474 | 362k| 396 (1)| 00:00:01 |
-
|- * 6 | index range scan | idx_sel2 | | | | |
-
|- 7 | table access by index rowid| sel2 | 1 | 132 | 397 (2)| 00:00:01 |
-
| 8 | table access full | sel2 | 72802 | 9384k| 397 (2)| 00:00:01 |
-
--------------------------------------------------------------------------------------------
-
-
predicate information (identified by operation id):
-
---------------------------------------------------
-
-
1 - access("a"."object_id"="b"."object_id")
-
5 - filter(substr("a"."object_name",1,1)='d' and substr("a"."owner",1,3)='sys')
-
6 - access("a"."object_id"="b"."object_id")
-
-
note
-
-----
-
- dynamic statistics used: dynamic sampling (level=2)
-
- this is an adaptive plan (rows marked '-' are inactive)
这里的hints使用:dynamic_sampling(a 5),只对a表动态采样5级别,但是note:dynamic statistics used: dynamic sampling (level=2)
将substr("a"."object_name",1,1)='d' and substr("a"."owner",1,3)='sys'估算为2474行。
使用dynamic_sampling(a 5),指定不准的表动态采样即可。
如果改成dynamic_sampling(5)则note:dynamic statistics used: dynamic sampling (level=5),这种是全部表采样,不好:
-
explain plan for
-
select/*dynamic_sampling(5)*/ * from sel_test a,sel2 b
-
where substr(a.object_name,1,1) = 'd' and substr(a.owner,1,3)='sys'
-
and a.object_id=b.object_id;
-
select * from table(dbms_xplan.display(null,null,'adaptive'));
-
plan_table_output
-
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-
plan hash value: 2990091768
-
-
--------------------------------------------------------------------------------------------
-
| id | operation | name | rows | bytes | cost (%cpu)| time |
-
--------------------------------------------------------------------------------------------
-
| 0 | select statement | | 3005 | 827k| 794 (2)| 00:00:01 |
-
| * 1 | hash join | | 3005 | 827k| 794 (2)| 00:00:01 |
-
|- 2 | nested loops | | 3005 | 827k| 794 (2)| 00:00:01 |
-
|- 3 | nested loops | | | | | |
-
|- 4 | statistics collector | | | | | |
-
| * 5 | table access full | sel_test | 3005 | 440k| 396 (1)| 00:00:01 |
-
|- * 6 | index range scan | idx_sel2 | | | | |
-
|- 7 | table access by index rowid| sel2 | 1 | 132 | 397 (2)| 00:00:01 |
-
| 8 | table access full | sel2 | 72802 | 9384k| 397 (2)| 00:00:01 |
-
--------------------------------------------------------------------------------------------
-
-
predicate information (identified by operation id):
-
---------------------------------------------------
-
-
1 - access("a"."object_id"="b"."object_id")
-
5 - filter(substr("a"."object_name",1,1)='d' and substr("a"."owner",1,3)='sys')
-
6 - access("a"."object_id"="b"."object_id")
-
-
note
-
-----
-
- dynamic statistics used: dynamic sampling (level=5)
-
- this is an adaptive plan (rows marked '-' are inactive)
可以看到note:dynamic statistics used: dynamic sampling (level=5)
sel_test估算为3005行。
下面使用扩展统计信息:
收集扩展统计信息如下:
-
--收集单列扩展
-
exec dbms_stats.gather_table_stats(ownname=>user,tabname=>'sel_test',-
-
method_opt=>'for columns size auto (substr(object_name,1,1)),(substr(owner,1,3)),',no_invalidate=>false);
-
--收集组合扩展
-
exec dbms_stats.gather_table_stats(ownname=>user,tabname=>'sel_test',-
-
method_opt=>'for columns size auto (substr("object_name",1,1),substr("owner",1,3))',no_invalidate=>false);
-
先创建扩展列,然后在收集全表的,更准:
-
exec dbms_stats.gather_table_stats(ownname=>user,tabname=>'sel_test',no_invalidate=>false);
-
explain plan for
-
select * from sel_test a,sel2 b
-
where substr(a.object_name,1,1) = 'd' and substr(a.owner,1,3)='sys'
-
and a.object_id=b.object_id;
-
select * from table(dbms_xplan.display(null,null,'adaptive'));
可以看到执行计划从nested loops变为走hash join,sel_test对函数组合条件估算为2631行,准确。
-
plan_table_output
-
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-
plan hash value: 2990091768
-
-
--------------------------------------------------------------------------------------------
-
| id | operation | name | rows | bytes | cost (%cpu)| time |
-
--------------------------------------------------------------------------------------------
-
| 0 | select statement | | 2631 | 724k| 794 (2)| 00:00:01 |
-
| * 1 | hash join | | 2631 | 724k| 794 (2)| 00:00:01 |
-
|- 2 | nested loops | | 2631 | 724k| 794 (2)| 00:00:01 |
-
|- 3 | nested loops | | | | | |
-
|- 4 | statistics collector | | | | | |
-
| * 5 | table access full | sel_test | 2631 | 385k| 396 (1)| 00:00:01 |
-
|- * 6 | index range scan | idx_sel2 | | | | |
-
|- 7 | table access by index rowid| sel2 | 1 | 132 | 397 (2)| 00:00:01 |
-
| 8 | table access full | sel2 | 72802 | 9384k| 397 (2)| 00:00:01 |
-
--------------------------------------------------------------------------------------------
-
-
predicate information (identified by operation id):
-
---------------------------------------------------
-
-
1 - access("a"."object_id"="b"."object_id")
-
5 - filter(substr("object_name",1,1)='d' and substr("owner",1,3)='sys')
-
6 - access("a"."object_id"="b"."object_id")
-
-
note
-
-----
-
- this is an adaptive plan (rows marked '-' are inactive)
6.oracle cbo默认选择率总结
在oracle里,如果sql里的表无法获得需要的信息来计算选择率,可能就会使用默认选择率,常发生在以下几种情况下:
1)表有统计信息,列未收集统计信息,无法使用动态采样
2)表有统计信息,但是使用绑定变量,绑定变量无peeking
3) 表有统计信息,但是对列使用了函数等复杂条件,无法准确估算对应条件的选择率
默认选择率可能很多时候是不准的,会导致执行计划非optimal,这样需要针对不准情况对应修复:
针对1)情况,就是更新统计信息,补全没有收集或不准的统计信息。
针对第二种情况,为了避免修改_optim_peek_user_binds参数造成的全局影响,可以针对单个sql开启_optim_peek_user_binds,
使用sql profile绑定。
针对第三种情况,因为复杂条件无法计算准确选择率,利用使用扩展列统计信息、高级别动态采样、cardinality/opt_esimate hints、
sql profile等绑定执行计划。
另外默认选择率总结如下:
对于计算null数目影响的:
adjust_sel=(num_rows-num_nulls)/num_rows
其中走索引选择率ix_sel,索引过滤选择率ix_sel_with_filters,
回表选择率如果没有额外过滤条件就是ix_sel_with_filters,有额外条件还要加上额外条件选择率*condition_filters
对于列有统计信息的,final选择率=默认选择率*adjust_sel, adjust_sel=(num_rows-num_nulls)/num_rows,
如果绑定变量等无具体值的、使用函数等,可能使用默认选择率。
对于列无统计信息的,这个adjust_sel就没有了,所以就是默认选择率,如果对于num_nulls很多的列,更加不准确。
也即有统计信息的默认选择率如果考虑null影响,考虑adjust_sel,其final选择率=default_sel * adjust_sel
有统计信息默认选择率如下:
1)列等值条件:sel = max(1/num_distinct,density)*adjust_sel,不管是否走索引ix_sel和ix_sel_with_filters一样.
2) 列<>条件: sel = 1-sel(等值)= 1 - max(1/num_distinct,density)*adjust_sel ,其中ix_sel,ix_sel_with_filters一样
3) 索引列范围条件:(包括like) ix_sel = 0.009,其中ix_sel_with_filters=0.05,这样可能导致回表后的cardinality估算行数大于索引扫描行数
4) 索引列绑定变量范围条件: sel = 0.009 (无peeking),其中ix_sel = 0.009,ix_sel_with_filters=0.05
5)索引列多个范围and条件 ix_sel = 0.009/2 = 0.0045,其中ix_sel_with_filters还是按照(sel_1)*(sel_2)=0.05*0.05
6) 索引列or条件,如果or扩展走索引,按照单个条件计算 范围是ix_sel=0.009,等值是1/num_distinct.
回表ix_sel_with_filters=(sel_1)*(sel_2)-(sel_1*sel_2)=0.05*0.05-0.05*0.05,按照or数目计算。
7)函数条件等值 sel = 0.01
8) 函数条件范围 sel = 0.05
9) 非索引列范围 sel = 0.05
10) 回表选择率 sel = ix_sel_with_filters * condition_filters,如果回表没有额外条件=ix_sel_with_filters
回表除了等值的,非函数等值=max(1/num_distinct,density)*adjust_sel,函数等值=0.01
,其他均为0.05
无统计信息默认选择率如下:
无统计信息的不用考虑adjest_sel,如果null数目较多,对cbo来说也不可知,所以null数目多的时候,可能影响大。
1)等值列条件无统计信息选择率,列有索引等值条件 ix_sel=0.004,ix_sel_with_filters=0.01
2)等值列条件无统计信息,列无索引列默认选择率sel=0.01,包括绑定变量无具体值,函数条件的等值查询,不管有没有统计信息都是1%。
3) 对于非索引列列无统计信息的范围查询,和有统计信息无具体绑定变量值一样,非索引列(包括函数)是5%
4)列无统计信息,范围条件走索引的选择率是ix_sel=0.009,ix_sel_with_filters=0.05,和索引列有统计信息,使用绑定变量无具体值的范围查询一样。
5) 列无统计信息,<>条件选择率是5%,和列有统计信息按照1-sel(等值)的不一样。
oracle的默认选择率一般很小:
比如索引列density,0.009,0.004几种,有索引的偏向于走索引。
对于不走索引或回表的其他选择率:等值的0.01,其他0.05。
mysql对于列无索引,无直方图的,也会采用默认选择率,大部分使用
access path selection in a relational database management system论文里的选择率,
一般比较大,
比如:
等值是10%,
>,<等是33%,
like,between是11%,
<>是0.9
比oracle默认选择率大很多,
mysql默认选择率问题较多,
可以建索引或收集直方图解决,mysql没有动态采样,没有扩展统计信息,
没有cardinality hints。
本系列结束。