本文主要研究默认选择率,为什么要有默认选择率,因为没有存储计算对应条件选择率需要的数据,所以只能用默认的。
在oracle里,如果sql里的表无法获得需要的信息来计算选择率,可能就会使用默认选择率,常发生在以下几种情况下:
1)表有统计信息,列未收集统计信息,无法使用动态采样
2)表有统计信息,但是使用绑定变量,绑定变量无peeking
3) 表有统计信息,但是对列使用了函数等复杂条件,无法准确估算对应条件的选择率
针对1),就是更新统计信息,补全没有收集或不准的统计信息。
针对第二种情况,为了避免修改_optim_peek_user_binds参数造成的全局影响,可以针对单个sql开启_optim_peek_user_binds,使用sql profile绑定。
针对第三种情况,因为复杂条件无法计算准确选择率,利用使用扩展列统计信息、高级别动态采样、cardinality/opt_esimate hints、
sql profile等绑定执行计划。
1.oracle cbo 默认选择率
本节主要分析什么时候cbo使用默认选择率,各种场景下的默认选择率。
建表语句如下:
-
drop table sel_test;
-
create table sel_test as select * from dba_objects;
-
create index idx_sel_test on sel_test(object_name);
因为ctas会自动收集统计信息,这里不需要再收集(需要直方图的要收集)。
统计信息如下:
-
select num_rows,sample_size,last_analyzed
-
from dba_tables where table_name='sel_test';
-
num_rows sample_size last_analyze
-
---------- ----------- ------------
-
72800 72800 15-may-24
-
-
select column_name,num_distinct,num_nulls,density,histogram
-
from dba_tab_col_statistics
-
where table_name='sel_test';
-
column_name num_distinct num_nulls density histogram
-
-------------------- ------------ ---------- ---------- ------------------------------
-
temporary 2 0 .5 none
-
generated 2 0 .5 none
-
secondary 1 0 1 none
-
namespace 21 2 .047619048 none
-
edition_name 0 72800 0 none
-
sharing 4 0 .25 none
-
editionable 2 47685 .5 none
-
oracle_maintained 2 0 .5 none
-
application 1 0 1 none
-
default_collation 1 58503 1 none
-
duplicated 1 0 1 none
-
sharded 1 0 1 none
-
created_appid 0 72800 0 none
-
created_vsnid 0 72800 0 none
-
modified_appid 0 72800 0 none
-
modified_vsnid 0 72800 0 none
-
owner 27 0 .037037037 none
-
object_name 60896 0 .000016421 none
-
subobject_name 564 71801 .00177305 none
-
object_id 72798 2 .000013737 none
-
data_object_id 6252 66511 .000159949 none
-
object_type 45 0 .022222222 none
-
created 1479 0 .000676133 none
-
last_ddl_time 1634 2 .000611995 none
-
timestamp 1640 2 .000609756 none
-
status 1 0 1 none
注意点:
对于计算null数目影响的:
adjust_sel=(num_rows-num_nulls)/num_rows
其中走索引选择率ix_sel,索引过滤选择率ix_sel_with_filters,
回表选择率如果没有额外过滤条件就是ix_sel_with_filters,有额外条件还要加上额外条件选择率*condition_filters
对于列有统计信息的,选择率=默认选择率*adjust_sel, adjust_sel=(num_rows-num_nulls)/num_rows,
如果绑定变量等无具体值的、使用函数等,可能使用默认选择率。
对于列无统计信息的,这个adjust_sel就没有了,所以就是默认选择率,如果对于num_nulls很多的列,更加不准确。
注意:以下示例使用object_name的因为null行数较少,忽略其影响,
使用subobject_name null行数多,考虑adjust_sel,选择率=default_sel * adjust_sel
1.列有统计信息情况下的选择率计算
可以用explain测试,explain不会探测绑定变量值,用的是默认选择率
注意null数目多的,还要考虑adjuest_sel。
选择率=默认选择率*adjust_sel, adjust_sel=(num_rows-num_nulls)/num_rows,
1)列等值条件选择率计算(索引列,非索引列都一样,前提是有统计信息)
如果列有统计信息,这个不管比较条件是否有具体值,其实都不是默认选择率。
如果列有统计信息情况下:
-
explain plan for
-
select * from sel_test
-
where object_name = :var;
-
-
select * from table(dbms_xplan.display);
-
plan_table_output
-
----------------------------------------------------------------------------------------------------
-
plan hash value: 310337252
-
----------------------------------------------------------------------------------------------------
-
| id | operation | name | rows | bytes | cost (%cpu)| time |
-
----------------------------------------------------------------------------------------------------
-
| 0 | select statement | | 1 | 132 | 4 (0)| 00:00:01 |
-
| 1 | table access by index rowid batched| sel_test | 1 | 132 | 4 (0)| 00:00:01 |
-
|* 2 | index range scan | idx_sel_test | 1 | | 3 (0)| 00:00:01 |
-
----------------------------------------------------------------------------------------------------
-
predicate information (identified by operation id):
-
---------------------------------------------------
-
2 - access("object_name"=:var)
列等值选择率,这个是正常选择率,直接按照max(1/num_distinct,density)计算,
无直方图情况下1/num_distinct和density一样(这里没有考虑null,直方图等),
object_name的num_nulls=0,
对应cardinality=round((1/60896)*72800)=1
不管这个列是否走索引,等值的选择率都是sel=1/num_distinct=density,如果有直方图要复杂点,
如果考虑null,sel=(1/num_distinct)*(num_rows-num_nulls)/num_rows
非索引列object_id等值也是(1/num_distinct)*(num_rows-num_nulls)/num_rows:
-
explain plan for
-
select * from sel_test
-
where object_id = :var;
-
select * from table(dbms_xplan.display);
-
-
plan_table_output
-
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-
plan hash value: 747827318
-
-
------------------------------------------------------------------------------
-
| id | operation | name | rows | bytes | cost (%cpu)| time |
-
------------------------------------------------------------------------------
-
| 0 | select statement | | 1 | 132 | 396 (1)| 00:00:01 |
-
|* 1 | table access full| sel_test | 1 | 132 | 396 (1)| 00:00:01 |
-
------------------------------------------------------------------------------
-
-
predicate information (identified by operation id):
-
---------------------------------------------------
-
-
1 - filter("object_id"=to_number(:var))
2)列有统计信息,<>的选择率计算不是默认选择率,而是1-sel(等值条件)
-
explain plan for
-
select * from sel_test
-
where object_name <> :var;
-
-
select * from table(dbms_xplan.display);
-
plan_table_output
-
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-
plan hash value: 747827318
-
-
------------------------------------------------------------------------------
-
| id | operation | name | rows | bytes | cost (%cpu)| time |
-
------------------------------------------------------------------------------
-
| 0 | select statement | | 72799 | 9384k| 398 (2)| 00:00:01 |
-
|* 1 | table access full| sel_test | 72799 | 9384k| 398 (2)| 00:00:01 |
-
------------------------------------------------------------------------------
-
-
predicate information (identified by operation id):
-
---------------------------------------------------
-
-
1 - filter("object_name"<>:var)
<>选择率=1-sel(object_name=)=1-1/60896,结果行=round(72800-1)=82799
3)索引列like 绑定变量 走索引默认选择率计算(绑定变量无peeking) ix_sel=0.009,回表ix_sel_with_filters=0.05
like实际上是范围查询,所以对于使用绑定变量的范围查询选择率是一样的,比如
like :var, > :var,< :var都是一样的选择率,当然如果>,<是有多个组合的,默认选择率有所不同,见3)
如果有具体值,会按照具体值计算选择率,不是默认选择率。
-
explain plan for
-
select * from sel_test
-
where object_name like :var;
-
-
select * from table(dbms_xplan.display);
-
plan_table_output
-
----------------------------------------------------------------------------------------------------
-
plan hash value: 310337252
-
-
----------------------------------------------------------------------------------------------------
-
| id | operation | name | rows | bytes | cost (%cpu)| time |
-
----------------------------------------------------------------------------------------------------
-
| 0 | select statement | | 3640 | 469k| 335 (0)| 00:00:01 |
-
| 1 | table access by index rowid batched| sel_test | 3640 | 469k| 335 (0)| 00:00:01 |
-
|* 2 | index range scan | idx_sel_test | 655 | | 7 (0)| 00:00:01 |
-
----------------------------------------------------------------------------------------------------
-
predicate information (identified by operation id):
-
---------------------------------------------------
-
2 - access("object_name" like :var)
-
filter("object_name" like :var)
这里的rows=655,是怎么计算的?因为没有peeking,很显然是不准的,在实际应用中,
如果peeking关闭,则就是按照默认选择率,
这里的like :var,无peeking,不知道是前通配还是后通配,也就是走index range scan是
不准确的,如果采用这样的默认选择率,
可能传入的:var = '�%'也会走index range scan,这很有问题,所以生产上还是
建议打开peeking。
回头在看655是怎么产生的,因为这时候虽然object_name列有统计信息,但是因为like是
范围查询,范围查询需要知道具体的范围,
计算范围查询选择率sel=required_range/(high_value-low_value),
现在绑定变量没有具体值(和peeking关闭一样),所以不知道
required_range,因此采用默认选择率。
猜测下是多少?
sel= 655/num_rows=655/72800=.008997253,一般默认选择率到小数部分非0整数位结束,
也就是0.009
这是猜测,其实也可以通过10053查看:ix_sel: 0.009000
-
access path: index (rangescan)
-
index: idx_sel_test
-
resc_io: 335.000000 resc_cpu: 2943282
-
ix_sel: 0.009000 ix_sel_with_filters: 0.009000
-
cost: 335.220618 resp: 335.220618 degree: 1
如果like绑定变量列没有索引的话就是0.05,也就是5%的选择率。可以通过10053查看:
sel = 0.050000 ,如下:
-
cost of predicates:
-
io = nocost, cpu = 100.000000, sel = 0.050000 flag = 2048 ("sel_test"."object_name" like :b1)
也可以推导出来:sel=回表返回行/总行数=3640/72800=0.05,这就是为什么执行计划里走索引
估算行数是655,而回表行数=3640,
正常情况下索引行数是655,回表结果行数应该小于等于655,不会大这么多,主要原因就是
索引和回表的选择率不一样,索引是0.009,回表是0.05。
下面看下有索引的单个范围查询选择率:
-
explain plan for
-
select * from sel_test
-
where object_name > :var;
-
-
select * from table(dbms_xplan.display);
-
-
plan_table_output
-
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-
plan hash value: 310337252
-
-
----------------------------------------------------------------------------------------------------
-
| id | operation | name | rows | bytes | cost (%cpu)| time |
-
----------------------------------------------------------------------------------------------------
-
| 0 | select statement | | 3640 | 469k| 335 (0)| 00:00:01 |
-
| 1 | table access by index rowid batched| sel_test | 3640 | 469k| 335 (0)| 00:00:01 |
-
|* 2 | index range scan | idx_sel_test | 655 | | 7 (0)| 00:00:01 |
-
----------------------------------------------------------------------------------------------------
-
-
predicate information (identified by operation id):
-
---------------------------------------------------
-
-
2 - access("object_name">:var)
通过范围的基数看,还是655/72800=.008997253 = 0.009
4)索引列带绑定变量至于索引范围扫描,多个范围查询的选择率 and条件ix_sel选择率
=0.0045,相当于单个范围的的0.009/2,回表ix_sel_with_filters= 0.05*0.05
多个范围查询:
-
explain plan for
-
select * from sel_test
-
where object_name > :var and object_name < :var1;
-
-
select * from table(dbms_xplan.display);
-
plan_table_output
-
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-
plan hash value: 2443607680
-
-
-----------------------------------------------------------------------------------------------------
-
| id | operation | name | rows | bytes | cost (%cpu)| time |
-
-----------------------------------------------------------------------------------------------------
-
| 0 | select statement | | 182 | 24024 | 169 (0)| 00:00:01 |
-
|* 1 | filter | | | | | |
-
| 2 | table access by index rowid batched| sel_test | 182 | 24024 | 169 (0)| 00:00:01 |
-
|* 3 | index range scan | idx_sel_test | 328 | | 5 (0)| 00:00:01 |
-
-----------------------------------------------------------------------------------------------------
-
-
predicate information (identified by operation id):
-
---------------------------------------------------
-
-
1 - filter(:var1>:var)
-
3 - access("object_name">:var and "object_name"<:var1)
这个和一般的and选择率计算不一样,如果按照一般的选择率计算:
sel=0.009*0.009,结果行数=72800*0.009*0.009=6,很显然和328相差巨大,328对应单个
选择率计算的655的一般,
猜测选择率=0.0045,也可以通过10053看出 ix_sel: 0.004500 :
-
access path: index (rangescan)
-
index: idx_sel_test
-
resc_io: 169.000000 resc_cpu: 1482323
-
ix_sel: 0.004500 ix_sel_with_filters: 0.004500
-
cost: 169.111110 resp: 169.111110 degree: 1
这里的回表182行,还是按照回表选择率0.05*0.05计算,对应返回行数=
round(72800*0.05*0.5)=182.
5)索引列带绑定变量走索引范围扫描,or范围查询的选择率,单个分支ix_sel=0.009,
ix_sel_with_filters=0.05,
等值的单个分支ix_sel=1/num_distinct。
如果是走索引or,走索引单个条件还是按照0.009,回表按照0.05,计算返回行选择率
按照0.05计算,
sel(or)=sel(a) sel(b)-sel(a)*sel(b),回表选择绿按照0.05计算
-
explain plan for
-
select/*or_expand*/ * from sel_test
-
where object_name > :var or object_name <= :var1;
-
-
select * from table(dbms_xplan.display);
-
-
plan_table_output
-
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-
plan hash value: 2487264663
-
-
---------------------------------------------------------------------------------------------------------
-
| id | operation | name | rows | bytes | cost (%cpu)| time |
-
---------------------------------------------------------------------------------------------------------
-
| 0 | select statement | | 7098 | 3334k| 670 (0)| 00:00:01 |
-
| 1 | view | vw_ore_67cae57e | 7098 | 3334k| 670 (0)| 00:00:01 |
-
| 2 | union-all | | | | | |
-
| 3 | table access by index rowid batched| sel_test | 3640 | 469k| 335 (0)| 00:00:01 |
-
|* 4 | index range scan | idx_sel_test | 655 | | 7 (0)| 00:00:01 |
-
| 5 | table access by index rowid batched| sel_test | 3458 | 445k| 335 (0)| 00:00:01 |
-
|* 6 | index range scan | idx_sel_test | 655 | | 7 (0)| 00:00:01 |
-
---------------------------------------------------------------------------------------------------------
-
-
predicate information (identified by operation id):
-
---------------------------------------------------
-
-
4 - access("object_name">:var)
-
6 - access("object_name"<=:var1)
-
filter(lnnvl("object_name">:var))
走索引返回655行,是按照0.009计算,回表按照0.05计算:
sel=sel(a) sel(b)-sel(a)*sel(b)=0.05 0.05-0.05*0.05,
返回行数=72800*(0.05 0.05-0.05*0.05)=7098
像第二个分支的返回行数=3458,因为要排除前面分支的条件,对应的选择率=0.05-0.05*0.05,
所以返回行数=72800*(0.05-0.05*0.05)=3458
如果不加or_expand hints:
根据不走索引的默认选择率0.05计算:
sel=sel(a) sel(b)-sel(a)*sel(b)=0.05 0.05-0.05*0.05,
返回行数=72800*(0.05 0.05-0.05*0.05)=7098
-
explain plan for
-
select * from sel_test
-
where object_name > :var or object_name <= :var1;
-
-
select * from table(dbms_xplan.display);
-
plan_table_output
-
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-
plan hash value: 747827318
-
-
------------------------------------------------------------------------------
-
| id | operation | name | rows | bytes | cost (%cpu)| time |
-
------------------------------------------------------------------------------
-
| 0 | select statement | | 7098 | 914k| 395 (1)| 00:00:01 |
-
|* 1 | table access full| sel_test | 7098 | 914k| 395 (1)| 00:00:01 |
-
------------------------------------------------------------------------------
-
-
predicate information (identified by operation id):
-
---------------------------------------------------
-
-
1 - filter("object_name">:var or "object_name"<=:var1)
6)非索引列范围默认选择率0.05
通过2)可以得知,不走索引的like绑定变量默认选择率=0.05,和走索引回表的选择率一样,
比如:
-
explain plan for
-
select/*test*/ * from sel_test
-
where subobject_name like :var;
-
select * from table(dbms_xplan.display);
-
-
plan_table_output
-
----------------------------------------------------------------------------------------------------
-
plan hash value: 747827318
-
-
------------------------------------------------------------------------------
-
| id | operation | name | rows | bytes | cost (%cpu)| time |
-
------------------------------------------------------------------------------
-
| 0 | select statement | | 50 | 6600 | 395 (1)| 00:00:01 |
-
|* 1 | table access full| sel_test | 50 | 6600 | 395 (1)| 00:00:01 |
-
------------------------------------------------------------------------------
-
-
predicate information (identified by operation id):
-
---------------------------------------------------
-
1 - filter("subobject_name" like :var)
这里的rows=50,不是3640,为什么呢,因为num_nulls:71801
所以选择率=0.05*adjust_sel=0.05*(72800-71801)/72800=.000686126
所以返回行=round(72800*.000686126)=50
7)函数范围查询默认选择率0.05(函数范围,无具体值,有具体值都是5%)
-
create index idx1_sel_test on sel_test(object_id);
-
explain plan for
-
select * from sel_test
-
where object_id like :var;
-
select * from table(dbms_xplan.display);
-
-
plan_table_output
-
----------------------------------------------------------------------------------------------------
-
plan hash value: 747827318
-
-
------------------------------------------------------------------------------
-
| id | operation | name | rows | bytes | cost (%cpu)| time |
-
------------------------------------------------------------------------------
-
| 0 | select statement | | 3640 | 469k| 396 (1)| 00:00:01 |
-
|* 1 | table access full| sel_test | 3640 | 469k| 396 (1)| 00:00:01 |
-
------------------------------------------------------------------------------
-
-
predicate information (identified by operation id):
-
---------------------------------------------------
-
-
1 - filter(to_char("object_id") like :var)
like也是范围查询,object_id因为是number类型,隐式类型转换使用to_char,
对应选择率=3640/72800=0.05,object_id的num_nulls=2,可以忽略。
传入具体值也是5%:
-
explain plan for
-
select * from sel_test
-
where substr(object_id,1,2) > '123';
-
select * from table(dbms_xplan.display);
-
-
plan_table_output
-
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-
plan hash value: 747827318
-
-
------------------------------------------------------------------------------
-
| id | operation | name | rows | bytes | cost (%cpu)| time |
-
------------------------------------------------------------------------------
-
| 0 | select statement | | 3640 | 469k| 396 (1)| 00:00:01 |
-
|* 1 | table access full| sel_test | 3640 | 469k| 396 (1)| 00:00:01 |
-
------------------------------------------------------------------------------
-
-
predicate information (identified by operation id):
-
---------------------------------------------------
-
-
1 - filter(substr(to_char("object_id"),1,2)>'123')
8)函数等值查询,选择率是0.01(函数等值,无具体值,有具体值都是1%)
-
explain plan for
-
select * from sel_test
-
where substr(object_id,1,2) = :var;
-
select * from table(dbms_xplan.display);
-
plan_table_output
-
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-
plan hash value: 747827318
-
-
------------------------------------------------------------------------------
-
| id | operation | name | rows | bytes | cost (%cpu)| time |
-
------------------------------------------------------------------------------
-
| 0 | select statement | | 728 | 96096 | 396 (1)| 00:00:01 |
-
|* 1 | table access full| sel_test | 728 | 96096 | 396 (1)| 00:00:01 |
-
------------------------------------------------------------------------------
-
-
predicate information (identified by operation id):
-
---------------------------------------------------
-
-
1 - filter(substr(to_char("object_id"),1,2)=:var)
函数等值查询对应选择率0.01,所以返回行=72800*0.01=728
传入实际值也一样:
-
explain plan for
-
select * from sel_test
-
where substr(object_id,1,2) ='ta';
-
select * from table(dbms_xplan.display);
-
plan_table_output
-
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-
plan hash value: 747827318
-
-
------------------------------------------------------------------------------
-
| id | operation | name | rows | bytes | cost (%cpu)| time |
-
------------------------------------------------------------------------------
-
| 0 | select statement | | 728 | 96096 | 396 (1)| 00:00:01 |
-
|* 1 | table access full| sel_test | 728 | 96096 | 396 (1)| 00:00:01 |
-
------------------------------------------------------------------------------
-
-
predicate information (identified by operation id):
-
---------------------------------------------------
-
-
1 - filter(substr(to_char("object_id"),1,2)='ta')
下一篇:
oracle cbo 默认选择率及其问题凯发app官方网站的解决方案_part2