oracle cbo 默认选择率及其问题凯发app官方网站的解决方案-凯发app官方网站

凯发app官方网站-凯发k8官网下载客户端中心 | | 凯发app官方网站-凯发k8官网下载客户端中心
  • 博客访问: 1156273
  • 博文数量: 166
  • 博客积分: 0
  • 博客等级: 民兵
  • 技术积分: 3760
  • 用 户 组: 普通用户
  • 注册时间: 2011-03-30 13:00
个人简介

about me:oracle ace pro,optimistic,passionate and harmonious. focus on oracle,mysql and other database programming,peformance tuning,db design, j2ee,linux/aix,architecture tech,etc

文章分类

全部博文(166)

文章存档

2024年(21)

2023年(28)

2022年(43)

2020年(62)

2014年(3)

2013年(9)

相关博文
  • ·
  • ·
  • ·
  • ·
  • ·
  • ·
  • ·
  • ·
  • ·
  • ·

分类: oracle

2024-05-17 17:25:45

本部分主要研究默认选择率null数目影响、案例以及默认选择率系列总结。

4.默认选择率还要考虑null数目影响

对于计算null数目影响的:
adjust_sel=(num_rows-num_nulls)/num_rows


其中走索引选择率ix_sel,索引过滤选择率ix_sel_with_filters,
回表选择率如果没有额外过滤条件就是ix_sel_with_filters,有额外条件还要加上额外条件选择率*condition_filters


对于列有统计信息的,选择率=默认选择率*adjust_sel, adjust_sel=(num_rows-num_nulls)/num_rows,
如果绑定变量等无具体值的、使用函数等,可能使用默认选择率。


对于列无统计信息的,这个adjust_sel就没有了,所以就是默认选择率,如果对于num_nulls
很多的列,更加不准确。

也即有统计信息的默认选择率如果考虑null影响,要*adjust_sel.
比如subobject_name null较多:

点击(此处)折叠或打开

  1. select num_rows,sample_size from dba_tables where table_name='sel_test';

  2.   num_rows sample_size
  3. ---------- -----------
  4.      72800 72800


  5. select column_name,num_distinct,density,num_nulls,histogram
  6. from dba_tab_col_statistics
  7. where table_name='sel_test'
  8. and column_name='subobject_name';
  9. column_name num_distinct density num_nulls histogram
  10. -------------------- ------------ ---------- ---------- ------------------------------
  11. subobject_name 564 .00177305 71801 none

则adjust_sel=(num_rows-num_nulls)/num_rows
=(72800-71801)/72800
=0.013722527

建立索引如下:

点击(此处)折叠或打开

  1. create index idx2_sel_test on sel_test(subobject_name);
like范围有索引走索引默认ix_sel=0.009,ix_sel_with_filters=0.05


点击(此处)折叠或打开

  1. explain plan for
  2. select * from sel_test
  3. where subobject_name like :var;

  4. select * from table(dbms_xplan.display);
  5. plan_table_output
  6. --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  7. plan hash value: 2441733028

  8. -----------------------------------------------------------------------------------------------------
  9. | id | operation | name | rows | bytes | cost (%cpu)| time |
  10. -----------------------------------------------------------------------------------------------------
  11. | 0 | select statement | | 50 | 7500 | 5 (0)| 00:00:01 |
  12. | 1 | table access by index rowid batched| sel_test | 50 | 7500 | 5 (0)| 00:00:01 |
  13. |* 2 | index range scan | idx2_sel_test | 9 | | 2 (0)| 00:00:01 |
  14. -----------------------------------------------------------------------------------------------------

  15. predicate information (identified by operation id):
  16. ---------------------------------------------------

  17.    2 - access("subobject_name" like :var)
  18.        filter("subobject_name" like :var)

因为sub_object_name的null较多,还要考虑null影响选择率,所以ix_sel=
0.009*0.013722527,
所以索引扫描行数=round(72800*0.009*0.013722527)=9,
回表行数=round(72800*0.05*0.013722527)=50

5.示例分析


下面用join例子说简单分析:
删除扩展统计信息:

点击(此处)折叠或打开

  1. exec dbms_stats.drop_extended_stats(ownname=>user,tabname=>'sel_test',extension=>'(substr("object_name",1,1))');
  2. exec dbms_stats.drop_extended_stats(ownname=>user,tabname=>'sel_test',extension=>'(substr("owner",1,3))');
  3. exec dbms_stats.drop_extended_stats(ownname=>user,tabname=>'sel_test',extension=>'(substr("object_name",1,1),substr("owner",1,3))');

点击(此处)折叠或打开

  1. explain plan for
  2. select * from sel_test a,sel2 b
  3. where substr(a.object_name,1,1) = 'd' and substr(a.owner,1,3)='sys'
  4. and a.object_id=b.object_id;
  5. select * from table(dbms_xplan.display(null,null,'adaptive'));

执行计划如下,因为是19c,这里是adaptive plan,很显然使用adaptive plan也是不准的,无法收集函数条件统计信息,走nested loops,
没有走hash join:

点击(此处)折叠或打开

  1. plan_table_output
  2. --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  3. plan hash value: 2436822295

  4. --------------------------------------------------------------------------------------------
  5. | id | operation | name | rows | bytes | cost (%cpu)| time |
  6. --------------------------------------------------------------------------------------------
  7. | 0 | select statement | | 7 | 1974 | 410 (1)| 00:00:01 |
  8. |- * 1 | hash join | | 7 | 1974 | 410 (1)| 00:00:01 |
  9. | 2 | nested loops | | 7 | 1974 | 410 (1)| 00:00:01 |
  10. | 3 | nested loops | | 7 | 1974 | 410 (1)| 00:00:01 |
  11. |- 4 | statistics collector | | | | | |
  12. | * 5 | table access full | sel_test | 7 | 1050 | 396 (1)| 00:00:01 |
  13. | * 6 | index range scan | idx_sel2 | 1 | | 1 (0)| 00:00:01 |
  14. | 7 | table access by index rowid| sel2 | 1 | 132 | 2 (0)| 00:00:01 |
  15. |- 8 | table access full | sel2 | 1 | 132 | 2 (0)| 00:00:01 |
  16. --------------------------------------------------------------------------------------------

  17. predicate information (identified by operation id):
  18. ---------------------------------------------------

  19.    1 - access("a"."object_id"="b"."object_id")
  20.    5 - filter(substr("a"."object_name",1,1)='d' and substr("a"."owner",1,3)='sys')
  21.    6 - access("a"."object_id"="b"."object_id")

  22. note
  23. -----
  24.    - this is an adaptive plan (rows marked '-' are inactive)

如果使用dynamic_sampling(5),则执行计划走hash join:


点击(此处)折叠或打开

  1. explain plan for
  2. select/*dynamic_sampling(a 5)*/ * from sel_test a,sel2 b
  3. where substr(a.object_name,1,1) = 'd' and substr(a.owner,1,3)='sys'
  4. and a.object_id=b.object_id;
  5. select * from table(dbms_xplan.display(null,null,'adaptive'));
  6. plan_table_output
  7. --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  8. plan hash value: 2990091768

  9. --------------------------------------------------------------------------------------------
  10. | id | operation | name | rows | bytes | cost (%cpu)| time |
  11. --------------------------------------------------------------------------------------------
  12. | 0 | select statement | | 2474 | 681k| 794 (2)| 00:00:01 |
  13. | * 1 | hash join | | 2474 | 681k| 794 (2)| 00:00:01 |
  14. |- 2 | nested loops | | 2474 | 681k| 794 (2)| 00:00:01 |
  15. |- 3 | nested loops | | | | | |
  16. |- 4 | statistics collector | | | | | |
  17. | * 5 | table access full | sel_test | 2474 | 362k| 396 (1)| 00:00:01 |
  18. |- * 6 | index range scan | idx_sel2 | | | | |
  19. |- 7 | table access by index rowid| sel2 | 1 | 132 | 397 (2)| 00:00:01 |
  20. | 8 | table access full | sel2 | 72802 | 9384k| 397 (2)| 00:00:01 |
  21. --------------------------------------------------------------------------------------------

  22. predicate information (identified by operation id):
  23. ---------------------------------------------------

  24.    1 - access("a"."object_id"="b"."object_id")
  25.    5 - filter(substr("a"."object_name",1,1)='d' and substr("a"."owner",1,3)='sys')
  26.    6 - access("a"."object_id"="b"."object_id")

  27. note
  28. -----
  29.    - dynamic statistics used: dynamic sampling (level=2)
  30.    - this is an adaptive plan (rows marked '-' are inactive)

这里的hints使用:dynamic_sampling(a 5),只对a表动态采样5级别,但是note:dynamic statistics used: dynamic sampling (level=2)
将substr("a"."object_name",1,1)='d' and substr("a"."owner",1,3)='sys'估算为2474行。
使用dynamic_sampling(a 5),指定不准的表动态采样即可。


如果改成dynamic_sampling(5)则note:dynamic statistics used: dynamic sampling (level=5),这种是全部表采样,不好:

点击(此处)折叠或打开

  1. explain plan for
  2. select/*dynamic_sampling(5)*/ * from sel_test a,sel2 b
  3. where substr(a.object_name,1,1) = 'd' and substr(a.owner,1,3)='sys'
  4. and a.object_id=b.object_id;
  5. select * from table(dbms_xplan.display(null,null,'adaptive'));
  6. plan_table_output
  7. --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  8. plan hash value: 2990091768

  9. --------------------------------------------------------------------------------------------
  10. | id | operation | name | rows | bytes | cost (%cpu)| time |
  11. --------------------------------------------------------------------------------------------
  12. | 0 | select statement | | 3005 | 827k| 794 (2)| 00:00:01 |
  13. | * 1 | hash join | | 3005 | 827k| 794 (2)| 00:00:01 |
  14. |- 2 | nested loops | | 3005 | 827k| 794 (2)| 00:00:01 |
  15. |- 3 | nested loops | | | | | |
  16. |- 4 | statistics collector | | | | | |
  17. | * 5 | table access full | sel_test | 3005 | 440k| 396 (1)| 00:00:01 |
  18. |- * 6 | index range scan | idx_sel2 | | | | |
  19. |- 7 | table access by index rowid| sel2 | 1 | 132 | 397 (2)| 00:00:01 |
  20. | 8 | table access full | sel2 | 72802 | 9384k| 397 (2)| 00:00:01 |
  21. --------------------------------------------------------------------------------------------

  22. predicate information (identified by operation id):
  23. ---------------------------------------------------

  24.    1 - access("a"."object_id"="b"."object_id")
  25.    5 - filter(substr("a"."object_name",1,1)='d' and substr("a"."owner",1,3)='sys')
  26.    6 - access("a"."object_id"="b"."object_id")

  27. note
  28. -----
  29.    - dynamic statistics used: dynamic sampling (level=5)
  30.    - this is an adaptive plan (rows marked '-' are inactive)

可以看到note:dynamic statistics used: dynamic sampling (level=5)
sel_test估算为3005行。


下面使用扩展统计信息:
收集扩展统计信息如下:

点击(此处)折叠或打开

  1. --收集单列扩展
  2. exec dbms_stats.gather_table_stats(ownname=>user,tabname=>'sel_test',-
  3. method_opt=>'for columns size auto (substr(object_name,1,1)),(substr(owner,1,3)),',no_invalidate=>false);
  4. --收集组合扩展
  5. exec dbms_stats.gather_table_stats(ownname=>user,tabname=>'sel_test',-
  6. method_opt=>'for columns size auto (substr("object_name",1,1),substr("owner",1,3))',no_invalidate=>false);
  7. 先创建扩展列,然后在收集全表的,更准:
  8. exec dbms_stats.gather_table_stats(ownname=>user,tabname=>'sel_test',no_invalidate=>false);


点击(此处)折叠或打开

  1. explain plan for
  2. select * from sel_test a,sel2 b
  3. where substr(a.object_name,1,1) = 'd' and substr(a.owner,1,3)='sys'
  4. and a.object_id=b.object_id;
  5. select * from table(dbms_xplan.display(null,null,'adaptive'));
可以看到执行计划从nested loops变为走hash join,sel_test对函数组合条件估算为2631行,准确。


点击(此处)折叠或打开

  1. plan_table_output
  2. --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  3. plan hash value: 2990091768

  4. --------------------------------------------------------------------------------------------
  5. | id | operation | name | rows | bytes | cost (%cpu)| time |
  6. --------------------------------------------------------------------------------------------
  7. | 0 | select statement | | 2631 | 724k| 794 (2)| 00:00:01 |
  8. | * 1 | hash join | | 2631 | 724k| 794 (2)| 00:00:01 |
  9. |- 2 | nested loops | | 2631 | 724k| 794 (2)| 00:00:01 |
  10. |- 3 | nested loops | | | | | |
  11. |- 4 | statistics collector | | | | | |
  12. | * 5 | table access full | sel_test | 2631 | 385k| 396 (1)| 00:00:01 |
  13. |- * 6 | index range scan | idx_sel2 | | | | |
  14. |- 7 | table access by index rowid| sel2 | 1 | 132 | 397 (2)| 00:00:01 |
  15. | 8 | table access full | sel2 | 72802 | 9384k| 397 (2)| 00:00:01 |
  16. --------------------------------------------------------------------------------------------

  17. predicate information (identified by operation id):
  18. ---------------------------------------------------

  19.    1 - access("a"."object_id"="b"."object_id")
  20.    5 - filter(substr("object_name",1,1)='d' and substr("owner",1,3)='sys')
  21.    6 - access("a"."object_id"="b"."object_id")

  22. note
  23. -----
  24.    - this is an adaptive plan (rows marked '-' are inactive)

6.oracle cbo默认选择率总结


在oracle里,如果sql里的表无法获得需要的信息来计算选择率,可能就会使用默认选择率,常发生在以下几种情况下:
1)表有统计信息,列未收集统计信息,无法使用动态采样
2)表有统计信息,但是使用绑定变量,绑定变量无peeking
3) 表有统计信息,但是对列使用了函数等复杂条件,无法准确估算对应条件的选择率


默认选择率可能很多时候是不准的,会导致执行计划非optimal,这样需要针对不准情况对应修复:


针对1)情况,就是更新统计信息,补全没有收集或不准的统计信息。
针对第二种情况,为了避免修改_optim_peek_user_binds参数造成的全局影响,可以针对单个sql开启_optim_peek_user_binds,
使用sql profile绑定。
针对第三种情况,因为复杂条件无法计算准确选择率,利用使用扩展列统计信息、高级别动态采样、cardinality/opt_esimate hints、
sql profile等绑定执行计划。


另外默认选择率总结如下:
对于计算null数目影响的:
adjust_sel=(num_rows-num_nulls)/num_rows


其中走索引选择率ix_sel,索引过滤选择率ix_sel_with_filters,
回表选择率如果没有额外过滤条件就是ix_sel_with_filters,有额外条件还要加上额外条件选择率*condition_filters


对于列有统计信息的,final选择率=默认选择率*adjust_sel, adjust_sel=(num_rows-num_nulls)/num_rows,
如果绑定变量等无具体值的、使用函数等,可能使用默认选择率。


对于列无统计信息的,这个adjust_sel就没有了,所以就是默认选择率,如果对于num_nulls很多的列,更加不准确。


也即有统计信息的默认选择率如果考虑null影响,考虑adjust_sel,其final选择率=default_sel * adjust_sel


有统计信息默认选择率如下:
1)列等值条件:sel = max(1/num_distinct,density)*adjust_sel,不管是否走索引ix_sel和ix_sel_with_filters一样.

2) 列<>条件: sel = 1-sel(等值)= 1 - max(1/num_distinct,density)*adjust_sel ,其中ix_sel,ix_sel_with_filters一样

3) 索引列范围条件:(包括like) ix_sel = 0.009,其中ix_sel_with_filters=0.05,这样可能导致回表后的cardinality估算行数大于索引扫描行数

4) 索引列绑定变量范围条件: sel = 0.009 (无peeking),其中ix_sel = 0.009,ix_sel_with_filters=0.05

5)索引列多个范围and条件 ix_sel = 0.009/2 = 0.0045,其中ix_sel_with_filters还是按照(sel_1)*(sel_2)=0.05*0.05

6) 索引列or条件,如果or扩展走索引,按照单个条件计算 范围是ix_sel=0.009,等值是1/num_distinct.
 回表ix_sel_with_filters=(sel_1)*(sel_2)-(sel_1*sel_2)=0.05*0.05-0.05*0.05,按照or数目计算。

7)函数条件等值 sel = 0.01

8) 函数条件范围 sel = 0.05

9) 非索引列范围 sel = 0.05

10) 回表选择率 sel = ix_sel_with_filters * condition_filters,如果回表没有额外条件=ix_sel_with_filters
回表除了等值的,非函数等值=max(1/num_distinct,density)*adjust_sel,函数等值=0.01
,其他均为0.05




无统计信息默认选择率如下:
无统计信息的不用考虑adjest_sel,如果null数目较多,对cbo来说也不可知,所以null数目多的时候,可能影响大。

1)等值列条件无统计信息选择率,列有索引等值条件 ix_sel=0.004,ix_sel_with_filters=0.01

2)等值列条件无统计信息,列无索引列默认选择率sel=0.01,包括绑定变量无具体值,函数条件的等值查询,不管有没有统计信息都是1%。

3) 对于非索引列列无统计信息的范围查询,和有统计信息无具体绑定变量值一样,非索引列(包括函数)是5%

4)列无统计信息,范围条件走索引的选择率是ix_sel=0.009,ix_sel_with_filters=0.05,和索引列有统计信息,使用绑定变量无具体值的范围查询一样。

5) 列无统计信息,<>条件选择率是5%,和列有统计信息按照1-sel(等值)的不一样。


oracle的默认选择率一般很小:
比如索引列density,0.009,0.004几种,有索引的偏向于走索引。
对于不走索引或回表的其他选择率:等值的0.01,其他0.05。


mysql对于列无索引,无直方图的,也会采用默认选择率,大部分使用
access path selection in a relational database management system论文里的选择率,
一般比较大,
比如:
等值是10%,
>,<等是33%,
like,between是11%,
<>是0.9
比oracle默认选择率大很多,mysql默认选择率问题较多,
可以建索引或收集直方图解决,mysql没有动态采样,没有扩展统计信息,
没有cardinality hints



本系列结束。








阅读(25) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~
")); function link(t){ var href= $(t).attr('href'); href ="?url=" encodeuricomponent(location.href); $(t).attr('href',href); //setcookie("returnouturl", location.href, 60, "/"); }
网站地图