oracle组合索引选择率计算及如何选择组合索引的列顺序-凯发app官方网站

凯发app官方网站-凯发k8官网下载客户端中心 | | 凯发app官方网站-凯发k8官网下载客户端中心
  • 博客访问: 1156192
  • 博文数量: 166
  • 博客积分: 0
  • 博客等级: 民兵
  • 技术积分: 3760
  • 用 户 组: 普通用户
  • 注册时间: 2011-03-30 13:00
个人简介

about me:oracle ace pro,optimistic,passionate and harmonious. focus on oracle,mysql and other database programming,peformance tuning,db design, j2ee,linux/aix,architecture tech,etc

文章分类

全部博文(166)

文章存档

2024年(21)

2023年(28)

2022年(43)

2020年(62)

2014年(3)

2013年(9)

相关博文
  • ·
  • ·
  • ·
  • ·
  • ·
  • ·
  • ·
  • ·
  • ·
  • ·

分类: oracle

2024-05-08 17:28:28

接:oracle组合索引选择率计算及如何选择组合索引的列顺序_part1
        

2.前导列不是等值的选择率计算         

下面改下条件,语句如下:

点击(此处)折叠或打开

  1. select *
  2. from tab1 where id between 1000 and 40000 and code =159;

  3. 2 rows selected.

  4. elapsed: 00:00:00.00
执行计划如下:

点击(此处)折叠或打开

  1. elapsed: 00:00:00.00

  2. execution plan
  3. ----------------------------------------------------------
  4. plan hash value: 2211052296

  5. --------------------------------------------------------------------------
  6. | id | operation | name | rows | bytes | cost (%cpu)| time |
  7. --------------------------------------------------------------------------
  8. | 0 | select statement | | 1 | 20 | 103 (1)| 00:00:02 |
  9. |* 1 | table access full| tab1 | 1 | 20 | 103 (1)| 00:00:02 |
  10. --------------------------------------------------------------------------

  11. predicate information (identified by operation id):
  12. ---------------------------------------------------

  13.    1 - filter("code"=159 and "id"<=40000 and "id">=1000)


  14. statistics
  15. ----------------------------------------------------------
  16.           1 recursive calls
  17.           1 db block gets
  18.         384 consistent gets
  19.           0 physical reads
  20.           0 redo size
  21.         737 bytes sent via sql*net to client
  22.         520 bytes received via sql*net from client
  23.           2 sql*net roundtrips to/from client
  24.           0 sorts (memory)
  25.           0 sorts (disk)
  26.           2 rows processed
返回2行,全表扫描,为什么两个列都在索引里,从10w行里取2行,是全表扫描???
和leftmost prefix规则有关,前导列是范围,后面的列不能用于index access,只能是index filter,所以索引扫描是按照
id between 1000 and 40000,因为索引是按照前导列排序的,所以这个扫描了40%的数据量,走全表也正常。
通过cost=103也看出来全表的比索引的116小。


下面强制走索引分析,可以看到走索引的cost=116比全表扫描103大,所以未走索引,
从执行计划可以看到filter("code"=159),虽然access("id">=1000 and "code"=159 and "id"<=40000),但是这里的
"code"=159实际不参与索引access,而主要是索引filter:

点击(此处)折叠或打开

  1. select/*index(tab1 idx_tab1)*/ *
  2. from tab1 where id between 1000 and 40000 and code =159;
  3. 2 rows selected.

  4. elapsed: 00:00:00.00

  5. execution plan
  6. ----------------------------------------------------------
  7. plan hash value: 2722636538

  8. ----------------------------------------------------------------------------------------
  9. | id | operation | name | rows | bytes | cost (%cpu)| time |
  10. ----------------------------------------------------------------------------------------
  11. | 0 | select statement | | 1 | 20 | 116 (0)| 00:00:02 |
  12. | 1 | table access by index rowid| tab1 | 1 | 20 | 116 (0)| 00:00:02 |
  13. |* 2 | index range scan | idx_tab1 | 1 | | 115 (0)| 00:00:02 |
  14. ----------------------------------------------------------------------------------------

  15. predicate information (identified by operation id):
  16. ---------------------------------------------------

  17.    2 - access("id">=1000 and "code"=159 and "id"<=40000)
  18.        filter("code"=159)


  19. statistics
  20. ----------------------------------------------------------
  21.           0 recursive calls
  22.           0 db block gets
  23.         119 consistent gets
  24.           0 physical reads
  25.           0 redo size
  26.         745 bytes sent via sql*net to client
  27.         520 bytes received via sql*net from client
  28.           2 sql*net roundtrips to/from client
  29.           0 sorts (memory)
  30.           0 sorts (disk)
  31.           2 rows processed
id的列信息如下:

点击(此处)折叠或打开

  1. column_name num_distinct low_value real_low_value high_value real_high_value
  2. ------------------------------ ------------ -------------------- -------------- -------------------- ---------------
  3. id 100000 c102 1 c30b 100000
对应语句的索引访问条件是 id between 1000 and 40000,不包括"code"=159,所以选择率只是id条件的:         
ix_sel
=effective index selectivity
=sel(id between 1000 and 40000) 
=实际范围/总范围 1/num_distinct 1/num_distinct
=(40000-1000)/(100000-1) 1/100000 1/100000
= .3900239
=0.390024
sel(code=159)=1/63904=0.000015648


ix_sel_with_filter用于计算返回行数以及cost,所以要和将id和code条件选择率组合起来,因为是and条件,所以:
ix_sel_with_filter
= .3900239*0.000015648
=.0000061030939872
=.000006


对应总cost=
blevel
ceiling(leaf_blocks * effective index selectivity)
ceiling(clustering_factor * effective table selectivity)
=1
ceiling(290*0.390024)
ceiling(341*.000006)
=1 114 1
=116
注意这里的effective index selectivity按照索引访问条件只有id between 1000 and 40000,虽然执行计划里是:
access("id">=1000 and "code"=159 and "id"<=40000),这是显示的问题,要看后面的filter("code"=159),说明
"code"=159是index filtert条件,符合索引访问leftmost prefix规则,前导列code是非等值,后面的列id不能参与索引访问,
只能是index filter。
所以effective index selectivity按照id between 1000 and 40000计算,而不是前面的组合条件选择率,
所以索引访问cost=
blevel
ceiling(leaf_blocks * effective index selectivity)
=1
ceiling(290*0.390024)
=1 114
=115
这里的0.390024只是id选择率。
而执行计划里的rows则是对应条件组合的结果,包括了code过滤条件,所以返回行数=
rows=round(100000*.390125601*0.000015648)=1

对应10053:

点击(此处)折叠或打开

  1. access path: index (rangescan)
  2.     index: idx_tab1
  3.     resc_io: 116.00 resc_cpu: 8626877
  4.     ix_sel: 0.390024 ix_sel_with_filters: 0.000006
  5.     cost: 116.41 resp: 116.41 degree: 1
  6.   best:: accesspath: tablescan
  7.          cost: 103.17 degree: 1 resp: 103.17 card: 0.61 bytes: 0
注意cost和rows的计算方式,索引访问cost是按照索引访问条件计算的,不包括索引过滤条件,
而rows是按照整个条件计算的结果。

也就是说,执行计划里省略了真正扫描的行数(rows examined),这里扫描的行数是按照 id between 1000 and 40000 查找的,
因为按照id between 1000 and 40000行数接近总行数40%,如下:

点击(此处)折叠或打开

  1. select count(*) from tab1 where id between 1000 and 40000;
  2.  count(*)
  3. ----------
  4.      39001

返回3.9w行,扫描过程中按照code =159过滤,所以cost比前面id等值的大很多,所以索引效率要看真正参与index access的索引效率,
等值且选择性好的放前面,这样减少真正access访问的行数(也即减少io),减少index filter数目提高效率。

可以再看个例子,上面的2个例子回表之后没有额外条件,下面这个例子加上ext用于回表后的过滤
(不是ix_sel_with_filters,而是回表后的condition filter selectivity)。

sql如下:

点击(此处)折叠或打开

  1. select/*test*/ *
  2. from tab1 where id between 1000 and 40000 and code >=159 and ext like 'test5%';
对应的10053如下:

点击(此处)折叠或打开

  1. single table access path
  2.   single table cardinality estimation for tab1[tab1]

  3.   column (#3): ext(
  4.     avglen: 10 ndv: 100000 nulls: 0 density: 0.000010
  5.   column (#1): id(
  6.     avglen: 5 ndv: 100000 nulls: 0 density: 0.000010 min: 1 max: 100000
  7.   column (#2): code(
  8.     avglen: 5 ndv: 63904 nulls: 0 density: 0.000016 min: 2 max: 99998
  9.   table: tab1 alias: tab1
  10.     card: original: 100000.000000 rounded: 4736 computed: 4736.41 non adjusted: 4736.41
  11.   access path: tablescan
  12.     cost: 103.48 resp: 103.48 degree: 0
  13.       cost_io: 102.00 cost_cpu: 31119062
  14.       resp_io: 102.00 resp_cpu: 31119062
  15. kkofmx: index filter:"tab1"."code">=159

  16.   access path: index (skip-scan)
  17.     ss sel: 0.389418 andv (#skips): 39003.000000
  18.     ss io: 290.000000 vs. index scan io: 114.000000
  19.     skip scan rejected
  20.   access path: index (rangescan)
  21.     index: idx_tab1
  22.     resc_io: 248.00 resc_cpu: 20859874
  23.     ix_sel: 0.390024 ix_sel_with_filters: 0.389418
  24.     cost: 248.99 resp: 248.99 degree: 1
  25.   best:: accesspath: tablescan
  26.          cost: 103.48 degree: 1 resp: 103.48 card: 4736.41 bytes: 0
执行计划如下:

点击(此处)折叠或打开

  1. execution plan
  2. ----------------------------------------------------------
  3. plan hash value: 2211052296

  4. --------------------------------------------------------------------------
  5. | id | operation | name | rows | bytes | cost (%cpu)| time |
  6. --------------------------------------------------------------------------
  7. | 0 | select statement | | 4736 | 94720 | 103 (1)| 00:00:02 |
  8. |* 1 | table access full| tab1 | 4736 | 94720 | 103 (1)| 00:00:02 |
  9. --------------------------------------------------------------------------

  10. predicate information (identified by operation id):
  11. ---------------------------------------------------

  12.    1 - filter("id"<=40000 and "ext" like 'test5%' and "id">=1000 and
  13.               "code">=159)
强制索引:

点击(此处)折叠或打开

  1. select/*index(tab1)*/ *
  2. from tab1 where id between 1000 and 40000 and code >=159 and ext like 'test5%';


  3. execution plan
  4. ----------------------------------------------------------
  5. plan hash value: 2722636538

  6. ----------------------------------------------------------------------------------------
  7. | id | operation | name | rows | bytes | cost (%cpu)| time |
  8. ----------------------------------------------------------------------------------------
  9. | 0 | select statement | | 4736 | 94720 | 249 (1)| 00:00:03 |
  10. |* 1 | table access by index rowid| tab1 | 4736 | 94720 | 249 (1)| 00:00:03 |
  11. |* 2 | index range scan | idx_tab1 | 38942 | | 115 (0)| 00:00:02 |
  12. ----------------------------------------------------------------------------------------

  13. predicate information (identified by operation id):
  14. ---------------------------------------------------

  15.    1 - filter("ext" like 'test5%')
  16.    2 - access("id">=1000 and "code">=159 and "id"<=40000 and "code" is not null)
  17.        filter("code">=159)

点击(此处)折叠或打开

  1. code列统计信息如下:
  2. column_name num_distinct low_value real_low_value high_value real_high_value
  3. ------------------------------ ------------ -------------------- -------------- -------------------- ---------------
  4. code 63904 c103 2 c30a6463 99998
从执行计划以及索引访问leftmost prefix规则可以看出,这里的索引访问选择率是:"id">=1000 and "id"<=40000条件的,不包括code条件:


ix_sel
=effective index selectivity
=sel(id between 1000 and 40000) 
=实际范围/总范围 1/num_distinct 1/num_distinct
=(40000-1000)/(100000-1) 1/100000 1/100000
= .3900239
=0.390024


sel(code >=159)=实际范围/总范围 1/num_distinct
=(99998-159)/(99998-2) 1/63904
=  .998445586




则ix_sel_with_filter
= .3900239*.998445586
= 0.389418 




对应总cost=
blevel
ceiling(leaf_blocks * effective index selectivity)
ceiling(clustering_factor * effective table selectivity)
=1
ceiling(290*0.390024)
ceiling(341*0.389418 )
=1 114 133
=248


这里的248比执行计划里的249少了1,可能是没有考虑cpu cost的原因。

索引访问的cost还是按照实际index access的条件计算:
1 ceiling(290*0.390024)=115
回表的按照ix_sel_filters计算,貌似不包括回表的过滤条件"ext" like 'test5%',
也就是回表cost=ceiling(clustering_factor * effective table selectivity)
=ceiling(341*0.389418 )
=133


id=2返回行是包括index filter条件的,因此返回的行数=
ix_sel_with_filter*num_rows
=round(0.389418*100000)
=38942


回表过滤条件,不是ix_sel_filter,是单纯的索引回表后非索引访问条件的过滤,貌似cost没有包含这个过滤条件,是按照ix_sel_filter计算的。
但是对于返回行数,要考虑回表过滤条件选择率,也即"ext" like 'test5%'的选择率sel_ext,sel_ext*index_rows=返回rows


可以看下ext列统计信息:

点击(此处)折叠或打开

  1. column_name num_distinct low_value real_low_value high_value real_high_value
  2. -------------------- ------------ -------------------- -------------------- -------------------- --------------------
  3. ext 100000 7465737431 test1 746573743939393939 test99999

对于like后通配,实际上内部是将like转为>= and <来计算:
where "ext" like 'test5%'
按照下面条件计算:
"ext" >= 'test5' and "ext" < 'test6'
实际上不是转为test6,像mysql是转为>='test5\0\0\0\0\0\...'  and <=  'test5    ...'


然后选择率计算要将字符串转为数字,这里使用get_internal_value函数:

点击(此处)折叠或打开

  1. select get_internal_value('test5'),get_internal_value('test6') from dual;

  2. get_internal_value('test5')
  3. --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  4. get_internal_value('test6')
  5. --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  6. 604364106163032000000000000000000000
  7. 604364106164241000000000000000000000

具体的字符串如何转为数字的:
1)右侧补0,保留到15个字节
2)转为16进制
3)round左起21位
如果不用get_internal_value,也可以按照上面三点自己编写语句:

点击(此处)折叠或打开

  1. select to_char(round(to_number(utl_raw.cast_to_raw(rpad('test5',15,chr(0))),'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'),-21)) dec_val from dual;
  2. dec_val
  3. ----------------------------------------
  4. 604364106163032000000000000000000000


点击(此处)折叠或打开

  1. dingjun123@oradb> select get_internal_value('test1'),get_internal_value('test9999') from dual;

  2. get_internal_value('test1')
  3. --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  4. get_internal_value('test9999')
  5. --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  6. 604364106158196000000000000000000000
  7. 604364106168138000000000000000000000
对应的
sel=
(604364106164241-604364106163032)/(604364106168138-604364106158196) 1/100000
= 0.121615311
所以返回行数是:round(38942*0.121615)=4736。


注意:如果是like前通配则不一样,可以看到估算返回5000行,对应5%的选择率,这个是默认值,无法转为> and <。

点击(此处)折叠或打开

  1. select/*index(tab1)*/ *
  2. from tab1 where ext like '%test5';
  3. execution plan
  4. ----------------------------------------------------------
  5. plan hash value: 2211052296

  6. --------------------------------------------------------------------------
  7. | id | operation | name | rows | bytes | cost (%cpu)| time |
  8. --------------------------------------------------------------------------
  9. | 0 | select statement | | 5000 | 97k| 104 (2)| 00:00:02 |
  10. |* 1 | table access full| tab1 | 5000 | 97k| 104 (2)| 00:00:02 |
  11. --------------------------------------------------------------------------

  12. predicate information (identified by operation id):
  13. ---------------------------------------------------

  14.    1 - filter("ext" like '%test5' and "ext" is not null)

总结:

综上所述,要想建的索引效率高,也就是要提高index access的访问效率,减少访问io,其次要减少index filter以及回表的io。具体来说,组合索引要考虑:

1)组合索引要将常用且等值访问的列作为前导列,如果均匀分布较好,如果选择性好,则更佳。
   因为等值条件后续列还可能参与索引访问(like前通配,<>不行,<,>可以),这样索引访问效率是多列组合结果


2)将选择性好的列放前面,这个指的是条件的选择率,如果列是均匀的且等值访问,可以直接利用列计算条件选择率,
   如果分布不均或非等值,则要考虑具体条件选择率,也即条件返回的行数占总行数比例。
  


3)范围查询一般放后面,比如>,<,between,like后通配,date/timestamp列放后面,因为date/timestamp列一般都是用于范围查询,而非等值查询。


4)创建的索引尽可能地能够消除排序。


5)组合索引的每个列前导列都要按照上面3点考虑,特别是1,2,然后要考虑leftmost prefix规则,让前面的列条件尽可能是等值的,
   才能让后续列也能参与索引访问。




阅读(99) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~
")); function link(t){ var href= $(t).attr('href'); href ="?url=" encodeuricomponent(location.href); $(t).attr('href',href); //setcookie("returnouturl", location.href, 60, "/"); }
网站地图