oracle vs mysql 对组合索引包含in list执行计划研究(mysql部分)-凯发app官方网站

凯发app官方网站-凯发k8官网下载客户端中心 | | 凯发app官方网站-凯发k8官网下载客户端中心
  • 博客访问: 1156193
  • 博文数量: 166
  • 博客积分: 0
  • 博客等级: 民兵
  • 技术积分: 3760
  • 用 户 组: 普通用户
  • 注册时间: 2011-03-30 13:00
个人简介

about me:oracle ace pro,optimistic,passionate and harmonious. focus on oracle,mysql and other database programming,peformance tuning,db design, j2ee,linux/aix,architecture tech,etc

文章分类

全部博文(166)

文章存档

2024年(21)

2023年(28)

2022年(43)

2020年(62)

2014年(3)

2013年(9)

相关博文
  • ·
  • ·
  • ·
  • ·
  • ·
  • ·
  • ·
  • ·
  • ·
  • ·

分类: mysql/postgresql

2024-05-13 17:31:18

上一篇:oracle vs mysql 对组合索引包含in list执行计划研究(oracle部分)_part1
本文主要研究下组合索引包含in条件(多个值),在单表查询,关联查询这两种sql查询结果在oracle和mysql里的区别。
oracle具有强大的优化器,一般来说,组合索引在oracle里不管是单表还是关联查询,只要符合leftmost prefix规则,都可以用于index access,只要统计信息等是准确的。
mysql的优化器相对来说,要弱不少,很多功能不够健全,单表对于组合索引包含in的处理貌似没有什么问题,但是join情况下,
包含in list组合索引的表是被驱动表,则问题比较严重,卖个关子,详细见本文mysql部分讲述。

下面看下mysql部分:

2.mysql里组合索引有in条件的情况


2.1)mysql单表组合索引in list情况:
mysql貌似没有inlist iterator算子,能否实现inlist iterator效果:
建表语句如下:

点击(此处)折叠或打开

  1. drop table t1;
  2. drop table t2;
  3. create table t1 as select * from employees;
  4. alter table t1 add status int default 0;
  5. update t1 set status=ceil(rand()*10);
  6. create table t2 as select * from t1;
t1,t2表结构:

点击(此处)折叠或打开

  1. t1,t2表结构:
  2. ------------------------------------------------------
  3. | field | type | null | key | default | extra |
  4. ------------------------------------------------------
  5. | emp_no | int | no | | null | |
  6. | birth_date | date | no | | null | |
  7. | first_name | varchar(14) | no | | null | |
  8. | last_name | varchar(16) | no | | null | |
  9. | gender | enum('m','f') | no | | null | |
  10. | hire_date | date | no | | null | |
  11. | status | int | yes | | 0 | |
  12. ------------------------------------------------------
创建索引,将选择性好的放前面:
create index idx1_t1 on t1(first_name,status);



###对于mysql,看组合索引用了几个列参与索引access,可以通过索引列key_len查看,
也可以通过树形执行计划sing idx1_t1 over 有几个列组合查看。
或者可以查看status变量handler确定索引扫描区间数:
handler_read_key和handler_read_next,
handler_read_next是按照索引顺序扫描下一行的次数,也可以代表回表次数(比如有icp可以减少这
个值),
所以handler_read_next也不能判断索引实际扫描的行数,而是索引扫描结果行数。

看下执行计划:有using index condition,说明第二个列status in (1,2,8)还是icp条件,
索引access用到几个列,可以通过key_len看:
first_name的key_len=14*3 2 (变长的 2,utf8mb3是3个字节)=44,
status是int型,对应的key_len=4 1(允许null 1)=5
可以看到执行计划的key_len=49,说明两个条件都用到。


explain select * from t1 
where t1.first_name='saniya' and  t1.status in (1,2,8);

点击(此处)折叠或打开

  1. -------------------------------------------------------------------------------------------------------------------------
  2. | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | extra |
  3. -------------------------------------------------------------------------------------------------------------------------
  4. | 1 | simple | t1 | null | range | idx1_t1 | idx1_t1 | 49 | null | 78 | 100.00 | using index condition |
  5. -------------------------------------------------------------------------------------------------------------------------

从树形计划上看:using idx1_t1 over 转为两个or分支,也就是2个区间,说明两个条件都用到,
对应的还有icp,不过这个icp貌似作用不大:

点击(此处)折叠或打开

  1. explain format=tree
  2. select * from t1
  3. where t1.first_name='saniya' and t1.status in (1,2,8)\g
  4. *************************** 1. row ***************************
  5. explain: -> index range scan on t1 using idx1_t1 over
  6. (first_name = 'saniya' and status = 1) or (first_name = 'saniya' and status = 2) or (first_name = 'saniya' and status = 8),
  7. with index condition: ((t1.first_name = 'saniya') and (t1.`status` in (1,2,8))) (cost=42.35 rows=78)


可以查看实际执行信息:

点击(此处)折叠或打开

  1. flush status;
  2. explain analyze
  3. select * from t1
  4. where t1.first_name='saniya' and t1.status in (1,2,8)\g
  5. *************************** 1. row ***************************
  6. explain: -> index range scan on t1 using idx1_t1 over (first_name = 'saniya' and status = 1)
  7. or (first_name = 'saniya' and status = 2) or (first_name = 'saniya' and status = 8),
  8. with index condition: ((t1.first_name = 'saniya') and (t1.`status` in (1,2,8)))
  9. (cost=42.35 rows=78) (actual time=0.038..0.668 rows=78 loops=1)

通过show status查看handler信息,可以看到handler_read_key=3,对应三个索引区间。
说明first_name和status都参与索引扫描,执行计划里显示的icp基本用不着,这时候
handler_read_next=扫描的行数。

点击(此处)折叠或打开

  1. show status like '%handler%';
  2. -----------------------------------
  3. | variable_name | value |
  4. -----------------------------------
  5. | handler_commit | 1 |
  6. | handler_delete | 0 |
  7. | handler_discover | 0 |
  8. | handler_external_lock | 2 |
  9. | handler_mrr_init | 0 |
  10. | handler_prepare | 0 |
  11. | handler_read_first | 0 |
  12. | handler_read_key | 3 |
  13. | handler_read_last | 0 |
  14. | handler_read_next | 78 |
  15. | handler_read_prev | 0 |
  16. | handler_read_rnd | 0 |
  17. | handler_read_rnd_next | 0 |
  18. | handler_rollback | 0 |
  19. | handler_savepoint | 0 |
  20. | handler_savepoint_rollback | 0 |
  21. | handler_update | 0 |
  22. | handler_write | 0 |
  23. -----------------------------------

  24. select ps_thread_id(connection_id());
  25. -------------------------------
  26. | ps_thread_id(connection_id()) |
  27. -------------------------------
  28. | 50 |
  29. -------------------------------

  30. 通过检查examined_rows=78行,也符合索引扫描行数,因为使用explain analyze,rows_sent=1:
  31. ----------------------------------------------------
  32. | thread_id | rows_affected | rows_sent | rows_examined |
  33. ----------------------------------------------------
  34. | 50 | 0 | 1 | 78 |
  35. ----------------------------------------------------

下面将索引顺序换一下,貌似都和in非前导列的一样,都有icp,都是改成or条件(实际上这个icp没
有啥用filtered=100):
将选择性不好的列放前面:
create index idx2_t1 on t1(status,first_name);


通过key_len和树形计划的using idx2_t1 over,可以看到两个列条件都用到:
现在让两个列都是in,看是否能转为2*3=6个索引区间:

点击(此处)折叠或打开

  1. explain select/*index(t1 idx2_t1)*/ * from t1
  2. where t1.first_name in('saniya','aamer') and t1.status in (1,2,8);
  3. -------------------------------------------------------------------------------------------------------------------------
  4. | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | extra |
  5. -------------------------------------------------------------------------------------------------------------------------
  6. | 1 | simple | t1 | null | range | idx2_t1 | idx2_t1 | 49 | null | 144 | 100.00 | using index condition |
  7. -------------------------------------------------------------------------------------------------------------------------

  8. select count(*) from t1 where status in (1,2,8);
  9. ----------
  10. | count(*) |
  11. ----------
  12. | 89751 |
  13. ----------

  14. select count(*) from t1
  15. where t1.first_name in('saniya','aamer')
  16. and t1.status in (1,2,8);
  17. ----------
  18. | count(*) |
  19. ----------
  20. | 144 |
  21. ----------

实际执行看:

点击(此处)折叠或打开

  1. flush status;
  2. explain analyze
  3. select/*index(t1 idx2_t1)*/ * from t1
  4. where t1.first_name in('saniya','aamer') and t1.status in (1,2,8)\g
  5. *************************** 1. row ***************************
  6. explain: -> index range scan on t1 using idx2_t1 over
  7. (status = 1 and first_name = 'aamer') or (status = 1 and first_name = 'saniya')
  8. or (4 more), with index condition: ((t1.first_name in ('saniya','aamer')) and (t1.`status` in (1,2,8)))
  9. (cost=78.34 rows=144) (actual time=0.055..2.067 rows=144 loops=1)
执行计划里转为的or超过3个后面变为or (n more)


可以看到转为6个扫描区间:handler_read_key=6,
icp的实际用不上,所以两个列都是index access,对应handler_read_next和rows_examined都是
扫描的行数144:

点击(此处)折叠或打开

  1. show status like '%handler%';
  2. -----------------------------------
  3. | variable_name | value |
  4. -----------------------------------
  5. | handler_commit | 1 |
  6. | handler_delete | 0 |
  7. | handler_discover | 0 |
  8. | handler_external_lock | 2 |
  9. | handler_mrr_init | 0 |
  10. | handler_prepare | 0 |
  11. | handler_read_first | 0 |
  12. | handler_read_key | 6 |
  13. | handler_read_last | 0 |
  14. | handler_read_next | 144 |
  15. | handler_read_prev | 0 |
  16. | handler_read_rnd | 0 |
  17. | handler_read_rnd_next | 0 |
  18. | handler_rollback | 0 |
  19. | handler_savepoint | 0 |
  20. | handler_savepoint_rollback | 0 |
  21. | handler_update | 0 |
  22. | handler_write | 0 |
  23. -----------------------------------

  24. select thread_id,rows_affected,rows_sent,rows_examined from performance_schema.events_statements_current where thread_id=50;
  25. ----------------------------------------------------
  26. | thread_id | rows_affected | rows_sent | rows_examined |
  27. ----------------------------------------------------
  28. | 50 | 0 | 1 | 144 |
  29. ----------------------------------------------------



总结:mysql单表访问,组合索引in不管是否是前导列,都能按照leftmost prefix规则,转为or,然后
转为多个索引区间(interval),
能够充分利用组合索引条件进行index access,执行计划里虽然有icp,基本用不着。
mysql没有inlist iterator算子,可以将in list转为多个区间从而实现单表访问inlist iterator效果。




2.2)多表join,inlist表作为被驱动表
看能否实现和单表一样的效果


先给t2的last_name建个索引,让t2做驱动表,t1被驱动:
create index idx_t2 on t2(last_name);




mysql组合索引有in的,如果in里只有一个值,和oracle一样,也会转为等值,则组合索引都是
等值的,可以直接access


mysql和oracle对于组合索引中条件in有多个值的的处理貌似不一样:


###mysql没有index inlist iterator算子,如果组合索引前导列是in并且有多个值,则不能转为多个
等值的组合条件,
这样只能in的列作为access,后续列作为icp,如果关联查询且被驱动走nl,则in的列作为单独条件,
作为索引access效率则可能很低,
所以在mysql里,关联列作为索引前导列较好,in的作为非前导列(用于icp条件,可能有bug将in
条件放到回表filter,则可能非常影响效率)。


1)不管in是否多个值,列是否是前导列,则都有icp
2)如果in的是非前导列,则in作为icp条件,这个和oracle一样
3)如果in是前导列,则in条件列对应的后续列作为icp条件,这个和oracle不一样,oracle有inlist
iterator算子,
  将in转为or,变为多个索引等值条件,然后避免icp,mysql没有inlist interator算子,in作为前导列
在关联查询里效率可能很低,将关联列作为前导列较好。


###在mysql里in条件避免不了icp,所以关联查询的被驱动表,将关联列作为索引前导列,这样
可以access条件,in条件列作为非前导列使用icp,
如果将in条件作为前导列,则关联列只能用于icp,可能效率很低。

驱动表730行,被驱动表循环730次:

点击(此处)折叠或打开

  1. select count(*) from t2 where last_name like 'ar%';
  2. ----------
  3. | count(*) |
  4. ----------
  5. | 730 |
  6. ----------


###1.先测试关联列是前导列,然后 in list组合索引,走nested loops,t1被驱动:

点击(此处)折叠或打开

  1. explain select * from t1,t2
  2. where t1.first_name=t2.first_name and t1.status in (1,3,8)
  3. and t2.last_name like 'ar%';
通过执行计划看,多表join和单表不一样,t1被驱动,走idx1_t1(first_name,status),对应的
key_len是44,不是49(两个列都用上是49),
可以看到filtered=59.26,也就是status in (1,3,8)起类似过滤效果是真正icp,过滤了接近40%的
数据,这个和oracle的多表join不一样,
在mysql里貌似多表join的索引有in list,不能让多列都走index access:


点击(此处)折叠或打开

  1. ----------------------------------------------------------------------------------------------------------------------------------------------
  2. | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | extra |
  3. ----------------------------------------------------------------------------------------------------------------------------------------------
  4. | 1 | simple | t2 | null | range | idx_t2 | idx_t2 | 50 | null | 730 | 100.00 | using index condition |
  5. | 1 | simple | t1 | null | ref | idx2_t1,idx1_t1 | idx1_t1 | 44 | employees.t2.first_name | 234 | 59.26 | using index condition |
  6. ----------------------------------------------------------------------------------------------------------------------------------------------
  7. 2 rows in set, 1 warning (0.00 sec)

  8. flush status;
  9. explain analyze
  10. select * from t1,t2
  11. where t1.first_name=t2.first_name and t1.status in (1,3,8)
  12. and t2.last_name like 'ar%'\g
耗时0.22s,索引访问只能用到first_name,这里的first_name选择性比较好,这样status条件走不了
索引其实差别不大
改成union all效果不好,所以将关联列作为前导列较好,这样可以用到前导列索引

show status like '%handler%';

handler_read_key=驱动表索引区间1 被驱动表730次(每次一个区间)=731,
handler_read_next这时候就不是索引真正扫描的次数了,因为被驱动表t1只走first_name列索引
扫描,还有icp:status in (1,3,8),
这时候只是结果行数,相应的examined_rows也一样,这时候都是结果行数,结果行数包括驱动
表t2和t1的索引结果行数。

点击(此处)折叠或打开

  1. (root@localhost) [employees_15:21:17]> show status like '%handler%';
  2. -----------------------------------
  3. | variable_name | value |
  4. -----------------------------------
  5. | handler_commit | 1 |
  6. | handler_delete | 0 |
  7. | handler_discover | 0 |
  8. | handler_external_lock | 4 |
  9. | handler_mrr_init | 0 |
  10. | handler_prepare | 0 |
  11. | handler_read_first | 0 |
  12. | handler_read_key | 731 |
  13. | handler_read_last | 0 |
  14. | handler_read_next | 52626 |
  15. | handler_read_prev | 0 |
  16. | handler_read_rnd | 0 |
  17. | handler_read_rnd_next | 0 |
  18. | handler_rollback | 0 |
  19. | handler_savepoint | 0 |
  20. | handler_savepoint_rollback | 0 |
  21. | handler_update | 0 |
  22. | handler_write | 0 |
  23. -----------------------------------

  24. select thread_id,rows_affected,rows_sent,rows_examined from performance_schema.events_statements_current where thread_id=50;
  25. ----------------------------------------------------
  26. | thread_id | rows_affected | rows_sent | rows_examined |
  27. ----------------------------------------------------
  28. | 50 | 0 | 1 | 52626 |
  29. ----------------------------------------------------
  30. 1 row in set (0.01 sec)

  31. flush status;
  32. explain analyze
  33. select * from t1,t2
  34. where t1.first_name=t2.first_name and t1.status in (1)
  35. and t2.last_name like 'ar%'\g

如果in只有一个值,转为=,则访问条件是using idx1_t1 (first_name=t2.first_name, status=1),
没有问题:

点击(此处)折叠或打开

  1. *************************** 1. row ***************************
  2. explain: -> nested loop inner join (cost=7653.14 rows=17026) (actual time=0.078..67.316 rows=17234 loops=1)
  3.     -> index range scan on t2 using idx_t2 over ('ar\0\0\0\0\0\0\0\0\0\0\0\0\0\0' <= last_name <= 'ar??????????????'),
  4.      with index condition: (t2.last_name like 'ar%') (cost=328.76 rows=730) (actual time=0.049..6.793 rows=730 loops=1)
  5.     -> index lookup on t1 using idx1_t1 (first_name=t2.first_name, status=1)
  6.     (cost=7.70 rows=23) (actual time=0.009..0.081 rows=24 loops=730)

  7. 普通执行计划没有icp,key_len=49,两个列都用上:
  8. ------------------------------------------------------------------------------------------------------------------------------------------------------------
  9. | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | extra |
  10. ------------------------------------------------------------------------------------------------------------------------------------------------------------
  11. | 1 | simple | t2 | null | range | idx_t2 | idx_t2 | 50 | null | 730 | 100.00 | using index condition |
  12. | 1 | simple | t1 | null | ref | idx2_t1,idx1_t1,idx3_t1 | idx1_t1 | 49 | employees.t2.first_name,const | 23 | 100.00 | null |
  13. ------------------------------------------------------------------------------------------------------------------------------------------------------------

  14. 扫描的行数没有问题,被驱动t1的真实rows=24,这个是平均值。 730*24=17520略大于真实行数,然后驱动表700行,实际扫描17964行。
  15. show status like 'handle%';
  16. -----------------------------------
  17. | variable_name | value |
  18. -----------------------------------
  19. | handler_commit | 1 |
  20. | handler_delete | 0 |
  21. | handler_discover | 0 |
  22. | handler_external_lock | 4 |
  23. | handler_mrr_init | 0 |
  24. | handler_prepare | 0 |
  25. | handler_read_first | 0 |
  26. | handler_read_key | 731 |
  27. | handler_read_last | 0 |
  28. | handler_read_next | 17964 |
  29. | handler_read_prev | 0 |
  30. | handler_read_rnd | 0 |
  31. | handler_read_rnd_next | 0 |
  32. | handler_rollback | 0 |
  33. | handler_savepoint | 0 |
  34. | handler_savepoint_rollback | 0 |
  35. | handler_update | 0 |
  36. | handler_write | 0 |
  37. -----------------------------------

  38. select thread_id,rows_affected,rows_sent,rows_examined from performance_schema.events_statements_current where thread_id=50;
  39. ----------------------------------------------------
  40. | thread_id | rows_affected | rows_sent | rows_examined |
  41. ----------------------------------------------------
  42. | 50 | 0 | 1 | 17964 |
  43. ----------------------------------------------------

###2:测试in list列是前导列,关联条件列非前导列,被驱动走索引
先删除first_name前导列索引:
drop index idx1_t1 on t1;


单个值同样没有问题:using idx2_t1 (status=1, first_name=t2.first_name).

点击(此处)折叠或打开

  1. flush status;
  2. explain analyze
  3. select * from t1,t2
  4. where t1.first_name=t2.first_name and t1.status in (1)
  5. and t2.last_name like 'ar%'\g

  6. *************************** 1. row ***************************
  7. explain: -> nested loop inner join (cost=7741.93 rows=17233) (actual time=0.097..77.225 rows=17234 loops=1)
  8.     -> index range scan on t2 using idx_t2 over ('ar\0\0\0\0\0\0\0\0\0\0\0\0\0\0' <= last_name <= 'ar??????????????'), with index condition: (t2.last_name like 'ar%') (cost=328.76 rows=730) (actual time=0.053..12.567 rows=730 loops=1)
  9.     -> index lookup on t1 using idx2_t1 (status=1, first_name=t2.first_name) (cost=7.80 rows=24) (actual time=0.011..0.086 rows=24 loops=730)


  10. 前导列是in有多个值,关联列是非前导列,走了hash join,0.22s:
  11. explain analyze
  12. select * from t1,t2
  13. where t1.first_name=t2.first_name and t1.status in (1,3,8)
  14. and t2.last_name like 'ar%'\g

  15. *************************** 1. row ***************************
  16. explain: -> inner hash join (t1.first_name = t2.first_name) (cost=1306095.83 rows=763905) (actual time=6.099..329.593 rows=51896 loops=1)
  17.     -> filter: (t1.`status` in (1,3,8)) (cost=25.42 rows=17657) (actual time=0.029..281.806 rows=89911 loops=1)
  18.         -> table scan on t1 (cost=25.42 rows=297939) (actual time=0.019..258.542 rows=300024 loops=1)
  19.     -> hash
  20.         -> index range scan on t2 using idx_t2 over ('ar\0\0\0\0\0\0\0\0\0\0\0\0\0\0' <= last_name <= 'ar??????????????'),
  21.         with index condition: (t2.last_name like 'ar%') (cost=328.76 rows=730) (actual time=0.035..5.303 rows=730 loops=1)

  22. explain analyze树形计划看的cost=1306095,和json,实际估算的costs不一样:12906876,json的和普通计划的一样

  23. show status like '%last_query_cost%';
  24. ----------------------------------
  25. | variable_name | value |
  26. ----------------------------------
  27. | last_query_cost | 12906876.294155 |
  28. ----------------------------------
  29. explain: {
  30.   "query_block": {
  31.     "select_id": 1,
  32.     "cost_info": {
  33.       "query_cost": "12906876.29"
  34.     },

禁用hash join,竟然t1被驱动走全表,没有选择组合索引,说明走索引效率低cost大,
因为删除了idx1_t1(first_name,status)索引,这里的关联列没有索引,而是在status列上有前导列
索引,
mysql基本不选择t1走索引,可以通过possible_keys看出,很显然,这里的cost计算不准才选择
这种不走索引的计划:   
"query_cost": "22313643.61"     < 走idx2_t1索引的68339620.40

点击(此处)折叠或打开

  1. explain
  2. select/*no_bnl(t1,t2) */ * from t1,t2
  3. where t1.first_name=t2.first_name and t1.status in (1,3,8)
  4. and t2.last_name like 'ar%';

  5. --------------------------------------------------------------------------------------------------------------------------
  6. | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | extra |
  7. --------------------------------------------------------------------------------------------------------------------------
  8. | 1 | simple | t2 | null | range | idx_t2 | idx_t2 | 50 | null | 730 | 100.00 | using index condition |
  9. | 1 | simple | t1 | null | all | idx2_t1 | null | null | null | 297939 | 5.93 | using where |
  10. --------------------------------------------------------------------------------------------------------------------------

通过optimizer_trace查看:
先计算单表走索引按照status查询的cost:

点击(此处)折叠或打开

  1. "analyzing_range_alternatives": {
  2.                     "range_scan_alternatives": [
  3.                       {
  4.                         "index": "idx2_t1",
  5.                         "ranges": [
  6.                           "status = 1",
  7.                           "status = 3",
  8.                           "status = 8"
  9.                         ],
  10.                         "index_dives_for_eq_ranges": true,
  11.                         "rowid_ordered": false,
  12.                         "using_mrr": false,
  13.                         "index_only": false,
  14.                         "in_memory": 0.111617,
  15.                         "rows": 176572,
  16.                         "cost": 75958.3,
  17.                         "chosen": false,
  18.                         "cause": "cost"
  19.                       }

mysql没有考虑t1走idx2_t1索引,直接按照全表扫描来,没有考虑走nested loops情况下,可以
走组合索引选择率,比前面计算的只走status要小很多的情况,
所以直接走全表扫描,也没有考虑hash join:  

点击(此处)折叠或打开

  1. "cost_for_plan": 328.76,
  2.                 "rest_of_plan": [
  3.                   {
  4.                     "plan_prefix": [
  5.                       "`t2`"
  6.                     ],
  7.                     "table": "`t1`",
  8.                     "best_access_path": {
  9.                       "considered_access_paths": [
  10.                         {
  11.                           "rows_to_scan": 297939,
  12.                           "filtering_effect": [
  13.                           ],
  14.                           "final_filtering_effect": 0.592645,
  15.                           "access_type": "scan",
  16.                           "resulting_rows": 176572,
  17.                           "cost": 2.23133e07,
  18.                           "chosen": true
  19.                         }
  20.                       ]
  21.                     },

所以t2作为被驱动表,如果关联列不是前导列,则必须要有前导列且等值条件(in一个值也转为
等值),才可以让t2走nested loops,走索引。


强制走索引,可以看到rows=176572,filtered=10,key_len=5,索引效率很低,只走了status列
索引,first_name是icp:
json显示的执行 "query_cost": "68339620.40",走索引的cost大,这个计算不准,所以默认走了
全表

点击(此处)折叠或打开

  1. explain
  2. select/*no_bnl(t1,t2) index(t1 idx2_t1) */ * from t1,t2
  3. where t1.first_name=t2.first_name and t1.status in (1,3,8)
  4. and t2.last_name like 'ar%';
  5. ---------------------------------------------------------------------------------------------------------------------------
  6. | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | extra |
  7. ---------------------------------------------------------------------------------------------------------------------------
  8. | 1 | simple | t2 | null | range | idx_t2 | idx_t2 | 50 | null | 730 | 100.00 | using index condition |
  9. | 1 | simple | t1 | null | range | idx2_t1 | idx2_t1 | 5 | null | 176572 | 10.00 | using index condition |
  10. ---------------------------------------------------------------------------------------------------------------------------

explain analyze结果:执行耗时16s

点击(此处)折叠或打开

  1. explain analyze select/*no_bnl(t1,t2) index(t1 idx2_t1) */ * from t1,t2
  2. where t1.first_name=t2.first_name and t1.status in (1,3,8)
  3. and t2.last_name like 'ar%'\g
  4. *************************** 1. row ***************************
  5. explain: -> nested loop inner join (cost=16180534643.87 rows=128897560) (actual time=8.275..16590.253 rows=51896 loops=1)
  6.     -> index range scan on t2 using idx_t2 over ('ar\0\0\0\0\0\0\0\0\0\0\0\0\0\0' <= last_name <= 'ar??????????????'), with index condition: (t2.last_name like 'ar%') (cost=328.76 rows=730) (actual time=0.083..8.235 rows=730 loops=1)
  7.     -> index range scan on t1 using idx2_t1 over (status = 1) or (status = 3) or (status = 8), with index condition: ((t1.first_name = t2.first_name) and (t1.`status` in (1,3,8))) (cost=75958.29 rows=176572) (actual time=3.688..22.707 rows=71 loops=730)
  8.     

  9. 1 row in set (16.62 sec)


可以看到,不走关联列索引,走status in条件索引,效率比前面走first_name列索引差很多,因为
status列索引选择率不好。
可以看到,前导列in list 非前导列关联的,也不能全部走index access,和单表不一样。


这种前导列in list且是非关联列效率很低的,既然in list走不了多列索引扫描,那么改成等值就
可以了,
改为union all提升很明显,这样可以利用组合索引进行扫描:执行时间从16s到0.29s:

点击(此处)折叠或打开

  1. explain analyze
  2. select/*index(t1 idx2_t1)*/ * from t1,t2
  3. where t1.first_name=t2.first_name and t1.status in (1)
  4. and t2.last_name like 'ar%'
  5. union all
  6. select/*index(t1 idx2_t1)*/ * from t1,t2
  7. where t1.first_name=t2.first_name and t1.status in (3)
  8. and t2.last_name like 'ar%'
  9. union all
  10. select/*index(t1 idx2_t1)*/ * from t1,t2
  11. where t1.first_name=t2.first_name and t1.status in (8)
  12. and t2.last_name like 'ar%'\g
  13. *************************** 1. row ***************************
  14. explain: -> append (actual time=0.064..261.577 rows=51896 loops=1)
  15.     -> stream results (cost=7741.93 rows=17233) (actual time=0.063..89.491 rows=17234 loops=1)
  16.         -> nested loop inner join (cost=7741.93 rows=17233) (actual time=0.056..64.456 rows=17234 loops=1)
  17.             -> index range scan on t2 using idx_t2 over ('ar\0\0\0\0\0\0\0\0\0\0\0\0\0\0' <= last_name <= 'ar??????????????'), with index condition: (t2.last_name like 'ar%') (cost=328.76 rows=730) (actual time=0.035..3.081 rows=730 loops=1)
  18.             -> index lookup on t1 using idx2_t1 (status=1, first_name=t2.first_name) (cost=7.80 rows=24) (actual time=0.006..0.082 rows=24 loops=730)
  19.     -> stream results (cost=7741.93 rows=17233) (actual time=0.048..83.117 rows=17265 loops=1)
  20.         -> nested loop inner join (cost=7741.93 rows=17233) (actual time=0.044..59.954 rows=17265 loops=1)
  21.             -> index range scan on t2 using idx_t2 over ('ar\0\0\0\0\0\0\0\0\0\0\0\0\0\0' <= last_name <= 'ar??????????????'), with index condition: (t2.last_name like 'ar%') (cost=328.76 rows=730) (actual time=0.031..2.674 rows=730 loops=1)
  22.             -> index lookup on t1 using idx2_t1 (status=3, first_name=t2.first_name) (cost=7.80 rows=24) (actual time=0.006..0.077 rows=24 loops=730)
  23.     -> stream results (cost=7741.93 rows=17233) (actual time=0.043..85.437 rows=17397 loops=1)
  24.         -> nested loop inner join (cost=7741.93 rows=17233) (actual time=0.039..61.487 rows=17397 loops=1)
  25.             -> index range scan on t2 using idx_t2 over ('ar\0\0\0\0\0\0\0\0\0\0\0\0\0\0' <= last_name <= 'ar??????????????'), with index condition: (t2.last_name like 'ar%') (cost=328.76 rows=730) (actual time=0.028..2.909 rows=730 loops=1)
  26.             -> index lookup on t1 using idx2_t1 (status=8, first_name=t2.first_name) (cost=7.80 rows=24) (actual time=0.006..0.078 rows=24 loops=730)

  27. 1 row in set (0.29 sec)
###将索引列in list里值用with构造中间表,然后关联,这样先与驱动表做无条件join,然后驱动
in list表(实际上扩大了驱动表行数*inlist数目),
将in list转为等值条件,这样充分利用组合索引的多列做index access提高效率(主要是索引前导
列效率不好的时候使用)。


想一种方法怎么避免union all,很显然in list里三个数字(1,3,8)可以用with构造个临时表,
然后join,效率变成0.19s:


点击(此处)折叠或打开

  1. explain analyze
  2. with tmp as
  3. (select 1 status union all select 3 status union all select 8 status)
  4. select/*no_bnl(t1,t2) index(t1 idx2_t1) */ * from t1,t2,tmp
  5. where t1.first_name=t2.first_name and t1.status = tmp.status
  6. and t2.last_name like 'ar%'\g

  7. explain: -> nested loop inner join (cost=22790.57 rows=51698) (actual time=3.096..171.256 rows=51896 loops=1)
  8.     -> inner hash join (no condition) (cost=551.06 rows=2190) (actual time=3.049..3.937 rows=2190 loops=1)
  9.         -> table scan on tmp (cost=1.15..2.84 rows=3) (actual time=0.025..0.031 rows=3 loops=1)
  10.             -> materialize union cte tmp (cost=0.30..0.30 rows=3) (actual time=0.022..0.022 rows=3 loops=1)
  11.                 -> rows fetched before execution (cost=0.00..0.00 rows=1) (actual time=0.000..0.000 rows=1 loops=1)
  12.                 -> rows fetched before execution (cost=0.00..0.00 rows=1) (actual time=0.000..0.000 rows=1 loops=1)
  13.                 -> rows fetched before execution (cost=0.00..0.00 rows=1) (actual time=0.000..0.000 rows=1 loops=1)
  14.         -> hash
  15.             -> index range scan on t2 using idx_t2 over ('ar\0\0\0\0\0\0\0\0\0\0\0\0\0\0' <= last_name <= 'ar??????????????'), with index condition: (t2.last_name like 'ar%') (cost=328.76 rows=730) (actual time=0.046..2.756 rows=730 loops=1)
  16.     -> index lookup on t1 using idx2_t1 (status=tmp.`status`, first_name=t2.first_name), with index condition: (t1.`status` = tmp.`status`) (cost=7.80 rows=24) (actual time=0.008..0.075 rows=24 loops=2190)
  17. 1 row in set (0.19 sec)

同样的原来走idx1_t1的以first_name为前导列的也可以改成两个列都走索引,只不过因为
first_name选择性本来就够好,所以效率差别不大:基本都是0.2s左右:


点击(此处)折叠或打开

  1. explain analyze
  2. with tmp as
  3. (select 1 status union all select 3 status union all select 8 status)
  4. select/*no_bnl(t1,t2) index(t1 idx1_t1) */ * from t1,t2,tmp
  5. where t1.first_name=t2.first_name and t1.status = tmp.status
  6. and t2.last_name like 'ar%'\g

  7. *************************** 1. row ***************************
  8. explain: -> nested loop inner join (cost=22524.20 rows=51079) (actual time=2.879..171.640 rows=51896 loops=1)
  9.     -> inner hash join (no condition) (cost=551.06 rows=2190) (actual time=2.838..3.779 rows=2190 loops=1)
  10.         -> table scan on tmp (cost=1.15..2.84 rows=3) (actual time=0.049..0.055 rows=3 loops=1)
  11.             -> materialize union cte tmp (cost=0.30..0.30 rows=3) (actual time=0.046..0.046 rows=3 loops=1)
  12.                 -> rows fetched before execution (cost=0.00..0.00 rows=1) (actual time=0.000..0.000 rows=1 loops=1)
  13.                 -> rows fetched before execution (cost=0.00..0.00 rows=1) (actual time=0.000..0.000 rows=1 loops=1)
  14.                 -> rows fetched before execution (cost=0.00..0.00 rows=1) (actual time=0.000..0.000 rows=1 loops=1)
  15.         -> hash
  16.             -> index range scan on t2 using idx_t2 over ('ar\0\0\0\0\0\0\0\0\0\0\0\0\0\0' <= last_name <= 'ar??????????????'), with index condition: (t2.last_name like 'ar%') (cost=328.76 rows=730) (actual time=0.045..2.543 rows=730 loops=1)
  17.     -> index lookup on t1 using idx1_t1 (first_name=t2.first_name, status=tmp.`status`), with index condition: (t1.`status` = tmp.`status`) (cost=7.70 rows=23) (actual time=0.008..0.075 rows=24 loops=2190)

  18. 1 row in set (0.20 sec)

总结:
mysql里组合索引,如果有in多个值的,对于单表,可以转为or多个索引区间进行索引扫描,索引
扫描可以用上inlist列条件。

但是对于多表join的,如果被驱动表组合索引有in条件:
1)如果关联列是前导列且等值的,index access则是使用关联列,in非前导列多个值的,只能是icp
     ,这种关联列选择性好,一般问题不大
2) 如果关联列是非前导列,前导列是in条件,如果前导列in list选择性不好,一般走hash join,
    如果强制走索引,则走前导列in的,关联列则走icp,效率一般比关联列是前导列的差很多。
    针对这种,要么重新选择或建立以关联列(前提关联列选择性好)的索引,如果关联列选择
    性不好,组合索引选择性好,可以引入中间表,将in list里面的值用with构造为临时表,然后
    与原表关联,这样执行计划可以将构造的中间表先与驱动表关联(无条件join,实际上扩大
    了驱动表行数*inlist数目), 然后驱动in list表,这样in条件转为等值的(循环执行),则组
    合索引列都能用于index access。
         
所以对于多表join,子查询等,包含in条件的,in条件不要做前导列,而将关联列作为前导列(
前提选择性得好),这样可以利用关联列做nested loops扫描,in条件做icp,如果关联条件
选择性不好,可以使用with构造将in构造为临时表然后关联。
(效果没有oracle的inlist iterator算子好)。
      
另外注意一下,对于索引访问有icp的,examined_rows和handler_read_next并不能反应索引扫描
的行数,这两个变量如果是全表扫描或完全索引索引(无icp),而是实际扫描的行数,如果有
icp,则是索引访问返回的行数。

实际上examined_rows mysql设计的是server层扫描的行数,所以有icp的也是包含icp的结果行数
,非索引真正在stroage层扫描行数,其实大部分时候关心的是扫描索引的真实行数,mysql这么
设计,那么examined_rows的作用就小很多,因为大部分情况下,隐藏了真实索引扫描行数

        
阅读(119) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~
")); function link(t){ var href= $(t).attr('href'); href ="?url=" encodeuricomponent(location.href); $(t).attr('href',href); //setcookie("returnouturl", location.href, 60, "/"); }
网站地图