oracle具有强大的优化器,一般来说,组合索引在oracle里不管是单表还是关联查询,只要符合leftmost prefix规则,都可以用于index access,只要统计信息等是准确的。
建表语句如下:
-
drop table t1;
-
drop table t2;
-
create table t1 as select * from employees;
-
alter table t1 add status int default 0;
-
update t1 set status=ceil(rand()*10);
-
create table t2 as select * from t1;
t1,t2表结构:
-
t1,t2表结构:
-
------------------------------------------------------
-
| field | type | null | key | default | extra |
-
------------------------------------------------------
-
| emp_no | int | no | | null | |
-
| birth_date | date | no | | null | |
-
| first_name | varchar(14) | no | | null | |
-
| last_name | varchar(16) | no | | null | |
-
| gender | enum('m','f') | no | | null | |
-
| hire_date | date | no | | null | |
-
| status | int | yes | | 0 | |
-
------------------------------------------------------
创建索引,将选择性好的放前面:
create index idx1_t1 on t1(first_name,status);
###对于mysql,看组合索引用了几个列参与索引access,可以通过索引列key_len查看,
也可以通过树形执行计划sing idx1_t1 over 有几个列组合查看。
或者可以查看status变量handler确定索引扫描区间数:
handler_read_key和handler_read_next,
handler_read_next是按照索引顺序扫描下一行的次数,也可以代表回表次数(比如有icp可以减少这
个值),
所以handler_read_next也不能判断索引实际扫描的行数,而是索引扫描结果行数。
看下执行计划:有using index condition,说明第二个列status in (1,2,8)还是icp条件,
索引access用到几个列,可以通过key_len看:
first_name的key_len=14*3 2 (变长的 2,utf8mb3是3个字节)=44,
status是int型,对应的key_len=4 1(允许null 1)=5
可以看到执行计划的key_len=49,说明两个条件都用到。
explain select * from t1
where t1.first_name='saniya' and t1.status in (1,2,8);
-
-------------------------------------------------------------------------------------------------------------------------
-
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | extra |
-
-------------------------------------------------------------------------------------------------------------------------
-
| 1 | simple | t1 | null | range | idx1_t1 | idx1_t1 | 49 | null | 78 | 100.00 | using index condition |
-
-------------------------------------------------------------------------------------------------------------------------
从树形计划上看:using idx1_t1 over 转为两个or分支,也就是2个区间,说明两个条件都用到,
对应的还有icp,不过这个icp貌似作用不大:
-
explain format=tree
-
select * from t1
-
where t1.first_name='saniya' and t1.status in (1,2,8)\g
-
*************************** 1. row ***************************
-
explain: -> index range scan on t1 using idx1_t1 over
-
(first_name = 'saniya' and status = 1) or (first_name = 'saniya' and status = 2) or (first_name = 'saniya' and status = 8),
-
with index condition: ((t1.first_name = 'saniya') and (t1.`status` in (1,2,8))) (cost=42.35 rows=78)
可以查看实际执行信息:
-
flush status;
-
explain analyze
-
select * from t1
-
where t1.first_name='saniya' and t1.status in (1,2,8)\g
-
*************************** 1. row ***************************
-
explain: -> index range scan on t1 using idx1_t1 over (first_name = 'saniya' and status = 1)
-
or (first_name = 'saniya' and status = 2) or (first_name = 'saniya' and status = 8),
-
with index condition: ((t1.first_name = 'saniya') and (t1.`status` in (1,2,8)))
-
(cost=42.35 rows=78) (actual time=0.038..0.668 rows=78 loops=1)
通过show status查看handler信息,可以看到handler_read_key=3,对应三个索引区间。
说明first_name和status都参与索引扫描,执行计划里显示的icp基本用不着,这时候
handler_read_next=扫描的行数。
-
show status like '%handler%';
-
-----------------------------------
-
| variable_name | value |
-
-----------------------------------
-
| handler_commit | 1 |
-
| handler_delete | 0 |
-
| handler_discover | 0 |
-
| handler_external_lock | 2 |
-
| handler_mrr_init | 0 |
-
| handler_prepare | 0 |
-
| handler_read_first | 0 |
-
| handler_read_key | 3 |
-
| handler_read_last | 0 |
-
| handler_read_next | 78 |
-
| handler_read_prev | 0 |
-
| handler_read_rnd | 0 |
-
| handler_read_rnd_next | 0 |
-
| handler_rollback | 0 |
-
| handler_savepoint | 0 |
-
| handler_savepoint_rollback | 0 |
-
| handler_update | 0 |
-
| handler_write | 0 |
-
-----------------------------------
-
-
select ps_thread_id(connection_id());
-
-------------------------------
-
| ps_thread_id(connection_id()) |
-
-------------------------------
-
| 50 |
-
-------------------------------
-
-
通过检查examined_rows=78行,也符合索引扫描行数,因为使用explain analyze,rows_sent=1:
-
----------------------------------------------------
-
| thread_id | rows_affected | rows_sent | rows_examined |
-
----------------------------------------------------
-
| 50 | 0 | 1 | 78 |
-
----------------------------------------------------
下面将索引顺序换一下,貌似都和in非前导列的一样,都有icp,都是改成or条件(实际上这个icp没
有啥用filtered=100):
将选择性不好的列放前面:
create index idx2_t1 on t1(status,first_name);
通过key_len和树形计划的using idx2_t1 over,可以看到两个列条件都用到:
现在让两个列都是in,看是否能转为2*3=6个索引区间:
-
explain select/*index(t1 idx2_t1)*/ * from t1
-
where t1.first_name in('saniya','aamer') and t1.status in (1,2,8);
-
-------------------------------------------------------------------------------------------------------------------------
-
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | extra |
-
-------------------------------------------------------------------------------------------------------------------------
-
| 1 | simple | t1 | null | range | idx2_t1 | idx2_t1 | 49 | null | 144 | 100.00 | using index condition |
-
-------------------------------------------------------------------------------------------------------------------------
-
-
select count(*) from t1 where status in (1,2,8);
-
----------
-
| count(*) |
-
----------
-
| 89751 |
-
----------
-
-
select count(*) from t1
-
where t1.first_name in('saniya','aamer')
-
and t1.status in (1,2,8);
-
----------
-
| count(*) |
-
----------
-
| 144 |
-
----------
实际执行看:
-
flush status;
-
explain analyze
-
select/*index(t1 idx2_t1)*/ * from t1
-
where t1.first_name in('saniya','aamer') and t1.status in (1,2,8)\g
-
*************************** 1. row ***************************
-
explain: -> index range scan on t1 using idx2_t1 over
-
(status = 1 and first_name = 'aamer') or (status = 1 and first_name = 'saniya')
-
or (4 more), with index condition: ((t1.first_name in ('saniya','aamer')) and (t1.`status` in (1,2,8)))
-
(cost=78.34 rows=144) (actual time=0.055..2.067 rows=144 loops=1)
执行计划里转为的or超过3个后面变为or (n more)
可以看到转为6个扫描区间:handler_read_key=6,
icp的实际用不上,所以两个列都是index access,对应handler_read_next和rows_examined都是
扫描的行数144:
-
show status like '%handler%';
-
-----------------------------------
-
| variable_name | value |
-
-----------------------------------
-
| handler_commit | 1 |
-
| handler_delete | 0 |
-
| handler_discover | 0 |
-
| handler_external_lock | 2 |
-
| handler_mrr_init | 0 |
-
| handler_prepare | 0 |
-
| handler_read_first | 0 |
-
| handler_read_key | 6 |
-
| handler_read_last | 0 |
-
| handler_read_next | 144 |
-
| handler_read_prev | 0 |
-
| handler_read_rnd | 0 |
-
| handler_read_rnd_next | 0 |
-
| handler_rollback | 0 |
-
| handler_savepoint | 0 |
-
| handler_savepoint_rollback | 0 |
-
| handler_update | 0 |
-
| handler_write | 0 |
-
-----------------------------------
-
-
select thread_id,rows_affected,rows_sent,rows_examined from performance_schema.events_statements_current where thread_id=50;
-
----------------------------------------------------
-
| thread_id | rows_affected | rows_sent | rows_examined |
-
----------------------------------------------------
-
| 50 | 0 | 1 | 144 |
-
----------------------------------------------------
总结:mysql单表访问,组合索引in不管是否是前导列,都能按照leftmost prefix规则,转为or,然后
转为多个索引区间(interval),
能够充分利用组合索引条件进行index access,执行计划里虽然有icp,基本用不着。
mysql没有inlist iterator算子,可以将in list转为多个区间从而实现单表访问inlist iterator效果。
2.2)多表join,inlist表作为被驱动表
看能否实现和单表一样的效果
先给t2的last_name建个索引,让t2做驱动表,t1被驱动:
create index idx_t2 on t2(last_name);
mysql组合索引有in的,如果in里只有一个值,和oracle一样,也会转为等值,则组合索引都是
等值的,可以直接access
mysql和oracle对于组合索引中条件in有多个值的的处理貌似不一样:
###mysql没有index inlist iterator算子,如果组合索引前导列是in并且有多个值,则不能转为多个
等值的组合条件,
这样只能in的列作为access,后续列作为icp,如果关联查询且被驱动走nl,则in的列作为单独条件,
作为索引access效率则可能很低,
所以在mysql里,关联列作为索引前导列较好,in的作为非前导列(用于icp条件,可能有bug将in
条件放到回表filter,则可能非常影响效率)。
1)不管in是否多个值,列是否是前导列,则都有icp
2)如果in的是非前导列,则in作为icp条件,这个和oracle一样
3)如果in是前导列,则in条件列对应的后续列作为icp条件,这个和oracle不一样,oracle有inlist
iterator算子,
将in转为or,变为多个索引等值条件,然后避免icp,mysql没有inlist interator算子,in作为前导列
在关联查询里效率可能很低,将关联列作为前导列较好。
###在mysql里in条件避免不了icp,所以关联查询的被驱动表,将关联列作为索引前导列,这样
可以access条件,in条件列作为非前导列使用icp,
如果将in条件作为前导列,则关联列只能用于icp,可能效率很低。
驱动表730行,被驱动表循环730次:
-
select count(*) from t2 where last_name like 'ar%';
-
----------
-
| count(*) |
-
----------
-
| 730 |
-
----------
###1.先测试关联列是前导列,然后 in list组合索引,走nested loops,t1被驱动:
-
explain select * from t1,t2
-
where t1.first_name=t2.first_name and t1.status in (1,3,8)
-
and t2.last_name like 'ar%';
通过执行计划看,多表join和单表不一样,t1被驱动,走idx1_t1(first_name,status),对应的
key_len是44,不是49(两个列都用上是49),
可以看到filtered=59.26,也就是status in (1,3,8)起类似过滤效果是真正icp,过滤了接近40%的
数据,这个和oracle的多表join不一样,
在mysql里貌似多表join的索引有in list,不能让多列都走index access:
-
----------------------------------------------------------------------------------------------------------------------------------------------
-
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | extra |
-
----------------------------------------------------------------------------------------------------------------------------------------------
-
| 1 | simple | t2 | null | range | idx_t2 | idx_t2 | 50 | null | 730 | 100.00 | using index condition |
-
| 1 | simple | t1 | null | ref | idx2_t1,idx1_t1 | idx1_t1 | 44 | employees.t2.first_name | 234 | 59.26 | using index condition |
-
----------------------------------------------------------------------------------------------------------------------------------------------
-
2 rows in set, 1 warning (0.00 sec)
-
-
flush status;
-
explain analyze
-
select * from t1,t2
-
where t1.first_name=t2.first_name and t1.status in (1,3,8)
-
and t2.last_name like 'ar%'\g
耗时0.22s,索引访问只能用到first_name,这里的first_name选择性比较好,这样status条件走不了
索引其实差别不大
改成union all效果不好,所以将关联列作为前导列较好,这样可以用到前导列索引:
show status like '%handler%';
handler_read_key=驱动表索引区间1 被驱动表730次(每次一个区间)=731,
handler_read_next这时候就不是索引真正扫描的次数了,因为被驱动表t1只走first_name列索引
扫描,还有icp:status in (1,3,8),
这时候只是结果行数,相应的examined_rows也一样,这时候都是结果行数,结果行数包括驱动
表t2和t1的索引结果行数。
-
(root@localhost) [employees_15:21:17]> show status like '%handler%';
-
-----------------------------------
-
| variable_name | value |
-
-----------------------------------
-
| handler_commit | 1 |
-
| handler_delete | 0 |
-
| handler_discover | 0 |
-
| handler_external_lock | 4 |
-
| handler_mrr_init | 0 |
-
| handler_prepare | 0 |
-
| handler_read_first | 0 |
-
| handler_read_key | 731 |
-
| handler_read_last | 0 |
-
| handler_read_next | 52626 |
-
| handler_read_prev | 0 |
-
| handler_read_rnd | 0 |
-
| handler_read_rnd_next | 0 |
-
| handler_rollback | 0 |
-
| handler_savepoint | 0 |
-
| handler_savepoint_rollback | 0 |
-
| handler_update | 0 |
-
| handler_write | 0 |
-
-----------------------------------
-
-
select thread_id,rows_affected,rows_sent,rows_examined from performance_schema.events_statements_current where thread_id=50;
-
----------------------------------------------------
-
| thread_id | rows_affected | rows_sent | rows_examined |
-
----------------------------------------------------
-
| 50 | 0 | 1 | 52626 |
-
----------------------------------------------------
-
1 row in set (0.01 sec)
-
-
flush status;
-
explain analyze
-
select * from t1,t2
-
where t1.first_name=t2.first_name and t1.status in (1)
-
and t2.last_name like 'ar%'\g
如果in只有一个值,转为=,则访问条件是using idx1_t1 (first_name=t2.first_name, status=1),
没有问题:
-
*************************** 1. row ***************************
-
explain: -> nested loop inner join (cost=7653.14 rows=17026) (actual time=0.078..67.316 rows=17234 loops=1)
-
-> index range scan on t2 using idx_t2 over ('ar\0\0\0\0\0\0\0\0\0\0\0\0\0\0' <= last_name <= 'ar??????????????'),
-
with index condition: (t2.last_name like 'ar%') (cost=328.76 rows=730) (actual time=0.049..6.793 rows=730 loops=1)
-
-> index lookup on t1 using idx1_t1 (first_name=t2.first_name, status=1)
-
(cost=7.70 rows=23) (actual time=0.009..0.081 rows=24 loops=730)
-
-
普通执行计划没有icp,key_len=49,两个列都用上:
-
------------------------------------------------------------------------------------------------------------------------------------------------------------
-
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | extra |
-
------------------------------------------------------------------------------------------------------------------------------------------------------------
-
| 1 | simple | t2 | null | range | idx_t2 | idx_t2 | 50 | null | 730 | 100.00 | using index condition |
-
| 1 | simple | t1 | null | ref | idx2_t1,idx1_t1,idx3_t1 | idx1_t1 | 49 | employees.t2.first_name,const | 23 | 100.00 | null |
-
------------------------------------------------------------------------------------------------------------------------------------------------------------
-
-
扫描的行数没有问题,被驱动t1的真实rows=24,这个是平均值。 730*24=17520略大于真实行数,然后驱动表700行,实际扫描17964行。
-
show status like 'handle%';
-
-----------------------------------
-
| variable_name | value |
-
-----------------------------------
-
| handler_commit | 1 |
-
| handler_delete | 0 |
-
| handler_discover | 0 |
-
| handler_external_lock | 4 |
-
| handler_mrr_init | 0 |
-
| handler_prepare | 0 |
-
| handler_read_first | 0 |
-
| handler_read_key | 731 |
-
| handler_read_last | 0 |
-
| handler_read_next | 17964 |
-
| handler_read_prev | 0 |
-
| handler_read_rnd | 0 |
-
| handler_read_rnd_next | 0 |
-
| handler_rollback | 0 |
-
| handler_savepoint | 0 |
-
| handler_savepoint_rollback | 0 |
-
| handler_update | 0 |
-
| handler_write | 0 |
-
-----------------------------------
-
-
select thread_id,rows_affected,rows_sent,rows_examined from performance_schema.events_statements_current where thread_id=50;
-
----------------------------------------------------
-
| thread_id | rows_affected | rows_sent | rows_examined |
-
----------------------------------------------------
-
| 50 | 0 | 1 | 17964 |
-
----------------------------------------------------
###2:测试in list列是前导列,关联条件列非前导列,被驱动走索引
先删除first_name前导列索引:
drop index idx1_t1 on t1;
单个值同样没有问题:using idx2_t1 (status=1, first_name=t2.first_name).
-
flush status;
-
explain analyze
-
select * from t1,t2
-
where t1.first_name=t2.first_name and t1.status in (1)
-
and t2.last_name like 'ar%'\g
-
-
*************************** 1. row ***************************
-
explain: -> nested loop inner join (cost=7741.93 rows=17233) (actual time=0.097..77.225 rows=17234 loops=1)
-
-> index range scan on t2 using idx_t2 over ('ar\0\0\0\0\0\0\0\0\0\0\0\0\0\0' <= last_name <= 'ar??????????????'), with index condition: (t2.last_name like 'ar%') (cost=328.76 rows=730) (actual time=0.053..12.567 rows=730 loops=1)
-
-> index lookup on t1 using idx2_t1 (status=1, first_name=t2.first_name) (cost=7.80 rows=24) (actual time=0.011..0.086 rows=24 loops=730)
-
-
-
前导列是in有多个值,关联列是非前导列,走了hash join,0.22s:
-
explain analyze
-
select * from t1,t2
-
where t1.first_name=t2.first_name and t1.status in (1,3,8)
-
and t2.last_name like 'ar%'\g
-
-
*************************** 1. row ***************************
-
explain: -> inner hash join (t1.first_name = t2.first_name) (cost=1306095.83 rows=763905) (actual time=6.099..329.593 rows=51896 loops=1)
-
-> filter: (t1.`status` in (1,3,8)) (cost=25.42 rows=17657) (actual time=0.029..281.806 rows=89911 loops=1)
-
-> table scan on t1 (cost=25.42 rows=297939) (actual time=0.019..258.542 rows=300024 loops=1)
-
-> hash
-
-> index range scan on t2 using idx_t2 over ('ar\0\0\0\0\0\0\0\0\0\0\0\0\0\0' <= last_name <= 'ar??????????????'),
-
with index condition: (t2.last_name like 'ar%') (cost=328.76 rows=730) (actual time=0.035..5.303 rows=730 loops=1)
-
-
explain analyze树形计划看的cost=1306095,和json,实际估算的costs不一样:12906876,json的和普通计划的一样
-
-
show status like '%last_query_cost%';
-
----------------------------------
-
| variable_name | value |
-
----------------------------------
-
| last_query_cost | 12906876.294155 |
-
----------------------------------
-
explain: {
-
"query_block": {
-
"select_id": 1,
-
"cost_info": {
-
"query_cost": "12906876.29"
-
},
禁用hash join,竟然t1被驱动走全表,没有选择组合索引,说明走索引效率低cost大,
因为删除了idx1_t1(first_name,status)索引,这里的关联列没有索引,而是在status列上有前导列
索引,
mysql基本不选择t1走索引,可以通过possible_keys看出,很显然,这里的cost计算不准才选择
这种不走索引的计划:
"query_cost": "22313643.61" < 走idx2_t1索引的68339620.40
-
explain
-
select/*no_bnl(t1,t2) */ * from t1,t2
-
where t1.first_name=t2.first_name and t1.status in (1,3,8)
-
and t2.last_name like 'ar%';
-
-
--------------------------------------------------------------------------------------------------------------------------
-
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | extra |
-
--------------------------------------------------------------------------------------------------------------------------
-
| 1 | simple | t2 | null | range | idx_t2 | idx_t2 | 50 | null | 730 | 100.00 | using index condition |
-
| 1 | simple | t1 | null | all | idx2_t1 | null | null | null | 297939 | 5.93 | using where |
-
--------------------------------------------------------------------------------------------------------------------------
通过optimizer_trace查看:
先计算单表走索引按照status查询的cost:
-
"analyzing_range_alternatives": {
-
"range_scan_alternatives": [
-
{
-
"index": "idx2_t1",
-
"ranges": [
-
"status = 1",
-
"status = 3",
-
"status = 8"
-
],
-
"index_dives_for_eq_ranges": true,
-
"rowid_ordered": false,
-
"using_mrr": false,
-
"index_only": false,
-
"in_memory": 0.111617,
-
"rows": 176572,
-
"cost": 75958.3,
-
"chosen": false,
-
"cause": "cost"
-
}
mysql没有考虑t1走idx2_t1索引,直接按照全表扫描来,没有考虑走nested loops情况下,可以
走组合索引选择率,比前面计算的只走status要小很多的情况,
所以直接走全表扫描,也没有考虑hash join:
-
"cost_for_plan": 328.76,
-
"rest_of_plan": [
-
{
-
"plan_prefix": [
-
"`t2`"
-
],
-
"table": "`t1`",
-
"best_access_path": {
-
"considered_access_paths": [
-
{
-
"rows_to_scan": 297939,
-
"filtering_effect": [
-
],
-
"final_filtering_effect": 0.592645,
-
"access_type": "scan",
-
"resulting_rows": 176572,
-
"cost": 2.23133e07,
-
"chosen": true
-
}
-
]
-
},
所以t2作为被驱动表,如果关联列不是前导列,则必须要有前导列且等值条件(in一个值也转为
等值),才可以让t2走nested loops,走索引。
强制走索引,可以看到rows=176572,filtered=10,key_len=5,索引效率很低,只走了status列
索引,first_name是icp:
json显示的执行 "query_cost": "68339620.40",走索引的cost大,这个计算不准,所以默认走了
全表
-
explain
-
select/*no_bnl(t1,t2) index(t1 idx2_t1) */ * from t1,t2
-
where t1.first_name=t2.first_name and t1.status in (1,3,8)
-
and t2.last_name like 'ar%';
-
---------------------------------------------------------------------------------------------------------------------------
-
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | extra |
-
---------------------------------------------------------------------------------------------------------------------------
-
| 1 | simple | t2 | null | range | idx_t2 | idx_t2 | 50 | null | 730 | 100.00 | using index condition |
-
| 1 | simple | t1 | null | range | idx2_t1 | idx2_t1 | 5 | null | 176572 | 10.00 | using index condition |
-
---------------------------------------------------------------------------------------------------------------------------
explain analyze结果:执行耗时16s
-
explain analyze select/*no_bnl(t1,t2) index(t1 idx2_t1) */ * from t1,t2
-
where t1.first_name=t2.first_name and t1.status in (1,3,8)
-
and t2.last_name like 'ar%'\g
-
*************************** 1. row ***************************
-
explain: -> nested loop inner join (cost=16180534643.87 rows=128897560) (actual time=8.275..16590.253 rows=51896 loops=1)
-
-> index range scan on t2 using idx_t2 over ('ar\0\0\0\0\0\0\0\0\0\0\0\0\0\0' <= last_name <= 'ar??????????????'), with index condition: (t2.last_name like 'ar%') (cost=328.76 rows=730) (actual time=0.083..8.235 rows=730 loops=1)
-
-> index range scan on t1 using idx2_t1 over (status = 1) or (status = 3) or (status = 8), with index condition: ((t1.first_name = t2.first_name) and (t1.`status` in (1,3,8))) (cost=75958.29 rows=176572) (actual time=3.688..22.707 rows=71 loops=730)
-
-
-
1 row in set (16.62 sec)
可以看到,不走关联列索引,走status in条件索引,效率比前面走first_name列索引差很多,因为
status列索引选择率不好。
可以看到,前导列in list 非前导列关联的,也不能全部走index access,和单表不一样。
这种前导列in list且是非关联列效率很低的,既然in list走不了多列索引扫描,那么改成等值就
可以了,
改为union all提升很明显,这样可以利用组合索引进行扫描:执行时间从16s到0.29s:
-
explain analyze
-
select/*index(t1 idx2_t1)*/ * from t1,t2
-
where t1.first_name=t2.first_name and t1.status in (1)
-
and t2.last_name like 'ar%'
-
union all
-
select/*index(t1 idx2_t1)*/ * from t1,t2
-
where t1.first_name=t2.first_name and t1.status in (3)
-
and t2.last_name like 'ar%'
-
union all
-
select/*index(t1 idx2_t1)*/ * from t1,t2
-
where t1.first_name=t2.first_name and t1.status in (8)
-
and t2.last_name like 'ar%'\g
-
*************************** 1. row ***************************
-
explain: -> append (actual time=0.064..261.577 rows=51896 loops=1)
-
-> stream results (cost=7741.93 rows=17233) (actual time=0.063..89.491 rows=17234 loops=1)
-
-> nested loop inner join (cost=7741.93 rows=17233) (actual time=0.056..64.456 rows=17234 loops=1)
-
-> index range scan on t2 using idx_t2 over ('ar\0\0\0\0\0\0\0\0\0\0\0\0\0\0' <= last_name <= 'ar??????????????'), with index condition: (t2.last_name like 'ar%') (cost=328.76 rows=730) (actual time=0.035..3.081 rows=730 loops=1)
-
-> index lookup on t1 using idx2_t1 (status=1, first_name=t2.first_name) (cost=7.80 rows=24) (actual time=0.006..0.082 rows=24 loops=730)
-
-> stream results (cost=7741.93 rows=17233) (actual time=0.048..83.117 rows=17265 loops=1)
-
-> nested loop inner join (cost=7741.93 rows=17233) (actual time=0.044..59.954 rows=17265 loops=1)
-
-> index range scan on t2 using idx_t2 over ('ar\0\0\0\0\0\0\0\0\0\0\0\0\0\0' <= last_name <= 'ar??????????????'), with index condition: (t2.last_name like 'ar%') (cost=328.76 rows=730) (actual time=0.031..2.674 rows=730 loops=1)
-
-> index lookup on t1 using idx2_t1 (status=3, first_name=t2.first_name) (cost=7.80 rows=24) (actual time=0.006..0.077 rows=24 loops=730)
-
-> stream results (cost=7741.93 rows=17233) (actual time=0.043..85.437 rows=17397 loops=1)
-
-> nested loop inner join (cost=7741.93 rows=17233) (actual time=0.039..61.487 rows=17397 loops=1)
-
-> index range scan on t2 using idx_t2 over ('ar\0\0\0\0\0\0\0\0\0\0\0\0\0\0' <= last_name <= 'ar??????????????'), with index condition: (t2.last_name like 'ar%') (cost=328.76 rows=730) (actual time=0.028..2.909 rows=730 loops=1)
-
-> index lookup on t1 using idx2_t1 (status=8, first_name=t2.first_name) (cost=7.80 rows=24) (actual time=0.006..0.078 rows=24 loops=730)
-
-
1 row in set (0.29 sec)
###将索引列in list里值用with构造中间表,然后关联,这样先与驱动表做无条件join,然后驱动
in list表(实际上扩大了驱动表行数*inlist数目),
将in list转为等值条件,这样充分利用组合索引的多列做index access提高效率(主要是索引前导
列效率不好的时候使用)。
想一种方法怎么避免union all,很显然in list里三个数字(1,3,8)可以用with构造个临时表,
然后join,效率变成0.19s:
-
explain analyze
-
with tmp as
-
(select 1 status union all select 3 status union all select 8 status)
-
select/*no_bnl(t1,t2) index(t1 idx2_t1) */ * from t1,t2,tmp
-
where t1.first_name=t2.first_name and t1.status = tmp.status
-
and t2.last_name like 'ar%'\g
-
-
explain: -> nested loop inner join (cost=22790.57 rows=51698) (actual time=3.096..171.256 rows=51896 loops=1)
-
-> inner hash join (no condition) (cost=551.06 rows=2190) (actual time=3.049..3.937 rows=2190 loops=1)
-
-> table scan on tmp (cost=1.15..2.84 rows=3) (actual time=0.025..0.031 rows=3 loops=1)
-
-> materialize union cte tmp (cost=0.30..0.30 rows=3) (actual time=0.022..0.022 rows=3 loops=1)
-
-> rows fetched before execution (cost=0.00..0.00 rows=1) (actual time=0.000..0.000 rows=1 loops=1)
-
-> rows fetched before execution (cost=0.00..0.00 rows=1) (actual time=0.000..0.000 rows=1 loops=1)
-
-> rows fetched before execution (cost=0.00..0.00 rows=1) (actual time=0.000..0.000 rows=1 loops=1)
-
-> hash
-
-> index range scan on t2 using idx_t2 over ('ar\0\0\0\0\0\0\0\0\0\0\0\0\0\0' <= last_name <= 'ar??????????????'), with index condition: (t2.last_name like 'ar%') (cost=328.76 rows=730) (actual time=0.046..2.756 rows=730 loops=1)
-
-> index lookup on t1 using idx2_t1 (status=tmp.`status`, first_name=t2.first_name), with index condition: (t1.`status` = tmp.`status`) (cost=7.80 rows=24) (actual time=0.008..0.075 rows=24 loops=2190)
-
1 row in set (0.19 sec)
同样的原来走idx1_t1的以first_name为前导列的也可以改成两个列都走索引,只不过因为
first_name选择性本来就够好,所以效率差别不大:基本都是0.2s左右:
-
explain analyze
-
with tmp as
-
(select 1 status union all select 3 status union all select 8 status)
-
select/*no_bnl(t1,t2) index(t1 idx1_t1) */ * from t1,t2,tmp
-
where t1.first_name=t2.first_name and t1.status = tmp.status
-
and t2.last_name like 'ar%'\g
-
-
*************************** 1. row ***************************
-
explain: -> nested loop inner join (cost=22524.20 rows=51079) (actual time=2.879..171.640 rows=51896 loops=1)
-
-> inner hash join (no condition) (cost=551.06 rows=2190) (actual time=2.838..3.779 rows=2190 loops=1)
-
-> table scan on tmp (cost=1.15..2.84 rows=3) (actual time=0.049..0.055 rows=3 loops=1)
-
-> materialize union cte tmp (cost=0.30..0.30 rows=3) (actual time=0.046..0.046 rows=3 loops=1)
-
-> rows fetched before execution (cost=0.00..0.00 rows=1) (actual time=0.000..0.000 rows=1 loops=1)
-
-> rows fetched before execution (cost=0.00..0.00 rows=1) (actual time=0.000..0.000 rows=1 loops=1)
-
-> rows fetched before execution (cost=0.00..0.00 rows=1) (actual time=0.000..0.000 rows=1 loops=1)
-
-> hash
-
-> index range scan on t2 using idx_t2 over ('ar\0\0\0\0\0\0\0\0\0\0\0\0\0\0' <= last_name <= 'ar??????????????'), with index condition: (t2.last_name like 'ar%') (cost=328.76 rows=730) (actual time=0.045..2.543 rows=730 loops=1)
-
-> index lookup on t1 using idx1_t1 (first_name=t2.first_name, status=tmp.`status`), with index condition: (t1.`status` = tmp.`status`) (cost=7.70 rows=23) (actual time=0.008..0.075 rows=24 loops=2190)
-
-
1 row in set (0.20 sec)
总结:
mysql里组合索引,如果有in多个值的,对于单表,可以转为or多个索引区间进行索引扫描,索引
扫描可以用上inlist列条件。
但是对于多表join的,如果被驱动表组合索引有in条件:
1)如果关联列是前导列且等值的,index access则是使用关联列,in非前导列多个值的,只能是icp
,这种关联列选择性好,一般问题不大
2) 如果关联列是非前导列,前导列是in条件,如果前导列in list选择性不好,一般走hash join,
如果强制走索引,则走前导列in的,关联列则走icp,效率一般比关联列是前导列的差很多。
针对这种,要么重新选择或建立以关联列(前提关联列选择性好)的索引,如果关联列选择
性不好,组合索引选择性好,可以引入中间表,将in list里面的值用with构造为临时表,然后
与原表关联,这样执行计划可以将构造的中间表先与驱动表关联(无条件join,实际上扩大
了驱动表行数*inlist数目), 然后驱动in list表,这样in条件转为等值的(循环执行),则组
合索引列都能用于index access。
所以对于多表join,子查询等,包含in条件的,in条件不要做前导列,而将关联列作为前导列(
前提选择性得好),这样可以利用关联列做nested loops扫描,in条件做icp,如果关联条件
选择性不好,可以使用with构造将in构造为临时表然后关联。
(效果没有oracle的inlist iterator算子好)。
另外注意一下,对于索引访问有icp的,examined_rows和handler_read_next并不能反应索引扫描
的行数,这两个变量如果是全表扫描或完全索引索引(无icp),而是实际扫描的行数,如果有
icp,则是索引访问返回的行数。
实际上examined_rows mysql设计的是server层扫描的行数,所以有icp的也是包含icp的结果行数
,非索引真正在stroage层扫描行数,其实大部分时候关心的是扫描索引的真实行数,mysql这么
设计,那么examined_rows的作用就小很多,因为大部分情况下,隐藏了真实索引扫描行数。