问题
I have two query for same output
Slow Query:
SELECT
*
FROM
account_range
WHERE
is_active = 1 AND '8033576667466317' BETWEEN range_start AND range_end;
Execution Time: ~800 ms.
Explain:
+----+-------------+---------------+------------+------+-------------------------------------------+------+---------+------+--------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------------+------------+------+-------------------------------------------+------+---------+------+--------+----------+-------------+
| 1 | SIMPLE | account_range | NULL | ALL | range_start,range_end,range_se_active_idx | NULL | NULL | NULL | 940712 | 2.24 | Using where |
+----+-------------+---------------+------------+------+-------------------------------------------+------+---------+------+--------+----------+-------------+
Very Fast Query: learnt from here
SELECT
*
FROM
account_range
WHERE
is_active = 1 AND
range_start = (SELECT
MAX(range_start)
FROM
account_range
WHERE
range_start <= '8033576667466317') AND
range_end = (SELECT
MIN(range_end)
FROM
account_range
WHERE
range_end >= '8033576667466317')
Execution Time: ~1ms
Explain:
+----+-------------+---------------+------------+------+-------------------------------------------+---------------------+---------+-------------------+------+----------+------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------------+------------+------+-------------------------------------------+---------------------+---------+-------------------+------+----------+------------------------------+
| 1 | PRIMARY | account_range | NULL | ref | range_start,range_end,range_se_active_idx | range_se_active_idx | 125 | const,const,const | 1 | 100.00 | NULL |
| 3 | SUBQUERY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Select tables optimized away |
| 2 | SUBQUERY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Select tables optimized away |
+----+-------------+---------------+------------+------+-------------------------------------------+---------------------+---------+-------------------+------+----------+------------------------------+
Table Structure:
CREATE TABLE account_range (
id int(11) unsigned NOT NULL AUTO_INCREMENT,
range_start varchar(20) NOT NULL,
range_end varchar(20) NOT NULL,
is_active tinyint(1) NOT NULL,
bank_name varchar(100) DEFAULT NULL,
addedon timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
updatedon timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
description text,
PRIMARY KEY (id),
KEY range_start (range_start),
KEY range_end (range_end),
KEY range_se_active_idx (range_start , range_end , is_active)
) ENGINE=InnoDB AUTO_INCREMENT=946132 DEFAULT CHARSET=utf8;
Please do explain Why doesn't MySql automatically optimizes BETWEEN query?
Update:
Realised my mistake from @kordirko answer. My table contains only non-overlapping
ranges, so both queries are returning same results.
回答1:
Such a comparision doesn't make sense, since you are comparing apples to oranges.
These two queries are not eqivalent, they give different resuts,
thus MySql optimises them in a different way and their plans can differ.
See this simple example: http://sqlfiddle.com/#!9/98678/2
create table account_range(
is_active int,
range_start int,
range_end int
);
insert into account_range values
(1,-20,100), (1,10,30);
First query gives 2 rows:
select * from account_range
where is_active = 1 and 25 between range_start AND range_end;
| is_active | range_start | range_end |
|-----------|-------------|-----------|
| 1 | -20 | 100 |
| 1 | 10 | 30 |
Second query gives only 1 row:
SELECT * FROM account_range
WHERE
is_active = 1 AND
range_start = (SELECT MAX(range_start)
FROM account_range
WHERE range_start <= 25
) AND
range_end = (SELECT MIN(range_end)
FROM account_range
WHERE range_end >= 25
)
| is_active | range_start | range_end |
|-----------|-------------|-----------|
| 1 | 10 | 30 |
To speed this query up (the first one), two bitmap indexes can be used together with "bitmap and" operation - but MySql doesn't have such a feature.
Another option is a spatial index (for example GIN indexes in PostgreSql: http://www.postgresql.org/docs/current/static/textsearch-indexes.html).
And another option is a star transformation (or a star schema) - you need to "divide" this table into two "dimension" or "measures" tables and one "fact" table .... but this is too broad topic, if you want know more you can start from here: https://en.wikipedia.org/wiki/Star_schema
回答2:
Second query is fast because MySQL is able to use available index.
SELECT * FROM account_range
WHERE
is_active = 1 AND
range_start = a_constant_value_1 AND
range_end = a_constant_value_2
Above query is fast because range_se_active_idx
index can satisfy search criteria so it is used.
Both subqueries are also fast (see Select tables optimized away
in EXPLAIN's output)
SELECT MAX(range_start) FROM account_range
WHERE range_start <= '8033576667466317'
SELECT MIN(range_end) FROM account_range
WHERE range_end >= '8033576667466317'
because range_start
and range_end
both are indexed, they are ordered.
With ordered data, for first sub query, MySQL basically just picks one record whose range_start
equals 8033576667466317
or one below it (MAX(range_start)
). For second sub query, MySQL picks one record whose range_end
equals 8033576667466317
or one above it (MIN(range_end)
).
For BETWEEN ... AND ..
query, MySQL cannot find any indices because that is not a range search. It's basically same as
SELECT * FROM account_range
WHERE
is_active = 1 AND
range_start >= '8033576667466317' AND
range_end <= '8033576667466317';
It has to search records with range_start
from 8033576667466317
to largest value and also from smallest range_end
to 8033576667466317
. All indices cannot satisfy this criteria so it has to scan table.
I believe it can be optimized if you can rewrite it into something like this:
SELECT * FROM account_range
WHERE
is_active = 1 AND
(range_start BETWEEN a_min_value AND a_max_value) AND
(range_end BETWEEN a_min_value AND a_max_value);
来源:https://stackoverflow.com/questions/36029086/why-doesnt-mysql-automatically-optimises-between-query