SQL Server why is index not used with OR

淺唱寂寞╮ 提交于 2021-02-08 15:11:54

问题


I've been studying indexes and trying to understand how they work and how I can use them to boost performance, but I'm missing something.

I have the following table:

Person:

| Id | Name | Email | Phone |
| 1  | John |  E1   |  P1   |
| 2  | Max  |  E2   |  P2   |

I'm trying to find the best way to index the columns Email and Phone considering that the queries will (most of the time) be of the form

[1] SELECT * FROM Person WHERE Email = '...' OR Phone = '...'
[2] SELECT * FROM Person WHERE Email = ...
[3] SELECT * FROM Person WHERE Phone = ...

I thought the best approach would be to create a single index using both columns:

CREATE NONCLUSTERED INDEX [IX_EmailPhone]
ON [dbo].[Person]([Email], [PhoneNumber]);

However, with the index above, only the query [2] benefits from an index seek, the others use index scan.

I also tried to create multiple index: one with both columns, one for email, and one for email. In this case, [2] and [3] use seek, but [1] continues to use scan.

Why can't the database use index with an or? What would be the best indexing approach for this table considering the queries?


回答1:


Create a separate index for each column.
By using hints we can force the optimizer to use/not use the indexes, so you can check the execution plan, get a feeling of the performance involved and understand the meaning of each path.

Go through my demo and consider the work involved in each path for the following scenarios -

  1. Only few rows satisfy the condition j=123.
    Only few rows satisfy the condition k=456.

  2. Most of the rows satisfy the condition j=123.
    Most of the rows satisfy the condition k=456.

  3. Only few rows satisfy the condition j=123.
    Most of the rows satisfy the condition k=456.

Try to think what path you would have chosen for each scenario.
Please feel free to ask questions.


Demo

;with t(n) as (select 0 union all select n+1 from t where n < 999)

select      1+t0.n+1000*t1.n                                as i
           ,floor(rand(cast (newid() as varbinary))*1000)   as j
           ,floor(rand(cast (newid() as varbinary))*1000)   as k 

into        t

from        t t0,t t1 

option       (maxrecursion 0)
;

create index t_j on t (j);
create index t_k on t (k);

update statistics t (t_j)
update statistics t (t_k)

Scan

select      *
from        t (forcescan)
where       j = 123
        or  k = 456
  • This is straightforward.

Seek

select      *
from        t (forceseek)
where       j = 123
        or  k = 456
  • "Index Seek": Each index is being seeked for the relevant values (123 and 456)
  • "Merge Join": The results (row IDs) are being concatenated (as in UNION ALL)
  • "Stream Aggregate": Duplicate row IDs are being eliminated
  • "Rid Lookup" + "Nested Loops": The row IDs are being used to retrieve the rows from the table (t)




回答2:


Use two separate indexes, one on (email) and one on (phone, email).

The OR is rather difficult. If your conditions were connected by AND rather than OR, then your index would be used for the first query (but not the third, because phone is not the first key in the index).

You can write the query as:

SELECT *
FROM Person 
WHERE Email = '...' 
UNION ALL
SELECT *
FROM Person 
WHERE Email <> '...' AND Phone = '...';

SQL Server should use the appropriate index for each subquery.



来源:https://stackoverflow.com/questions/40874599/sql-server-why-is-index-not-used-with-or

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!