SQL Server Performance tips for like %

问题

I have a table with 5-10 million records which has 2 fields

example data

Row  Field1   Field2
------------------
1    0712334  072342344
2    06344534 083453454
3    06344534 0845645565

Given 2 variables

variable1 : 0634453445645
variable2 : 08345345456756

I need to be able to query the table for best matches as fast as possible

The above example would produce 1 record (e.g row 2)

What would be the fastest way to query the database for matches?

Note : the data and variables are always in this format (i.e always a number, may or may not have a leading zero, and fields are not unique however the combination of both will be )

My initial thought was to do something like this

Select blah where Field1 + "%" like variable1 and  Field2 + "%" like variable2

Please forgive my pseudo code if its not correct, as this is more a fact finding mission however i think i'm in the ball park

Note : I don't think any indexing can help here, though a memory based table i'm guessing would speed this up

Can anyone think of a better way of solving the problem? Any suggestions or comments would be greatly appreciated, thanks

回答1:

You can get a plan with a seek on an index on Field1 with query like this.

declare @V1 varchar(20) = '0634453445645'
declare @V2 varchar(20) = '08345345456756'

select Field1,
       Field2
from YourTable
where Field1 like left(@V1, 4) + '%' and
      @V1 like Field1 + '%' and
      @V2 like Field2 + '%'

It does a range seek on the first four characters on Field1 and uses the full comparison on Field1 and Field2 in a residual predicate.

回答2:

There is no performance tip. SImple like that.

%somethin% is table scan, Indices are not used due to the beginning %. Ful ltext indexing won't work as it is not a full text you seek but part of a word.

Getting a faster machine to handle the table scans and denormalizing is the only thing you can do. 5-10 million rows should be faste enough on a decent computer. Memory based table is not needed - just enough RAM to cache that table.

And that pretty much is it. Either find a way to get rid of the initial % or get hardware (mostly memory) fast enough to handle this.

OR - handle it OUTSIDE sql server. Load the 5-10 million rows into a search service and use a better data structure. SQL being generic has to make compromises. But again, the partial match will kill pretty much most approaches.

回答3:

Postgres has trigram indexes http://www.postgresql.org/docs/current/interactive/pgtrgm.html

Maybe SQL Server has something like that?

回答4:

What is the shortest length in Column 'Field1' and 'Field2'? Call this number 'N'.

Then create a select statement which asks for all substrings starting at the first character of length N to the length of each variable. Example (say, N=10)

select distinct * from myTable 
where Field1 in ('0634453445','06344534456','063445344564', '0634453445645')
  and Field2 in ('0834534545','08345345456','083453454567', '0834534545675','08345345456756')

Write a small script which creates the query for you. Of course there is much more to optimize but this requires (imho) changes in the structure of your table and I can imagine that this is something you don't want. At least you can give it a fast try.

Also, you should include the query plan when you try this approach in SSMS. The query plan will give you a nice hint in how to organize your index.

来源：https://stackoverflow.com/questions/22165269/sql-server-performance-tips-for-like

标签

sql-server

performance

sql-like