SQL Server uses high CPU when searching inside nvarchar strings

前端 未结 5 876
孤独总比滥情好
孤独总比滥情好 2020-12-08 15:05

Check out the following example. It shows that searching within a unicode string (nvarchar) is almost eight times as bad as searching within a varchar string. And on par wit

5条回答
  •  感动是毒
    2020-12-08 15:27

    It's because the sorting rules of unicode characters are more complicated than sorting rules for non-unicode characters.

    But, things are not as simple as varchar vs nvarchar

    You also have to consider SQL Collation vs Windows Collation as explained here.

    SQL Server performs string comparisons of non-Unicode data defined with a Windows collation by using Unicode sorting rules. Because these rules are much more complex than non-Unicode sorting rules, they are more resource-intensive. So, although Unicode sorting rules are frequently more expensive, there is generally little difference in performance between Unicode data and non-Unicode data defined with a Windows collation.

    As it's stated, for Windows Collation, SQL Server will use unicode sorting rules for varchar, hence you will have no performance gain.

    Here is an example:

    -- Server default collation is Latin1_General_CI_AS
    create table test
    (
        testid int identity primary key,
        v varchar(36) COLLATE Latin1_General_CI_AS, --windows collation
        v_sql varchar(36) COLLATE SQL_Latin1_General_CP1_CI_AS, --sql collation
        nv nvarchar(36),
        filler char(500)
    )
    go
    
    set nocount on
    set statistics time off
    insert test (v, nv)
    select CAST (newid() as varchar(36)),
        CAST (newid() as nvarchar(36))
    go 1000000
    
    set statistics time on
    
    -- search utf8 string
    select COUNT(1) from test where v_sql like '%abcd%' option (maxdop 1)
    -- CPU time = 625 ms,  elapsed time = 620 ms.
    
    -- search utf8 string
    select COUNT(1) from test where v like '%abcd%' option (maxdop 1)
    -- CPU time = 3141 ms,  elapsed time = 3389 ms.
    
    -- search utf8 string using unicode (uses convert_implicit)
    select COUNT(1) from test where v like N'%abcd%' option (maxdop 1)
    -- CPU time = 3203 ms,  elapsed time = 3209 ms.
    
    -- search unicode string
    select COUNT(1) from test where nv like N'%abcd%' option (maxdop 1)
    -- CPU time = 3156 ms,  elapsed time = 3151 ms.
    

    As you can see, there is no difference between varchar and nvarchar with windows collation.

    Note: It seems that SQL collations are only included for legacy purpose and should not be used for new projects (even if they seem to have better performance).

提交回复
热议问题