COLLATE in UDF does not work as expected

a 夏天 提交于 2019-12-19 17:56:07

问题


I have a table with text field. I want to select rows where text is in all caps. This code works as it should, and returns ABC:

SELECT txt
FROM (SELECT 'ABC' AS txt UNION SELECT 'cdf') t
WHERE 
txt COLLATE SQL_Latin1_General_CP1_CS_AS = UPPER(txt)

then I create UDF (as suggested here):

CREATE FUNCTION [dbo].[fnsConvert]
(
      @p NVARCHAR(2000) ,
      @c NVARCHAR(2000)
)
RETURNS NVARCHAR(2000)
AS
    BEGIN
        IF ( @c = 'SQL_Latin1_General_CP1_CS_AS' )
            SET @p = @p COLLATE SQL_Latin1_General_CP1_CS_AS
        RETURN @p    
    END

and run it as follows (which looks like an equivalent code to me):

SELECT txt
FROM (SELECT 'ABC' AS txt UNION SELECT 'cdf') t
WHERE 
dbo.fnsConvert(txt, 'SQL_Latin1_General_CP1_CS_AS') = UPPER(txt)

however, this returns ABC as well as cdf.

Why is that so, and how do I get this to work?

PS I need UDF here to be able to call case-sensitive comparison from .Net LINQ2SQL provider.


回答1:


A variable cannot have it's own collation. It will always use the server's default. Check this:

--I declare three variables, each of which get's its own collation - at least one might think so:

DECLARE @deflt VARCHAR(100) = 'aBc'; --Latin1_General_CI_AS in my system
DECLARE @Arab VARCHAR(100) = 'aBc' COLLATE Arabic_100_CS_AS_WS_SC;
DECLARE @Rom VARCHAR(100) = 'aBc' COLLATE Romanian_CI_AI

--Now check this. All three variables are seen as the system's default collation:

SELECT [name], system_type_name, collation_name
FROM sys.dm_exec_describe_first_result_set(N'SELECT @deflt AS Deflt, @Arab AS Arab, @Rom AS Rom'
                                          ,N'@deflt varchar(100), @Arab varchar(100),@Rom varchar(100)'
                                          ,0);

/*
name    system_type_name    collation_name
Deflt   varchar(100)        Latin1_General_CI_AS
Arab    varchar(100)        Latin1_General_CI_AS
Rom     varchar(100)        Latin1_General_CI_AS
*/

--Now we check a simple comparison of "aBc" against "ABC"

SELECT CASE WHEN @deflt = 'ABC' THEN 'CI' ELSE 'CS' END AS CheckDefault
      ,CASE WHEN @Arab = 'ABC' THEN 'CI' ELSE 'CS' END AS CheckArab
      ,CASE WHEN @Rom = 'ABC' THEN 'CI' ELSE 'CS' END AS CheckRom

/*CI    CI  CI*/

--But we can specify the collation for one given action!

SELECT CASE WHEN @deflt = 'ABC' THEN 'CI' ELSE 'CS' END AS CheckDefault
      ,CASE WHEN @Arab = 'ABC' COLLATE Arabic_100_CS_AS_WS_SC THEN 'CI' ELSE 'CS' END AS CheckArab
      ,CASE WHEN @Rom = 'ABC' COLLATE Romanian_CI_AI THEN 'CI' ELSE 'CS' END AS CheckRom

/*CI    CS  CI*/

--But a table's column will behave differently:

CREATE TABLE #tempTable(deflt VARCHAR(100)
                       ,Arab VARCHAR(100) COLLATE Arabic_100_CS_AS_WS_SC
                       ,Rom VARCHAR(100) COLLATE Romanian_CI_AI);

INSERT INTO #tempTable(deflt,Arab,Rom) VALUES('aBc','aBc','aBc');

SELECT [name], system_type_name, collation_name
FROM sys.dm_exec_describe_first_result_set(N'SELECT * FROM #tempTable',NULL,0);
DROP TABLE #tempTable;

/*
name    system_type_name    collation_name
deflt   varchar(100)        Latin1_General_CI_AS
Arab    varchar(100)        Arabic_100_CS_AS_WS_SC
Rom     varchar(100)        Romanian_CI_AI
*/

--This applys for declared table variables also. The comparison "knows" the specified collation:

DECLARE @TableVariable TABLE(deflt VARCHAR(100)
                            ,Arab VARCHAR(100) COLLATE Arabic_100_CS_AS_WS_SC
                            ,Rom VARCHAR(100) COLLATE Romanian_CI_AI);

INSERT INTO @TableVariable(deflt,Arab,Rom) VALUES('aBc','aBc','aBc');

SELECT CASE WHEN tv.deflt = 'ABC' THEN 'CI' ELSE 'CS' END AS CheckDefault
      ,CASE WHEN tv.Arab = 'ABC' THEN 'CI' ELSE 'CS' END AS CheckArab
      ,CASE WHEN tv.Rom = 'ABC' THEN 'CI' ELSE 'CS' END AS CheckRom
FROM @TableVariable AS tv

/*CI    CS  CI*/

UPDATE Some documentation

At this link You can read about the details. A collation does not change the value. It applys a rule (related to NOT NULL which does not change the values, but just adds the rule whether NULL can be set or not).

The documentation tells clearly

Is a clause that can be applied to a database definition or a column definition to define the collation, or to a character string expression to apply a collation cast.

And a bit later you'll find

  1. Creating or altering a database
  2. Creating or altering a table column
  3. Casting the collation of an expression

UPDATE 2: A suggestion for a solution

If you want to have control whether a comparison is done CS or CI you might try this:

DECLARE @tbl TABLE(SomeValueInDefaultCollation VARCHAR(100));
INSERT INTO  @tbl VALUES ('ABC'),('aBc');

DECLARE @CompareCaseSensitive BIT = 0;
DECLARE @SearchFor VARCHAR(100) = 'aBc';

SELECT *
FROM @tbl 
WHERE (@CompareCaseSensitive=1 AND SomeValueInDefaultCollation=@SearchFor COLLATE Latin1_General_CS_AS)
   OR (ISNULL(@CompareCaseSensitive,0)=0 AND SomeValueInDefaultCollation=@SearchFor COLLATE Latin1_General_CI_AS);

With @CompareCaseSensitive set to 1 it will return just the aBc, with NULL or 0 it will return both lines.

This is - for sure! - much better in performance than an UDF.




回答2:


Please try using BINARY_CHECKSUM Function, and no need to UDF Function:

SELECT txt
FROM (SELECT 'ABC' AS txt UNION SELECT 'cdf') t
WHERE 
BINARY_CHECKSUM(txt)= BINARY_CHECKSUM(UPPER(txt))



回答3:


I think you are confused on how collation works. If you want to force a case sensitive collation you would do it in your where predicate, not with a function like that. And scalar functions are horrible for performance.

Here is how you would be able to use collation for this type of thing.

SELECT txt
FROM (SELECT 'ABC' AS txt UNION SELECT 'cdf') t
WHERE txt collate SQL_Latin1_General_CP1_CS_AS = UPPER(txt)



回答4:


Here's what I did: I changed the function to perform a comparison, instead of setting the collation, and then return a 1 or 0.

CREATE FUNCTION [dbo].[fnsConvert]
(
      @p NVARCHAR(2000) ,
      @c NVARCHAR(2000)
)
RETURNS BIT
AS

    BEGIN
        DECLARE @result BIT

        IF ( @c = 'SQL_Latin1_General_CP1_CS_AS' )
        BEGIN
            IF @p COLLATE SQL_Latin1_General_CP1_CS_AS = UPPER(@p) 
                SET @result = 1
            ELSE
                SET @result = 0    
        END
        ELSE
            SET @result = 0 

        RETURN @result
    END

Then the query that uses the function changes just a bit.

SELECT txt
FROM (SELECT 'ABC' AS txt UNION SELECT 'cdf') t
WHERE 
dbo.fnsConvert(txt, 'SQL_Latin1_General_CP1_CS_AS') = 1



回答5:


As @Shnugo stated, the collation is not an attribute of a variable, but it can be attribute of a column definition.

For collation-enabled comparison outside of TSQL, you can define a (persisted) computed column with an explicit collation:

create table Q47890189 (
    txt nvarchar(100),
    colltxt as txt collate SQL_Latin1_General_CP1_CS_AS persisted
)

insert into Q47890189 (txt) values ('ABC')
insert into Q47890189 (txt) values ('cdf')

select * from Q47890189 where txt = UPPER(txt)
select * from Q47890189 where colltxt = UPPER(colltxt)

Note that a persisted column can also be indexed, and has a better performance than calling a scalar function.




回答6:


COLLATE :Is a clause that can be applied to a database definition or a column definition to define the collation, or to a character string expression to apply a collation cast.

COLLATE do not convert any column or variable..It define the characteristics of collate.

CREATE TABLE [dbo].[OINV]
    [CardCode] [nvarchar](50) NULL
)

if i have a table with 5175460 rows then converting this to another data type will take time because its of its value is converted to new data type.

alter table OINV
alter column CardCode varchar(50)
--1 min 45 sec

alter table OINV
alter column CardCode nvarchar(50) COLLATE SQL_Latin1_General_CP1_CS_AS

If i don't convert the data type and only want to change collate then it take 1 ms to do so.That means it do not convert 5175460 rows to said collate. It just define the collate on that column.

when this column is use in where condition then column will exhibit characteristics of said collate.

UDF/TVF is not perform-ant way to do so.Best way is to alter table

Another example,

declare @i varchar(60)='ABC'

SELECT txt
FROM (SELECT 'abc' AS txt UNION SELECT 'cdf') t
WHERE 
txt  = @i COLLATE SQL_Latin1_General_CP1_CS_AS

I can't declare it like this,

declare @i varchar(60) COLLATE SQL_Latin1_General_CP1_CS_AS='ABC'

So variable will exhibit collate characteristics only as long as it is use along collate .

In your case you are return only plain variable,

UDF way of doing so,

CREATE FUNCTION testfn (
    @test VARCHAR(100)
    ,@i INT
    )
RETURNS TABLE
AS
RETURN (
        -- insert into @t values(@test)
        SELECT @test COLLATE SQL_Latin1_General_CP1_CS_AS AS a
        )


SELECT *
FROM (
    SELECT 'ABC' AS txt

    UNION

    SELECT 'cdf'
    ) t
OUTER APPLY dbo.testfn(txt, 0) fn
WHERE fn.a = UPPER(txt)

To define multiple collate you have to define multiple table with different collate. TVF can return only static table schema,so there can be only one collate define.

Therefore TVF is not right way to perform your task.




回答7:


I agree with @Shnugo when you create local variable it will take default collation

But, you could explicitly collate your variable values returned by function with your user defined collation as follow :

select * from 
(SELECT 'ABC' AS txt UNION SELECT 'cdf') a
where (dbo.fnsConvert(txt, 'SQL_Latin1_General_CP1_CS_AS') 
collate SQL_Latin1_General_CP1_CS_AS)  = UPPER(txt)

In addition collate clause can only applied to database definition, column defination or string/character expression, in other words it is used for database objects i.e. tables, columns, indexes

collation_name can't be represented by variable or expression.




回答8:


MSDN clearly defines COLLATE:

Is a clause that can be applied to a database definition or a column definition to define the collation, or to a character string expression to apply a collation cast.

Can you see a word about variable here?

If you need UDF, just use table-valued function:

CREATE FUNCTION dbo.test 
(   
    @text nvarchar(max)
)
RETURNS TABLE 
AS
RETURN 
(
    SELECT c COLLATE SQL_Latin1_General_CP1_CS_AS as txt
    FROM (VALUES (@text)) as t(c)
)
GO

And use it like:

;WITH cte AS (
    SELECT N'ABC' as txt
    UNION 
    SELECT N'cdf'
)

SELECT c.txt
FROM cte c
OUTER APPLY dbo.test (c.txt) t
WHERE t.txt = UPPER(c.txt)

Output:

txt
------
ABC


来源:https://stackoverflow.com/questions/47890189/collate-in-udf-does-not-work-as-expected

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!