Find non-ASCII characters in varchar columns using SQL Server

前端 未结 8 1561
遥遥无期
遥遥无期 2020-12-02 14:38

How can rows with non-ASCII characters be returned using SQL Server?
If you can show how to do it for one column would be great.

I am doing something like this

8条回答
  •  北海茫月
    2020-12-02 14:59

    running the various solutions on some real world data - 12M rows varchar length ~30, around 9k dodgy rows, no full text index in play, the patIndex solution is the fastest, and it also selects the most rows.

    (pre-ran km. to set the cache to a known state, ran the 3 processes, and finally ran km again - the last 2 runs of km gave times within 2 seconds)

    patindex solution by Gerhard Weiss -- Runtime 0:38, returns 9144 rows

    select dodgyColumn from myTable fcc
    WHERE  patindex('%[^ !-~]%' COLLATE Latin1_General_BIN,dodgyColumn ) >0
    

    the substring-numbers solution by MT. -- Runtime 1:16, returned 8996 rows

    select dodgyColumn from myTable fcc
    INNER JOIN dbo.Numbers32k dn ON dn.number<(len(fcc.dodgyColumn ))
    WHERE ASCII(SUBSTRING(fcc.dodgyColumn , dn.Number, 1))<32 
        OR ASCII(SUBSTRING(fcc.dodgyColumn , dn.Number, 1))>127
    

    udf solution by Deon Robertson -- Runtime 3:47, returns 7316 rows

    select dodgyColumn 
    from myTable 
    where dbo.udf_test_ContainsNonASCIIChars(dodgyColumn , 1) = 1
    

提交回复
热议问题