问题
We recently migrated from SQL Server 2012 to SQL Server 2014 and all our FOR XML
code started throwing errors about non-printable ASCII characters.
I wrote this horrible function to remove non-printable ASCII characters as a quick fix. I want to replace it with something cleaner. Is there a way to do this?
ALTER FUNCTION [dbo].[remove_non_printable_chars]
(@input_string nvarchar(max))
RETURNS nvarchar(max)
BEGIN
RETURN
REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
REPLACE(REPLACE(@input_string,
CHAR(1), ''),CHAR(2), ''),CHAR(3), ''),CHAR(4), ''),CHAR(5), ''),CHAR(6), ''),CHAR(7), ''),CHAR(8), ''),CHAR(9), ''),CHAR(10), ''),
CHAR(11), ''),CHAR(12), ''),CHAR(13), ''),CHAR(14), ''),CHAR(15), ''),CHAR(16), ''),CHAR(17), ''),CHAR(18), ''),CHAR(19), ''),CHAR(20), ''),
CHAR(21), ''),CHAR(22), ''),CHAR(23), ''),CHAR(24), ''),CHAR(25), ''),CHAR(26), ''),CHAR(27), ''),CHAR(28), ''),CHAR(29), ''),CHAR(30), ''),
CHAR(31), ''), NCHAR(0) COLLATE Latin1_General_100_BIN2, '')
END
Here's the FOR XML
code that broke. (I did not write this. It was already in the code base).
SELECT @HTMLTableData =
(
SELECT HTMLRows
FROM (
SELECT N'<tr>' + HTMLRow + N'</tr>' AS HTMLRows
FROM @HTMLRowData
) mi
FOR XML PATH(''), TYPE
).value('/', 'NVARCHAR(MAX)')
回答1:
Another Option.
This function will replace control characters and correct any residual repeating spaces. For example Jane Smith{13}was here
will not be returned as Jane Smithwas here
, but rather Jane Smith was here
CREATE FUNCTION [dbo].[udf-Str-Strip-Control](@S varchar(max))
Returns varchar(max)
Begin
;with cte1(N) As (Select 1 From (Values(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) N(N)),
cte2(C) As (Select Top (32) Char(Row_Number() over (Order By (Select NULL))-1) From cte1 a,cte1 b)
Select @S = Replace(@S,C,' ')
From cte2
Return ltrim(rtrim(replace(replace(replace(@S,' ','†‡'),'‡†',''),'†‡',' ')))
End
--Select [dbo].[udf-Str-Strip-Control]('Michael '+char(13)+char(10)+'LastName') --Returns: Michael LastName
回答2:
In-line version:
create function [dbo].[remove_non_printable_chars] (@input_string nvarchar(max))
returns table with schemabinding as return (
select
replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(
replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(
replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(
replace(replace(@input_string collate latin1_general_100_bin2,
char(1), ''),char(2), ''),char(3), ''),char(4), ''),char(5), ''),char(6), ''),char(7), ''),char(8), ''),char(9), ''),char(10), ''),
char(11), ''),char(12), ''),char(13), ''),char(14), ''),char(15), ''),char(16), ''),char(17), ''),char(18), ''),char(19), ''),char(20), ''),
char(21), ''),char(22), ''),char(23), ''),char(24), ''),char(25), ''),char(26), ''),char(27), ''),char(28), ''),char(29), ''),char(30), ''),
char(31), ''), char(0) , '')
as clean_string
);
go
And use it like so:
select c.clean_string
from dbo.remove_non_printable_chars(@dirtystring) c
or
select ...
, c.clean_string
from t
cross apply dbo.remove_non_printable_chars(t.dirty_string) c
Reference:
- When is a SQL function not a function? "If it’s not inline, it’s rubbish." - Rob Farley
- Inline Scalar Functions - Itzik Ben-Gan
- Scalar functions, inlining, and performance: An entertaining title for a boring post - Adam Machanic
- TSQL User-Defined Functions: Ten Questions You Were Too Shy To Ask - Robert Sheldon
回答3:
Just extending the previous answers a little bit
Below the white-list characters, all the others chars will be cleared
[ !`"#$%&'()\*+,\-\./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\[\]^``\\_abcdefghijklmnopqrstuvwxyz{|}~µº°¡¢£¤¥¦§¨©ª«¬®¯±²³´¶·¸¹»¼½¾¿×÷ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ]
I know it's ugly but it works.
CREATE FUNCTION [dbo].REPLACE_UNPRINT_CHARS(@VarString nvarchar(256))
RETURNS nvarchar(256)
AS
BEGIN
RETURN REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(@VarString, CHAR(0), ''), CHAR(1), ''), CHAR(2), ''), CHAR(3), ''), CHAR(4), ''), CHAR(5), ''), CHAR(6), ''), CHAR(7), ''), CHAR(8), ''), CHAR(9), ''), CHAR(10), ''), CHAR(11), ''), CHAR(12), ''), CHAR(13), ''), CHAR(14), ''), CHAR(15), ''), CHAR(16), ''), CHAR(17), ''), CHAR(18), ''), CHAR(19), ''), CHAR(20), ''), CHAR(21), ''), CHAR(22), ''), CHAR(23), ''), CHAR(24), ''), CHAR(25), ''), CHAR(26), ''), CHAR(27), ''), CHAR(28), ''), CHAR(29), ''), CHAR(30), ''), CHAR(31), ''), CHAR(127), ''), CHAR(128), ''), CHAR(129), ''), CHAR(130), ''), CHAR(131), ''), CHAR(132), ''), CHAR(133), ''), CHAR(134), ''), CHAR(135), ''), CHAR(136), ''), CHAR(137), ''), CHAR(138), ''), CHAR(139), ''), CHAR(140), ''), CHAR(141), ''), CHAR(142), ''), CHAR(143), ''), CHAR(144), ''), CHAR(145), ''), CHAR(146), ''), CHAR(147), ''), CHAR(148), ''), CHAR(149), ''), CHAR(150), ''), CHAR(151), ''), CHAR(152), ''), CHAR(153), ''), CHAR(154), ''), CHAR(155), ''), CHAR(156), ''), CHAR(157), ''), CHAR(158), ''), CHAR(159), ''), CHAR(160), '');
END;
Use for data clean up
UPDATE [dnName].[dbo].[tableName]
SET FieldDirtyData= dbo.REPLACE_UNPRINT_CHARS(FieldDirtyData)
WHERE PATINDEX('%['+CHAR(1)+CHAR(2)+CHAR(3)+CHAR(4)+CHAR(5)+CHAR(6)+CHAR(7)+CHAR(8)+CHAR(9)+CHAR(10)+CHAR(11)+CHAR(12)+
CHAR(13)+CHAR(14)+CHAR(15)+CHAR(16)+CHAR(17)+CHAR(18)+CHAR(19)+CHAR(20)+
CHAR(21)+CHAR(22)+CHAR(23)+CHAR(24)+CHAR(25)+CHAR(26)+CHAR(27)+CHAR(28)+CHAR(29)+CHAR(30)+CHAR(31)+CHAR(127)+
CHAR(128)+CHAR(129)+CHAR(130)+CHAR(131)+CHAR(132)+CHAR(133)+CHAR(134)+CHAR(135)+CHAR(136)+CHAR(137)+CHAR(138)+
CHAR(139)+CHAR(140)+CHAR(141)+CHAR(142)+CHAR(143)+CHAR(144)+CHAR(145)+CHAR(146)+CHAR(147)+CHAR(148)+CHAR(149)+CHAR(150)+
CHAR(151)+CHAR(152)+CHAR(153)+CHAR(154)+CHAR(155)+CHAR(156)+CHAR(157)+CHAR(158)+CHAR(159)+CHAR(160)+']%', FieldDirtyData) <> 0
Adjust your datatype (nvarchar or varchar + max) as required
If you want to add more chars to clear use "select ASCII('char to remove here')" MSSQL command in order to get the ASCII code of the char and put it inside the replace instruction
i.g SELECT ASCII('¢') returns 162
so add one more "REPLACE(" after "RETURN" and "CHAR(162), '')" at the end of line but before the ";" sign.
来源:https://stackoverflow.com/questions/43148767/sql-server-remove-all-non-printable-ascii-characters