This is not a question on how to overcome the \"XML parsing: ... illegal xml character\" error, but about why it is happening? I know tha
SQL Sever internally uses UTF-16. Either let the encoding away or cast to unicode
The reason you are looking for: With UTF-8 specified, this character is not known.
--without your directive, SQL Server picks its default
declare @xml XML =
'
';
select @xml;
--or UNICODE, but you must use UTF-16
declare @xml2 XML =
CAST('
' AS NVARCHAR(MAX));
select @xml2
UTF-8 means, that there are chunks of 8 bits used to carry information. The base characters are just one chunk, easy going...
Other characters can be encoded as well. There are "c2" and "c3" codes (look here). c3-codes need three chunks to be encoded. But the internally used UTF16 expects 2 byte encoded characters.
Hope this is clear now...
This code will show you, that the Hyphen has the ASCII code 45 and your en-dash 150:
DECLARE @x VARCHAR(100)=
' ';
WITH RunningNumbers AS
(
SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS Nmbr
FROM sys.objects
)
SELECT SUBSTRING(@x,Nmbr,1), ASCII(SUBSTRING(@x,Nmbr,1)) AS ASCII_Code
FROM RunningNumbers
WHERE ASCII(SUBSTRING(@x,Nmbr,1)) IS NOT NULL;
Have a look here All characters with 7 bits are "plain" and should encode without problems. The "extended ASCII" is depending on code tables and could vary. 150 might be en-dash or something else. UTF8 uses some tricky encodings to allow strange characters to be "legal". Obviously (this was new to me too) the internally used UTF16 cannot cope with c3-characters.