Why does en-dash (–) trigger illegal XML character error (C#/SSMS)?

后端 未结 4 596
星月不相逢
星月不相逢 2021-01-05 04:19

This is not a question on how to overcome the \"XML parsing: ... illegal xml character\" error, but about why it is happening? I know tha

4条回答
  •  长情又很酷
    2021-01-05 04:58

    SQL Sever internally uses UTF-16. Either let the encoding away or cast to unicode

    The reason you are looking for: With UTF-8 specified, this character is not known.

    --without your directive, SQL Server picks its default
    declare @xml XML = 
    '
      
      
    ';
    select @xml;
    
    --or UNICODE, but you must use UTF-16
    declare @xml2 XML = 
    CAST('
    
      
      
    ' AS NVARCHAR(MAX));
    
    select @xml2
    

    UPDATE

    UTF-8 means, that there are chunks of 8 bits used to carry information. The base characters are just one chunk, easy going...

    Other characters can be encoded as well. There are "c2" and "c3" codes (look here). c3-codes need three chunks to be encoded. But the internally used UTF16 expects 2 byte encoded characters.

    Hope this is clear now...

    UPDATE 2

    This code will show you, that the Hyphen has the ASCII code 45 and your en-dash 150:

    DECLARE @x VARCHAR(100)=
    '';
    
    WITH RunningNumbers AS
    (
        SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS Nmbr
        FROM sys.objects
    )
    SELECT SUBSTRING(@x,Nmbr,1), ASCII(SUBSTRING(@x,Nmbr,1)) AS ASCII_Code
    FROM RunningNumbers
    WHERE ASCII(SUBSTRING(@x,Nmbr,1)) IS NOT NULL;
    

    Have a look here All characters with 7 bits are "plain" and should encode without problems. The "extended ASCII" is depending on code tables and could vary. 150 might be en-dash or something else. UTF8 uses some tricky encodings to allow strange characters to be "legal". Obviously (this was new to me too) the internally used UTF16 cannot cope with c3-characters.

提交回复
热议问题