SQL Server - defining an XML type column with UTF-8 encoding

后端 未结 4 1662
忘掉有多难
忘掉有多难 2020-11-30 15:29

The default encoding for an XML type field defined in an SQL Server is UTF-16. I have no trouble inserting into that field with UTF-16 encoded XML streams.

But if I t

4条回答
  •  暖寄归人
    2020-11-30 15:43

    Is there a way to define a SQL Server column/field as having UTF-8 encoding?

    No, the only Unicode encoding in SQL Server is UTF-16 Little Endian, which is how the NCHAR, NVARCHAR, NTEXT (deprecated as of SQL Server 2005 so don't use this in new development; besides, it sucks compared to NVARCHAR(MAX) anyway), and XML datatypes are handled. You do not get a choice of Unicode encodings like some other RDBMS's allow.

    You can insert UTF-8 encoded XML into SQL Server, provided you follow these three rules:

    1. The incoming string has to be of datatype VARCHAR, not NVARCHAR (as NVARCHAR is always UTF-16 Little Endian, hence the error about not being able to switch the encoding).
    2. The XML has an XML declaration that explicitly states that the encoding of the XML is indeed UTF-8: .
    3. The byte sequence needs to be the actual UTF-8 bytes.

    For example, we can import a UTF-8 encoded XML document containing the screaming face emoji (and we can get the UTF-8 byte sequence for that Supplementary Character by following that link):

    SET NOCOUNT ON;
    DECLARE @XML XML = ''
                        + CHAR(0xF0) + CHAR(0x9F) + CHAR(0x98) + CHAR(0xB1)
                        + '';
    
    SELECT @XML;
    PRINT CONVERT(NVARCHAR(MAX), @XML);
    

    Returns (in both "Results" and "Messages" tabs):

提交回复
热议问题