Encoding SQL_Latin1_General_CP1_CI_AS into UTF-8

后端 未结 7 1921
小鲜肉
小鲜肉 2020-12-09 10:50

I\'m generating a XML file with PHP using DomDocument and I need to handle asian characters. I\'m pulling data from the MSSQL2008 server using the pdo_mssql driver and I app

7条回答
  •  感动是毒
    2020-12-09 11:29

    I found how to solve it, so hopefully this will be helpful to someone.

    First, SQL_Latin1_General_CP1_CI_AS is a strange mix of CP-1252 and UTF-8. The basic characters are CP-1252, so this is why all I had to do was UTF-8 and everything worked. The asian and other UTF-8 characters are encoded on 2 bytes and the php pdo_mssql driver seems to hate varying length characters so it seems to do a CAST to varchar (instead of nvarchar) and then all the 2 byte characters become question marks ('?').

    I fixed it by casting it to binary and then I rebuild the text with php:

    SELECT CAST(MY_COLUMN AS VARBINARY(MAX)) FROM MY_TABLE;
    

    In php:

    //Binary to hexadecimal
    $hex = bin2hex($bin);
    
    //And then from hex to string
    $str = "";
    for ($i=0;$i

提交回复
热议问题