Convert String (UTF-16) to UTF-8 in C#

后端 未结 6 1969
囚心锁ツ
囚心锁ツ 2020-12-05 11:16

I need to convert a string to UTF-8 in C#. I\'ve already try many ways but none works as I wanted. I converted my string into a byte array and then to try to write it to an

6条回答
  •  醉话见心
    2020-12-05 11:31

    If you want a UTF8 string, where every byte is correct ('Ö' -> [195, 0] , [150, 0]), you can use the followed:

    public static string Utf16ToUtf8(string utf16String)
    {
       /**************************************************************
        * Every .NET string will store text with the UTF16 encoding, *
        * known as Encoding.Unicode. Other encodings may exist as    *
        * Byte-Array or incorrectly stored with the UTF16 encoding.  *
        *                                                            *
        * UTF8 = 1 bytes per char                                    *
        *    ["100" for the ansi 'd']                                *
        *    ["206" and "186" for the russian 'κ']                   *
        *                                                            *
        * UTF16 = 2 bytes per char                                   *
        *    ["100, 0" for the ansi 'd']                             *
        *    ["186, 3" for the russian 'κ']                          *
        *                                                            *
        * UTF8 inside UTF16                                          *
        *    ["100, 0" for the ansi 'd']                             *
        *    ["206, 0" and "186, 0" for the russian 'κ']             *
        *                                                            *
        * We can use the convert encoding function to convert an     *
        * UTF16 Byte-Array to an UTF8 Byte-Array. When we use UTF8   *
        * encoding to string method now, we will get a UTF16 string. *
        *                                                            *
        * So we imitate UTF16 by filling the second byte of a char   *
        * with a 0 byte (binary 0) while creating the string.        *
        **************************************************************/
    
        // Storage for the UTF8 string
        string utf8String = String.Empty;
    
        // Get UTF16 bytes and convert UTF16 bytes to UTF8 bytes
        byte[] utf16Bytes = Encoding.Unicode.GetBytes(utf16String);
        byte[] utf8Bytes = Encoding.Convert(Encoding.Unicode, Encoding.UTF8, utf16Bytes);
    
        // Fill UTF8 bytes inside UTF8 string
        for (int i = 0; i < utf8Bytes.Length; i++)
        {
            // Because char always saves 2 bytes, fill char with 0
            byte[] utf8Container = new byte[2] { utf8Bytes[i], 0 };
            utf8String += BitConverter.ToChar(utf8Container, 0);
        }
    
        // Return UTF8
        return utf8String;
    }
    

    In my case the DLL request is a UTF8 string too, but unfortunately the UTF8 string must be interpreted with UTF16 encoding ('Ö' -> [195, 0], [19, 32]). So the ANSI '–' which is 150 has to be converted to the UTF16 '–' which is 8211. If you have this case too, you can use the following instead:

    public static string Utf16ToUtf8(string utf16String)
    {
        // Get UTF16 bytes and convert UTF16 bytes to UTF8 bytes
        byte[] utf16Bytes = Encoding.Unicode.GetBytes(utf16String);
        byte[] utf8Bytes = Encoding.Convert(Encoding.Unicode, Encoding.UTF8, utf16Bytes);
    
        // Return UTF8 bytes as ANSI string
        return Encoding.Default.GetString(utf8Bytes);
    }
    

    Or the Native-Method:

    [DllImport("kernel32.dll")]
    private static extern Int32 WideCharToMultiByte(UInt32 CodePage, UInt32 dwFlags, [MarshalAs(UnmanagedType.LPWStr)] String lpWideCharStr, Int32 cchWideChar, [Out, MarshalAs(UnmanagedType.LPStr)] StringBuilder lpMultiByteStr, Int32 cbMultiByte, IntPtr lpDefaultChar, IntPtr lpUsedDefaultChar);
    
    public static string Utf16ToUtf8(string utf16String)
    {
        Int32 iNewDataLen = WideCharToMultiByte(Convert.ToUInt32(Encoding.UTF8.CodePage), 0, utf16String, utf16String.Length, null, 0, IntPtr.Zero, IntPtr.Zero);
        if (iNewDataLen > 1)
        {
            StringBuilder utf8String = new StringBuilder(iNewDataLen);
            WideCharToMultiByte(Convert.ToUInt32(Encoding.UTF8.CodePage), 0, utf16String, -1, utf8String, utf8String.Capacity, IntPtr.Zero, IntPtr.Zero);
    
            return utf8String.ToString();
        }
        else
        {
            return String.Empty;
        }
    }
    

    If you need it the other way around, see Utf8ToUtf16. Hope I could be of help.

提交回复
热议问题