Using unicode characters bigger than 2 bytes with .Net

前端 未结 4 1873
说谎
说谎 2020-12-15 08:32

I\'m using this code to generate U+10FFFC

var s = Encoding.UTF8.GetString(new byte[] {0xF4,0x8F,0xBF,0xBC});

I know it\'s for

4条回答
  •  甜味超标
    2020-12-15 09:24

    As posted already by Martinho, it is much easier to create the string with this private codepoint that way:

    var s = char.ConvertFromUtf32(0x10FFFC);
    

    But to loop through the two char elements of that string is senseless:

    foreach(var ch in s)
    {
        Console.WriteLine(ch);
    }
    

    What for? You will just get the high and low surrogate that encode the codepoint. Remember a char is a 16 bit type so it can hold just a max value of 0xFFFF. Your codepoint doesn't fit into a 16 bit type, indeed for the highest codepoint you'll need 21 bits (0x10FFFF) so the next wider type would just be a 32 bit type. The two char elements are not characters, but a surrogate pair. The value of 0x10FFFC is encoded into the two surrogates.

提交回复
热议问题