发表新帖

发表新帖

Using unicode characters bigger than 2 bytes with .Net

前端未结

关注

 4  1873

说谎 2020-12-15 08:32

I\'m using this code to generate U+10FFFC

var s = Encoding.UTF8.GetString(new byte[] {0xF4,0x8F,0xBF,0xBC});

I know it\'s for

4条回答

甜味超标 (楼主)

2020-12-15 09:24
As posted already by Martinho, it is much easier to create the string with this private codepoint that way:
```
var s = char.ConvertFromUtf32(0x10FFFC);
```
But to loop through the two char elements of that string is senseless:
```
foreach(var ch in s)
{
    Console.WriteLine(ch);
}
```
What for? You will just get the high and low surrogate that encode the codepoint. Remember a char is a 16 bit type so it can hold just a max value of 0xFFFF. Your codepoint doesn't fit into a 16 bit type, indeed for the highest codepoint you'll need 21 bits (0x10FFFF) so the next wider type would just be a 32 bit type. The two char elements are not characters, but a surrogate pair. The value of 0x10FFFC is encoded into the two surrogates.
0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...

热议问题