Return code point of characters in C#

后端 未结 6 1283
梦谈多话
梦谈多话 2020-12-10 01:52

How can I return the Unicode Code Point of a character? For example, if the input is \"A\", then the output should be \"U+0041\". Ideally, a solution should take care of sur

6条回答
  •  轻奢々
    轻奢々 (楼主)
    2020-12-10 02:30

    Easy, since chars in C# is actually UTF16 code points:

    char x = 'A';
    Console.WriteLine("U+{0:x4}", (int)x);
    

    To address the comments, A char in C# is a 16 bit number, and holds a UTF16 code point. Code points above 16 the bit space cannot be represented in a C# character. Characters in C# is not variable width. A string however can have 2 chars following each other, each being a code unit, forming a UTF16 code point. If you have a string input and characters above the 16 bit space, you can use char.IsSurrogatePair and Char.ConvertToUtf32, as suggested in another answer:

    string input = ....
    for(int i = 0 ; i < input.Length ; i += Char.IsSurrogatePair(input,i) ? 2 : 1)
    {
        int x = Char.ConvertToUtf32(input, i);
        Console.WriteLine("U+{0:X4}", x);
    }
    

提交回复
热议问题