How can I return the Unicode Code Point of a character? For example, if the input is \"A\", then the output should be \"U+0041\". Ideally, a solution should take care of sur
Easy, since chars in C# is actually UTF16 code points:
char x = 'A';
Console.WriteLine("U+{0:x4}", (int)x);
To address the comments, A char
in C# is a 16 bit number, and holds a UTF16 code point. Code points above 16 the bit space cannot be represented in a C# character. Characters in C# is not variable width. A string however can have 2 chars following each other, each being a code unit, forming a UTF16 code point. If you have a string input and characters above the 16 bit space, you can use char.IsSurrogatePair
and Char.ConvertToUtf32
, as suggested in another answer:
string input = ....
for(int i = 0 ; i < input.Length ; i += Char.IsSurrogatePair(input,i) ? 2 : 1)
{
int x = Char.ConvertToUtf32(input, i);
Console.WriteLine("U+{0:X4}", x);
}