Converting chinese character to Unicode

前端 未结 3 803
傲寒
傲寒 2020-12-17 04:20

Let\'s say I have a random Chinese character, 玩. I want to convert it to Unicode, which would be U+73A9. How could I do this in C#?

相关标签:
3条回答
  • 2020-12-17 04:47

    A bit longer example, that follows the pattern in Jon Hanna's answer:

    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    
    namespace UnicodeDecodeConsoleApplication
    {
        class Program
        {
            static void Main(string[] args)
            {
                char c = '\u73a9';
                char[] chars = {c};
                Encoding encoding = Encoding.BigEndianUnicode;
                byte[] decodeds = encoding.GetBytes(chars);
                StringBuilder stringBuilder = new StringBuilder("U+");
                foreach (byte decoded in decodeds)
                {
                    stringBuilder.Append(decoded.ToString("x2"));
                }
                Console.WriteLine(stringBuilder);
                Console.ReadLine();
            }
        }
    }
    

    --jeroen

    0 讨论(0)
  • 2020-12-17 04:48

    The characater 玩 is in Unicode.

    If you have it in C# as 玩, then it's currently in UTF-16, which is one of the Unicode encoding forms.

    If you are obtaining it from somewhere else you need to:

    1. Find the encoding it is in.
    2. Get the bytes (wrapped by a stream is nice).
    3. Get of write an appropriate Encoder.
    4. Use the encoder to get the string (wrapping the nice stream with a textreader is nicer).

    Step 3 May be simple (oh, I just use that one!) or hard (darn, have to write it myself!) or somewhere in between (hey, anyone written one of these already?!)

    0 讨论(0)
  • 2020-12-17 05:01

    Take myChar as a char referencing your special character...

    Console.WriteLine("{0} U+{1:x4} {2}", myChar, (int)myChar, (int)myChar);
    

    Above we're outputting the character itself followed by the Unicode code point and then the integer value.

    Reduce the format string and parameters to output only the "U+..." code...

    Console.WriteLine("U+{0:x4}", (int)myChar);
    
    0 讨论(0)
提交回复
热议问题