why '?' appears as output while Printing unicode characters in java

后端 未结 5 1911
不思量自难忘°
不思量自难忘° 2020-12-21 08:47

While printing certain unicode characters in java we get output as \'?\'. Why is it so and is there any way to print these characters?

This is my code



        
相关标签:
5条回答
  • 2020-12-21 08:48

    Java's default behaviour when reading an invalid unicode character is to replace it with the Replacement Character (\uFFFD). This character is often rendered as a question mark.

    In your case, the text you're reading is not encoded as unicode, it's encoded as something else (Windows-1252 or ISO-8859-1 are probably the most common alternatives if your text is in English).

    0 讨论(0)
  • 2020-12-21 08:54

    I wrote an Open Source Library that has a utility that converts any String to Unicode sequence and vise-versa. It helps to diagnose such issues. So for instance to print your String you can use something like this:

    String str= StringUnicodeEncoderDecoder.decodeUnicodeSequenceToString("\\u0197" +
       StringUnicodeEncoderDecoder.encodeStringToUnicodeSequence("Test"));
    

    You can read about the library and where to download it and how to use it at Open Source Java library with stack trace filtering, Silent String parsing Unicode converter and Version comparison See the paragraph "String Unicode converter"

    0 讨论(0)
  • 2020-12-21 09:01

    Here's a great article, written by Joel Spolsky, on the topic. It won't directly help you solve your problem, but it will help you understand what's going on. It'll also show you how involved the situation really is.

    0 讨论(0)
  • 2020-12-21 09:01

    You have a character encoding which doesn't match the character you have or the supported characters on the screen.

    I would check which encoding you are using through out and try to determine whether you are reading, storing or printing the value correctly.

    0 讨论(0)
  • 2020-12-21 09:02

    Are you sure which encoding you need? You may need to explicitly encode your output as UTF-8 or ISO 8859-1 if you are dealing with European characters.

    0 讨论(0)
提交回复
热议问题