Avoid printing unicode replacement character in Java

好久不见. 提交于 2019-12-24 14:28:00

问题


In Java, why does Character.toString((char) 65533) print out this symbol: � ?

I have a Java program which prints these characters all over the place. Its a big program. Any ideas on what I can do to avoid this?


回答1:


One of the most likely scenarios is that you are trying to read ISO-8859 data using the UTF-8 character set. If you come across a sequence of characters that is not valid UTF-8, then it will be replaced with the � symbol.

Check your input streams, and ensure that you read them using the correct character set.




回答2:


In java, why does Character.toString((char) 65533) print out this symbol: � ?

Because exact this particular character IS associated with the particular codepoint. It does not display a random character as you seem to think.

I have a java program which prints these characters all over the place. Its a big program. Any ideas on what I can do to avoid this?

Your problem lies somewhere else. It at least boils down that you should set every step which involves byte-char conversions (storing text in file/db, reading text from file/db, manipulating text, transferring text, displaying text, etcetera) to use UTF-8.

Which catches my eye is the fact that Java does absolutely nothing special with 0xFFFD, it just replaces uncovered chars by a question mark ? and that while you keep insisting that 0xFFFD comes from Java. I know that Firefox does exactly what you said, so are you maybe confusing "Firefox" with "Java"?

If this is true and you're actually talking about a Java webapplication, then you need to set at least the HTTP response encoding to UTF-8. You can do that by putting <%@ page pageEncoding="UTF-8" %> in top of the JSP page in question. You may find this article useful to get more background information and a detailed overview of all steps and solutions you need to apply to solve this "Unicode problem".




回答3:


There is no Unicode character U+FFFD. Hence, the code is logically incorrect. The intended use of the Unicode Replacement Symbol is to be substitued for bad input (such as (char)65533).

How to fix it: don't put junk in strings. Strings are for text. Bytes are for random binary data.




回答4:


Well, what do you want it to do? If you're getting these characters "all over the place" I suspect you have bad data... it should be pretty rare that you receive data which can't be represented in Unicode.

How are you getting the data to start with?




回答5:


Have a look at this primer on character encodings.



来源:https://stackoverflow.com/questions/1832304/avoid-printing-unicode-replacement-character-in-java

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!