问题
In Java, why does Character.toString((char) 65533)
print out this symbol: � ?
I have a Java program which prints these characters all over the place. Its a big program. Any ideas on what I can do to avoid this?
回答1:
One of the most likely scenarios is that you are trying to read ISO-8859 data using the UTF-8 character set. If you come across a sequence of characters that is not valid UTF-8, then it will be replaced with the � symbol.
Check your input streams, and ensure that you read them using the correct character set.
回答2:
In java, why does Character.toString((char) 65533) print out this symbol: � ?
Because exact this particular character IS associated with the particular codepoint. It does not display a random character as you seem to think.
I have a java program which prints these characters all over the place. Its a big program. Any ideas on what I can do to avoid this?
Your problem lies somewhere else. It at least boils down that you should set every step which involves byte
-char
conversions (storing text in file/db, reading text from file/db, manipulating text, transferring text, displaying text, etcetera) to use UTF-8
.
Which catches my eye is the fact that Java does absolutely nothing special with 0xFFFD
, it just replaces uncovered chars by a question mark ?
and that while you keep insisting that 0xFFFD
comes from Java. I know that Firefox does exactly what you said, so are you maybe confusing "Firefox" with "Java"?
If this is true and you're actually talking about a Java webapplication, then you need to set at least the HTTP response encoding to UTF-8
. You can do that by putting <%@ page pageEncoding="UTF-8" %>
in top of the JSP page in question. You may find this article useful to get more background information and a detailed overview of all steps and solutions you need to apply to solve this "Unicode problem".
回答3:
There is no Unicode character U+FFFD. Hence, the code is logically incorrect. The intended use of the Unicode Replacement Symbol is to be substitued for bad input (such as (char)65533
).
How to fix it: don't put junk in strings. Strings are for text. Bytes are for random binary data.
回答4:
Well, what do you want it to do? If you're getting these characters "all over the place" I suspect you have bad data... it should be pretty rare that you receive data which can't be represented in Unicode.
How are you getting the data to start with?
回答5:
Have a look at this primer on character encodings.
来源:https://stackoverflow.com/questions/1832304/avoid-printing-unicode-replacement-character-in-java