Eclipse wrong Java properties UTF-8 encoding

空扰寡人 提交于 2019-11-30 05:58:42
hagrawal

Root cause:

By default ISO 8859-1 character encoding is used for Eclipse properties file (read here), so if the file contains any character beyond ISO 8859-1 then it will not be processed as expected.

Solution 1

If you use Eclipse then you will notice that it implicitly converts the special character into \uXXXX equivalent. Try copying

会意字 / 會意字

into a properties file opened in Eclipse.

EDIT: As per comment from OP

Update the encoding of your Eclipse as shown below. If you set encoding as UTF-32 then even you can see Chinese character, which you cannot see generally.

How to change Encoding of properties file in Eclipse: See this Eclipse Bugzilla bug for more details, which talks about several other possibilities and in the end suggest what I have highlighted below.

Chinese characters can be seen in Eclipse after encoding is set properly:

Solution 2

If above doesn't work consistently for you (it does work for me and I never see encoding issues) then try this using some Eclipse plugin which handles encoding of properties or other files. For example Eclipse ResourceBundle Editor or Extended Resource-Bundle editor

I would recommend using Eclipse ResourceBundle Editor.

Solution 3

Another possibility to change encoding of file is using Edit --> Set Encoding option. It really matters because it changes the default character set and file encoding. Play around with by changing encoding using Edit --> Set Encoding option and do following Java sysout System.out.println("Default Charset=" + Charset.defaultCharset()); and System.out.println(System.getProperty("file.encoding"));


As an aside: 1

Process the properties file to have content with ISO 8859-1 character encoding by using native2ascii - Native-to-ASCII Converter

What native2ascii does: It converts all the non-ISO 8859-1 character in their equivalent \uXXXX. This is a good tool because you need not to search the \uXXXX equivalent of special character.

Usage for UTF-8: native2ascii -encoding utf8 e:\a.txt e:\b.txt


As an aside: 2

Every computer program whether an IDE, application server, web server, browser, etc. understands only bits, so it need to know how to interpret the bits to make expected sense out of it because depending upon encoding used, same bits can represent different characters. And that's where "Encoding" comes into picture by giving a unique identifier to represent a character so that all computer programs, diverse OS etc. knows exact right way to interpret it.

So, if you have written into a file using some encoding scheme, lets say UTF-8, and then reading using any editor but running with encoding scheme as UTF-8 then you can expect to get correct display.

Please do read my this answer to get more details but from browser-server perspective.

Properties Files are expected to be ISO-8859-1 (Latin-1) encoded. Most likely this what eclipse was set to by default as well.

You have to make sure that every tool which is run in the build or whatever disregards the spec and uses UTF-8 instead.

user1363516

Add the following arguments to your eclipse.ini file.

-Dclient.encoding.override=UTF-8
-Dfile.encoding=UTF-8

By default Eclipse uses the encoding format picked up by the Java Virtual Machine (JVM). Also, you can set the file encoding to utf-8.

This looks like a mixture of Eclipse and git encoding or rather not-encoding.

Git uses raw bytes and doesn't care about encoding. Using git diff you might get characters like shown here. An example there is R<C3><BC>ckg<C3><A4>ngig # should be "Rückgängig".

As you can see there's two funny bracket things showing per umlaut. And in your editor, there are always two \uFFFD for each umlaut in the lines starting with +.

So I assume that your UTF-8 editor tries to interpret the git notation and fails. This in turn leads to the representation \uFFFD, which basically meands that this is character whose value is unknown or unrepresentable (see here).

Like suggested in the first link, you can try setting LESSCHARSET=UTF-8 in your environment variable (Windows). Hmm, in Linux it should be in etc/profile ?

see: a marker such as FFFD (REPLACEMENT CHARACTER) in http://unicode.org/faq/utf_bom.html

and see native2ascii --help

   -encoding encoding_name
          Specifies the name of the character encoding to be used by the conversion procedure. If this option is not present, then the
          default character encoding (as determined by the java.nio.charset.Charset.defaultCharset method) is used. The encoding_name
          string must be the name of a character encoding that is supported by the JRE. See Supported Encodings at
          http://docs.oracle.com/javase/8/docs/technotes/guides/intl/encoding.doc.html

a case

$ file yourfile.properties
yourfile.properties : ISO-8859 text, with very long lines
$ native2ascii -encoding ISO-8859-1 yourfile.properties yourfile.properties 
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!