Charset of Java source file and failing test

好久不见. 提交于 2019-12-07 03:08:28

It looks like the default encoding used on your Windows 7 machine is UTF-8, while on Windows XP it is Windows-1252. So: always be explicit in the encoding your files use when compiling, don't depend on the platform default.

BTW: As far as I know java on my Windows 7 machine still uses Windows-1252 as the default.

Regarding how to fix it, I would suggest that you store your test data in a file or files. Ensure that the files are saved with the required encoding. Load your test data at runtime using the required encoding. This decouples your tests from compiler encoding.

Kounavi

I'm not an expert in this matter but to see if they are indeed different go to:
Regional and Language Options -> Control Panel -> Advanced options tab

In general you cannot expect all your users to use the Windows default latin charset and why should you?Also, think about other operating systems which use other default encodings (*nix, MACs etc).
This leaves you with the option of guessing because, say, if you have the latin character A you cannot discern if it's in ASCII, UTF-8 or ISO-8859-1 because these charsets map the character to the same entry in the character table (in our case table entry 41 in hexadecimal notation)!
If you really want somehow to solve this there is no perfect solution but using CharsetEncoder ( Java SE 7 - CharsetEncoder ) and CharsetDecoder ( Java SE 7 - Charset Decoder) you may be able to treat the characters in a specific format and encode/decode them as bytes. However, there are still some disadvantages in this approach such as:
1)You cannot expect all character mappings to be detected successfully.
2)It's a killer in perfomance when doing multiple/heavy I/Os.

Your best bet, in my opinion, is one: CONVENTION

Enforce your own encoding-decoding (i.e UTF-8) with Unix style line-endings (/n) and treat all files as such. If you expect to read files produced by others and you expect to read characters that cannot be mapped in your encoding then try to use a "bigger" charset (UTF-16) or read the "illegal" character in bytes and write it with your own encoding in bytes (it will be written in an unreadable/non-representable format however!)

My $0.02 cents. Have fun :)

EDIT:Check this post also: Charset conversion Java

The prior answers suffice.

As you mentioned it. For your information, in our projects we set the (java) source encoding to UTF-8 to stay international and having no need to revert to \uXXXX escaping. Readers and Writers explicitly mention the encoding. In fact also in our national projects we hold to UTF-8. I think UTF-8 might be an emerging convention.

BufferedReader in = new BufferedReader(
      new InputStreamReader(new FileInputStream(is), "UTF-8"));

Mime string escapes are not needed in the java mail API which can handle UTF-8 in subjects and content.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!