How can I open files containing accents in Java?

前端 未结 6 1625
傲寒
傲寒 2020-12-01 18:44

(editing for clarification and adding some code)

Hello, We have a requirement to parse data sent from users all over the world. Our Linux systems have a de

6条回答
  •  渐次进展
    2020-12-01 19:20

    First, the character encoding used is not directly related to the locale. So changing the locale won't help much.

    Second, the � is typical for the Unicode replacement character U+FFFD being printed in ISO-8859-1 instead of UTF-8. Here's an evidence:

    System.out.println(new String("�".getBytes("UTF-8"), "ISO-8859-1")); // �
    

    So there are two problems:

    1. Your JVM is reading those special characters as .
    2. Your console is using ISO-8859-1 to display characters.

    For a Sun JVM, the VM argument -Dfile.encoding=UTF-8 should fix the first problem. The second problem is to be fixed in the console settings. If you're using for example Eclipse, you can change it in Window > Preferences > General > Workspace > Text File Encoding. Set it to UTF-8 as well.


    Update: As per your update:

    byte[] textArray = f.getName().getBytes();
    

    That should have been the following to exclude influence of platform default encoding:

    byte[] textArray = f.getName().getBytes("UTF-8");
    

    If that still displays the same, then the problem lies deeper. What JVM exactly are you using? Do a java -version. As said before, the -Dfile.encoding argument is Sun JVM specific. Some Linux machines ships with GNU JVM or OpenJDK's JVM and this argument may then not work.

提交回复
热议问题