Java Charset problem on linux

巧了我就是萌 提交于 2021-02-06 15:53:14

问题


problem: I have a string containing special characters which i convert to bytes and vice versa..the conversion works properly on windows but on linux the special character is not converted properly.the default charset on linux is UTF-8 as seen with Charset.defaultCharset.getdisplayName()

however if i run on linux with option -Dfile.encoding=ISO-8859-1 it works properly..

how to make it work using the UTF-8 default charset and not setting the -D option in unix environment.

edit: i use jdk1.6.13

edit:code snippet works with cs = "ISO-8859-1"; or cs="UTF-8"; on win but not in linux

        String x = "½";
        System.out.println(x);
        byte[] ba = x.getBytes(Charset.forName(cs));
        for (byte b : ba) {
            System.out.println(b);
        }
        String y = new String(ba, Charset.forName(cs));
        System.out.println(y);

~regards daed


回答1:


Your characters are probably being corrupted by the compilation process and you're ending up with junk data in your class file.

if i run on linux with option -Dfile.encoding=ISO-8859-1 it works properly..

The "file.encoding" property is not required by the J2SE platform specification; it's an internal detail of Sun's implementations and should not be examined or modified by user code. It's also intended to be read-only; it's technically impossible to support the setting of this property to arbitrary values on the command line or at any other time during program execution.

In short, don't use -Dfile.encoding=...

    String x = "½";

Since U+00bd (½) will be represented by different values in different encodings:

windows-1252     BD
UTF-8            C2 BD
ISO-8859-1       BD

...you need to tell your compiler what encoding your source file is encoded as:

javac -encoding ISO-8859-1 Foo.java

Now we get to this one:

    System.out.println(x);

As a PrintStream, this will encode data to the system encoding prior to emitting the byte data. Like this:

 System.out.write(x.getBytes(Charset.defaultCharset()));

That may or may not work as you expect on some platforms - the byte encoding must match the encoding the console is expecting for the characters to show up correctly.




回答2:


Your problem is a bit vague. You mentioned that -Dfile.encoding solved your linux problem, but this is in fact only used to inform the Sun(!) JVM which encoding to use to manage filenames/pathnames at the local disk file system. And ... this does't fit in the problem description you literally gave: "converting chars to bytes and back to chars failed". I don't see what -Dfile.encoding has to do with this. There must be more into the story. How did you conclude that it failed? Did you read/write those characters from/into a pathname/filename or so? Or was you maybe printing to the stdout? Did the stdout itself use the proper encoding?

That said, why would you like to convert the chars forth and back to/from bytes? I don't see any useful business purposes for this.

(sorry, this didn't fit in a comment, but I will update this with the answer if you have given more info about the actual functional requirement).

Update: as per the comments: you basically just need to configure the stdout/cmd so that it uses the proper encoding to display those characters. In Windows you can do that with chcp command, but there's one major caveat: the standard fonts used in Windows cmd does not have the proper glyphs (the actual font pictures) for characters outside the ISO-8859 charsets. You can hack the one or other in registry to add proper fonts. No wording about Linux as I don't do it extensively, but it look like that -Dfile.encoding is somehow the way to go. After all ... I think it's better to replace cmd with a crossplatform UI tool to display the characters the way you want, for example Swing.




回答3:


You should make the conversion explicitly:

byte[] byteArray = "abcd".getBytes( "ISO-8859-1" );
new String( byteArray, "ISO-8859-1" );

EDIT:

It seems that the problem is the encoding of your java file. If it works on windows, try compiling the source files on linux with javac -encondig ISO-8859-1. This should solve your problem.



来源:https://stackoverflow.com/questions/2168350/java-charset-problem-on-linux

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!