How does the JVM determine the (default?) character encoding for argv on Linux

你说的曾经没有我的故事 提交于 2019-12-04 05:14:45

问题


Java has a default character encoding, which it uses in contexts where a character encoding is not explicitly supplied. The documentation for how it chooses that encoding is vague:

The default charset is determined during virtual-machine startup and typically depends upon the locale and charset of the underlying operating system.

That documentation has to be vague because the method the JVM uses is system specific.

Using the default character encoding is often a bad idea; it is better to use an explicitly indicated encoding, or to always use the same encoding for some I/O. But one unavoidable use of the default character encoding would seem to be the character encoding used for command-line arguments. On a POSIX system such as Linux, the native (C/C++) code of the JVM gets the command-line arguments as a null terminated list of C/C++ char pointers. Which ought to be thought of as byte pointers, as they must be encoding code points in some (unclear) manner. The JVM has to interpret those sequences of C/C++ chars (bytes) to convert them into a sequence of Java chars, to be given to the main() of the Java program. I assume the JVM uses the default character encoding for this.

So I need to know precisely how the JVM determines the default encoding for a particular system (a modern GNU/Linux operating system), so I can provide user documentation about how my program behaves, and so users of my program can predict how it will behave.

I guess the JVM examines some environment variables, but which ones?


回答1:


You can ofcourse look at the source code of java.nio.charset.Charset.defaultCharset(). When I do that on my system (64-bit Windows 7, with Oracle JDK 8 update 25) I see this:

public static Charset defaultCharset() {
    if (defaultCharset == null) {
        synchronized (Charset.class) {
            String csn = AccessController.doPrivileged(
                new GetPropertyAction("file.encoding"));
            Charset cs = lookup(csn);
            if (cs != null)
                defaultCharset = cs;
            else
                defaultCharset = forName("UTF-8");
        }
    }
    return defaultCharset;
}

In other words, it looks at the system property file.encoding and if it cannot find a matching Charset instance, it uses UTF-8.



来源:https://stackoverflow.com/questions/27923366/how-does-the-jvm-determine-the-default-character-encoding-for-argv-on-linux

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!