what is the character encoding used in eclipse vm arguement?

☆樱花仙子☆ 提交于 2019-12-12 15:24:45

问题


We read an important parameter as vm argument and it is a path to a file. Now, users are using vm argument with some korean characters (folders have been named with korean characters) and the program started to break since the korean characters are read as question marks! The below experiment shows the technical situation.

I tried to debug a program in eclipse and in "Debug Configurations" under "arguments" tab in "VM arguments", I gave the input like this

-Dfilepath=D:\XXXX\카운터

But when I read it from the program like this

String filepath = System.getProperty("filepath");

I get the output with question marks like below.

D:\XXXX\???

I understand that eclipse debug GUI uses the right encoding (?) to display the right characters, But when the value is read in program it uses different encoding which is not able to read the characters properly.

what is the default encoding does java uses to read vm arguments supplied to it?

How to change the encoding in eclipse so that the program reads the characters properly ?


回答1:


My conclusion is the conversion depended on default encoding(Windows setting "Language for non-Unicode programs") Here is the program for testing:

package test;
import java.io.FileOutputStream;
public class Test {
public static void main(String[] args) throws Exception {
    StringBuilder sb = new StringBuilder();
    sb.append("[카운터] sysprop=[").append(System.getProperty("cenv"));
    if (args.length > 0) {
        sb.append("], cmd args=[").append(args[0]);
    }
    sb.append("], file.encoding=").append(System.getProperty("file.encoding"));
    FileOutputStream fout = new FileOutputStream("/testout");
    fout.write(sb.toString().getBytes("UTF-8"));
    fout.close();//write result to a file instead of System.out
    //Thread.sleep(10000);//For checking arguments using Process Explorer
}
}

Test1: "Language for non-Unicode programs" is Korean(Korea)

Exceute in command prompt: java -Dcenv=카운터 test.Test 카운터(Korean chars are correct when I verify the arguments using Process Explorer)

Result: [카운터] sysprop=[카운터], cmd args=[카운터], file.encoding=MS949

Test2: "Language for non-Unicode programs" is Chinese(Traditional, Taiwan)

Exceute in command prompt(paste from clipboard): java -Dcenv=카운터 test.Test 카운터(I cannot see Korean chars in command windows. However, Korean chars are correct when I verify the arguments using Process Explorer)

Result: [카운터] sysprop=[???], cmd args=[???], file.encoding=MS950

Test3: "Language for non-Unicode programs" is Chinese(Traditional, Taiwan)

Launch from Eclipse by setting Program arguments and VM arguments (The command line in Process Explorer is C:\pg\jdk160\bin\javaw.exe -agentlib:jdwp=transport=dt_socket,suspend=y,address=localhost:50672 -Dcenv=카운터 -Dfile.encoding=UTF-8 -classpath S:\ws\wtest\bin test.Test 카운터 This is the same as you see in the Properties dialog of Eclipse Debug view)

Result: [카운터] sysprop=[???], cmd args=[bin], file.encoding=UTF-8

Change the Korean chars to "碁石",which exist in MS950/MS949 charset:

  • Test1 Result: [碁石] sysprop=[碁石], cmd args=[碁石], file.encoding=MS949
  • Test2 Result: [碁石] sysprop=[碁石], cmd args=[碁石], file.encoding=MS950
  • Test3 Result: [碁石] sysprop=[碁石], cmd args=[碁石], file.encoding=UTF-8

Change the Korean chars to "鈥焢",which exist in MS950 charset:

  • Test1 Result: [鈥焢] sysprop=[??], cmd args=[??], file.encoding=MS949
  • Test2 Result: [鈥焢] sysprop=[鈥焢], cmd args=[鈥焢], file.encoding=MS950
  • Test3 Result: [鈥焢] sysprop=[鈥焢], cmd args=[鈥焢], file.encoding=UTF-8

Change the Korean chars to "宽广",which exist in GBK charset:

  • Test1 Result: [宽广] sysprop=[??], cmd args=[??], file.encoding=MS949
  • Test2 Result: [宽广] sysprop=[??], cmd args=[??], file.encoding=MS950
  • Test3 Result: [宽广] sysprop=[??], cmd args=[??], file.encoding=UTF-8
  • Test4: to verify my assumption, I change "Language for non-Unicode programs" to Chinese(Simplified, PRC) and exceute java -Dcenv=宽广 test.Test 宽广 in command prompt

    Result: [宽广] sysprop=[宽广], cmd args=[宽广], file.encoding=GBK

During testing, I always check the command line via Process Explorer, and make sure all chars are correct. However, the command argument chars are converted using default encoding before invoke main(String[] args) of Java class. If one of char does not exist in the charset of default encoding, the program will get unexpected argument.

I'm not sure the problem is caused by java.exe/javaw.exe or Windows. But passing non-ASCII parameter via command arguments is not a good idea.

BTW, I also try to execute the command via .bat file(file encoding is UTF-8). Maybe someone is interest,

Test5: "Language for non-Unicode programs" is Korean(Korea)

The command line in Process Explorer is java -Dcenv=移댁슫?? test.Test 移댁슫??(The Korean chars are collapsed)

Result: [카운터] sysprop=[移댁슫??], cmd args=[移댁슫??], file.encoding=MS949

Test6: "Language for non-Unicode programs" is Korean(Korea)

Add another VM arguments. The command line in Process Explorer is java -Dfile.encoding=UTF-8 -Dcenv=移댁슫?? test.Test 移댁슫??(The Korean chars are collapsed)

Result: [카운터] sysprop=[移댁슫??], cmd args=[移댁슫??], file.encoding=UTF-8

Test7: "Language for non-Unicode programs" is Chinese(Traditional, Taiwan)

The command line in Process Explorer is java -cp s:\ws\wtest\bin -Dcenv=儦渥?? test.Test 儦渥??(The Korean chars are collapsed)

Result: [카운터] sysprop=[儦渥??], cmd args=[儦渥??], file.encoding=MS950



来源:https://stackoverflow.com/questions/32587876/what-is-the-character-encoding-used-in-eclipse-vm-arguement

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!