What is an example for non unicode character set for -Dfile.encoding=?

試著忘記壹切 提交于 2021-01-29 09:29:54

问题


I have a JVM. where character set as "-Dfile.encoding=UTF-8" . This is how UTF-8 is set. I would want to set it to a non Unicode character set.

Is there an example/value for non unicode character set so that I can set to -Dfile.encoding= ?


回答1:


[ TLDR => Application encoding a confusing issue, but this document from Oracle should help. ]

First a few important general points about specifying the encoding by setting the System Property file.encoding at run time:

  • It's use is not formally supported, and never has been. From a Java Bug Report in 1998:

    The "file.encoding" property is not required by the J2SE platform specification; it's an internal detail of Sun's implementations and should not be examined or modified by user code. It's also intended to be read-only; it's technically impossible to support the setting of this property to arbitrary values on the command line or at any other time during program execution.

  • There is a draft JEP (JDK Enhancement Proposal), JDK-8187041 Use UTF-8 as default Charset, which proposes:

    Use UTF-8 as the Java virtual machine's default charset so that APIs that depend on the default charset behave consistently across all platforms.

  • It doesn't necessarily make sense to claim that "This application uses encoding {x}" since there may be multiple encodings associated with an application, which can be addressed in different ways, including:

    • The file encoding for console output.
    • The file encoding of the application's source files.
    • The file encoding(s) for file I/O.
    • The file encoding of file paths.

All that said, Oracle specify all encodings supported by Java SE 8. I can't find a corresponding document for more recent JDK versions. Note that:

  • Encodings can be environment specific, based on locale, operating system, Java version, etc.
  • Almost every encoding has at least one alias. For example, the encoding name for simplified Chinese is GBK, but you could also use CP936 or windows-936.
  • Most encoding are non Unicode since Unicode encoding names contain the string "UTF".
  • An encoding name can vary depending on how the application is processing files (java.nio APIs vs. java.io/java.lang APIs.). For example, if performing some I/O on Turkish files on Windows:
    • If the java.nio.* classes are used, specify -Dfile.encoding=windows-1254 at runtime.
    • If the java.lang.* & java.io.* classes are used, specify -Dfile.encoding=Cp1254 at runtime.

This DZone article provides a useful piece of code to show how setting -Dfile.encoding at runtime can impact various settings:

import java.io.ByteArrayInputStream;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.nio.charset.Charset;
import java.util.Locale;
import static java.lang.System.out;
/**
 * Demonstrate default Charset-related details.
 */
public class CharsetDemo
{
   /**
    * Supplies the default encoding without using Charset.defaultCharset()
    * and without accessing System.getProperty("file.encoding").
    *
    * @return Default encoding (default charset).
    */
   public static String getEncoding()
   {
      final byte [] bytes = {'D'};
      final InputStream inputStream = new ByteArrayInputStream(bytes);
      final InputStreamReader reader = new InputStreamReader(inputStream);
      final String encoding = reader.getEncoding();
      return encoding;
   }
   public static void main(final String[] arguments)
   {
      out.println("Default Locale:   " + Locale.getDefault());
      out.println("Default Charset:  " + Charset.defaultCharset());
      out.println("file.encoding;    " + System.getProperty("file.encoding"));
      out.println("sun.jnu.encoding: " + System.getProperty("sun.jnu.encoding"));
      out.println("Default Encoding: " + getEncoding());
   }
}

Here's some sample output when specifying -Dfile.encoding=860 (an alias for MS-DOS Portuguese) using Java 12 on Windows 10:

run:
Default Locale:   en_US
Default Charset:  IBM860
file.encoding:    860
sun.jnu.encoding: Cp1252
Default Encoding: Cp860
BUILD SUCCESSFUL (total time: 0 seconds)

Test the encoding you plan to specify at run time on all target platforms. You may get unexpected results. For example, when I run the code above on Windows 10 with -Dfile.encoding=IBM864 (PC Arabic) it works, but fails with -Dfile.encoding=IBM420 (IBM Arabic).



来源:https://stackoverflow.com/questions/56028048/what-is-an-example-for-non-unicode-character-set-for-dfile-encoding

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!