Platform's default charset on different platforms?

后端 未结 2 1833
无人及你
无人及你 2020-12-03 04:47

Some legacy code relies on the platform\'s default charset for translations. For Windows and Linux installations in the \"western world\" I know what that means. But thinkin

2条回答
  •  执念已碎
    2020-12-03 05:20

    For Windows and Linux installations in the "western world" I know what that means.

    Probably not as well as you think.

    But thinking about Russian or Asian platforms I am totally unsure what their platform's default charset is

    Usually it's whatever encoding is historically used in their country.

    (just UTF-16?).

    Most definitely not. Computer usage spread widely before the Unicode standard existed, and each language area developed one or more encodings that could support its language. Those who needed less than 128 characters outside ASCII typically developed an "extended ASCII", many of which were eventually standardized as ISO-8859, while others developed two-byte encodings, often several competing ones. For example, in Japan, emails typically use JIS, but webpages use Shift-JIS, and some applications use EUC-JP. Any of these might be encountered as the platform default encoding in Java.

    It's all a huge mess, which is exactly why Unicode was developed. But the mess has not yet disappeared and we still have to deal with it and should not make any assumptions about what encoding a given bunch of bytes to be interpreted as text are in. There Ain't No Such Thing as Plain Text.

提交回复
热议问题