Some legacy code relies on the platform\'s default charset for translations. For Windows and Linux installations in the \"western world\" I know what that means. But thinkin
For Windows and Linux installations in the "western world" I know what that means.
Probably not as well as you think.
But thinking about Russian or Asian platforms I am totally unsure what their platform's default charset is
Usually it's whatever encoding is historically used in their country.
(just UTF-16?).
Most definitely not. Computer usage spread widely before the Unicode standard existed, and each language area developed one or more encodings that could support its language. Those who needed less than 128 characters outside ASCII typically developed an "extended ASCII", many of which were eventually standardized as ISO-8859, while others developed two-byte encodings, often several competing ones. For example, in Japan, emails typically use JIS, but webpages use Shift-JIS, and some applications use EUC-JP. Any of these might be encountered as the platform default encoding in Java.
It's all a huge mess, which is exactly why Unicode was developed. But the mess has not yet disappeared and we still have to deal with it and should not make any assumptions about what encoding a given bunch of bytes to be interpreted as text are in. There Ain't No Such Thing as Plain Text.