Unicode Support in Various Programming Languages

后端 未结 20 1988
醉话见心
醉话见心 2020-12-13 13:31

I\'d like to have a canonical place to pool information about Unicode support in various languages. Is it a part of the core language? Is it provided in libraries? Is it not

20条回答
  •  情歌与酒
    2020-12-13 14:08

    Rust

    Rust's strings (std::String and &str) are always valid UTF-8, and do not use null terminators, and as a result can not be indexed as an array, like they can be in C/C++, etc. They can be sliced somewhat like Go using .get since 1.20, with the caveat that it will fail if you try slicing the middle of a code point.

    Rust also has OsStr/OsString for interacting with the Host OS. It's byte array on Unix (containing any sequence of bytes). On windows it's WTF-8 (A super-set of UTF-8 that handles the improperly formed Unicode strings that are allowed in Windows and Javascript), &str and String can be freely converted to OsStr or OsString, but require checks to covert the other way. Either by Failing on invalid unicode, or replacing with the Unicode replacement char. (There is also Path/PathBuf, which are just wrappers around OsStr/OsString).

    There is also the CStr and CString types, which represent Null terminated C strings, like OsStr on Unix they can contain arbitrary bytes.

    Rust doesn't directly support UTF-16. But can convert OsStr to UCS-2 on windows.

提交回复
热议问题