Unicode Support in Various Programming Languages

后端 未结 20 1999
醉话见心
醉话见心 2020-12-13 13:31

I\'d like to have a canonical place to pool information about Unicode support in various languages. Is it a part of the core language? Is it provided in libraries? Is it not

20条回答
  •  半阙折子戏
    2020-12-13 14:05

    Lua

    Lua 5.3 has a built-in utf8 library, which handles the UTF-8 encoding. It allows you to convert a series of codepoints to the corresponding byte sequence and the other way around, get the length (the number of codepoints in a string), iterate over the codepoints in a string, get the byte position of the nth codepoint. It also provides a pattern, to be used by the pattern-matching functions in the string library, that will match one UTF-8 byte sequence.

    Lua 5.3 has Unicode code point escape sequences that can be used in string literals (for instance, "\u{61}" for "a"). They translate to UTF-8 byte sequences.

    Lua source code can be encoded in UTF-8 or any encoding in which ASCII characters take up one byte. UTF-16 and UTF-32 are not understood by the vanilla Lua interpreter. But strings can contain any encoding, or arbitrary binary data.

提交回复
热议问题