Unicode Support in Various Programming Languages

后端 未结 20 2030
醉话见心
醉话见心 2020-12-13 13:31

I\'d like to have a canonical place to pool information about Unicode support in various languages. Is it a part of the core language? Is it provided in libraries? Is it not

20条回答
  •  孤街浪徒
    2020-12-13 14:19

    Python

    Python 2 has the classes str and unicode. str objects store bytes, unicode objects store UTF-16 characters. Most library functions support both (e.g. os.listdir('.') returns a list of str, os.listdir(u'.') returns a list of unicode objects). Both have encode and decode methods.

    Python 3 basically renamed unicode to str. The Python 3 equivalent to str would be the type bytes. bytes has a decode and str an encode method. Since Python 3.3 str objects internally use one of several encodings in order to save memory. For a Python programmer it still looks like an abstract unicode sequence.

    Python supports:

    • encoding/decoding
    • normalization
    • simple case conversion and splitting on whitespace
    • looking up characters by their name

    Python does not support/has limited support for:

    • collation (limited)
    • special case conversions where there is no 1:1 mapping between lower and upper case characters
    • regular expressions (it's worked on)
    • text segmentation
    • bidirectional text handling

    See also: The Truth about Unicode in Python

提交回复
热议问题