Unicode Support in Various Programming Languages

后端 未结 20 1969
醉话见心
醉话见心 2020-12-13 13:31

I\'d like to have a canonical place to pool information about Unicode support in various languages. Is it a part of the core language? Is it provided in libraries? Is it not

相关标签:
20条回答
  • 2020-12-13 14:12

    PHP

    There is already an entire thread on this on SO!

    0 讨论(0)
  • 2020-12-13 14:15

    HQ9+

    The Q command has complete Unicode support in most implementations.

    0 讨论(0)
  • 2020-12-13 14:15

    Objective-C

    None built-in, aside from whatever happens to be available as part of the C string library.

    However, once you add frameworks…

    Foundation (Cocoa and Cocoa Touch) and Core Foundation

    NSString and CFString each implement a fully Unicode-based string class (actually several classes, as an implementation detail). The two are “toll-free-bridged” so that the API for one can be used with instances of the other, and vice versa.

    For data that doesn't necessarily represent text, there's NSData and CFData. NSString provides methods and CFString provides functions to encode text into data and decode text from data. Core Foundation supports more than a hundred different encodings, including all forms of the UTFs. The encodings are divided into two groups: built-in encodings, which are supported everywhere, and external encodings, which are at least supported on Mac OS X.

    NSString provides methods for normalizing to forms D, KD, C, or KC. Each returns a new string.

    Both NSString and CFString provide a wide variety of comparison/collation options. Here are Foundation's comparison-option flags and Core Foundation's comparison-option flags. They are not all synonymous; for example, Core Foundation makes literal (strict code-point-based) comparison the default, whereas Foundation makes non-literal comparison (allowing characters with accents to compare equal) the default.

    Note that Core Foundation does not require Objective-C; indeed, it was created pretty much to provide most of the features of Foundation to Carbon programmers, who used straight C or C++. However, I suspect most modern usage of it is in Cocoa or Cocoa Touch programs, which are all written in Objective-C or Objective-C++.

    0 讨论(0)
  • 2020-12-13 14:15

    Arc

    Arc doesn't have any unicode support. Yet.

    0 讨论(0)
  • 2020-12-13 14:19

    Java

    Same as with .NET, Java uses UTF-16 internally: java.lang.String

    A String represents a string in the UTF-16 format in which supplementary characters are represented by surrogate pairs (see the section Unicode Character Representations in the Character class for more information). Index values refer to char code units, so a supplementary character uses two positions in a String.

    0 讨论(0)
  • 2020-12-13 14:19

    Python

    Python 2 has the classes str and unicode. str objects store bytes, unicode objects store UTF-16 characters. Most library functions support both (e.g. os.listdir('.') returns a list of str, os.listdir(u'.') returns a list of unicode objects). Both have encode and decode methods.

    Python 3 basically renamed unicode to str. The Python 3 equivalent to str would be the type bytes. bytes has a decode and str an encode method. Since Python 3.3 str objects internally use one of several encodings in order to save memory. For a Python programmer it still looks like an abstract unicode sequence.

    Python supports:

    • encoding/decoding
    • normalization
    • simple case conversion and splitting on whitespace
    • looking up characters by their name

    Python does not support/has limited support for:

    • collation (limited)
    • special case conversions where there is no 1:1 mapping between lower and upper case characters
    • regular expressions (it's worked on)
    • text segmentation
    • bidirectional text handling

    See also: The Truth about Unicode in Python

    0 讨论(0)
提交回复
热议问题