Unicode Support in Various Programming Languages

后端未结

关注

 20  1969

I\'d like to have a canonical place to pool information about Unicode support in various languages. Is it a part of the core language? Is it provided in libraries? Is it not

相关标签:

20条回答

暗喜

2020-12-13 14:12

PHP

There is already an entire thread on this on SO!

0 讨论(0)
发布评论:

提交评论
- 加载中...
隐瞒了意图╮

2020-12-13 14:15

HQ9+

The Q command has complete Unicode support in most implementations.

0 讨论(0)
发布评论:

提交评论
- 加载中...
悲哀的现实

2020-12-13 14:15

Objective-C

None built-in, aside from whatever happens to be available as part of the C string library.

However, once you add frameworks…

Foundation (Cocoa and Cocoa Touch) and Core Foundation

NSString and CFString each implement a fully Unicode-based string class (actually several classes, as an implementation detail). The two are “toll-free-bridged” so that the API for one can be used with instances of the other, and vice versa.

For data that doesn't necessarily represent text, there's NSData and CFData. NSString provides methods and CFString provides functions to encode text into data and decode text from data. Core Foundation supports more than a hundred different encodings, including all forms of the UTFs. The encodings are divided into two groups: built-in encodings, which are supported everywhere, and external encodings, which are at least supported on Mac OS X.

NSString provides methods for normalizing to forms D, KD, C, or KC. Each returns a new string.

Both NSString and CFString provide a wide variety of comparison/collation options. Here are Foundation's comparison-option flags and Core Foundation's comparison-option flags. They are not all synonymous; for example, Core Foundation makes literal (strict code-point-based) comparison the default, whereas Foundation makes non-literal comparison (allowing characters with accents to compare equal) the default.

Note that Core Foundation does not require Objective-C; indeed, it was created pretty much to provide most of the features of Foundation to Carbon programmers, who used straight C or C++. However, I suspect most modern usage of it is in Cocoa or Cocoa Touch programs, which are all written in Objective-C or Objective-C++.

0 讨论(0)
发布评论:

提交评论
- 加载中...
面向向阳花

2020-12-13 14:15

Arc

Arc doesn't have any unicode support. Yet.

0 讨论(0)
发布评论:

提交评论
- 加载中...
庸人自扰

2020-12-13 14:19

Java

Same as with .NET, Java uses UTF-16 internally: java.lang.String

A String represents a string in the UTF-16 format in which supplementary characters are represented by surrogate pairs (see the section Unicode Character Representations in the Character class for more information). Index values refer to char code units, so a supplementary character uses two positions in a String.

0 讨论(0)
发布评论:

提交评论
- 加载中...
孤街浪徒

2020-12-13 14:19
Python

Python 2 has the classes str and unicode. str objects store bytes, unicode objects store UTF-16 characters. Most library functions support both (e.g. os.listdir('.') returns a list of str, os.listdir(u'.') returns a list of unicode objects). Both have encode and decode methods.

Python 3 basically renamed unicode to str. The Python 3 equivalent to str would be the type bytes. bytes has a decode and str an encode method. Since Python 3.3 str objects internally use one of several encodings in order to save memory. For a Python programmer it still looks like an abstract unicode sequence.

Python supports:
- encoding/decoding
- normalization
- simple case conversion and splitting on whitespace
- looking up characters by their name
Python does not support/has limited support for:
- collation (limited)
- special case conversions where there is no 1:1 mapping between lower and upper case characters
- regular expressions (it's worked on)
- text segmentation
- bidirectional text handling
See also: The Truth about Unicode in Python
0 讨论(0)
发布评论:

提交评论
- 加载中...

Unicode Support in Various Programming Languages

PHP

HQ9+

Objective-C

Foundation (Cocoa and Cocoa Touch) and Core Foundation

Arc

Java

Python