Unicode Support in Various Programming Languages

后端未结

关注

 20  1966

I\'d like to have a canonical place to pool information about Unicode support in various languages. Is it a part of the core language? Is it provided in libraries? Is it not

相关标签:

20条回答

盖世英雄少女心

2020-12-13 13:56

Ruby

The only stuff I can find for Ruby is pretty old and not being much of a rubist, I'm not sure how accurate it is.

For the record, Ruby does support utf8, but not multibyte. Internally, it usually assumes strings are byte vectors, though there are libraries and tricks you can usually use to make things work.

Found that here.

Ruby 1.9

Ruby 1.9 attaches encodings to strings. Binary strings use the encoding "ASCII-8BIT". While the default encoding is usually UTF-8 on any modern system, you cannot assume that all third party library functions always returns strings in this encoding. It might return any other encoding (e.g. some yaml parsers do that in some situations). If you concatenate two strings of different encoding you might get an Encoding::CompatibilityError.

0 讨论(0)
发布评论:

提交评论
- 加载中...
孤城傲影

2020-12-13 13:57
.NET (C#, VB.NET, ...)

.NET stores strings internally as a sequence of System.Char objects. One System.Char represents a UTF-16 code unit.

From the MSDN documentation on System.Char:

The .NET Framework uses the Char structure to represent a Unicode character. The Unicode Standard identifies each Unicode character with a unique 21-bit scalar number called a code point, and defines the UTF-16 encoding form that specifies how a code point is encoded into a sequence of one or more 16-bit values. Each 16-bit value ranges from hexadecimal 0x0000 through 0xFFFF and is stored in a Char structure.

Additional resources:
- Strings in .NET and C# (by Jon Skeet).
0 讨论(0)
发布评论:

提交评论
- 加载中...
无人共我

2020-12-13 13:58

Delphi

Delphi 2009 fully supports Unicode. They've changed the implementation of string to default to 16-bit Unicode encoding, and most libraries including the third party ones support Unicode. See Marco Cantù's Delphi and Unicode.

Prior to Delphi 2009, the support for Unicode was limited, but there was WideChar and WideString to store the 16-bit encoded string. See Unicode in Delphi for more info.

Note, you can still develop bilingual CJKV application without using Unicode. For example, Shift JIS encoded string for Japanese can be stored using plain AnsiString.

0 讨论(0)
发布评论:

提交评论
- 加载中...
清酒与你

2020-12-13 13:58

R6RS Scheme

Requires the implementation of Unicode 5.1. All strings are in 'unicode format'.

0 讨论(0)
发布评论:

提交评论
- 加载中...
被撕碎了的回忆

2020-12-13 13:59

D

D supports UTF-8, UTF-16, and UTF-32 (char, wchar, and dchar, respectively). The table with all the types can be found here.

0 讨论(0)
发布评论:

提交评论
- 加载中...
隐瞒了意图╮

2020-12-13 14:04

JavaScript

Looks like before JS 1.3 there was no support for Unicode. As of 1.5, UTF-8, UTF-16 and UCS-2 are all supported. You can use Unicode escape sequences in strings, regexs and identifiers. Source

0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 3 4 下一页

Unicode Support in Various Programming Languages

Ruby

Ruby 1.9

.NET (C#, VB.NET, ...)

Delphi

D

JavaScript