I\'d like to have a canonical place to pool information about Unicode support in various languages. Is it a part of the core language? Is it provided in libraries? Is it not
The only stuff I can find for Ruby is pretty old and not being much of a rubist, I'm not sure how accurate it is.
For the record, Ruby does support utf8, but not multibyte. Internally, it usually assumes strings are byte vectors, though there are libraries and tricks you can usually use to make things work.
Found that here.
Ruby 1.9 attaches encodings to strings. Binary strings use the encoding "ASCII-8BIT". While the default encoding is usually UTF-8 on any modern system, you cannot assume that all third party library functions always returns strings in this encoding. It might return any other encoding (e.g. some yaml parsers do that in some situations). If you concatenate two strings of different encoding you might get an Encoding::CompatibilityError
.
.NET stores strings internally as a sequence of System.Char objects. One System.Char
represents a UTF-16 code unit.
From the MSDN documentation on System.Char
:
The .NET Framework uses the Char structure to represent a Unicode character. The Unicode Standard identifies each Unicode character with a unique 21-bit scalar number called a code point, and defines the UTF-16 encoding form that specifies how a code point is encoded into a sequence of one or more 16-bit values. Each 16-bit value ranges from hexadecimal 0x0000 through 0xFFFF and is stored in a Char structure.
Additional resources:
Delphi 2009 fully supports Unicode. They've changed the implementation of string
to default to 16-bit Unicode encoding, and most libraries including the third party ones support Unicode. See Marco Cantù's Delphi and Unicode.
Prior to Delphi 2009, the support for Unicode was limited, but there was WideChar
and WideString
to store the 16-bit encoded string. See Unicode in Delphi for more info.
Note, you can still develop bilingual CJKV application without using Unicode. For example, Shift JIS encoded string for Japanese can be stored using plain AnsiString
.
R6RS Scheme
Requires the implementation of Unicode 5.1. All strings are in 'unicode format'.
D supports UTF-8, UTF-16, and UTF-32 (char, wchar, and dchar, respectively). The table with all the types can be found here.
Looks like before JS 1.3 there was no support for Unicode. As of 1.5, UTF-8, UTF-16 and UCS-2 are all supported. You can use Unicode escape sequences in strings, regexs and identifiers. Source