Some characters such as the Unicode Character \'LATIN SMALL LETTER C WITH CARON\' can be encoded as 0xC4 0x8D
, but can also be represented with the two code poi
String#encode can be used since Ruby 1.9. UTF-8-MAC is a variant of NFD. The codepoints in the range between U+2000 and U+2FFF, or U+F900 and U+FAFF, or U+2F800 and U+2FAFF are not decomposed. See https://developer.apple.com/library/mac/qa/qa1173/_index.html for the details. UTF-8-HFS can be also used insted of UTF-8-MAC.
# coding: utf-8
s = "\u010D"
s.encode!('UTF-8-MAC', 'UTF-8')
s.force_encoding('UTF-8')
p "\x63\xcc\x8c" == s
p "\u0063" == s[0]
p "\u030C" == s[1]