How to convert UTF8 combined Characters into single UTF8 characters in ruby?

后端未结

关注

 3  1066

野趣味 2021-01-01 18:25

Some characters such as the Unicode Character \'LATIN SMALL LETTER C WITH CARON\' can be encoded as 0xC4 0x8D, but can also be represented with the two code poi

3条回答

萌比男神i (楼主)

2021-01-01 19:00
String#encode can be used since Ruby 1.9. UTF-8-MAC is a variant of NFD. The codepoints in the range between U+2000 and U+2FFF, or U+F900 and U+FAFF, or U+2F800 and U+2FAFF are not decomposed. See https://developer.apple.com/library/mac/qa/qa1173/_index.html for the details. UTF-8-HFS can be also used insted of UTF-8-MAC.
```
# coding: utf-8

s = "\u010D"
s.encode!('UTF-8-MAC', 'UTF-8')
s.force_encoding('UTF-8')

p "\x63\xcc\x8c" == s
p "\u0063" == s[0]
p "\u030C" == s[1]
```
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...