How to convert UTF8 combined Characters into single UTF8 characters in ruby?

后端 未结 3 1057
野趣味
野趣味 2021-01-01 18:25

Some characters such as the Unicode Character \'LATIN SMALL LETTER C WITH CARON\' can be encoded as 0xC4 0x8D, but can also be represented with the two code poi

3条回答
  •  萌比男神i
    2021-01-01 19:00

    String#encode can be used since Ruby 1.9. UTF-8-MAC is a variant of NFD. The codepoints in the range between U+2000 and U+2FFF, or U+F900 and U+FAFF, or U+2F800 and U+2FAFF are not decomposed. See https://developer.apple.com/library/mac/qa/qa1173/_index.html for the details. UTF-8-HFS can be also used insted of UTF-8-MAC.

    # coding: utf-8
    
    s = "\u010D"
    s.encode!('UTF-8-MAC', 'UTF-8')
    s.force_encoding('UTF-8')
    
    p "\x63\xcc\x8c" == s
    p "\u0063" == s[0]
    p "\u030C" == s[1]
    

提交回复
热议问题