What are the unicode ranges for Hindi accented characters?

后端 未结 3 1432
面向向阳花
面向向阳花 2020-12-06 21:50

I\'m trying to gather a Unicode list of all the \'o\' like shapes in the Hindi character-set. In fact, a list of any characters (in any language) that makes uses of separate

3条回答
  •  暖寄归人
    2020-12-06 22:01

    If you want the complete set (for all languages), you can do it problematically. You start from the Unicode date file at ftp://ftp.unicode.org/Public/6.1.0/ucd/UnicodeData.txt, described by TR-44 (http://unicode.org/reports/tr44/#Property_Definitions)

    You can use the Canonical_Combining_Class field (see at http://unicode.org/reports/tr44/#Canonical_Combining_Class_Values) to filter the exact characters you want. Can't be more precise, because "accent" a bit vague :-) You might even have to also look at General_Category to get the filter right (and exclude certain marks, or symbols, or punctuation).

    And a script doing this would definitely be better than trying to mess with text editors. One of the characteristics of combining characters is that they combine :-) So you might get all kind of puzzling results (like this: http://www.siao2.com/2006/02/17/533929.aspx :-)

提交回复
热议问题