how can I compare utf8 string such as persian words in c++?

后端 未结 3 689
抹茶落季
抹茶落季 2020-12-11 10:13

I want to compare strings in Persian (utf8). I know I must use some thing like L\"گل\" and it must be saved in wchar_t * or wstring. the question is when I compare by the fu

3条回答
  •  误落风尘
    2020-12-11 11:06

    Unicode is notoriously difficult to compare.

    Note that any Unicode encoding, including UTF-8, 16 or 32 cannot be compared byte-wise for anything other than byte-equality. The display may be identical, but the bytes used (such as R->L markers, surrogate pairs, display modifiers, and similar used in non-English languages such as Persian) will not be.

    Generally, you need to normalize Unicode before you can make a realistic comparison if the meaning of the text has any significance:

    http://userguide.icu-project.org/transforms/normalization

提交回复
热议问题