how can I compare utf8 string such as persian words in c++?

后端未结

关注

 3  689

抹茶落季 2020-12-11 10:13

I want to compare strings in Persian (utf8). I know I must use some thing like L\"گل\" and it must be saved in wchar_t * or wstring. the question is when I compare by the fu

3条回答

误落风尘 (楼主)

2020-12-11 11:06

Unicode is notoriously difficult to compare.

Note that any Unicode encoding, including UTF-8, 16 or 32 cannot be compared byte-wise for anything other than byte-equality. The display may be identical, but the bytes used (such as R->L markers, surrogate pairs, display modifiers, and similar used in non-English languages such as Persian) will not be.

Generally, you need to normalize Unicode before you can make a realistic comparison if the meaning of the text has any significance:

http://userguide.icu-project.org/transforms/normalization

0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...