What is normalized UTF-8 all about?

后端未结

关注

 7  949

没有蜡笔的小新 2020-11-29 15:26

The ICU project (which also now has a PHP library) contains the classes needed to help normalize UTF-8 strings to make it easier to compare values when searching.

7条回答

时光取名叫无心 (楼主)

2020-11-29 16:07

If two unicode strings are canonically equivalent the strings are really the same, only using different unicode sequences. For example Ä can be represented either using the character Ä or a combination of A and ◌̈.

If the strings are only compatibility equivalent the strings aren't necessarily the same, but they may be the same in some contexts. E.g. ﬀ could be considered same as ff.

So, if you are comparing strings you should use canonical equivalence, because compatibility equivalence isn't real equivalence.

But if you want to sort a set of strings it might make sense to use compatibility equivalence as the are nearly identical.

0 讨论(0)

查看其它7个回答
发布评论:

提交评论
- 加载中...