发表新帖

发表新帖

How to compare Unicode characters that “look alike”?

前端未结

关注

 10  1159

情歌与酒 2020-11-27 10:42

I fall into a surprising issue.

I loaded a text file in my application and I have some logic which compares the value having µ.

And I realized that even if

10条回答

再見小時候 (楼主)

2020-11-27 11:14

For the specific example of μ (mu) and µ (micro sign), the latter has a compatibility decomposition to the former, so you can normalize the string to FormKC or FormKD to convert the micro signs to mus.

However, there are lots of sets of characters that look alike but aren't equivalent under any Unicode normalization form. For example, A (Latin), Α (Greek), and А (Cyrillic). The Unicode website has a confusables.txt file with a list of these, intended to help developers guard against homograph attacks. If necessary, you could parse this file and build a table for “visual normalization” of strings.

0 讨论(0)

查看其它10个回答
发布评论:

提交评论
- 加载中...

热议问题