发表新帖

发表新帖

How to know the preferred display width (in columns) of Unicode characters?

后端未结

关注

 5  2056

说谎 2020-12-23 18:10

In different encodings of Unicode, for example UTF-16le or UTF-8, a character may occupy 2 or 3 bytes. Many Unicode applications doesn\'t t

5条回答

孤独总比滥情好 (楼主)

2020-12-23 18:29

I believe that to do this correctly, you need to consider that component of the published Unicode Standard known as Unicode Standard Annex #14, the Unicode Line Breaking Algorithm.

If you were programming in Perl, what you want to know would be super easy, because Perl’s Unicode::LineBreak module implementing UAX#14 includes a class with a simple columns method that tells you the right answer for its string argument. These things work especially well on Asian languages, where absolutley nothing else will do. This module includes over 6,000 unit tests, is actively maintained, and its author is himself Asian, so it’s important to him to get these tricky bits exactly correct.

Most of the guts of the module are a library written in C. I have not looked at how to call its component C library from other languages thn Perl, but you might look into whether this might be possible.

0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...

热议问题