What is a realistic maximum number of unicode combining characters?

江枫思渺然 提交于 2021-02-19 07:44:22

问题


I'm looking for a maximum number of unicode combining characters that appear after a non-combining one in a realistic natural text.

I know that in unicode text there can be an arbitrary number of combinings placed anywhere in the text. However, I am writing a specialized application that has to operate under constrained resources and because of that and other technical reasons displaying an arbitrary number of combining chars after a non-combining one is not an option. However I would still like to display natural languages properly if possible and support for a small number of combinings should not be a problem.

My intuition that natural languages don't need more than some two or three combinings after a proper char, but I'm not sure and can't find any source on that number.


回答1:


Ok, for a lack of a better answer, here's what I did (for future reference if needed):

I ended up using a SmallVec -like thing with a threshold of 8 bytes before allocation and some 50 bytes upper limit (text stored in UTF-8). That should make everyone happy I think and performance doesn't suffer.

Take those numbers with a pinch of salt, they are arbitrary and I might tune them anyway.



来源:https://stackoverflow.com/questions/50272889/what-is-a-realistic-maximum-number-of-unicode-combining-characters

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!