In different encodings of Unicode, for example UTF-16le or UTF-8, a character may occupy 2 or 3 bytes. Many Unicode applications doesn\'t t
I believe that to do this correctly, you need to consider that component of the published Unicode Standard known as Unicode Standard Annex #14, the Unicode Line Breaking Algorithm.
If you were programming in Perl, what you want to know would be super easy, because Perl’s Unicode::LineBreak module implementing UAX#14 includes a class with a simple columns method that tells you the right answer for its string argument. These things work especially well on Asian languages, where absolutley nothing else will do. This module includes over 6,000 unit tests, is actively maintained, and its author is himself Asian, so it’s important to him to get these tricky bits exactly correct.
Most of the guts of the module are a library written in C. I have not looked at how to call its component C library from other languages thn Perl, but you might look into whether this might be possible.