C++ UTF-8 actual string length

倖福魔咒の 提交于 2019-12-06 09:53:38

问题


Is there any native (cross platform) C++ function in any of standard libraries which returns the actual length of std::string?

Update: as we know std::string.length() returns the number of bytes not the number of characters. I already have a custom function which returns the actual one, but I'm looking for an standard one.


回答1:


codecvt ought to be helpful, the Standard provides implementations for UTF-8, for example codecvt_utf8<char32_t>() would be appropriate in this case.

Probably something like:

wstring_convert< codecvt_utf8<char32_t>, char32_t >().from_bytes(the_std_string).size()



回答2:


Actual length is the number of bytes. There is very little meaning to counting codepoints. You may though want to count other things like grapheme clusters.

See more about different kind of string lengths in http://utf8everywhere.org




回答3:


There is no way to do that in C/C++, without 3rd party libraries. Even if you convert to char32_t, you will get code points, not characters.

A code point does not match the user perception of a character, because of things like decompose formats, ligatures, variation selectors.

The closest available construct to a "user character" is a "grapheme cluster" (see http://www.unicode.org/reports/tr29/)

Your best cross-platform option is ICU4C (http://site.icu-project.org/)



来源:https://stackoverflow.com/questions/16863937/c-utf-8-actual-string-length

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!