Unicode string indexing in C++

后端 未结 5 1643
刺人心
刺人心 2020-12-30 15:05

I come from python where you can use \'string[10]\' to access a character in sequence. And if the string is encoded in Unicode it will give me expected results. However when

5条回答
  •  清酒与你
    2020-12-30 15:50

    In my opinion, the best solution is to do any task with strings using iterators. I can't imagine a scenario where one really has to index strings: if you need indexing like ramp[5] in your example, then the 5 is usually computed in other part of the code and usually you scan all the preceding characters anyway. That's why Standard Library uses iterators in its API.

    A similar problem comes up if you want to get the size of a string. Should it be character (or code point) count or merely number of bytes? Usually you need the size to allocate a buffer so byte count is more desirable. You only very, very rarely have to get Unicode character count.

    If you want to process UTF-8 encoded strings using iterators then I would definitely recommend UTF8-CPP.

提交回复
热议问题