How does Microsoft handle the fact that UTF-16 is a variable length encoding in their C++ standard library implementation

前端 未结 5 2364
小蘑菇
小蘑菇 2021-02-20 08:48

Having a variable length encoding is indirectly forbidden in the standard.

So I have several questions:

How is the following part of the standard handled?

5条回答
  •  独厮守ぢ
    2021-02-20 09:42

    Here's how Microsoft's STL implementation handles the variable-length encoding:

    basic_string::operator[])( can return a low or a high surrogate, in isolation.

    basic_string::size() returns the number of wchar_t objects. A surrogate pair (one Unicode character) uses two wchar_t's and therefore adds two to the size.

    basic_string::resize() can truncate a string in the middle of a surrogate pair.

    basic_string::insert() can insert in the middle of a surrogate pair.

    basic_string::erase() can erase either half of a surrogate pair.

    In general, the pattern should be clear: the STL does not assume that a std::wstring is in UTF-16, nor enforce that it remains UTF-16.

提交回复
热议问题