std::string and UTF-8 encoded unicode

后端 未结 3 1290
耶瑟儿~
耶瑟儿~ 2021-01-05 11:27

If I understand well, it is possible to use both string and wstring to store UTF-8 text.

  • With char, ASCII characters take a single byte, some chinese charac

3条回答
  •  时光取名叫无心
    2021-01-05 11:33

    You are correct for those:
    ...Which means that str[3] doesn't necessarily point to the 4th character...only use them as dummy feature-less byte arrays...

    string of C++ can only handle ascii characters. This is different from the String of Java, which can handle Unicode characters. You can store the encoding result (bytes) of Chinese characters into string (char in C/C++ is just byte), but this is meaningless as string just treat the bytes as ascii chars, so you cannot use string function to process it.
    wstring may be something you need.

    There is something that should be clarified. UTF-8 is just an encoding method for Unicode characters (transforming characters from/to byte format).

提交回复
热议问题