C++20 with u8, char8_t and std::string

我与影子孤独终老i 提交于 2019-12-04 15:27:52

问题


C++11 brought us the u8 prefix for UTF-8 literals and I thought that was pretty cool a few years ago and peppered my code with things like this:

std::string myString = u8"●";

This is all fine and good, but the issue comes up in C++20 it doesn't seem to compile anymore because u8 creates a char8_t* and this is incompatible with std::string which just uses char.

Should I be creating a new utf8string? What's the consistent and correct way to do this kind of thing in a C++20 world where we have more explicit types that don't really match with the standard std::string?


回答1:


In addition to @lubgr's answer, the paper char8_t backward compatibility remediation (P1423) discusses several ways how to make std::string with char8_t character arrays.

Basically the idea is that you can cast the u8 char array into a "normal" char array to get the same behaviour as C++17 and before, you just have to be a bit more explicit. The paper discusses various ways to do this.

The most simple (but not fully zero overhead, unless you add more overloads) method that fits your usecase is probably the last one, i.e. introduce explicit conversion functions:

std::string from_u8string(const std::string &s) {
  return s;
}
std::string from_u8string(std::string &&s) {
  return std::move(s);
}
#if defined(__cpp_lib_char8_t)
std::string from_u8string(const std::u8string &s) {
  return std::string(s.begin(), s.end());
}
#endif



回答2:


Should I be creating a new utf8string?

No, it's already there. P0482 does not only propose char8_t, but also a new specialization of std::basic_string for char8_t character types named std::u8string. So this already compiles with clang and libc++ from trunk:

const std::u8string str = u8"●";

The fact that std::string construction from a u8-literal breaks is unfortunate. From the proposal:

This proposal does not specify any backward compatibility features other than to retain interfaces that it deprecates. The author believes such features are necessary, but that a single set of such features would unnecessarily compromise the goals of this proposal. Rather, the expectation is that implementations will provide options to enable more fine grained compatibility features.

But I guess most of such initialization as above should be grep-able or be subject to some automatic clang tooling fixes.



来源:https://stackoverflow.com/questions/56833000/c20-with-u8-char8-t-and-stdstring

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!