I have a problem, I need to use UTF-8 encoded strings on standard char types in C++ source code like so:
char* twochars = \"\\xe6\\x97\\xa5\\xd1\\x88\";
There's no way to write the string literal directly in UTF-8 with the current versions of VC++. A future version should have UTF-8 string literals.
I tried pasting non-ASCII text directly into a string literal in a source file and saved the file as UTF-8. Looking at the source file in a hex editor confirmed that it's saved as UTF-8, but that still doesn't do what you want. At compile time, those bytes are either mapped to a character in the current code page or you get a warning.
So the most portable way to create a string literal right now is to explicitly write the octets as you've been doing.
If you want to do a run-time conversion, there are a couple options.
You could use one of these techniques to write a little utility that does the conversion and outputs them as the explicit octets you would need for a string literal. You could then copy and paste the output into your source code.
You can use the still undocumented pragma directive execution_character_set("utf-8")
. This way your char
strings will be saved as UTF-8 in your binary. BTW, this pragma is available in Visual C++ compilers only.
#include <iostream>
#include <cstring>
#pragma execution_character_set("utf-8")
using namespace std;
char *five_chars = "ĄĘĆŻ!";
int _tmain(int argc, _TCHAR* argv[])
{
cout << "This is an UTF-8 string: " << five_chars << endl;
cout << "...it's 5 characters long" << endl;
cout << "...but it's " << strlen(five_chars) << " bytes long" << endl;
return 0;
}