Is there an easy way to write UTF-8 octets in Visual Studio?

前端 未结 2 1602
北荒
北荒 2020-12-19 04:23

I have a problem, I need to use UTF-8 encoded strings on standard char types in C++ source code like so:

char* twochars = \"\\xe6\\x97\\xa5\\xd1\\x88\";


        
相关标签:
2条回答
  • 2020-12-19 04:34

    There's no way to write the string literal directly in UTF-8 with the current versions of VC++. A future version should have UTF-8 string literals.

    I tried pasting non-ASCII text directly into a string literal in a source file and saved the file as UTF-8. Looking at the source file in a hex editor confirmed that it's saved as UTF-8, but that still doesn't do what you want. At compile time, those bytes are either mapped to a character in the current code page or you get a warning.

    So the most portable way to create a string literal right now is to explicitly write the octets as you've been doing.

    If you want to do a run-time conversion, there are a couple options.

    1. The Windows API has WideCharToMultiByte, which can take a text as UTF-16 and convert it to multibyte encodings like UTF-8.
    2. If you're using a new enough version of the compiler and the C++ runtime, you can use std::codecvt to transform your wide character string into UTF-8.

    You could use one of these techniques to write a little utility that does the conversion and outputs them as the explicit octets you would need for a string literal. You could then copy and paste the output into your source code.

    0 讨论(0)
  • 2020-12-19 04:54

    You can use the still undocumented pragma directive execution_character_set("utf-8"). This way your char strings will be saved as UTF-8 in your binary. BTW, this pragma is available in Visual C++ compilers only.

    #include <iostream>
    #include <cstring>
    
    #pragma execution_character_set("utf-8")
    
    using namespace std;
    
    char *five_chars = "ĄĘĆŻ!";
    
    int _tmain(int argc, _TCHAR* argv[])
    {
        cout << "This is an UTF-8 string: " << five_chars << endl;
        cout << "...it's 5 characters long" << endl;
        cout << "...but it's " << strlen(five_chars) << " bytes long" << endl;
        return 0;
    }
    
    0 讨论(0)
提交回复
热议问题