C++: how to convert ASCII or ANSI to UTF8 and stores in std::string

匿名 (未验证) 提交于 2019-12-03 02:53:02

问题:

My company use some code like this:

    std::string(CT2CA(some_CString)).c_str() 

Is there anyway that I can put this header back to UTF8 and store into a std::string or const char*?

I know there are a lot of smarter ways to do this, but I need to keep the code sticking to its original one (i.e. sent the header as std::string or const char*).

Thanks in advance.

回答1:

This sounds like a plain conversion from one encoding to another encoding: You can use std::codecvt<char, char, mbstate_t> for this. Whether your implementation ships with a suitable conversion, I don't know, however. From the sounds of it you just try to convert ISO-Latin-1 into Unicode. That should be pretty much trivial: the first 128 characters map (0 to 127) identically to UTF-8 and the second half conveniently map to the corresponding Unicode code points, i.e., you just need to encode the corresponding value into UTF-8. Each character will be replaced by two characters. That it, I think the conversion is something like that:

// Takes the next position and the end of a buffer as first two arguments and the // character to convert from ISO-Latin-1 as third argument. // Returns a pointer to end of the produced sequence. char* iso_latin_1_to_utf8(char* buffer, char* end, unsigned char c) {     if (c < 128) {         if (buffer == end) { throw std::runtime_error("out of space"); }         *buffer++ = c;     }     else {         if (end - buffer < 2) { throw std::runtime_error("out of space"); }         *buffer++ = 0xC0 & (c >> 6);         *buffer++ = 0x80 & (c & 0x3f);     }     return buffer; } 


回答2:

Becareful : it's '|' and not '&' !

*buffer++ = 0xC0 | (c >> 6); *buffer++ = 0x80 | (c & 0x3F); 


标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!