Convert wstring to string encoded in UTF-8

房东的猫 提交于 2019-11-26 10:56:03

问题


I need to convert between wstring and string. I figured out, that using codecvt facet should do the trick, but it doesn\'t seem to work for utf-8 locale.

My idea is, that when I read utf-8 encoded file to chars, one utf-8 character is read into two normal characters (which is how utf-8 works). I\'d like to create this utf-8 string from wstring representation for library I use in my code.

Does anybody know how to do it?

I already tried this:

  locale mylocale(\"cs_CZ.utf-8\");
  mbstate_t mystate;

  wstring mywstring = L\"čřžýáí\";

  const codecvt<wchar_t,char,mbstate_t>& myfacet =
    use_facet<codecvt<wchar_t,char,mbstate_t> >(mylocale);

  codecvt<wchar_t,char,mbstate_t>::result myresult;  

  size_t length = mywstring.length();
  char* pstr= new char [length+1];

  const wchar_t* pwc;
  char* pc;

  // translate characters:
  myresult = myfacet.out (mystate,
      mywstring.c_str(), mywstring.c_str()+length+1, pwc,
      pstr, pstr+length+1, pc);

  if ( myresult == codecvt<wchar_t,char,mbstate_t>::ok )
   cout << \"Translation successful: \" << pstr << endl;
  else cout << \"failed\" << endl;
  return 0;

which returns \'failed\' for cs_CZ.utf-8 locale and works correctly for cs_CZ.iso8859-2 locale.


回答1:


C++ has no idea of Unicode. Use an external library such as ICU (UnicodeString class) or Qt (QString class), both support Unicode, including UTF-8.




回答2:


The code below might help you :)

#include <codecvt>
#include <string>

// convert UTF-8 string to wstring
std::wstring utf8_to_wstring (const std::string& str)
{
    std::wstring_convert<std::codecvt_utf8<wchar_t>> myconv;
    return myconv.from_bytes(str);
}

// convert wstring to UTF-8 string
std::string wstring_to_utf8 (const std::wstring& str)
{
    std::wstring_convert<std::codecvt_utf8<wchar_t>> myconv;
    return myconv.to_bytes(str);
}



回答3:


What's your platform? Note that Windows does not support UTF-8 locales so this may explain why you're failing.

To get this done in a platform dependent way you can use MultiByteToWideChar/WideCharToMultiByte on Windows and iconv on Linux. You may be able to use some boost magic to get this done in a platform independent way, but I haven't tried it myself so I can't add about this option.




回答4:


You can use boost's utf_to_utf converter to get char format to store in std::string.

std::string myresult = boost::locale::conv::utf_to_utf<char>(my_wstring);



回答5:


What locale does is that it gives the program information about the external encoding, but assuming that the internal encoding didn't change. If you want to output UTF-8 you need to do it from wchar_t not from char*.

What you could do is output it as raw data (not string), it should be then correctly interpreted if the systems locale is UTF-8.

Plus when using (w)cout/(w)cerr/(w)cin you need to imbue the locale on the stream.




回答6:


The Lexertl library has an iterator that lets you do this:

std::string str;
str.assign(
  lexertl::basic_utf8_out_iterator<std::wstring::const_iterator>(wstr.begin()),
  lexertl::basic_utf8_out_iterator<std::wstring::const_iterator>(wstr.end()));


来源:https://stackoverflow.com/questions/4358870/convert-wstring-to-string-encoded-in-utf-8

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!