Stumped with Unicode, Boost, C++, codecvts

社会主义新天地 提交于 2019-11-30 02:08:41

Okay, after a long few months I've figured it out, and I'd like to help people in the future.

First of all, the codecvt thing was the wrong way of doing it. Boost.Locale provides a simple way of converting between character sets in its boost::locale::conv namespace. Here's one example (there's others not based on locales).

#include <boost/locale.hpp>
namespace loc = boost::locale;

int main(void)
{
  loc::generator gen;
  std::locale blah = gen.generate("en_US.utf-32");

  std::string UTF8String = "Tésting!";
  // from_utf will also work with wide strings as it uses the character size
  // to detect the encoding.
  std::string converted = loc::conv::from_utf(UTF8String, blah);

  // Outputs a UTF-32 string.
  std::cout << converted << std::endl;

  return 0;
}

As you can see, if you replace the "en_US.utf-32" with "" it'll output in the user's locale.

I still don't know how to make std::cout do this all the time, but the translate() function of Boost.Locale outputs in the user's locale.

As for the filesystem using UTF-8 strings cross platform, it seems that that's possible, here's a link to how to do it.

  std::cout.imbue(convLoc);
  std::cout << data << std::endl;

This does no conversion, since it uses codecvt<char, char, mbstate_t> which is a no-op. The only standard streams that use codecvt are file-streams. std::cout is not required to perform any conversion at all.

To force Boost.Filesystem to interpret narrow-strings as UTF-8 on windows, use boost::filesystem::imbue with a locale with a UTF-8 ↔ UTF-16 codecvt facet. Boost.Locale has an implementation of the latter.

Cheers and hth. - Alf

The Boost filesystem iostream replacement classes work fine with UTF-16 when used with Visual C++.

However, they do not work (in the sense of supporting arbitrary filenames) when used with g++ in Windows - at least as of Boost version 1.47. There is a code comment explaining that; essentially, the Visual C++ standard library provides non-standard wchar_t based constructors that Boost filesystem classes make use of, but g++ does not support these extensions.

A workaround is to use 8.3 short filenames, but this solution is a bit brittle since with old Windows versions the user can turn off automatic generation of short filenames.


Example code for using Boost filesystem in Windows:
#include "CmdLineArgs.h"        // CmdLineArgs
#include "throwx.h"             // throwX, hopefully
#include "string_conversions.h" // ansiOrFillerFrom( wstring )

#include <boost/filesystem/fstream.hpp>     // boost::filesystem::ifstream
#include <iostream>             // std::cout, std::cerr, std::endl
#include <stdexcept>            // std::runtime_error, std::exception
#include <string>               // std::string
#include <stdlib.h>             // EXIT_SUCCESS, EXIT_FAILURE
using namespace std;
namespace bfs = boost::filesystem;

inline string ansi( wstring const& ws ) { return ansiWithFillersFrom( ws ); }

int main()
{
    try
    {
        CmdLineArgs const   args;
        wstring const       programPath     = args.at( 0 );

        hopefully( args.nArgs() == 2 )
            || throwX( "Usage: " + ansi( programPath ) + " FILENAME" );

        wstring const       filePath        = args.at( 1 );
        bfs::ifstream       stream( filePath );     // Nice Boost ifstream subclass.
        hopefully( !stream.fail() )
            || throwX( "Failed to open file '" + ansi( filePath ) + "'" );

        string line;
        while( getline( stream, line ) )
        {
            cout << line << endl;
        }
        hopefully( stream.eof() )
            || throwX( "Failed to list contents of file '" + ansi( filePath ) + "'" );

        return EXIT_SUCCESS;
    }
    catch( exception const& x )
    {
        cerr << "!" << x.what() << endl;
    }
    return EXIT_FAILURE;
}
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!