string conversion with boost locale: different behaviour on windows and linux

筅森魡賤 提交于 2019-12-18 04:46:09

问题


This is my sample code:

#pragma execution_character_set("utf-8")

#include <boost/locale.hpp>
#include <boost/algorithm/string/case_conv.hpp>
#include <iostream>

int main()
{
    std::locale loc = boost::locale::generator().generate("");
    std::locale::global(loc);

#ifdef MSVC
    std::cout << boost::locale::conv::from_utf("grüßen vs ", "ISO8859-15");
    std::cout << boost::locale::conv::from_utf(boost::locale::to_upper("grüßen"), "ISO8859-15") << std::endl;
    std::cout << boost::locale::conv::from_utf(boost::locale::fold_case("grüßen"), "ISO8859-15") << std::endl;
    std::cout << boost::locale::conv::from_utf(boost::locale::normalize("grüßen", boost::locale::norm_nfd), "ISO8859-15") << std::endl;
#else
    std::cout << "grüßen vs ";
    std::cout << boost::locale::to_upper("grüßen") << std::endl;
    std::cout << boost::locale::fold_case("grüßen") << std::endl;
    std::cout << boost::locale::normalize("grüßen", boost::locale::norm_nfd) << std::endl;
#endif

    return 0;
}

Output on Windows 7 is:

grüßen vs GRÜßEN
grüßen
grußen

Output on Linux (openSuSE 12.3) is:

grüßen vs GRÜSSEN
grüssen
grüßen

On Linux the german letter 'ß' is converted to 'SS' as predicted, while this character remains unchanged on Windows.

Question: why is this so? How can I correct the conversion?

Some notes: Windows console codepage is set to 1252. In both cases locales are set to de_DE. I tried to replace the default locale setting in the listing above by "de_DE.UTF-8" - without any effect. On Windows this code is compiled with Visual Studio 2013, on Linux with GCC 4.7, c++11 enabled.

Any suggestions are appreciated - thanks in advance for your support!


回答1:


Windows doesn't do this conversion because "it would be too confusing" for developers if the string length changed all of a sudden. And boost presumably just delegates all the Unicode conversions to the underlying Windows APIs

Source

I guess the robust way to handle it would be to use a third-party Unicode library such as ICU.



来源:https://stackoverflow.com/questions/22331487/string-conversion-with-boost-locale-different-behaviour-on-windows-and-linux

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!