C++ tolower on special characters such as ü

前端 未结 3 460
耶瑟儿~
耶瑟儿~ 2020-12-19 13:25

I have trouble transforming a string to lowercase with the tolower() function in C++. With normal strings, it works as expected, however special characters are not converted

3条回答
  •  误落风尘
    2020-12-19 14:18

    I think the most portable way to do this is to use the user selected locale which is achieved by setting the locale to "" (empty string).

    std::locale::global(std::locale("")); 
    

    That sets the locale to whatever was in use where the program was run and it effects the standard character conversion routines (std::mbsrtowcs & std::wcsrtombs) that convert between multi-byte and wide-string characters.

    Then you can use those functions to convert from the system/user selected multi-byte characters (such as UTF-8) to system standard wide character codes that can be used in functions like std::tolower that operate on one character at a time.

    This is important because multi-byte character sets like UTF-8 can not be converted using single character operations like with std::tolower().

    Once you have converted the wide string version to upper/lower case it can then be converted back to the system/user multibyte character set for printing to the console.

    // Convert from multi-byte codes to wide string codes
    std::wstring mb_to_ws(std::string const& mb)
    {
        std::wstring ws;
        std::mbstate_t ps{};
        char const* src = mb.data();
    
        std::size_t len = 1 + mbsrtowcs(0, &src, 3, &ps);
    
        ws.resize(len);
        src = mb.data();
    
        mbsrtowcs(&ws[0], &src, ws.size(), &ps);
    
        if(src)
            throw std::runtime_error("invalid multibyte character after: '"
                + std::string(mb.data(), src) + "'");
    
        ws.pop_back();
    
        return ws;
    }
    
    // Convert from wide string codes to multi-byte codes
    std::string ws_to_mb(std::wstring const& ws)
    {
        std::string mb;
        std::mbstate_t ps{};
        wchar_t const* src = ws.data();
    
        std::size_t len = 1 + wcsrtombs(0, &src, 0, &ps);
    
        mb.resize(len);
        src = ws.data();
    
        wcsrtombs(&mb[0], &src, mb.size(), &ps);
    
        if(src)
            throw std::runtime_error("invalid wide character");
    
        mb.pop_back();
    
        return mb;
    }
    
    int main()
    {
        // set locale to the one chosen by the user
        // (or the one set by the system default)
        std::locale::global(std::locale(""));
    
        try
        {
            string NotLowerCase = "Grüßen";
    
            std::cout << NotLowerCase << '\n';
    
            // convert system/user multibyte character codes
            // to wide string versions
            std::wstring ws1 = mb_to_ws(NotLowerCase);
            std::wstring ws2;
    
            for(unsigned int i = 0; i < ws1.length(); i++) {
                // use the system/user locale
                ws2 += std::tolower(ws1[i], std::locale("")); 
            }
    
            // convert wide string character codes back
            // to system/user multibyte versions
            string LowerCase = ws_to_mb(ws2);
    
            std::cout << LowerCase << '\n';
        }
        catch(std::exception const& e)
        {
            std::cerr << e.what() << '\n';
            return EXIT_FAILURE;
        }
        catch(...)
        {
            std::cerr << "Unknown exception." << '\n';
            return EXIT_FAILURE;
        }
    
        return EXIT_SUCCESS;
    }
    

    Code not heavily tested

提交回复
热议问题