Do C++11 regular expressions work with UTF-8 strings?

后端 未结 4 2179
一个人的身影
一个人的身影 2020-12-01 07:57

If I want to use C++11\'s regular expressions with unicode strings, will they work with char* as UTF-8 or do I have to convert them to a wchar_t* string?

4条回答
  •  粉色の甜心
    2020-12-01 08:31

    I have a use-case where I need to handle potentially unicode strings when looking for Cartesian coordinates, and this sample shows how I handle it as advised for std::wregex and std::wstring, against potentially unicode characters for a parsing module.

    static bool isCoordinate(std::wstring token)
    {   
        std::wregex re(L"^(-?[[:digit:]]+)$");
        std::wsmatch match;
        return std::regex_search(token, match, re);
    }
    
    int wmain(int argc, wchar_t * argv[])
    {
        // Testing against not a number nor unicode designation
        bool coord = ::isCoordinate(L"أَبْجَدِيَّة عَرَبِيَّة‎中文"); 
    
        if (!coord)
            return 0;
        return 1;
    }
    

提交回复
热议问题