If I want to use C++11\'s regular expressions with unicode strings, will they work with char* as UTF-8 or do I have to convert them to a wchar_t* string?
I have a use-case where I need to handle potentially unicode strings when looking for Cartesian coordinates, and this sample shows how I handle it as advised for std::wregex
and std::wstring
, against potentially unicode characters for a parsing module.
static bool isCoordinate(std::wstring token)
{
std::wregex re(L"^(-?[[:digit:]]+)$");
std::wsmatch match;
return std::regex_search(token, match, re);
}
int wmain(int argc, wchar_t * argv[])
{
// Testing against not a number nor unicode designation
bool coord = ::isCoordinate(L"أَبْجَدِيَّة عَرَبِيَّة中文");
if (!coord)
return 0;
return 1;
}