unicode regular expressions c++

回眸只為那壹抹淺笑 提交于 2020-06-28 09:31:06


I want to match the word "février" or any other month by using regular expression.

Regular expression:



The problem is that I cannot match the words that contain unicode letters: à,é,è etc. I found on the following website: Unicode that the unicode value of é is \u00E9. Can i integrate this value in the regular expression? and how can I use unicode values in regular expressions.

void returnValue(string pattern)
    bool x = false;
    const boost::regex e("février");
    x = boost::regex_search(pattern.c_str(),e);
    if(x){ cout <<"found"<<endl; }


You can match a unicode with boost::regex. There are two ways to do it.

  1. Rely on wchar_t if your platform's wchar_t can hold Unicode characters and your platform's C/C++ runtime correctly handles wide character constants. (this has few pitfalls, not suggested, read about this in the link I provided)

  2. Use a Unicode aware regular expression type (boost::u32regex). Boost has to be configured to enable this via Building With Unicode and ICU Support


