unicode regular expressions c++

帅比萌擦擦* 提交于 2020-06-28 09:36:41

问题


I want to match the word "février" or any other month by using regular expression.

Regular expression:

^(JANVIER|FEVRIER|MARS|AVRIL|MAI|JUIN|JUILLET|AOUT|SEPTEMBRE|OCTOBRE|NOVEMBRE|DECEMBRE|Jan|Feb|Mar|Apr|May|Jun|JUN|Jul|Aug|Sep|Oct|Nov|Dec|[jJ]anvier|[Ff]évrier|[mM]ars|[aA]vril|[mM]ai|[jJ]uin|[jJ]uillet|[aA]o[éû]t|aout|[sS]eptembre|[oO]ctobre|[nN]ovembre|[dD][eé]cembre)$


Problem

The problem is that I cannot match the words that contain unicode letters: à,é,è etc. I found on the following website: Unicode that the unicode value of é is \u00E9. Can i integrate this value in the regular expression? and how can I use unicode values in regular expressions.


void returnValue(string pattern)
{
    bool x = false;
    const boost::regex e("février");
    x = boost::regex_search(pattern.c_str(),e);
    if(x){ cout <<"found"<<endl; }
}

回答1:


You can match a unicode with boost::regex. There are two ways to do it.

  1. Rely on wchar_t if your platform's wchar_t can hold Unicode characters and your platform's C/C++ runtime correctly handles wide character constants. (this has few pitfalls, not suggested, read about this in the link I provided)

  2. Use a Unicode aware regular expression type (boost::u32regex). Boost has to be configured to enable this via Building With Unicode and ICU Support

http://www.boost.org/doc/libs/1_42_0/libs/regex/doc/html/boost_regex/unicode.html



来源:https://stackoverflow.com/questions/23932970/unicode-regular-expressions-c

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!