Common symbols '\p{S}' not been 'matched' using boost wregex

三世轮回 提交于 2019-12-11 00:01:50

问题


I am using the code below to try and match symbols using regex, (as an example, I am trying to match the circle star symbol, http://graphemica.com/%E2%9C%AA)

#include <boost/regex.hpp>

//...
std::wstring text = L"a✪c";
auto re = L"(\\p{S}|\\p{L})+?";
boost::wregex r(re);
boost::regex_token_iterator<std::wstring::const_iterator>
  i(boost::make_regex_token_iterator(text, r, 1)), j;
while (i != j)
{
  std::wstring x = *i;
  ++i;
}
//...

The byte value of text is {97, 10026, 99}, (or `{0x61,0x272A, 0x63}'). So it is a valid symbol.

The code matches the 2 letters, 'a' 0x61 and 'c'``0x63, but not the symbol (0x272A). I have tried it with a couple of other symbols and none of them work, (© for example).

What am I missing here?


回答1:


The Boost.Regex documentation explicitly states that there's no support for Unicode-specific character classes when using boost::wregex.

If you want this functionality, you'll need to build Boost.Regex with ICU support enabled then use the boost::u32regex type instead of boost::wregex.



来源:https://stackoverflow.com/questions/38525120/common-symbols-ps-not-been-matched-using-boost-wregex

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!