Why does regex_match throw “complexity exception”?

本秂侑毒 提交于 2019-12-19 19:45:35

问题


I am trying to test (using boost::regex) whether a line in a file contains only numeric entries seperated by spaces. I encountered an exception which I do not understand (see below). It would be great if someone could explain why it is thrown. Maybe I am doing something stupid here in my way of defining the patterns? Here is the code:

// regex_test.cpp
#include <string>
#include <iostream>
#include <boost/regex.hpp>
using namespace std;
using namespace boost;

int main(){
  // My basic pattern to test for a single numeric expression
  const string numeric_value_pattern = "(?:-|\\+)?[[:d:]]+\\.?[[:d:]]*";
  // pattern for the full line
  const string numeric_sequence_pattern = "([[:s:]]*"+numeric_value_pattern+"[[:s:]]*)+";

  regex r(numeric_sequence_pattern);
  string line= "1 2 3 4.444444444444";
  bool match = regex_match(line, r);
  cout<<match<<endl;

  //...
}

I compile that successfully with

g++ -std=c++11 -L/usr/lib64/ -lboost_regex regex_test.cpp  

The resulting program worked fine so far and match == true as I wanted. But then I test an input line like

string line= "1 2 3 4.44444444e-16"; 

Of course, my pattern isn't built to recognise the format 4.44444444e-16 and I would expect that match == false. However, instead I get the following runtime error:

terminate called after throwing an instance of  
'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<std::runtime_error> >'  
  what():  The complexity of matching the regular expression exceeded predefined bounds.  
Try refactoring the regular expression to make each choice made by the state machine unambiguous.  
This exception is thrown to prevent "eternal" matches that take an indefinite period time to locate.  

Why is that?
Note: the example I gave is extremal in the sense that putting one digit less after the dot works ok. That means

string line= "1 2 3 4.4444444e-16";

just results in match == false as expected. So, I'm baffled. What is happening here?

Thanks already!


Update:
Problem seems to be solved. Given the hint of alejrb I refactored the pattern to

const string numeric_value_pattern = "(?:-|\\+)?[[:d:]]+(?:\\.[[:d:]]*)?";  

That seems to work as it should. Somehow, the isolated optional \\. inside the original pattern [[:d:]]+\\.?[[:d:]]* left to many possibilities to match a long sequence of digits in different ways.
I hope the pattern is safe now. However, if someone finds a way to use it for a blow up in the new form, let me know! It's not so obvious for me whether that might still be possible...


回答1:


I'd say that your regex is probably exponentially backtracking. To protect you from a loop that would become entirely unworkable if the input were any longer, the regex engine just aborts the attempt.

One of the patterns that often causes this problem is anything of the form (x+x+)+ - which you build up here when you place the first pattern inside the second.

There's a good discussion at http://www.regular-expressions.info/catastrophic.html



来源:https://stackoverflow.com/questions/21751874/why-does-regex-match-throw-complexity-exception

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!