How to set a time limit on a java function running a regex

后端 未结 10 948
难免孤独
难免孤独 2020-12-17 02:41

I am running a regex in a java function to parse a document and return true if it has found the string specified by the regex and return false if it hasn\'t. But the problem

10条回答
  •  被撕碎了的回忆
    2020-12-17 03:03

    There are two ways to answer this question.

    On the one hand, there is no practical/effective way that is known to be safe of killing a thread that is executing Matcher.find(...) or Matcher.match(...). Calling Thread.stop() would work, but there are significant safety issues. The only way to address this would be to develop your own regex engine that regularly checked the interrupted flag. (This is not totally impractical. For example, if GPL wasn't an issue for you, you could start with the existing regex engine in OpenJDK.)

    On the other hand, the real root of your problem is (most likely) that you are using regexes the wrong way. Either you are trying to do something that is too complicated for a single regex, or your regex is suboptimal.

    EDIT: The typical cause of regexes taking too long is multiple quantifiers (?, , +) causing pathological backtracking. For example, if you try to match a string of N "A" characters followed by a "B" with the regex "^AAAAAA$", the complexity of the computation is (at least) O(N**5). Here's a more "real world" example:

    "(.*)(.*)(.*)(.*)(.*)(.*)(.*)"
    

    Now imagine what happens if you encounter a "web page" like this:

    
    
    
    
    
    

    Notice that there is no closing tag. This will run for a long time before failing. (I'm not exactly sure what the complexity is ... but you can estimate it experimentally it you feel like it.)

    In this case, a simple answer is to use simpler regexes to locate the 6 marker elements and then extract the stuff between then using substring().

提交回复
热议问题