How to terminate Matcher.find(), when its running too long?

青春壹個敷衍的年華 提交于 2019-12-05 22:47:12

问题


Wondering about techniques for terminating long running regular expression matches (java matcher.find() method). Maybe subclassing Matcher and adding some logic to terminate after x number of iterations?

Basically I'm generating regular expressions using a genetic algorithm, so I don't have a lot of control over them. Then I test each one against some text to see if they match a certain target area of the text.

So since I'm sort of randomly generating these regular expressions, I get some crazy stuff going on, and it eats a ton of cpu and some find() calls take a while to terminate. I'd rather just kill them after a while, but not sure of best way to do that.

So if anyone has ideas, please let me know.


回答1:


There is a solution here which would solve your problem. (That question is the same problem yours is.)

Essentially, its a CharSequence that can notice thread interrupts.

The code from that answer:

/**
 * CharSequence that noticed thread interrupts -- as might be necessary 
 * to recover from a loose regex on unexpected challenging input. 
 * 
 * @author gojomo
 */
public class InterruptibleCharSequence implements CharSequence {
    CharSequence inner;
    // public long counter = 0; 

    public InterruptibleCharSequence(CharSequence inner) {
        super();
        this.inner = inner;
    }

    public char charAt(int index) {
        if (Thread.interrupted()) { // clears flag if set
            throw new RuntimeException(new InterruptedException());
        }
        // counter++;
        return inner.charAt(index);
    }

    public int length() {
        return inner.length();
    }

    public CharSequence subSequence(int start, int end) {
        return new InterruptibleCharSequence(inner.subSequence(start, end));
    }

    @Override
    public String toString() {
        return inner.toString();
    }
}

Wrap your string with this and you can interrupt the thread.




回答2:


A worst case scenario and one which may have people yelling at me is:

You can run the regex matching in another thread and if its running too long you can thread.stop() it.




回答3:


Just show another solution.

You can use the NFA algorithm which is not sensitive to input and hundreds of times faster than the Java standard library.

I think the sensitivity to input is the original reason which causes your problem.

You can check out the introduction here: Regular Expression Matching Can Be Simple And Fast (but is slow in Java, Perl, PHP, Python, Ruby, ...)

I also answered a similar question with more detail here: Cancelling a long running regex match?




回答4:


One possible solution, which has a nice thing that it doesn't block the main thread, would be to spawn off the "matching" in a separate thread. You can create a customized Callable which returns null after the duration/threshold has expired or the "match" result if it is successful.




回答5:


You need to use another thread and stop it when it runs out of time.

There are two ways of stopping: Thread#stop() and Thread#interrupt().

Using Thread.stop() is rather dangerous, and Matcher does not respond to Thread.interrupt (answering to interrupt is an opt-in behavior).

BUT there is a really clever solution, some details are here. Use the provided InterruptibleCharSequence (it wraps yours string and works almost like one, BUT it adds support for Thread#interrupt()), then build your own Callable returning whatever matcher returns. Each runnable can be now executed using a FutureTask / ThreadPool combo, and you can get the result with any timeout you desire:

Boolean result = myMatchingTask().get(2, TimeUnit.SECONDS)

If you are in Java EE environment, you can skip the complicated part, just use the InterruptipleCharSequence and @Asynchronous calls.

If this sounds cryptic, ask for details.




回答6:


If I were you, I would make my own class that I would put between my application and the library you're using to match, and implement methods like "interrupt" that you need to kill the thread, and manage the matching that way.



来源:https://stackoverflow.com/questions/7125732/how-to-terminate-matcher-find-when-its-running-too-long

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!