Stop runaway regular expression

后端 未结 1 2159
甜味超标
甜味超标 2021-02-20 05:47

Is there a way to stop a runaway regular expression?

I am not interested in suggestions on how to modify it. I know it can be modified so it doesn\'t break, etc, but I a

相关标签:
1条回答
  • 2021-02-20 06:40

    Perl's built-in alarm is insufficient for breaking out of a long running regular expression because Perl doesn't give opportunities for alarm timeouts inside of internal opcodes. alarm simply cannot penetrate it.

    In some cases the most obvious solution is to fork a subprocess and time it out after it's gone on too long using alarm. This PerlMonks post demonstrates how to time-out a forked process: Re: Timeout on script

    There is a Perl module on CPAN called Sys::SigAction that has a function called timeout_call, which will interrupt a long-running regular expression using unsafe signals. However, the RE engine wasn't designed to be interrupted, and can be left in an unstable state that may lead to seg-faults about 10% of the time.

    Here is some example code that demonstrates Sys::SigAction successfully breaking out of the regex engine, as well as demonstrating that Perl's alarm is incapable of doing so:

    use Sys::SigAction 'timeout_call';
    use Time::HiRes;
    
    
    sub run_re {
      my $string = ('a' x 64 ) . 'b';
    
      if( $string =~ m/(a*a*a*a*a*a*a*a*a*a*a*a*)*[^Bb]$/ ) {
        print "Whoops!\n";
      }
      else {
        print "Ok!\n";
      }
    }
    
    print "Sys::SigAction::timeout_call:\n";
    my $t = time();
    timeout_call(2,\&run_re);
    print time() - $t, " seconds.\n";
    
    print "alarm:\n";
    $t = time();
    
    eval {
      local $SIG{ALRM} = sub { die "alarm\n" };
      alarm 2;
      run_re();
      alarm 0;
    };
    
    if( $@ ) {
      die unless $@ eq "alarm\n";
    }
    else {
      print time() - $t, " seconds.\n";
    }
    

    The output will be something along the lines of:

    $ ./mytest.pl
    Sys::SigAction::timeout_call:
    Complex regular subexpression recursion limit (32766) exceeded at ./mytest.pl line 11.
    2 seconds.
    alarm:
    Complex regular subexpression recursion limit (32766) exceeded at ./mytest.pl line 11.
    ^C
    

    You will notice that in the second call -- the one that is supposed to time out with alarm, I finally had to ctrl-C out of it because alarm was inadequate for breaking out of the RE engine.

    The big caveat with Sys::SigAction is that even though it is capable of breaking out of a long-running regular expression, because the RE engine wasn't designed for such interruptions, the entire process can become unstable, leading to a segfault. While it doesn't happen every time, it can happen. This probably isn't what you want.

    I don't know what your regular expression looks like, but if it fits within the syntax allowed by the RE2 engine, you can use the Perl module, re::engine::RE2 to work with the RE2 C++ library. This engine guarantees linear time searches, though it provides less powerful semantics than Perl's built-in engine. The RE2 approach avoids the whole issue in the first place by providing a linear-time assurance.

    However, if you're unable to use RE2 (possibly because your regex's semantics are too demanding for it), the fork/alarm method is probably the safest way to assure you remain in control.

    (By the way, this question, and a version of my answer were crossposted to PerlMonks.)

    0 讨论(0)
提交回复
热议问题