I have a growing list of regular expressions that I am using to parse through log files searching for \"interesting\" error and debug statements. I\'m currently breaking th
You might want to take a look at Regexp::Assemble. It's intended to handle exactly this sort of problem.
Boosted code from the module's synopsis:
use Regexp::Assemble;
my $ra = Regexp::Assemble->new;
$ra->add( 'ab+c' );
$ra->add( 'ab+-' );
$ra->add( 'a\w\d+' );
$ra->add( 'a\d+' );
print $ra->re; # prints a(?:\w?\d+|b+[-c])
You can even slurp your regex collection out of a separate file.
Your example regular expressions look like they are based mainly on ordinary words and phrases. If that's the case, you might be able to speed things up considerably by pre-filtering the input lines using index
, which is much faster than a regular expression. Under such a strategy, every regular expression would have a corresponding non-regex word or phrase for use in the pre-filtering stage. Better still would be to skip the regular expression test entirely, wherever possible: two of your example tests do not require regular expressions and could be done purely with index
.
Here is an illustration of the basic idea:
use strict;
use warnings;
my @checks = (
['Failed', qr/Failed in routing out/ ],
['failed', qr/Agent .+ failed/ ],
['Not Exist', qr/Record Not Exist in DB/ ],
);
my @filter_strings = map { $_->[0] } @checks;
my @regexes = map { $_->[1] } @checks;
sub regex {
my $line = shift;
for my $reg (@regexes){
return 1 if $line =~ /$reg/;
}
return;
}
sub pre {
my $line = shift;
for my $fs (@filter_strings){
return 1 if index($line, $fs) > -1;
}
return;
}
my @data = (
qw(foo bar baz biz buz fubb),
'Failed in routing out.....',
'Agent FOO failed miserably',
'McFly!!! Record Not Exist in DB',
);
use Benchmark qw(cmpthese);
cmpthese ( -1, {
regex => sub { for (@data){ return $_ if( regex($_)) } },
pre => sub { for (@data){ return $_ if(pre($_) and regex($_)) } },
} );
Output (results with your data might be very different):
Rate regex prefilter
regex 36815/s -- -54%
prefilter 79331/s 115% --