Lengthy perl regex | 易学教程

问题

This may seem as somewhat odd question, but anyhow to the point;

I have a string that I need to search for many many possible character occurrences in several combinations (so character classes are out of question), so what would be the most efficent way to do this?

I was thinking either stack it into one regex:

if ($txt =~ /^(?:really |really |long | regex here)$/){}

or using several 'smaller' comparisons, but I'd assume this won't be very efficent:

if ($txt =~ /^regex1$/ || $txt =~ /^regex2$/ || $txt =~ /^regex3$/) {}

or perhaps nest several if comparisons.

I will appreciate any extra suggestions and other input on this issue. Thanks

回答1:

Ever since way back in v5.9.2, Perl compiles a set of N alternatives like:

/string1|string2|string3|string4|string5|.../

into a trie data structure, and if that is the first thing in the pattern, even uses Aho–Corasick matching to find the start point very quickly.

That means that your match of N alternatives will now run in O(1) time instead of in the O(N) time that this:

if (/string1/ || /string2/ || /string3/ || /string4/ || /string5/ || ...)

will run in.

So you can have O(1) or O(N) performance: your choice.

If you use re "debug" or -Mre-debug, Perl will show these trie structures in your patterns.

回答2:

This will not replace some time testing. If possible though, I would suggest using the o flag if possible so that Perl doesn't recompile your (large) regex on every evaulation. Of course this is only possible if those combinations of characters do not change for each evaluation.

回答3:

I think it depends on how long regex you have. Sometimes better to devide very long expressions.

来源：https://stackoverflow.com/questions/4909597/lengthy-perl-regex

标签

regex

perl

optimization