Lengthy perl regex

左心房为你撑大大i 提交于 2020-01-11 11:19:13

问题


This may seem as somewhat odd question, but anyhow to the point;

I have a string that I need to search for many many possible character occurrences in several combinations (so character classes are out of question), so what would be the most efficent way to do this?

I was thinking either stack it into one regex:

if ($txt =~ /^(?:really |really |long | regex here)$/){}

or using several 'smaller' comparisons, but I'd assume this won't be very efficent:

if ($txt =~ /^regex1$/ || $txt =~ /^regex2$/ || $txt =~ /^regex3$/) {}

or perhaps nest several if comparisons.

I will appreciate any extra suggestions and other input on this issue. Thanks


回答1:


Ever since way back in v5.9.2, Perl compiles a set of N alternatives like:

/string1|string2|string3|string4|string5|.../

into a trie data structure, and if that is the first thing in the pattern, even uses Aho–Corasick matching to find the start point very quickly.

That means that your match of N alternatives will now run in O(1) time instead of in the O(N) time that this:

if (/string1/ || /string2/ || /string3/ || /string4/ || /string5/ || ...)

will run in.

So you can have O(1) or O(N) performance: your choice.

If you use re "debug" or -Mre-debug, Perl will show these trie structures in your patterns.




回答2:


This will not replace some time testing. If possible though, I would suggest using the o flag if possible so that Perl doesn't recompile your (large) regex on every evaulation. Of course this is only possible if those combinations of characters do not change for each evaluation.




回答3:


I think it depends on how long regex you have. Sometimes better to devide very long expressions.



来源:https://stackoverflow.com/questions/4909597/lengthy-perl-regex

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!