Using alternation or character class for single character matching?

匿名 (未验证) 提交于 2019-12-03 01:27:01

问题:

(Note: Title doesn't seem to clear -- if someone can rephrase this I'm all for it!)

Given this regex: (.*_e\.txt), which matches some filenames, I need to add some other single character suffixes in addition to the e. Should I choose a character class or should I use an alternation for this? (Or does it really matter??)

That is, which of the following two seems "better", and why:

a) (.*(e|f|x)\.txt), or

b) (.*[efx]\.txt)

回答1:

Use [efx] - that's exactly what character classes are designed for: to match one of the included characters. Therefore it's also the most readable and shortest solution.

I don't know if it's faster, but I would be very much surprised if it wasn't. It definitely won't be slower.

My reasoning (without ever having written a regex engine, so this is pure conjecture):

The regex token [abc] will be applied in a single step of the regex engine: "Is the next character one of a, b, or c?"

(a|b|c) however tells the regex engine to

  • remember the current position in the string for backtracking, if necessary
  • check if it's possible to match a. If so, success. If not:
  • check if it's possible to match b. If so, success. If not:
  • check if it's possible to match c. If so, success. If not:
  • give up.


回答2:

Here is a benchmark:

updated according to tchrist comment, the difference is more significant

result:

              Rate alternation       class alternation 2855/s          --        -50% class       5677/s         99%          -- 


回答3:

With a single character, it's going to have such a minimal difference that it won't matter. (unless you're doing LOTS of operations)

However, for readability (and a slight performance increase) you should be using the character class method.

For a bit further information - opening a round bracket ( causes Perl to start backtracking for that current position, which, as you don't have further matches to go against, you really don't need for your regex. A character class will not do this.



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!