可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
(Note: Title doesn't seem to clear -- if someone can rephrase this I'm all for it!)
Given this regex: (.*_e\.txt)
, which matches some filenames, I need to add some other single character suffixes in addition to the e
. Should I choose a character class or should I use an alternation for this? (Or does it really matter??)
That is, which of the following two seems "better", and why:
a) (.*(e|f|x)\.txt)
, or
b) (.*[efx]\.txt)
回答1:
Use [efx]
- that's exactly what character classes are designed for: to match one of the included characters. Therefore it's also the most readable and shortest solution.
I don't know if it's faster, but I would be very much surprised if it wasn't. It definitely won't be slower.
My reasoning (without ever having written a regex engine, so this is pure conjecture):
The regex token [abc]
will be applied in a single step of the regex engine: "Is the next character one of a
, b
, or c
?"
(a|b|c)
however tells the regex engine to
- remember the current position in the string for backtracking, if necessary
- check if it's possible to match
a
. If so, success. If not: - check if it's possible to match
b
. If so, success. If not: - check if it's possible to match
c
. If so, success. If not: - give up.
回答2:
Here is a benchmark:
updated according to tchrist comment, the difference is more significant
result:
Rate alternation class alternation 2855/s -- -50% class 5677/s 99% --
回答3:
With a single character, it's going to have such a minimal difference that it won't matter. (unless you're doing LOTS of operations)
However, for readability (and a slight performance increase) you should be using the character class method.
For a bit further information - opening a round bracket (
causes Perl to start backtracking for that current position, which, as you don't have further matches to go against, you really don't need for your regex. A character class will not do this.