Why regex engine choose to match pattern `..X` from `.X|..X|X.`?

别来无恙 提交于 2019-12-01 21:37:47

The main point here is:

Regex engine analyzes the input from LEFT TO RIGHT by default.

So, you have an alternation pattern .X|..X|X. and you run it against 1234X5678. See what happens:

Each alternative branch is tested against each location in the string from left to right.

The first 1-7 steps show how the engine tries to match the characters at the beginning of the string. However, none of the branches (neither .X, nor ..X, nor X. match 12 or 123).

Steps 8-13 just repeat the same failing scenario as none of the branches match 23 or 234.

Steps 14-19 show a success scenario because the 34X can be matched with Branch 2 (..X).

The regex engine does not reach the location before 4 since this location gets matched and consumed.

And another conclusion:

The order of alternations matters, and in NFA regex engines the first alternative matched wins, BUT this alternative does not have to be the first shortest one, a farther longer alternative that matches the same characters in the beginning can match earlier.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!