How a RegEx engine works [closed]

随声附和 提交于 2019-11-29 20:17:47

There are two main classes of regex engines.

  1. Those based on Finite State Automaton. These are generally the fastest. They work by building a state machine, and feeding it characters from the input string. It is difficult, if not impossible, to implement some more advanced features in engines like this.

    Examples of FSA based engines:

    • Posix/GNU ERE/BRE — Used in most unix utilities, such as grep, sed and awk.
    • Re2 — A relatively new project for trying to give more power to the Automata based method.
       
  2. Those based on back-tracking. These often compile the pattern into byte-code, resembling machine instructions. The engine then executes the code, jumping from instruction to instruction. When an instruction fails, it then back-tracks to find another way to match the input.

    Examples of back-tracking based engines:

    • Perl — The original. Most other engines of this type try to replicate the functionality of regexes in the Perl language.
    • PCRE — The most successful implementation. This library is the most widely used implementation. It has a rich set of features, some of which can't be considered as "Regular" any more.
    • Python, Ruby, Java, .NET — Other implementations I don't intend to describe further.

For more information:

If you want me to expand on something, post a comment.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!