Is “regex” in modern programming languages really “context sensitive grammar”?

怎甘沉沦 提交于 2019-11-29 06:45:33

In particular backreferences to capturing parentheses make regular expressions more complex than regular, context-free, or context-sensitive grammars. The name is simply historically grown (as many words). See also this section in Wikipedia and this explanation with an example from Perl.

The way I see it:

  • Regular languages:
    • Matched by state machines. Only one variable can be used to represent the current "location" in the grammar to be matched: Recursion cannot be implemented
  • Context-free languages:
    • Matched by a stack machine. The current "location" in the grammar is represented by a stack in one or another form. Cannot "remember" anything that occurred before
  • Context-sensitive languages:
    • Most programming languages
    • All Most human languages

I do know of regular expression parsers that allow you to match against something the parser has already encountered, achieving something like a context-sensitive grammar.

Still, regular expression parsers, however sophisticated they may be, don't allow for recursive application of rules, which is a definite requirement for context-free grammars.

The term regex, in my opinion, mostly refers to the syntax used to express those regular grammars (the stars and question marks).

There are features in modern regular expression implementations that break the rules of the classic regular expression definition.

For example Microsoft’s .NET Balancing Group (?<name1-name2> … ):

^(?:0(?<L>)|1(?<-L>))*(?(L)(?!))$

This does match the language L₀₁ = {ε, 01, 0011, 000111, … }. But this language is not regular according to the Pumping Lemma.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!