Why is recursive regex not regex?

后端 未结 4 1744
一生所求
一生所求 2020-12-11 16:32

I was reading through some of the responses in this question and saw that a few people said that recursive regular expressions were not strictly speaking regular expressions

4条回答
  •  轻奢々
    轻奢々 (楼主)
    2020-12-11 17:15

    The strict definition of regular language from theoretical computer science may seem abstract with little practical benefit, but if you’re ever faced with the need to implement a state machine to recognize certain inputs, you can save yourself a lot of useless effort and hairpulling if you can prove up front that it can’t be done.

    An informal way to express it is recognition of a regular language cannot require an arbitrary amount of memory. The pumping lemma for regular languages is useful for proving that a particular language (i.e., a set of strings) cannot be recognized by a deterministic finite automaton.

    From An Introduction to Formal Languages and Automata by Peter Linz (pg. 115, 3rd ed.):

    Theorem 4.8

    Let L be an infinite regular language. Then there exists some positive integer m such that any w ∈ L with |w| ≥ m can be decomposed as

    w = xyz,

    with

    |xy| ≤ m,

    and

    |y| ≥ 1,

    such that

    wi = xyiz — Eq. (4.2)

    is also in L for all i = 0, 1, 2, …

    To recognize an infinite language, a finite automaton must “pump” or repeat some portion of its states, and that’s the function of yi (notation for some string y repeated i times).

    Very nearly all proofs involving the pumping lemma involve proof by contradiction. On page 117, the author proves that the language L = { anbn : n ≥ 0 }—i.e., strings of the form aaa…bbb… where the all-a and all-b substrings are equal in length—is not regular:

    Assume that L is regular, so that the pumping lemma must hold. We do not know the value of m, but whatever it is, we can always choose n = m. Therefore, the substring y must consist entirely of a's. Suppose |y| = k. Then the string obtained by using i = 0 in Equation (4.2) is

    w0 = am-kbm

    and is clearly not in L. This contradicts the pumping lemma and thereby indicates that the assumption that L is regular must be false.

    Other examples of languages that are not regular:

    • L = { wwR : w ∈ Σ* } — i.e., palindromes
    • L = { w ∈ Σ* : na(w) < nb(w) } — i.e., number of as fewer than number of bs
    • L = { an! : n ≥ 0 }
    • L = { anbl : nl }
    • L = { anbl : n + l is a prime number }

    It turns out that what we loosely call regular expressions are considerably more powerful: matching regular expressions with backreferences is NP-hard!

提交回复
热议问题