Why is recursive regex not regex?

后端未结

关注

 4  1744

一生所求 2020-12-11 16:32

I was reading through some of the responses in this question and saw that a few people said that recursive regular expressions were not strictly speaking regular expressions

4条回答

轻奢々 (楼主)

2020-12-11 17:15
The strict definition of regular language from theoretical computer science may seem abstract with little practical benefit, but if you’re ever faced with the need to implement a state machine to recognize certain inputs, you can save yourself a lot of useless effort and hairpulling if you can prove up front that it can’t be done.

An informal way to express it is recognition of a regular language cannot require an arbitrary amount of memory. The pumping lemma for regular languages is useful for proving that a particular language (i.e., a set of strings) cannot be recognized by a deterministic finite automaton.

From An Introduction to Formal Languages and Automata by Peter Linz (pg. 115, 3rd ed.):

Theorem 4.8

Let L be an infinite regular language. Then there exists some positive integer m such that any w ∈ L with |w| ≥ m can be decomposed as

w = xyz,

with

|xy| ≤ m,

and

|y| ≥ 1,

such that

w_i = xyⁱz — Eq. (4.2)

is also in L for all i = 0, 1, 2, …

To recognize an infinite language, a finite automaton must “pump” or repeat some portion of its states, and that’s the function of yⁱ (notation for some string y repeated i times).

Very nearly all proofs involving the pumping lemma involve proof by contradiction. On page 117, the author proves that the language L = { aⁿbⁿ : n ≥ 0 }—i.e., strings of the form aaa…bbb… where the all-a and all-b substrings are equal in length—is not regular:

Assume that L is regular, so that the pumping lemma must hold. We do not know the value of m, but whatever it is, we can always choose n = m. Therefore, the substring y must consist entirely of a's. Suppose |y| = k. Then the string obtained by using i = 0 in Equation (4.2) is

w₀ = a^m-kb^m

and is clearly not in L. This contradicts the pumping lemma and thereby indicates that the assumption that L is regular must be false.

Other examples of languages that are not regular:
- L = { ww^R : w ∈ Σ^* } — i.e., palindromes
- L = { w ∈ Σ^* : n_a(w) < n_b(w) } — i.e., number of as fewer than number of bs
- L = { a^n! : n ≥ 0 }
- L = { aⁿb^l : n ≠ l }
- L = { aⁿb^l : n + l is a prime number }
It turns out that what we loosely call regular expressions are considerably more powerful: matching regular expressions with backreferences is NP-hard!
0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...

Why is recursive regex not regex?

Theorem 4.8