Greedy behaviour of grep

回眸只為那壹抹淺笑 提交于 2020-06-27 10:27:11

问题


I thought that in regular expressions, the "greediness" applies to quantifiers rather than matches as a whole. However, I observe that

grep -E --color=auto 'a+(ab)?' <(printf "aab")

returns aab rather than aab.

The same applies to sed. On the other hand, in pcregrep and other tools, it is really the quantifier that is greedy. Is this a specific behaviour of grep?

N.B. I checked both grep (BSD grep) 2.5.1-FreeBSD and grep (GNU grep) 3.1


回答1:


In the description of term matched, POSIX states that

The search for a matching sequence starts at the beginning of a string and stops when the first sequence matching the expression is found, where "first" is defined to mean "begins earliest in the string". If the pattern permits a variable number of matching characters and thus there is more than one such sequence starting at that point, the longest such sequence is matched.

This statement clearly anwers your question. The string aab contains two substrings beginning at the same position matching the ERE a+(ab)?; these are aa and aab. The latter is the longest, thus it's matched.

POSIX doesn't use the terms greedy, greediness etc. in the specification of REs btw. So, as far as POSIX utilities are concerned, you better refer to their documents rather than googling the issue you're facing using terminology belonging to separate implementations.



来源:https://stackoverflow.com/questions/59137763/greedy-behaviour-of-grep

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!