How do you use a plus symbol with a character class as part of a regular expression?

試著忘記壹切 提交于 2021-02-04 15:58:26

问题


in cygwin, this does not return a match:

$ echo "aaab" | grep '^[ab]+$'

But this does return a match:

$ echo "aaab" | grep '^[ab][ab]*$'
aaab

Are the two expressions not identical? Is there any way to express "one or more characters of the character class" without typing the character class twice (like in the seconds example)?

According to this link the two expressions should be the same, but perhaps Regular-Expressions.info does not cover bash in cygwin.


回答1:


grep has multiple "modes" of matching, and by default only uses a basic set, which does not recognize a number of metacharacters unless they're escaped. You can put grep into extended or perl modes to let + be evaluated.

From man grep:

Matcher Selection
  -E, --extended-regexp
     Interpret PATTERN as an extended regular expression (ERE, see below).  (-E is specified by POSIX.)

  -P, --perl-regexp
     Interpret PATTERN as a Perl regular expression.  This is highly experimental and grep -P may warn of unimplemented features.


Basic vs Extended Regular Expressions
  In basic regular expressions the meta-characters ?, +, {, |, (, and ) lose their special meaning; instead use the backslashed versions \?, \+, \{, \|, \(, and \).

  Traditional egrep did not support the { meta-character, and some egrep implementations support \{ instead, so portable scripts should avoid { in grep -E patterns and should use [{] to match a literal {.

  GNU  grep -E attempts to support traditional usage by assuming that { is not special if it would be the start of an invalid interval specification.  For example, the command grep -E '{1' searches for the two-character string {1 instead of reporting a syntax
       error in the regular expression.  POSIX.2 allows this behavior as an extension, but portable scripts should avoid it.

Alternately, you can use egrep instead of grep -E.




回答2:


In basic regular expressions the metacharacters ?, +, {, |, (, and ) lose their special meaning; instead use the backslashed versions \?, \+, \{, \|, \(, and \).

So use the backslashed version:

$ echo aaab | grep '^[ab]\+$'
aaab

Or activate extended syntax:

$ echo aaab | egrep '^[ab]+$'
aaab



回答3:


Masking by backslash, or egrep as extended grep, alias grep -e:

echo "aaab" | egrep '^[ab]+$'

aaab

echo "aaab" | grep '^[ab]\+$'

aaab



来源:https://stackoverflow.com/questions/5650761/how-do-you-use-a-plus-symbol-with-a-character-class-as-part-of-a-regular-express

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!