Whether to escape ( and ) in regex using GNU sed

后端 未结 4 1728
清歌不尽
清歌不尽 2020-12-08 14:53

I\'ve noticed several posts on this site which say that with gnu sed you should use ( and ) in regex rather than \\( and \\)

相关标签:
4条回答
  • 2020-12-08 15:36

    Escaped parentheses (\() make the regex search for parentheses as part of the expression.

    Unescaped parentheses (() make the regex group the contents of the parentheses together.

    In other words, if you escape them, the engine looks for them, but if you leave them as is, they cause the engine to group results into variables.

    An example to demonstrate:

    $myString = "junk(150)moar";

    To get just the number:
    #^\w+\((\d+)\)\w+$#

    ($1 is 150)

    It's a mess, I know, but it demonstrates the use of grouping parentheses and parentheses as part of the matching expression.

    Update Years Later:

    As user @bmk correctly points out, this answer applies to extended regular expressions, but not to basic regular expressions. It's difficult to find basic regular expressions as the default parsing engine in most programming languages, etc., but it would be prudent to verify which engine you are using before assuming this answer will apply to your situation.

    0 讨论(0)
  • 2020-12-08 15:45

    Originally sed, like grep and everything else, used \( to indicate grouping, whereas ( just matched a literal open-paren.

    Many newer implementations of regular expressions, including egrep and perl, switched this around, so \( meant a literal open-paren, and ( was used to specify grouping.

    So now with gnu sed, ( is a special character; just like egrep. But on other systems (e.g. BSD) it's still the old way, as far as I can tell. Unfortunately this is a real mess, because now it's hard to know which one to use.

    0 讨论(0)
  • 2020-12-08 15:51

    This part of the gnu sed manual you linked to explains that whether you should escape parentheses depends on whether you are using basic regular expressions or extended regular expressions. This part says that the -r flag determines what mode you are in.

    Edit: as stated in grok12's comment, the -E flag in bsd sed does what the -r flag does in gnu sed.

    0 讨论(0)
  • 2020-12-08 15:51

    Thanks to rocker, murga, and chris. Each of you helped me understand the issue. I'm answering my own question here in order to (hopefully) put the whole story together in one place.

    There are two major versions of sed in use: gnu and bsd. Both of them require parens in basic regex to be escaped when used for grouping but not escaped when used in extended regex. They diff in that the -r option enables extended regex for gnu but -E does so for bsd.

    The standard sed in mac OSX is bsd. I believe much of the rest of the world uses gnu sed as the standard but I don't know precisely who uses what. If you are unsure which you are using try:

    > sed -r
    

    If you get a

    > sed: illegal option -- r
    

    reply then you have bsd.

    0 讨论(0)
提交回复
热议问题