Regex lookahead for 'not followed by' in grep

老子叫甜甜 提交于 2019-11-26 19:47:55

Negative lookahead, which is what you're after, requires a more powerful tool than the standard grep. You need a PCRE-enabled grep.

If you have GNU grep, the current version supports options -P or --perl-regexp and you can then use the regex you wanted.

If you don't have (a sufficiently recent version of) GNU grep, then consider getting ack.

NHDaly

The answer to part of your problem is here, and ack would behave the same way: Ack & negative lookahead giving errors

You are using double-quotes for grep, which permits bash to "interpret ! as history expand command."

You need to wrap your pattern in SINGLE-QUOTES: grep 'Ui\.(?!L)' *

However, see @JonathanLeffler's answer to address the issues with negative lookaheads in standard grep!

You probably cant perform standard negative lookaheads using grep, but usually you should be able to get equivalent behaviour using the "inverse" switch '-v'. Using that you can construct a regex for the complement of what you want to match and then pipe it through 2 greps.

For the regex in question you might do something like

grep 'Ui\.' * | grep -v 'Ui\.L'

If you need to use a regex implementation that doesn't support negative lookaheads and you don't mind matching extra character(s)*, then you can use negated character classes [^L], alternation |, and the end of string anchor $.

In your case grep 'Ui\.\([^L]\|$\)' * does the job.

  • Ui\. matches the string you're interested in

  • \([^L]\|$\) matches any single character other than L or it matches the end of the line: [^L] or $.

If you want to exclude more than just one character, then you just need to throw more alternation and negation at it. To find a not followed by bc:

grep 'a\(\([^b]\|$\)\|\(b\([^c]\|$\)\)\)' *

Which is either (a followed by not b or followed by the end of the line: a then [^b] or $) or (a followed by b which is either followed by not c or is followed by the end of the line: a then b, then [^c] or $.

This kind of expression gets to be pretty unwieldy and error prone with even a short string. You could write something to generate the expressions for you, but it'd probably be easier to just use a regex implementation that supports negative lookaheads.

*If your implementation supports non-capturing groups then you can avoid capturing extra characters.

If your grep doesn't support -P or --perl-regexp, and you can install PCRE-enabled grep, e.g. "pcregrep", than it won't need any command-line options like GNU grep to accept Perl-compatible regular expressions, you just run

pcregrep "Ui\.(?!Line)"

You don't need another nested group for "Line" as in your example "Ui.(?!(Line))" -- the outer group is sufficient, like I've shown above.

Let me give you another example of looking negative assertions: when you have list of lines, returned by "ipset", each line showing number of packets in a middle of the line, and you don't need lines with zero packets, you just run:

ipset list | pcregrep "packets(?! 0 )"

If you like perl-compatible regular expressions and have perl but don't have pcregrep or your grep doesn't support --perl-regexp, you can you one-line perl scripts that work the same way like grep:

perl -e "while (<>) {if (/Ui\.(?!Lines)/){print;};}"

Perl accepts stdin the same way like grep, e.g.

ipset list | perl -e "while (<>) {if (/packets(?! 0 )/){print;};}"
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!