How to do a non-greedy match in grep?

后端 未结 7 2090
深忆病人
深忆病人 2020-11-30 17:30

I want to grep the shortest match and the pattern should be something like:


...
...
...

... mean

相关标签:
7条回答
  • 2020-11-30 17:41

    grep

    For non-greedy match in grep you could use a negated character class. In other words, try to avoid wildcards.

    For example, to fetch all links to jpeg files from the page content, you'd use:

    grep -o '"[^" ]\+.jpg"'
    

    To deal with multiple line, pipe the input through xargs first. For performance, use ripgrep.

    0 讨论(0)
  • 2020-11-30 17:44

    I know that its a bit of a dead post but I just noticed that this works. It removed both clean-up and cleanup from my output.

    > grep -v -e 'clean\-\?up'
    > grep --version grep (GNU grep) 2.20
    
    0 讨论(0)
  • 2020-11-30 17:46

    My grep that works after trying out stuff in this thread:

    echo "hi how are you " | grep -shoP ".*? "
    

    Just make sure you append a space to each one of your lines

    (Mine was a line by line search to spit out words)

    0 讨论(0)
  • 2020-11-30 17:48

    Actualy the .*? only works in perl. I am not sure what the equivalent grep extended regexp syntax would be. Fortunately you can use perl syntax with grep so grep -P would work but grep -E which is same as egrep would not work (it would be greedy).

    See also: http://blog.vinceliu.com/2008/02/non-greedy-regular-expression-matching.html

    0 讨论(0)
  • 2020-11-30 17:49

    You're looking for a non-greedy (or lazy) match. To get a non-greedy match in regular expressions you need to use the modifier ? after the quantifier. For example you can change .* to .*?.

    By default grep doesn't support non-greedy modifiers, but you can use grep -P to use the Perl syntax.

    0 讨论(0)
  • 2020-11-30 17:54

    The short answer is using the next regular expression:

    (?s)<car .*? model=BMW .*?>.*?</car>
    
    • (?s) - this makes a match across multiline
    • .*? - matches any character, a number of times in a lazy way (minimal match)

    A (little) more complicated answer is:

    (?s)<([a-z\-_0-9]+?) .*? model=BMW .*?>.*?</\1>
    

    This will makes possible to match car1 and car2 in the following text

    <car1 ... model=BMW ...>
    ...
    ...
    ...
    </car1>
    <car2 ... model=BMW ...>
    ...
    ...
    ...
    </car2>
    
    • (..) represents a capturing group
    • \1 in this context matches the sametext as most recently matched by capturing group number 1
    0 讨论(0)
提交回复
热议问题