How come my regex isn't working as expected in Bash? Greedy instead of Lazy

社会主义新天地 提交于 2020-02-06 07:55:48

问题


How come my regex pattern isn't lazy? It should be capturing the first number, not the second.

Here is a working bash script..

#!/bin/bash

text='here is some example text I want to match word1 and this number 3.01 GiB here is some extra text and another number 1.89 GiB'

regex='(word1|word2).*?number[[:blank:]]([0-9.]+) GiB'

if [[ "$text" =~ $regex ]]; then
    echo 'FULL MATCH:  '"${BASH_REMATCH[0]}"
    echo 'NUMBER CAPTURE:  '"${BASH_REMATCH[2]}"
fi

Here is the output...

FULL MATCH:  word1 and this number 3.01 GiB here is some extra text and another number 1.89 GiB
NUMBER CAPTURE:  1.89

Using this online POSIX regex tester it is lazy as I expected. But in Bash it is greedy. The NUMBER CAPTURE should be 3.01, not 1.89.


回答1:


Wrt .*?, POSIX standard says

The behavior of multiple adjacent duplication symbols ( '+', '*', '?', and intervals) produces undefined results.

And concerning greedy matching, it says:

If the pattern permits a variable number of matching characters and thus there is more than one such sequence starting at that point, the longest such sequence is matched.

In this particular case you can use [^&]* instead.

text='here is some example text I want to match word1 and this number 3.01 GiB here is some extra text and another number 1.89 GiB'
regex='(word1|word2)[^&]*number[[:blank:]]([0-9.]+) GiB'
if [[ "$text" =~ $regex ]]; then
    echo 'FULL MATCH:  '"${BASH_REMATCH[0]}";
    echo 'NUMBER CAPTURE:  '"${BASH_REMATCH[2]}";
fi

Outputs:

FULL MATCH:  word1 and this number 3.01 GiB
NUMBER CAPTURE:  3.01


来源:https://stackoverflow.com/questions/57620201/how-come-my-regex-isnt-working-as-expected-in-bash-greedy-instead-of-lazy

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!