Non-greedy string regular expression matching

后端 未结 2 2012
独厮守ぢ
独厮守ぢ 2020-11-28 10:21

I\'m pretty sure I\'m missing something obvious here, but I cannot make R to use non-greedy regular expressions:

> library(stringr)
> str_match(\'xxx a         


        
2条回答
  •  -上瘾入骨i
    2020-11-28 11:05

    Difficult concept so I'll try my best... Someone feel free to edit and explain better if it is a bit confusing.

    Expressions that match your patterns are searched from left to right. Yes, all of the following strings aaaab, aaab, aab, and ab are matches to your pattern, but aaaab being the one that starts the most to the left is the one that is returned.

    So here, your non-greedy pattern is not very useful. Maybe this other example will help you understand better when a non-greedy pattern kicks in:

    str_match('xxx aaaab yyy', "a.*?y") 
    #      [,1]     
    # [1,] "aaaab y"
    

    Here all of the strings aaaab y, aaaab yy, aaaab yyy matched the pattern and started at the same position, but the first one was returned because of the non-greedy pattern.


    So what can you do to catch that last ab? Use this:

    str_match('xxx aaaab yyy', ".*(a.*b)")
    #      [,1]        [,2]
    # [1,] "xxx aaaab" "ab"
    

    How does it work? By adding a greedy pattern .* in the front, you are now forcing the process to put the last possible a into the captured group.

提交回复
热议问题