RegExp: Last occurence of pattern that occurs before another pattern

问题

I want to take a text pattern that occurs the last before another text pattern.

For example I have this text:

code 4ab6-7b5
Another lorem ipsum
Random commentary.

code f6ee-304
Lorem ipsum text 
Dummy text

code: ebf6-649
Other random text
id-x: 7662dd41-29b5-9646-a4bc-1f6e16e8095e

code: abcd-ebf
Random text
id-x: 7662dd41-29b5-9646-a4bc-1f6e16e8095e

I want to take the last code that occurs before the first occurrence of id-x (which means I want to get code ebf6-649)

How can I do that with regexp?

回答1:

If your regex flavor supports lookaheads, you can use a solution like this

^code:[ ]([0-9a-f-]+)(?:(?!^code:[ ])[\s\S])*id-x

And you can find your result in capture number 1.

How does it work?

^code:[ ]           # match "code: " at the beginning of a line, the square 
                    # brackets are just to aid readability. I recommend always
                    # using them for literal spaces.

(                   # capturing group 1, your key
  [0-9a-f-]+        # match one or more hex-digits or hyphens
)                   # end of group 1

(?:                 # start a non-capturing group; each "instance" of this group
                    # will match a single arbitrary character that does not start
                    # a new "code: " (hence this cannot go beyond the current
                    # block)

  (?!               # negative lookahead; this does not consume any characters,
                    # but causes the pattern to fail, if its subpattern could
                    # match here

    ^code:[ ]       # match the beginning of a new block (i.e. "code: " at the
                    # beginning of another line

  )                 # end of negative lookahead, if we've reached the beginning
                    # of a new block, this will cause the non-capturing group to
                    # fail. otherwise just ignore this.

  [\s\S]            # match one arbitrary character
)*                  # end of non-capturing group, repeat 0 or more times
id-x                # match "id-x" literally

The (?:(?!stopword)[\s\S])* pattern let's you match as much as possible without going beyond another occurrence of stopword.

Note that you might have to use some form of multi-line mode for ^ to match at the beginning of a line. The ^ is important to avoid false negatives, if your random text contains open:.

Working demo (using Ruby's regex flavor, as I'm not sure which one you are ultimately going to use)

来源：https://stackoverflow.com/questions/17049622/regexp-last-occurence-of-pattern-that-occurs-before-another-pattern

标签

regex

git-bash