Return the next nth result \w+ after a hyphen globally

问题

Just getting to the next stage of understanding regex, hoping the community can help...

string = These.Final.Hours-AUSVERSION.2013-TEST-TESTAGAIN-YIFY.cp(tt123456).MiLLENiUM.mp4

There are multiple test names preceded by a '-' hyphen which I derive from regex \(?<=-)\w+\g

Result:

AUSVERSION
TEST
TESTAGAIN
YIFY

I can parse the very last result using greediness with regex \(?!.*-)(?<=-)\w+\g

Result:

YIFI (4th & last result)

Can you please help me parse either the 1st, 2nd, or 3rd result Globally using the same string?

回答1:

In Python, you can get these matches with a simple -\s*(\w+) regex and re.findall and then access any match with the appropriate index:

See IDEONE demo:

import re
s = 'These.Final.Hours-AUSVERSION.2013-TEST-TESTAGAIN-YIFY.cp(tt123456).MiLLENiUM.mp4'
r = re.findall(r'-\s*(\w+)', s)
print(r[0]) # => AUSVERSION
print(r[1]) # => TEST
print(r[2]) # => TESTAGAIN
print(r[3]) # => YIFY

The -\s*(\w+) pattern search for a hyphen, followed with 0+ whitespaces, and then captures 1+ digits, letters or underscores. re.findall only returns the texts captured with capturing groups, so you only get those Group 1 values captured with (\w+).

To get these matches one by one, with re.search, you can use ^(?:.*?-\s*(\w+)){n}, where n is the match index you want. Here is a regex demo.

A quick Python demo (in real code, assign the result of re.search and only access Group 1 value after checking if there was a match):

s = "These.Final.Hours-AUSVERSION.2013-TEST-TESTAGAIN- YIFY.cp(tt123456).MiLLENiUM.mp4"
print(re.search(r'^(?:.*?-\s*(\w+))', s).group(1))
print(re.search(r'^(?:.*?-\s*(\w+)){2}', s).group(1))
print(re.search(r'^(?:.*?-\s*(\w+)){3}', s).group(1))
print(re.search(r'^(?:.*?-\s*(\w+)){4}', s).group(1))

Explanation of the pattern:

^ - start of string
(?:.*?-\s*(\w+)){2} - a non-capturing group that matches (here) 2 sequences of:
- .*? - 0+ any characters other than a newline (since no re.DOTALL modifier is used) up to the first...
- - - hyphen
- \s* - 0 or more whitespaces
- (\w+) - Group 1 capturing 1+ word characters (letters, digits or underscores).

来源：https://stackoverflow.com/questions/37924545/return-the-next-nth-result-w-after-a-hyphen-globally

标签

python

regex

regex-lookarounds