Return the next nth result \w+ after a hyphen globally

牧云@^-^@ 提交于 2019-12-08 09:00:01

问题


Just getting to the next stage of understanding regex, hoping the community can help...

string = These.Final.Hours-AUSVERSION.2013-TEST-TESTAGAIN-YIFY.cp(tt123456).MiLLENiUM.mp4

There are multiple test names preceded by a '-' hyphen which I derive from regex \(?<=-)\w+\g

Result:

  • AUSVERSION
  • TEST
  • TESTAGAIN
  • YIFY

I can parse the very last result using greediness with regex \(?!.*-)(?<=-)\w+\g

Result:

  • YIFI (4th & last result)

Can you please help me parse either the 1st, 2nd, or 3rd result Globally using the same string?


回答1:


In Python, you can get these matches with a simple -\s*(\w+) regex and re.findall and then access any match with the appropriate index:

See IDEONE demo:

import re
s = 'These.Final.Hours-AUSVERSION.2013-TEST-TESTAGAIN-YIFY.cp(tt123456).MiLLENiUM.mp4'
r = re.findall(r'-\s*(\w+)', s)
print(r[0]) # => AUSVERSION
print(r[1]) # => TEST
print(r[2]) # => TESTAGAIN
print(r[3]) # => YIFY

The -\s*(\w+) pattern search for a hyphen, followed with 0+ whitespaces, and then captures 1+ digits, letters or underscores. re.findall only returns the texts captured with capturing groups, so you only get those Group 1 values captured with (\w+).

To get these matches one by one, with re.search, you can use ^(?:.*?-\s*(\w+)){n}, where n is the match index you want. Here is a regex demo.

A quick Python demo (in real code, assign the result of re.search and only access Group 1 value after checking if there was a match):

s = "These.Final.Hours-AUSVERSION.2013-TEST-TESTAGAIN- YIFY.cp(tt123456).MiLLENiUM.mp4"
print(re.search(r'^(?:.*?-\s*(\w+))', s).group(1))
print(re.search(r'^(?:.*?-\s*(\w+)){2}', s).group(1))
print(re.search(r'^(?:.*?-\s*(\w+)){3}', s).group(1))
print(re.search(r'^(?:.*?-\s*(\w+)){4}', s).group(1))

Explanation of the pattern:

  • ^ - start of string
  • (?:.*?-\s*(\w+)){2} - a non-capturing group that matches (here) 2 sequences of:
    • .*? - 0+ any characters other than a newline (since no re.DOTALL modifier is used) up to the first...
    • - - hyphen
    • \s* - 0 or more whitespaces
    • (\w+) - Group 1 capturing 1+ word characters (letters, digits or underscores).


来源:https://stackoverflow.com/questions/37924545/return-the-next-nth-result-w-after-a-hyphen-globally

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!