Return the next nth result \\w+ after a hyphen globally

柔情痞子 提交于 2019-12-06 14:36:01

In Python, you can get these matches with a simple -\s*(\w+) regex and re.findall and then access any match with the appropriate index:

See IDEONE demo:

import re
s = 'These.Final.Hours-AUSVERSION.2013-TEST-TESTAGAIN-YIFY.cp(tt123456).MiLLENiUM.mp4'
r = re.findall(r'-\s*(\w+)', s)
print(r[0]) # => AUSVERSION
print(r[1]) # => TEST
print(r[2]) # => TESTAGAIN
print(r[3]) # => YIFY

The -\s*(\w+) pattern search for a hyphen, followed with 0+ whitespaces, and then captures 1+ digits, letters or underscores. re.findall only returns the texts captured with capturing groups, so you only get those Group 1 values captured with (\w+).

To get these matches one by one, with re.search, you can use ^(?:.*?-\s*(\w+)){n}, where n is the match index you want. Here is a regex demo.

A quick Python demo (in real code, assign the result of re.search and only access Group 1 value after checking if there was a match):

s = "These.Final.Hours-AUSVERSION.2013-TEST-TESTAGAIN- YIFY.cp(tt123456).MiLLENiUM.mp4"
print(re.search(r'^(?:.*?-\s*(\w+))', s).group(1))
print(re.search(r'^(?:.*?-\s*(\w+)){2}', s).group(1))
print(re.search(r'^(?:.*?-\s*(\w+)){3}', s).group(1))
print(re.search(r'^(?:.*?-\s*(\w+)){4}', s).group(1))

Explanation of the pattern:

  • ^ - start of string
  • (?:.*?-\s*(\w+)){2} - a non-capturing group that matches (here) 2 sequences of:
    • .*? - 0+ any characters other than a newline (since no re.DOTALL modifier is used) up to the first...
    • - - hyphen
    • \s* - 0 or more whitespaces
    • (\w+) - Group 1 capturing 1+ word characters (letters, digits or underscores).
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!