Regex capture all before substring

自闭症网瘾萝莉.ら 提交于 2019-12-11 08:37:28

问题


I have a string:

s = 'Abc - 33 SR 11 Kill(s) P G - (Type-1P-G) 2 Kill(s) M 1 Kill(s) S - M9A CWS 1 Kill(s) 11 Kill(s)'

I'm trying to split this up to capture the number of kills, and the information before each "XY Kill(s)" to get this output:

['Abc - 33 SR', 
 'P G - (Type-1P-G)', 
 'M', 
 'S - M9A CWS']

Getting the number of kills was simple:

re.findall(r"(\d+) Kill", s)
['11', '2', '1', '1', '11']

Getting the text has been harder. From researching, I have tried to use the following regex, which just gave the beginning of a series of capture groups:

re.findall(r"(?=[0-9]+ Kill)", s)
['', '', '', '', '', '', '']

I then changed this to add in "any number of characters before each group".

re.findall(r"(.+)(?=[0-9]+ Kill)", s)
['Abc - 33 SR 11 Kill(s) P G - (Type-1P-G) 2 Kill(s) M 1 Kill(s) S - M9A CWS 1 Kill(s) 1']

This just gives the entire string. How can I adjust this to capture everything before "any number of digits-space-Kill"?

Let's get the dupes out of the way. I've consulted the following. The second in particular looked useful but I've been unable to make it suit this purpose.

Extract Number before a Character in a String Using Python,

How would I get everything before a : in a string Python,

how to get the last part of a string before a certain character?.


回答1:


You may use

re.findall(r'(.*?)\s*(\d+) Kill\(s\)\s*', s)

See the regex demo

Details

  • (.*?) - Capturing group 1: any 0+ chars other than line break chars, as few as possible
  • \s* - 0+ whitespaces
  • (\d+) - Capturing group 2: one or more digits
  • Kill(s) - a space and Kill(s) substring
  • \s* - 0+ whitespaces

Python demo:

import re
rx = r"(.*?)\s*(\d+) Kill\(s\)\s*"
s = "Abc - 33 SR 11 Kill(s) P G - (Type-1P-G) 2 Kill(s) M 1 Kill(s) S - M9A CWS 1 Kill(s) 11 Kill(s)"
print(re.findall(rx, s))

Output:

[('Abc - 33 SR', '11'), ('P G - (Type-1P-G)', '2'), ('M', '1'), ('S - M9A CWS', '1'), ('', '11')]



回答2:


You can use re.split() to get a list of all content between matches.

>>> re.split(r"\d+ Kill\(s\)", s)
    ['Abc - 33 SR ', ' P G - (Type-1P-G) ', ' M ', ' S - M9A CWS ', ' ', '']

You can clean it up to remove whitespace and empty strings.

>>> [s.strip() for s in re.split(r"\d+ Kill\(s\)", s) if s.strip()]
    ['Abc - 33 SR', 'P G - (Type-1P-G)', 'M', 'S - M9A CWS']


来源:https://stackoverflow.com/questions/51153569/regex-capture-all-before-substring

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!