Regex capture all before substring

问题

I have a string:

s = 'Abc - 33 SR 11 Kill(s) P G - (Type-1P-G) 2 Kill(s) M 1 Kill(s) S - M9A CWS 1 Kill(s) 11 Kill(s)'

I'm trying to split this up to capture the number of kills, and the information before each "XY Kill(s)" to get this output:

['Abc - 33 SR', 
 'P G - (Type-1P-G)', 
 'M', 
 'S - M9A CWS']

Getting the number of kills was simple:

re.findall(r"(\d+) Kill", s)
['11', '2', '1', '1', '11']

Getting the text has been harder. From researching, I have tried to use the following regex, which just gave the beginning of a series of capture groups:

re.findall(r"(?=[0-9]+ Kill)", s)
['', '', '', '', '', '', '']

I then changed this to add in "any number of characters before each group".

re.findall(r"(.+)(?=[0-9]+ Kill)", s)
['Abc - 33 SR 11 Kill(s) P G - (Type-1P-G) 2 Kill(s) M 1 Kill(s) S - M9A CWS 1 Kill(s) 1']

This just gives the entire string. How can I adjust this to capture everything before "any number of digits-space-Kill"?

Let's get the dupes out of the way. I've consulted the following. The second in particular looked useful but I've been unable to make it suit this purpose.

Extract Number before a Character in a String Using Python,

How would I get everything before a : in a string Python,

how to get the last part of a string before a certain character?.

回答1:

You may use

re.findall(r'(.*?)\s*(\d+) Kill\(s\)\s*', s)

See the regex demo

Details

(.*?) - Capturing group 1: any 0+ chars other than line break chars, as few as possible
\s* - 0+ whitespaces
(\d+) - Capturing group 2: one or more digits
Kill(s) - a space and Kill(s) substring
\s* - 0+ whitespaces

Python demo:

import re
rx = r"(.*?)\s*(\d+) Kill\(s\)\s*"
s = "Abc - 33 SR 11 Kill(s) P G - (Type-1P-G) 2 Kill(s) M 1 Kill(s) S - M9A CWS 1 Kill(s) 11 Kill(s)"
print(re.findall(rx, s))

Output:

[('Abc - 33 SR', '11'), ('P G - (Type-1P-G)', '2'), ('M', '1'), ('S - M9A CWS', '1'), ('', '11')]

回答2:

You can use re.split() to get a list of all content between matches.

>>> re.split(r"\d+ Kill\(s\)", s)
    ['Abc - 33 SR ', ' P G - (Type-1P-G) ', ' M ', ' S - M9A CWS ', ' ', '']

You can clean it up to remove whitespace and empty strings.

>>> [s.strip() for s in re.split(r"\d+ Kill\(s\)", s) if s.strip()]
    ['Abc - 33 SR', 'P G - (Type-1P-G)', 'M', 'S - M9A CWS']

来源：https://stackoverflow.com/questions/51153569/regex-capture-all-before-substring

标签

python

regex