问题
I currently have a string similar to the following:
str = 'abcHello Wor=A9ld'
What I want to do is find the 'abc' and '=A9' and replace these matched groups with an empty string, such that my final string is 'Hello World'.
I am currently using this regex, which is correctly finding the groups I want to replace:
r'^(abc).*?(=[A-Z0-9]+)'
I have tried to replace these groups using the following code:
clean_str = re.sub(r'^(abc).*?(=[A-Z0-9]+)', '', str)
Using the above code has resulted in:
print(clean_str)
>>> 'ld'
My question is, how can I use re.sub to replace these groups with an empty string and obtain my 'Hello World'?
回答1:
Is there a way that I can .. ensure that
abcis present, otherwise don't replace the second pattern?
I understand that you need to first check if the string starts with abc, and if yes, remove the abc and all instances of =[0-9A-Z]+ pattern in the string.
I recommend:
import re
s="abcHello wo=A9rld"
if s.startswith('abc'):
print(re.sub(r'=[A-Z0-9]+', '', s[3:]))
Here, if s.startswith('abc'): checks if the string has abc in the beginning, then s[3:] truncates the string from the start removing the abc, and then re.sub removes all non-overlapping instances of the =[A-Z0-9]+ pattern.
Note you may use PyPi regex module to do the same with one regex:
import regex
r = regex.compile(r'^abc|(?<=^abc.*?)=[A-Z0-9]+', regex.S)
print(r.sub('', 'abcHello Wor=A9ld=B56')) # Hello World
print(r.sub('', 'Hello Wor=A9ld')) # => Hello Wor=A9ld
See an online Python demo
Here,
^abc-abcat the start of the string only|- or(?<=^abc.*?)- check if there isabcat the start of the input and then any number of chars other than line break chars immediately to the left of the current location=[A-Z0-9]+- a=followed with 1+ uppercase ASCII letters/digits.
回答2:
Capture everything else and put those groups in the replacement, like so:
re.sub(r'^abc(.*?)=[A-Z0-9]+(.*)', r'\1\2', s)
回答3:
This is a naïve approach but why can't you use replace twice instead of regex, like this:
str = str.replace('abc','')
str = str.replace('=A9','')
print(str) #'Hello World'
回答4:
This worked for me.
re.sub(r'^(abc)(.*?)(=[A-Z0-9]+)(.*?)$', r"\2\4", str)
来源:https://stackoverflow.com/questions/44799965/replace-captured-groups-with-empty-string-in-python