问题
I have this initial string.
'bananaappleorangestrawberryapplepear'
And also have a tuple with strings:
('apple', 'plepe', 'leoran', 'lemon')
I want a function so that from the initial string and the tuple with strings I obtain this:
'bananaxxxxxxxxxgestrawberryxxxxxxxar'
I know how to do it imperatively by finding the word in the initial string for every word and then loop character by character in all initial string with replaced words.
But it's not very efficient and ugly. I suspect there should be some way of doing this more elegantly, in a functional way, with itertools or something. If you know a Python library that can do this efficiently please let me know.
UPDATE: Justin Peel pointed out a case I didn't describe in my initial question. If a word is 'aaa' and 'aaaaaa' is in the initial string, the output should look like 'xxxxxx'.
回答1:
import re
words = ('apple', 'plepe', 'leoran', 'lemon')
s = 'bananaappleorangestrawberryapplepear'
x = set()
for w in words:
for m in re.finditer(w, s):
i = m.start()
for j in range(i, i+len(w)):
x.add(j)
result = ''.join(('x' if i in x else s[i]) for i in range(len(s)))
print result
produces:
bananaxxxxxxxxxgestrawberryxxxxxxxar
回答2:
Here's another answer. There might be a faster way to replace the letters with x's, but I don't think that it is necessary because this is already pretty fast.
import re
def do_xs(s,pats):
pat = re.compile('('+'|'.join(pats)+')')
sout = list(s)
i = 0
match = pat.search(s)
while match:
span = match.span()
sout[span[0]:span[1]] = ['x']*(span[1]-span[0])
i = span[0]+1
match = pat.search(s,i)
return ''.join(sout)
txt = 'bananaappleorangestrawberryapplepear'
pats = ('apple', 'plepe', 'leoran', 'lemon')
print do_xs(txt,pats)
Basically, I create a regex pattern that will match any of the input patterns. Then I just keep restarting the search starting 1 after the starting position of the most recent match. There might be a problem though if you have one of the input patterns is a prefix of another input pattern.
回答3:
Assuming we're restricted to working without stdlib and other imports:
s1 = 'bananaappleorangestrawberryapplepear'
t = ('apple', 'plepe', 'leoran', 'lemon')
s2 = s1
solution = 'bananaxxxxxxxxxgestrawberryxxxxxxxar'
for word in t:
if word not in s1: continue
index = -1 # Start at -1 so our index search starts at 0
for iteration in range(s1.count(word)):
index = s1.find(word, index+1)
length = len(word)
before = s2[:index]
after = s2[index+length:]
s2 = before + 'x'*length + after
print s2 == solution
回答4:
>>> string_ = 'bananaappleorangestrawberryapplepear'
>>> words = ('apple', 'plepe', 'leoran', 'lemon')
>>> xes = [(string_.find(w), len(w)) for w in words]
>>> xes
[(6, 5), (29, 5), (9, 6), (-1, 5)]
>>> for index, len_ in xes:
... if index == -1: continue
... string_ = string_.replace(string_[index:index+len_], 'x'*len_)
...
>>> string_
'bananaxxxxxxxxxgestrawberryxxxxxxxar'
>>>
There are surely more effective ways, but the premature optimisation is the root of all evil.
回答5:
a = ('apple', 'plepe', 'leoran', 'lemon')
b = 'bananaappleorangestrawberryapplepear'
for fruit in a:
if a in b:
b = b.replace(fruit, numberofx's)
The only thing you have to do now his determine how many X's to replace with.
回答6:
def mask_words(s, words):
mask = [False] * len(s)
for word in words:
pos = 0
while True:
idx = s.find(word, pos)
if idx == -1:
break
length = len(word)
for i in xrange(idx, idx+length):
mask[i] = True
pos = idx+length
# Sanity check:
assert len(mask) == len(s)
result = []
for masked, c in zip(mask, s):
result.append('x' if masked else c)
return "".join(result)
来源:https://stackoverflow.com/questions/4173904/string-coverage-optimization-in-python