String coverage optimization in Python

问题

I have this initial string.

'bananaappleorangestrawberryapplepear'

And also have a tuple with strings:

('apple', 'plepe', 'leoran', 'lemon')

I want a function so that from the initial string and the tuple with strings I obtain this:

'bananaxxxxxxxxxgestrawberryxxxxxxxar'

I know how to do it imperatively by finding the word in the initial string for every word and then loop character by character in all initial string with replaced words.

But it's not very efficient and ugly. I suspect there should be some way of doing this more elegantly, in a functional way, with itertools or something. If you know a Python library that can do this efficiently please let me know.

UPDATE: Justin Peel pointed out a case I didn't describe in my initial question. If a word is 'aaa' and 'aaaaaa' is in the initial string, the output should look like 'xxxxxx'.

回答1:

import re

words = ('apple', 'plepe', 'leoran', 'lemon')
s = 'bananaappleorangestrawberryapplepear'

x = set()

for w in words:
    for m in re.finditer(w, s):
        i = m.start()
        for j in range(i, i+len(w)):
            x.add(j)

result = ''.join(('x' if i in x else s[i]) for i in range(len(s)))
print result

produces:

bananaxxxxxxxxxgestrawberryxxxxxxxar

回答2:

Here's another answer. There might be a faster way to replace the letters with x's, but I don't think that it is necessary because this is already pretty fast.

import re

def do_xs(s,pats):
    pat = re.compile('('+'|'.join(pats)+')')

    sout = list(s)
    i = 0
    match = pat.search(s)
    while match:
        span = match.span()
        sout[span[0]:span[1]] = ['x']*(span[1]-span[0])
        i = span[0]+1
        match = pat.search(s,i)
    return ''.join(sout)

txt = 'bananaappleorangestrawberryapplepear'
pats = ('apple', 'plepe', 'leoran', 'lemon')
print do_xs(txt,pats)

Basically, I create a regex pattern that will match any of the input patterns. Then I just keep restarting the search starting 1 after the starting position of the most recent match. There might be a problem though if you have one of the input patterns is a prefix of another input pattern.

回答3:

Assuming we're restricted to working without stdlib and other imports:

s1 = 'bananaappleorangestrawberryapplepear'
t = ('apple', 'plepe', 'leoran', 'lemon')
s2 = s1

solution = 'bananaxxxxxxxxxgestrawberryxxxxxxxar'

for word in t:
    if word not in s1: continue
    index = -1 # Start at -1 so our index search starts at 0
    for iteration in range(s1.count(word)):
        index = s1.find(word, index+1)
        length = len(word)
        before = s2[:index]
        after = s2[index+length:]
        s2 = before + 'x'*length + after

print s2 == solution

回答4:

>>> string_ = 'bananaappleorangestrawberryapplepear'
>>> words = ('apple', 'plepe', 'leoran', 'lemon')
>>> xes = [(string_.find(w), len(w)) for w in words]
>>> xes
[(6, 5), (29, 5), (9, 6), (-1, 5)]
>>> for index, len_ in xes:
...   if index == -1: continue
...   string_ = string_.replace(string_[index:index+len_], 'x'*len_)
...
>>> string_
'bananaxxxxxxxxxgestrawberryxxxxxxxar'
>>>

There are surely more effective ways, but the premature optimisation is the root of all evil.

回答5:

a = ('apple', 'plepe', 'leoran', 'lemon')
b = 'bananaappleorangestrawberryapplepear'

for fruit in a:
    if a in b:
        b = b.replace(fruit, numberofx's)

The only thing you have to do now his determine how many X's to replace with.

回答6:

def mask_words(s, words):
    mask = [False] * len(s)
    for word in words:
        pos = 0
        while True:
            idx = s.find(word, pos)
            if idx == -1:
                break

            length = len(word)
            for i in xrange(idx, idx+length):
                mask[i] = True
            pos = idx+length

    # Sanity check:
    assert len(mask) == len(s)

    result = []
    for masked, c in zip(mask, s):
        result.append('x' if masked else c)

    return "".join(result)

来源：https://stackoverflow.com/questions/4173904/string-coverage-optimization-in-python

标签

python

string

optimization

functional-programming

itertools