问题
for
loops are quite expensive when it comes to execution time. I am building a correction algorithm and I've used peter norvig's code of spell correction . I modified it a bit and realized it is taking too long to execute the optimization on thousands of words.
The algorithm checks for 1 and 2 edit distance and corrects it. I've made it 3 . So that might increase the time (I am not sure). Here is a part of the end where the highest occurring words are used as reference:
def correct(word):
candidates = (known([word]).union(known(edits1(word)))).union(known_edits2(word).union(known_edits3(word)) or [word]) # this is where the problem is
candidate_new = []
for candidate in candidates: #this statement isnt the problem
if soundex(candidate) == soundex(word):
candidate_new.append(candidate)
return max(candidate_new, key=(NWORDS.get))
And it looks like the statement for candidate in candidates
is increasing the execution time. You could easily have a look at the code of peter norvig, Click here.
I've figured out the problem. It's in the statement
candidates = (known([word]).union(known(edits1(word)))
).union(known_edits2(word).union(known_edits3(word)) or [word])
where ,
def known_edits3(word):
return set(e3 for e1 in edits1(word) for e2 in edits1(e1)
for e3 in edits1(e2) if e3 in NWORDS)
It can be seen that there are 3 for loops inside edits3
which increases execution time 3 fold. edits2
has 2 for loops . so this is the culprit.
How do I minimize this expression?
Could itertools.repeat
help out with this one??
回答1:
A couple of ways to increase performance here:
- Use list comprehension (or generator)
- Don't compute the same thing in each iteration
The code would reduce to:
def correct(word):
candidates = (known([word]).union(known(edits1(word)))).union(known_edits2(word).union(known_edits3(word)) or [word])
# Compute soundex outside the loop
soundex_word = soundex(word)
# List compre
candidate_new = [candidate for candidate in candidates if soundex(candidate) == soundex_word]
# Or Generator. This will save memory
candidate_new = (candidate for candidate in candidates if soundex(candidate) == soundex_word)
return max(candidate_new, key=(NWORDS.get))
Another enhancement is based on the fact that you need only the MAX candidate
def correct(word):
candidates = (known([word]).union(known(edits1(word)))).union(known_edits2(word).union(known_edits3(word)) or [word])
soundex_word = soundex(word)
max_candidate = None
max_nword = 0
for candidate in candidates:
if soundex(candidate) == soundex_word and NWORDS.get(candidate) > max_nword:
max_candidate = candidate
return max_candidate
来源:https://stackoverflow.com/questions/21780334/speed-up-execution-python