Speed Up Execution ,Python

感情迁移 提交于 2019-12-12 01:05:57

问题


for loops are quite expensive when it comes to execution time. I am building a correction algorithm and I've used peter norvig's code of spell correction . I modified it a bit and realized it is taking too long to execute the optimization on thousands of words.

The algorithm checks for 1 and 2 edit distance and corrects it. I've made it 3 . So that might increase the time (I am not sure). Here is a part of the end where the highest occurring words are used as reference:

def correct(word):
    candidates = (known([word]).union(known(edits1(word)))).union(known_edits2(word).union(known_edits3(word)) or [word]) # this is where the problem is

    candidate_new = []
    for candidate in candidates: #this statement isnt the problem
        if soundex(candidate) == soundex(word):
            candidate_new.append(candidate)
    return max(candidate_new, key=(NWORDS.get))

And it looks like the statement for candidate in candidates is increasing the execution time. You could easily have a look at the code of peter norvig, Click here.
I've figured out the problem. It's in the statement

candidates = (known([word]).union(known(edits1(word)))
             ).union(known_edits2(word).union(known_edits3(word)) or [word])

where ,

def known_edits3(word):
    return set(e3 for e1 in edits1(word) for e2 in edits1(e1) 
                                      for e3 in edits1(e2) if e3 in NWORDS)  

It can be seen that there are 3 for loops inside edits3 which increases execution time 3 fold. edits2 has 2 for loops . so this is the culprit.

How do I minimize this expression? Could itertools.repeat help out with this one??


回答1:


A couple of ways to increase performance here:

  1. Use list comprehension (or generator)
  2. Don't compute the same thing in each iteration

The code would reduce to:

def correct(word):
    candidates = (known([word]).union(known(edits1(word)))).union(known_edits2(word).union(known_edits3(word)) or [word])

    # Compute soundex outside the loop
    soundex_word = soundex(word)

    # List compre
    candidate_new = [candidate for candidate in candidates if soundex(candidate) == soundex_word]

    # Or Generator. This will save memory
    candidate_new = (candidate for candidate in candidates if soundex(candidate) == soundex_word)

    return max(candidate_new, key=(NWORDS.get))

Another enhancement is based on the fact that you need only the MAX candidate

def correct(word):
    candidates = (known([word]).union(known(edits1(word)))).union(known_edits2(word).union(known_edits3(word)) or [word])

    soundex_word = soundex(word)
    max_candidate = None
    max_nword = 0
    for candidate in candidates:
        if soundex(candidate) == soundex_word and NWORDS.get(candidate) > max_nword:
            max_candidate = candidate
    return max_candidate


来源:https://stackoverflow.com/questions/21780334/speed-up-execution-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!