Performance of library itertools compared to python code

后端 未结 2 2005
迷失自我
迷失自我 2021-01-13 14:49

As answer to my question Find the 1 based position to which two lists are the same I got the hint to use the C-library itertools to speed up things.

To verify I code

2条回答
  •  耶瑟儿~
    2021-01-13 15:20

    • First, kudos for actually timing something.
    • Second, readability is usually more important than writing fast code. If your code runs 3x faster, but you spend 2 out of every 3 weeks debugging it, it's not worth your time.
    • Third, you can also use timeit to time small bits of code. I find that approach to be a little easier than using profile. (profile is good for finding bottlenecks though).

    itertools is, in general, pretty fast. However, especially in this case, your takewhile is going to slow things down because itertools needs to call a function for every element along the way. Each function call in python has a reasonable amount of overhead associated with it so that might be slowing you down a bit (there's also the cost of creating the lambda function in the first place). Notice that sum with the generator expression also adds a little overhead. Ultimately though, it appears that a basic loop wins in this situation all the time.

    from itertools import takewhile, izip
    
    def match_iter(self, other):
        return sum(1 for x in takewhile(lambda x: x[0] == x[1],
                                            izip(self, other)))
    
    def match_loop(self, other):
        cmp = lambda x1,x2: x1 == x2
    
        for element in range(min(len(self), len(other))):
            if self[element] == other[element]:
                element += 1
            else:
                break
    
        return element
    
    def match_loop_lambda(self, other):
        cmp = lambda x1,x2: x1 == x2
    
        for element in range(min(len(self), len(other))):
            if cmp(self[element],other[element]):
                element += 1
            else:
                break
    
        return element
    
    def match_iter_nosum(self,other):
        element = 0
        for _ in takewhile(lambda x: x[0] == x[1],
                           izip(self, other)):
            element += 1
        return element
    
    def match_iter_izip(self,other):
        element = 0
        for x1,x2 in izip(self,other):
            if x1 == x2:
                element += 1
            else:
                break
        return element
    
    
    
    a = [0, 1, 2, 3, 4]
    b = [0, 1, 2, 3, 4, 0]
    
    import timeit
    
    print timeit.timeit('match_iter(a,b)','from __main__ import a,b,match_iter')
    print timeit.timeit('match_loop(a,b)','from __main__ import a,b,match_loop')
    print timeit.timeit('match_loop_lambda(a,b)','from __main__ import a,b,match_loop_lambda')
    print timeit.timeit('match_iter_nosum(a,b)','from __main__ import a,b,match_iter_nosum')
    print timeit.timeit('match_iter_izip(a,b)','from __main__ import a,b,match_iter_izip')
    

    Notice however, that the fastest version is a hybrid of a loop+itertools. This (explicit) loop over izip also happens to be easier to read (in my opinion). So, we can conclude from this that takewhile is the slow-ish part, not necessarily itertools in general.

提交回复
热议问题