As answer to my question Find the 1 based position to which two lists are the same I got the hint to use the C-library itertools to speed up things.
To verify I code
timeit
to time small bits of code. I find that approach to be a little easier than using profile
. (profile
is good for finding bottlenecks though).itertools
is, in general, pretty fast. However, especially in this case, your takewhile
is going to slow things down because itertools needs to call a function for every element along the way. Each function call in python has a reasonable amount of overhead associated with it so that might be slowing you down a bit (there's also the cost of creating the lambda function in the first place). Notice that sum
with the generator expression also adds a little overhead. Ultimately though, it appears that a basic loop wins in this situation all the time.
from itertools import takewhile, izip
def match_iter(self, other):
return sum(1 for x in takewhile(lambda x: x[0] == x[1],
izip(self, other)))
def match_loop(self, other):
cmp = lambda x1,x2: x1 == x2
for element in range(min(len(self), len(other))):
if self[element] == other[element]:
element += 1
else:
break
return element
def match_loop_lambda(self, other):
cmp = lambda x1,x2: x1 == x2
for element in range(min(len(self), len(other))):
if cmp(self[element],other[element]):
element += 1
else:
break
return element
def match_iter_nosum(self,other):
element = 0
for _ in takewhile(lambda x: x[0] == x[1],
izip(self, other)):
element += 1
return element
def match_iter_izip(self,other):
element = 0
for x1,x2 in izip(self,other):
if x1 == x2:
element += 1
else:
break
return element
a = [0, 1, 2, 3, 4]
b = [0, 1, 2, 3, 4, 0]
import timeit
print timeit.timeit('match_iter(a,b)','from __main__ import a,b,match_iter')
print timeit.timeit('match_loop(a,b)','from __main__ import a,b,match_loop')
print timeit.timeit('match_loop_lambda(a,b)','from __main__ import a,b,match_loop_lambda')
print timeit.timeit('match_iter_nosum(a,b)','from __main__ import a,b,match_iter_nosum')
print timeit.timeit('match_iter_izip(a,b)','from __main__ import a,b,match_iter_izip')
Notice however, that the fastest version is a hybrid of a loop+itertools. This (explicit) loop over izip
also happens to be easier to read (in my opinion). So, we can conclude from this that takewhile
is the slow-ish part, not necessarily itertools
in general.