问题
While answering the question Clunky calculation of differences between an incrementing set of numbers, is there a more beautiful way?, I came up with two solutions, one with List Comprehension
and other using itertools.starmap.
To me, list comprehension
Syntax looks more lucid, readable, less verbose and more Pythonic. But still as starmap is well available in itertools, I was wondering, there has to be a reason for it.
My Question is whenstarmap
could be preferred over List Comprehension
?
Note If its a matter of Style then it definitely contradicts There should be one-- and preferably only one --obvious way to do it.
Head to Head Comparison
Readability counts. --- LC
Its again a matter of perception but to me LC
is more readable than starmap
.
To use starmap
, either you need to import operator
, or define lambda
or some explicit multi-variable
function and nevertheless extra import from itertools
.
Performance --- LC
>>> def using_star_map(nums):
delta=starmap(sub,izip(nums[1:],nums))
return sum(delta)/float(len(nums)-1)
>>> def using_LC(nums):
delta=(x-y for x,y in izip(nums[1:],nums))
return sum(delta)/float(len(nums)-1)
>>> nums=[random.randint(1,10) for _ in range(100000)]
>>> t1=Timer(stmt='using_star_map(nums)',setup='from __main__ import nums,using_star_map;from itertools import starmap,izip')
>>> t2=Timer(stmt='using_LC(nums)',setup='from __main__ import nums,using_LC;from itertools import izip')
>>> print "%.2f usec/pass" % (1000000 * t1.timeit(number=1000)/100000)
235.03 usec/pass
>>> print "%.2f usec/pass" % (1000000 * t2.timeit(number=1000)/100000)
181.87 usec/pass
回答1:
The difference I normally see is map()
/starmap()
are most appropriate where you are literally just calling a function on every item in a list. In this case, they are a little clearer:
(f(x) for x in y)
map(f, y) # itertools.imap(f, y) in 2.x
(f(*x) for x in y)
starmap(f, y)
As soon as you start needing to throw in lambda
or filter
as well, you should switch up to the list comp/generator expression, but in cases where it's a single function, the syntax feels very verbose for a generator expression of list comprehension.
They are interchangeable, and where in doubt, stick to the generator expression as it's more readable in general, but in a simple case (map(int, strings)
, starmap(Vector, points)
) using map()
/starmap()
can sometimes make things easier to read.
Example:
An example where I think starmap()
is more readable:
from collections import namedtuple
from itertools import starmap
points = [(10, 20), (20, 10), (0, 0), (20, 20)]
Vector = namedtuple("Vector", ["x", "y"])
for vector in (Vector(*point) for point in points):
...
for vector in starmap(Vector, points):
...
And for map()
:
values = ["10", "20", "0"]
for number in (int(x) for x in values):
...
for number in map(int, values):
...
Performance:
python -m timeit -s "from itertools import starmap" -s "from operator import sub" -s "numbers = zip(range(100000), range(100000))" "sum(starmap(sub, numbers))"
1000000 loops, best of 3: 0.258 usec per loop
python -m timeit -s "numbers = zip(range(100000), range(100000))" "sum(x-y for x, y in numbers)"
1000000 loops, best of 3: 0.446 usec per loop
For constructing a namedtuple
:
python -m timeit -s "from itertools import starmap" -s "from collections import namedtuple" -s "numbers = zip(range(100000), reversed(range(100000)))" -s "Vector = namedtuple('Vector', ['x', 'y'])" "list(starmap(Vector, numbers))"
1000000 loops, best of 3: 0.98 usec per loop
python -m timeit -s "from collections import namedtuple" -s "numbers = zip(range(100000), reversed(range(100000)))" -s "Vector = namedtuple('Vector', ['x', 'y'])" "[Vector(*pos) for pos in numbers]"
1000000 loops, best of 3: 0.375 usec per loop
In my tests, where we are talking about using simple functions (no lambda
), starmap()
is faster than the equivalent generator expression. Naturally, performance should take a back-seat to readability unless it's a proven bottleneck.
Example of how lambda
kills any performance gain, same example as in the first set, but with lambda
instead of operator.sub()
:
python -m timeit -s "from itertools import starmap" -s "numbers = zip(range(100000), range(100000))" "sum(starmap(lambda x, y: x-y, numbers))"
1000000 loops, best of 3: 0.546 usec per loop
回答2:
It's largely a style thing. Choose whichever you find more readable.
In relation to "There's only one way to do it", Sven Marnach kindly provides this Guido quote:
“You may think this violates TOOWTDI, but as I've said before, that was a white lie (as well a cheeky response to Perl's slogan around 2000). Being able to express intent (to human readers) often requires choosing between multiple forms that do essentially the same thing, but look different to the reader.”
In a performance hotspot, you might want to choose the solution which runs fastest (which I guess in this case would be the starmap
based one).
On performance - starmap is slower because of its destructuring; however starmap is not necessary here:
from timeit import Timer
import random
from itertools import starmap, izip,imap
from operator import sub
def using_imap(nums):
delta=imap(sub,nums[1:],nums[:-1])
return sum(delta)/float(len(nums)-1)
def using_LC(nums):
delta=(x-y for x,y in izip(nums[1:],nums))
return sum(delta)/float(len(nums)-1)
nums=[random.randint(1,10) for _ in range(100000)]
t1=Timer(stmt='using_imap(nums)',setup='from __main__ import nums,using_imap')
t2=Timer(stmt='using_LC(nums)',setup='from __main__ import nums,using_LC')
On my computer:
>>> print "%.2f usec/pass" % (1000000 * t1.timeit(number=1000)/100000)
172.86 usec/pass
>>> print "%.2f usec/pass" % (1000000 * t2.timeit(number=1000)/100000)
178.62 usec/pass
imap
comes out a tiny bit faster, probably because it avoids zipping/destructuring.
回答3:
About Starmap..
Lets say you have L = [(0,1,2),(3,4,5),(6,7,8),..]
.
Generator comprehansion would look like
(f(a,b,c) for a,b,c in L)
or
(f(*item) for item in L)
And starmap would look like
starmap(f, L)
The third variant is lighter and shorter. But first one is very obvious and it doesnt force me to thing what does it do.
Ok. Now I want to write more complicated in-line code..
some_result = starmap(f_res, [starmap(f1,L1), starmap(f2,L2), starmap(f3,L3)])
This line is not obvious, but still easy to understand.. In generator comprehansion it would look like:
some_result = (f_res(a,b,c) for a,b,c in [(f1(a,b,c) for a,b,c in L1), (f2(a,b,c) for a,b,c in L2), (f3(a,b,c) for a,b,c in L3)])
As you see, it is long, heavy to understand and could not be placed in one line, because it is larger than 79 characters (PEP 8). Even shorter variant is bad:
some_result = (f_res(*item) for item [(f1(*item) for item in L1), (f(*item2) for item in L2), (f3(*item) for item in L3)])
Too many characters.. Too many brackets.. Too much noise.
So. Starmap in some cases is a very useful tool. With it you can write less code that is simpler to understand.
EDIT added some dummy tests
from timeit import timeit
print timeit("from itertools import starmap\nL = [(0,1,2),(3,4,5),(6,7,8)]\nt=list((max(a,b,c)for a,b,c in L))")
print timeit("from itertools import starmap\nL = [(0,1,2),(3,4,5),(6,7,8)]\nt=list((max(*item)for item in L))")
print timeit("from itertools import starmap\nL = [(0,1,2),(3,4,5),(6,7,8)]\nt=list(starmap(max,L))")
outputs (python 2.7.2)
5.23479851154
5.35265309689
4.48601346328
So, starmap is even ~15% faster here.
来源:https://stackoverflow.com/questions/10448486/whenstarmap-could-be-preferred-over-list-comprehension