When`starmap` could be preferred over `List Comprehension`

南楼画角 提交于 2019-12-19 02:42:10

问题


While answering the question Clunky calculation of differences between an incrementing set of numbers, is there a more beautiful way?, I came up with two solutions, one with List Comprehension and other using itertools.starmap.

To me, list comprehension Syntax looks more lucid, readable, less verbose and more Pythonic. But still as starmap is well available in itertools, I was wondering, there has to be a reason for it.

My Question is whenstarmap could be preferred over List Comprehension?

Note If its a matter of Style then it definitely contradicts There should be one-- and preferably only one --obvious way to do it.

Head to Head Comparison

Readability counts. --- LC

Its again a matter of perception but to me LC is more readable than starmap. To use starmap, either you need to import operator, or define lambda or some explicit multi-variable function and nevertheless extra import from itertools.

Performance --- LC

>>> def using_star_map(nums):
    delta=starmap(sub,izip(nums[1:],nums))
    return sum(delta)/float(len(nums)-1)
>>> def using_LC(nums):
    delta=(x-y for x,y in izip(nums[1:],nums))
    return sum(delta)/float(len(nums)-1)
>>> nums=[random.randint(1,10) for _ in range(100000)]
>>> t1=Timer(stmt='using_star_map(nums)',setup='from __main__ import nums,using_star_map;from itertools import starmap,izip')
>>> t2=Timer(stmt='using_LC(nums)',setup='from __main__ import nums,using_LC;from itertools import izip')
>>> print "%.2f usec/pass" % (1000000 * t1.timeit(number=1000)/100000)
235.03 usec/pass
>>> print "%.2f usec/pass" % (1000000 * t2.timeit(number=1000)/100000)
181.87 usec/pass

回答1:


The difference I normally see is map()/starmap() are most appropriate where you are literally just calling a function on every item in a list. In this case, they are a little clearer:

(f(x) for x in y)
map(f, y) # itertools.imap(f, y) in 2.x

(f(*x) for x in y)
starmap(f, y)

As soon as you start needing to throw in lambda or filter as well, you should switch up to the list comp/generator expression, but in cases where it's a single function, the syntax feels very verbose for a generator expression of list comprehension.

They are interchangeable, and where in doubt, stick to the generator expression as it's more readable in general, but in a simple case (map(int, strings), starmap(Vector, points)) using map()/starmap() can sometimes make things easier to read.

Example:

An example where I think starmap() is more readable:

from collections import namedtuple
from itertools import starmap

points = [(10, 20), (20, 10), (0, 0), (20, 20)]

Vector = namedtuple("Vector", ["x", "y"])

for vector in (Vector(*point) for point in points):
    ...

for vector in starmap(Vector, points):
    ...

And for map():

values = ["10", "20", "0"]

for number in (int(x) for x in values):
    ...

for number in map(int, values):
    ...

Performance:

python -m timeit -s "from itertools import starmap" -s "from operator import sub" -s "numbers = zip(range(100000), range(100000))" "sum(starmap(sub, numbers))"                         
1000000 loops, best of 3: 0.258 usec per loop

python -m timeit -s "numbers = zip(range(100000), range(100000))" "sum(x-y for x, y in numbers)"                          
1000000 loops, best of 3: 0.446 usec per loop

For constructing a namedtuple:

python -m timeit -s "from itertools import starmap" -s "from collections import namedtuple" -s "numbers = zip(range(100000), reversed(range(100000)))" -s "Vector = namedtuple('Vector', ['x', 'y'])" "list(starmap(Vector, numbers))"
1000000 loops, best of 3: 0.98 usec per loop

python -m timeit -s "from collections import namedtuple" -s "numbers = zip(range(100000), reversed(range(100000)))" -s "Vector = namedtuple('Vector', ['x', 'y'])" "[Vector(*pos) for pos in numbers]"
1000000 loops, best of 3: 0.375 usec per loop

In my tests, where we are talking about using simple functions (no lambda), starmap() is faster than the equivalent generator expression. Naturally, performance should take a back-seat to readability unless it's a proven bottleneck.

Example of how lambda kills any performance gain, same example as in the first set, but with lambda instead of operator.sub():

python -m timeit -s "from itertools import starmap" -s "numbers = zip(range(100000), range(100000))" "sum(starmap(lambda x, y: x-y, numbers))" 
1000000 loops, best of 3: 0.546 usec per loop



回答2:


It's largely a style thing. Choose whichever you find more readable.

In relation to "There's only one way to do it", Sven Marnach kindly provides this Guido quote:

“You may think this violates TOOWTDI, but as I've said before, that was a white lie (as well a cheeky response to Perl's slogan around 2000). Being able to express intent (to human readers) often requires choosing between multiple forms that do essentially the same thing, but look different to the reader.”

In a performance hotspot, you might want to choose the solution which runs fastest (which I guess in this case would be the starmap based one).

On performance - starmap is slower because of its destructuring; however starmap is not necessary here:

from timeit import Timer
import random
from itertools import starmap, izip,imap
from operator import sub

def using_imap(nums):
    delta=imap(sub,nums[1:],nums[:-1])
    return sum(delta)/float(len(nums)-1)

def using_LC(nums):
    delta=(x-y for x,y in izip(nums[1:],nums))
    return sum(delta)/float(len(nums)-1)

nums=[random.randint(1,10) for _ in range(100000)]
t1=Timer(stmt='using_imap(nums)',setup='from __main__ import nums,using_imap')
t2=Timer(stmt='using_LC(nums)',setup='from __main__ import nums,using_LC')

On my computer:

>>> print "%.2f usec/pass" % (1000000 * t1.timeit(number=1000)/100000)
172.86 usec/pass
>>> print "%.2f usec/pass" % (1000000 * t2.timeit(number=1000)/100000)
178.62 usec/pass

imap comes out a tiny bit faster, probably because it avoids zipping/destructuring.




回答3:


About Starmap.. Lets say you have L = [(0,1,2),(3,4,5),(6,7,8),..].

Generator comprehansion would look like

(f(a,b,c) for a,b,c in L)

or

(f(*item) for item in L) 

And starmap would look like

starmap(f, L)

The third variant is lighter and shorter. But first one is very obvious and it doesnt force me to thing what does it do.

Ok. Now I want to write more complicated in-line code..

some_result = starmap(f_res, [starmap(f1,L1), starmap(f2,L2), starmap(f3,L3)])

This line is not obvious, but still easy to understand.. In generator comprehansion it would look like:

some_result = (f_res(a,b,c) for a,b,c in [(f1(a,b,c) for a,b,c in L1), (f2(a,b,c) for a,b,c in L2), (f3(a,b,c) for a,b,c in L3)])

As you see, it is long, heavy to understand and could not be placed in one line, because it is larger than 79 characters (PEP 8). Even shorter variant is bad:

some_result = (f_res(*item) for item [(f1(*item) for item in L1), (f(*item2) for item in L2), (f3(*item) for item in L3)])

Too many characters.. Too many brackets.. Too much noise.

So. Starmap in some cases is a very useful tool. With it you can write less code that is simpler to understand.

EDIT added some dummy tests

from timeit import timeit
print timeit("from itertools import starmap\nL = [(0,1,2),(3,4,5),(6,7,8)]\nt=list((max(a,b,c)for a,b,c in L))")
print timeit("from itertools import starmap\nL = [(0,1,2),(3,4,5),(6,7,8)]\nt=list((max(*item)for item in L))")
print timeit("from itertools import starmap\nL = [(0,1,2),(3,4,5),(6,7,8)]\nt=list(starmap(max,L))")

outputs (python 2.7.2)

5.23479851154
5.35265309689
4.48601346328

So, starmap is even ~15% faster here.



来源:https://stackoverflow.com/questions/10448486/whenstarmap-could-be-preferred-over-list-comprehension

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!