Speed of “sum” comprehension in Python

心不动则不痛 提交于 2020-01-02 07:03:14

问题


I was under the impression that using a sum construction was much faster than running a for loop. However, in the following code, the for loop actually runs faster:

import time

Score = [[3,4,5,6,7,8] for i in range(40)]

a=[0,1,2,3,4,5,4,5,2,1,3,0,5,1,0,3,4,2,2,4,4,5,1,2,5,4,3,2,0,1,1,0,2,0,0,0,1,3,2,1]

def ver1():
    for i in range(100000):
        total = 0
        for j in range(40):
            total+=Score[j][a[j]]
    print (total)

def ver2():
    for i in range(100000):
        total = sum(Score[j][a[j]] for j in range(40))
    print (total)


t0 = time.time()
ver1()
t1 = time.time()
ver2()
t2 = time.time()

print("Version 1 time: ", t1-t0)
print("Version 2 time: ", t2-t1)

The output is:

208
208
Version 1 time:  0.9300529956817627
Version 2 time:  1.066061019897461

Am I doing something wrong? Is there a way to do this faster?

(Note that this is just a demo I set up, in my real application the scores will not be repeated in this manner)

Some additional info: This is run on Python 3.4.4 64-bit, on Windows 7 64-bit, on an i7.


回答1:


This seems to depend on the system, probably the python version. On my system, the difference is is about 13%:

python sum.py 
208
208
('Version 1 time: ', 0.6371259689331055)
('Version 2 time: ', 0.7342419624328613)

The two version are not measuring sum versus manual looping because the loop "bodies" are not identical. ver2 does more work because it creates the generator expression 100000 times, while ver1's loop body is almost trivial, but it creates a list with 40 elements for every iteration. You can change the example to be identical, and then you see the effect of sum:

def ver1():
    r = [Score[j][a[j]] for j in range(40)]
    for i in xrange(100000):
        total = 0
        for j in r:
            total+=j
    print (total)

def ver2():
    r = [Score[j][a[j]] for j in xrange(40)]
    for i in xrange(100000):
        total = sum(r)
    print (total)

I've moved everything out of the inner loop body and out of the sum call to make sure that we are measuring only the overhead of hand-crafted loops. Using xrange instead of range further improves the overall runtime, but this applies to both versions and thus does not change the comparison. The results of the modified code on my system is:

python sum.py
208
208
('Version 1 time: ', 0.2034609317779541)
('Version 2 time: ', 0.04234910011291504)

ver2 is five times faster than ver1. This is the pure performance gain of using sum instead of a hand-crafted loop.

Inspired by ShadowRanger's comment on the question about lookups, I have modified the example to compare the original code and check if the lookup of bound symbols:

def gen(s,b):
    for j in xrange(40):
        yield s[j][b[j]]

def ver2():
    for i in range(100000):
        total = sum(gen(Score, a))
    print (total)

I create a small custom generator which locally binds Score and a to prevent expensive lookups in parent scopes. Executing this:

python sum.py
208
208
('Version 1 time: ', 0.6167840957641602)
('Version 2 time: ', 0.6198039054870605)

The symbol lookups alone account for ~12% of the runtime.




回答2:


Since j is iterating over both lists, I thought I'd see if zip worked any better:

def ver3():
    for i in range(100000):
        total = sum(s[i] for s,i in zip(Score,a))
    print (total)

On Py2 this runs about 30% slower than version 2, but on Py3 about 20% faster than version 1. If I change zip to izip (imported from itertools), this cuts the time down to between versions 1 and 2.



来源:https://stackoverflow.com/questions/35191815/speed-of-sum-comprehension-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!