Python built-in sum function vs. for loop performance

前端 未结 4 1533
野性不改
野性不改 2020-12-03 08:34

I noticed that Python\'s built-in sum function is roughly 3x faster than a for loop when summing a list of 1 000 000 integers:

import timeit

de         


        
4条回答
  •  小蘑菇
    小蘑菇 (楼主)
    2020-12-03 09:00

    The speed difference is actually greater than 3 times, but you slow down either version by first creating a huge in-memory list of 1 million integers. Separate that out of the time trials:

    >>> import timeit
    >>> def sum1(lst):
    ...     s = 0
    ...     for i in lst:
    ...         s += i
    ...     return s
    ... 
    >>> def sum2(lst):
    ...     return sum(lst)
    ... 
    >>> values = range(1000000)
    >>> timeit.timeit('f(lst)', 'from __main__ import sum1 as f, values as lst', number=100)
    3.457869052886963
    >>> timeit.timeit('f(lst)', 'from __main__ import sum2 as f, values as lst', number=100)
    0.6696369647979736
    

    The speed difference has risen to over 5 times now.

    A for loop is executed as interpreted Python bytecode. sum() loops entirely in C code. The speed difference between interpreted bytecode and C code is large.

    In addition, the C code makes sure not to create new Python objects if it can keep the sum in C types instead; this works for int and float results.

    The Python version, disassembled, does this:

    >>> import dis
    >>> def sum1():
    ...     s = 0
    ...     for i in range(1000000):
    ...         s += i
    ...     return s
    ... 
    >>> dis.dis(sum1)
      2           0 LOAD_CONST               1 (0)
                  3 STORE_FAST               0 (s)
    
      3           6 SETUP_LOOP              30 (to 39)
                  9 LOAD_GLOBAL              0 (range)
                 12 LOAD_CONST               2 (1000000)
                 15 CALL_FUNCTION            1
                 18 GET_ITER            
            >>   19 FOR_ITER                16 (to 38)
                 22 STORE_FAST               1 (i)
    
      4          25 LOAD_FAST                0 (s)
                 28 LOAD_FAST                1 (i)
                 31 INPLACE_ADD         
                 32 STORE_FAST               0 (s)
                 35 JUMP_ABSOLUTE           19
            >>   38 POP_BLOCK           
    
      5     >>   39 LOAD_FAST                0 (s)
                 42 RETURN_VALUE        
    

    Apart from the interpreter loop being slower than C, the INPLACE_ADD will create a new integer object (past 255, CPython caches small int objects as singletons).

    You can see the C implementation in the Python mercurial code repository, but it explicitly states in the comments:

    /* Fast addition by keeping temporary sums in C instead of new Python objects.
       Assumes all inputs are the same type.  If the assumption fails, default
       to the more general routine.
    */
    

提交回复
热议问题