itertools.product slower than nested for loops

后端 未结 3 1011
再見小時候
再見小時候 2020-12-19 02:34

I am trying using the itertools.product function to make a segment of my code (in an isotopic pattern simulator) easier to read and hopefully faster as well (th

相关标签:
3条回答
  • 2020-12-19 03:08

    Your original itertool code spent a lot extra time in the needless lambda, and building lists of intermediate values by hand - a lot of this can be replaced with builtin functionality.

    Now, the inner for loop does add quite a lot extra overhead: just try the following and the performance is very much on par with your original code:

    for a in itertools.product(carbons,hydrogens,nitrogens,oxygens17,
                               oxygens18,sulfurs33,sulfurs34,sulfurs36):
        i, j, k, l, m, n, o, p = a
        totals.append((i[0]+j[0]+k[0]+l[0]+m[0]+n[0]+o[0]+p[0],
                       i[1]*j[1]*k[1]*l[1]*m[1]*n[1]*o[1]*p[1]))
    

    The following code runs as much as possible in the CPython builtin side, and I tested it to be equivalent to with code. Notably the code uses zip(*iterable) to unzip each of the product results; then uses the reduce with operator.mul for product, and sum for summing; 2 generators for going through the lists. The for loop still beats slightly, but being hardcoded it probably is not what you can use in the long run.

    import itertools
    from operator import mul
    from functools import partial
    
    prod = partial(reduce, mul)
    elems = carbons, hydrogens, nitrogens, oxygens17, oxygens18, sulfurs33, sulfurs34, sulfurs36
    p = itertools.product(*elems)
    
    totals = [
        ( sum(massdiffs), prod(chances) )
        for massdiffs, chances in
        ( zip(*i) for i in p )
    ]
    
    0 讨论(0)
  • 2020-12-19 03:08

    I timed these two functions, which use the absolute minimum of extra code:

    def nested_for(first_iter, second_iter):
        for i in first_iter:
            for j in second_iter:
                pass
    
    def using_product(first_iter, second_iter):
        for i in product(first_iter, second_iter):
            pass
    

    Their bytecode instructions are similar:

    dis(nested_for)
      2           0 SETUP_LOOP              26 (to 28)
                  2 LOAD_FAST                0 (first_iter)
                  4 GET_ITER
            >>    6 FOR_ITER                18 (to 26)
                  8 STORE_FAST               2 (i)
    
      3          10 SETUP_LOOP              12 (to 24)
                 12 LOAD_FAST                1 (second_iter)
                 14 GET_ITER
            >>   16 FOR_ITER                 4 (to 22)
                 18 STORE_FAST               3 (j)
    
      4          20 JUMP_ABSOLUTE           16
            >>   22 POP_BLOCK
            >>   24 JUMP_ABSOLUTE            6
            >>   26 POP_BLOCK
            >>   28 LOAD_CONST               0 (None)
                 30 RETURN_VALUE
    
    dis(using_product)
      2           0 SETUP_LOOP              18 (to 20)
                  2 LOAD_GLOBAL              0 (product)
                  4 LOAD_FAST                0 (first_iter)
                  6 LOAD_FAST                1 (second_iter)
                  8 CALL_FUNCTION            2
                 10 GET_ITER
            >>   12 FOR_ITER                 4 (to 18)
                 14 STORE_FAST               2 (i)
    
      3          16 JUMP_ABSOLUTE           12
            >>   18 POP_BLOCK
            >>   20 LOAD_CONST               0 (None)
                 22 RETURN_VALUE
    

    And here are the results:

    >>> timer = partial(timeit, number=1000, globals=globals())
    >>> timer("nested_for(range(100), range(100))")
    0.1294467518782625
    >>> timer("using_product(range(100), range(100))")
    0.4335527486212385
    

    The results of additional tests performed via timeit and manual use of perf_counter were consistent with those above. Using product is clearly substantially slower than the use of nested for loops. However, based on the tests already displayed in previous answers, the discrepancy between the two approaches is inversely proportional to the number of nested loops (and, of course, the size of the tuple containing the Cartesian product).

    0 讨论(0)
  • 2020-12-19 03:17

    My strong suspicion is that the slowness comes from the creation of temporary variables/in places adds/creation of a function every time via lambda as well as the overhead of the function call. Just to demonstrate why the way you are doing addition is slower in case 2 I did this:

    import dis
    s = '''
        a = (1, 2)
        b = (2, 3)
        c = (3, 4)
    
        z = (a[0] + b[0] + c[0])
    
        t = 0
        t += a[0]
        t += b[0]
        t += c[0]
        '''
    
    x = compile(s, '', 'exec')
    
    dis.dis(x)
    

    This gives:

    <snip out variable declaration>
    5          18 LOAD_NAME                0 (a)
               21 LOAD_CONST               4 (0)
               24 BINARY_SUBSCR
               25 LOAD_NAME                1 (b)
               28 LOAD_CONST               4 (0)
               31 BINARY_SUBSCR
               32 BINARY_ADD
               33 LOAD_NAME                2 (c)
               36 LOAD_CONST               4 (0)
               39 BINARY_SUBSCR
               40 BINARY_ADD
               41 STORE_NAME               3 (z)
    

    7          50 LOAD_NAME                4 (t)
               53 LOAD_NAME                0 (a)
               56 LOAD_CONST               4 (0)
               59 BINARY_SUBSCR
               60 INPLACE_ADD
               61 STORE_NAME               4 (t)
    
    8          64 LOAD_NAME                4 (t)
               67 LOAD_NAME                1 (b)
               70 LOAD_CONST               4 (0)
               73 BINARY_SUBSCR
               74 INPLACE_ADD
               75 STORE_NAME               4 (t)
    
    9          78 LOAD_NAME                4 (t)
               81 LOAD_NAME                2 (c)
               84 LOAD_CONST               4 (0)
               87 BINARY_SUBSCR
               88 INPLACE_ADD
               89 STORE_NAME               4 (t)
               92 LOAD_CONST               5 (None)
               95 RETURN_VALUE
    

    As you can see there is an additional 2 opcode overhead because of the += addition vs the inline addition. This overhead comes from needing to load and store the name. I imagine this is just the beginning and Antti Haapala has code that spends more time in cpython builtins calling c code than running just in python. Function call overhead is expensive in python.

    0 讨论(0)
提交回复
热议问题