Fastest (most Pythonic) way to consume an iterator

前端 未结 2 590
孤城傲影
孤城傲影 2020-12-15 20:48

I am curious what the fastest way to consume an iterator would be, and the most Pythonic way.

For example, say that I want to create an iterator with the map

相关标签:
2条回答
  • 2020-12-15 21:12

    While you shouldn't be creating a map object just for side effects, there is in fact a standard recipe for consuming iterators in the itertools docs:

    def consume(iterator, n=None):
        "Advance the iterator n-steps ahead. If n is None, consume entirely."
        # Use functions that consume iterators at C speed.
        if n is None:
            # feed the entire iterator into a zero-length deque
            collections.deque(iterator, maxlen=0)
        else:
            # advance to the empty slice starting at position n
            next(islice(iterator, n, n), None)
    

    For just the "consume entirely" case, this can be simplified to

    def consume(iterator):
        collections.deque(iterator, maxlen=0)
    

    Using collections.deque this way avoids storing all the elements (because maxlen=0) and iterates at C speed, without bytecode interpretation overhead. There's even a dedicated fast path in the deque implementation for using a maxlen=0 deque to consume an iterator.

    Timing:

    In [1]: import collections
    
    In [2]: x = range(1000)
    
    In [3]: %%timeit
       ...: i = iter(x)
       ...: for _ in i:
       ...:     pass
       ...: 
    16.5 µs ± 829 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
    
    In [4]: %%timeit
       ...: i = iter(x)
       ...: collections.deque(i, maxlen=0)
       ...: 
    12 µs ± 566 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
    

    Of course, this is all based on CPython. The entire nature of interpreter overhead is very different on other Python implementations, and the maxlen=0 fast path is specific to CPython. See abarnert's answer for other Python implementations.

    0 讨论(0)
  • 2020-12-15 21:18

    If you only care about CPython, deque is the fastest way, as demonstrated in user2357112's answer.1 And the same thing has been demonstrated in 2.7 and 3.2, and 32- vs. 64-bit, and Windows vs. Linux, and so on.

    But that relies on an optimization in CPython's C implementation of deque. Other implementations may have no such optimization, which means they end up calling an append for each element.

    In PyPy in particular, there is no such optimization in the source,2 and the JIT cannot optimize that no-op append out. (And it's hard to see how it couldn't require at least a guard test each time through the loop.) Of course compared to the cost of looping in Python… right? But looping in Python is blazing fast in PyPy, almost as fast as a C loop in CPython, so this actually makes a huge difference.

    Comparing the times (using identical tests as in user's answer:3

              for      deque
    CPython   19.7us   12.7us
    PyPy       1.37us  23.3us
    

    There's no 3.x versions of the other major interpreters, and I don't have IPython for any of them, but a quick test with Jython shows similar effects.

    So, the fastest portable implementation is something like:

    if sys.implementation.name == 'cpython':
        import collections
        def consume(it):
            return collections.deque(it, maxlen=0)
    else:
        def consume(it):
            for _ in it:
                pass
    

    This of course gives me 12.7us in CPython, and 1.41us in PyPy.


    1. Of course you could write a custom C extension, but it's only going to be faster by a tiny constant term—you can avoid the constructor call and the test before jumping to the fast path, but once you get into that loop, you have to do exactly what it's doing.

    2. Tracing through PyPy source is always fun… but I think it ends up in the W_Deque class that's, which is part of the builtin _collections module.

    3. CPython 3.6.4; PyPy 5.10.1/3.5.3; both from the respective standard 64-bit macOS installers.

    0 讨论(0)
提交回复
热议问题