Iterate over a ‘window’ of adjacent elements in Python

前端 未结 5 1600
情话喂你
情话喂你 2020-12-15 08:33

This is more a question of elegance and performance rather than “how to do at all”, so I\'ll just show the code:

def iterate_adjacencies(gen, fill=0, size=2,         


        
相关标签:
5条回答
  • 2020-12-15 08:59

    Resulting function (from the edit of the question),

    frankeniter with ideas from answers of @agf, @FogleBird, @senderle, a resulting somewhat-neat-looking piece of code is:

    from itertools import chain, repeat, islice
    
    def window(seq, size=2, fill=0, fill_left=True, fill_right=False):
        """ Returns a sliding window (of width n) over data from the iterable:
          s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ...
        """
        ssize = size - 1
        it = chain(
          repeat(fill, ssize * fill_left),
          iter(seq),
          repeat(fill, ssize * fill_right))
        result = tuple(islice(it, size))
        if len(result) == size:  # `<=` if okay to return seq if len(seq) < size
            yield result
        for elem in it:
            result = result[1:] + (elem,)
            yield result
    

    and, for some performance information regarding deque/tuple:

    In [32]: kwa = dict(gen=xrange(1000), size=4, fill=-1, fill_left=True, fill_right=True)
    In [33]: %timeit -n 10000 [a+b+c+d for a,b,c,d in tmpf5.ia(**kwa)]
    10000 loops, best of 3: 358 us per loop
    In [34]: %timeit -n 10000 [a+b+c+d for a,b,c,d in tmpf5.window(**kwa)]
    10000 loops, best of 3: 368 us per loop
    In [36]: %timeit -n 10000 [sum(x) for x in tmpf5.ia(**kwa)]
    10000 loops, best of 3: 340 us per loop
    In [37]: %timeit -n 10000 [sum(x) for x in tmpf5.window(**kwa)]
    10000 loops, best of 3: 432 us per loop
    

    but anyway, if it's numbers then numpy is likely preferable.

    0 讨论(0)
  • 2020-12-15 09:01

    Ok, after coming to my senses, here's a non-ridiculous version of window_iter_fill. My previous version (visible in edits) was terrible because I forgot to use izip. Not sure what I was thinking. Using izip, this works, and, in fact, is the fastest option for small inputs!

    def window_iter_fill(gen, size=2, fill=None):
        gens = (chain(repeat(fill, size - i - 1), gen, repeat(fill, i))
                for i, gen in enumerate(tee(gen, size)))
        return izip(*gens)
    

    This one is also fine for tuple-yielding, but not quite as fast.

    def window_iter_deque(it, size=2, fill=None, fill_left=False, fill_right=False):
        lfill = repeat(fill, size - 1 if fill_left else 0)
        rfill = repeat(fill, size - 1 if fill_right else 0)
        it = chain(lfill, it, rfill)
        d = deque(islice(it, 0, size - 1), maxlen=size)
        for item in it:
            d.append(item)
            yield tuple(d)
    

    HoverHell's newest solution is still the best tuple-yielding solution for high inputs.

    Some timings:

    Arguments: [xrange(1000), 5, 'x', True, True]
    
    ==============================================================================
      window               HoverHell's frankeniter           :  0.2670ms [1.91x]
      window_itertools     from old itertools docs           :  0.2811ms [2.02x]
      window_iter_fill     extended `pairwise` with izip     :  0.1394ms [1.00x]
      window_iter_deque    deque-based, copying              :  0.4910ms [3.52x]
      ia_with_copy         deque-based, copying v2           :  0.4892ms [3.51x]
      ia                   deque-based, no copy              :  0.2224ms [1.60x]
    ==============================================================================
    

    Scaling behavior:

    Arguments: [xrange(10000), 50, 'x', True, True]
    
    ==============================================================================
      window               HoverHell's frankeniter           :  9.4897ms [4.61x]
      window_itertools     from old itertools docs           :  9.4406ms [4.59x]
      window_iter_fill     extended `pairwise` with izip     :  11.5223ms [5.60x]
      window_iter_deque    deque-based, copying              :  12.7657ms [6.21x]
      ia_with_copy         deque-based, copying v2           :  13.0213ms [6.33x]
      ia                   deque-based, no copy              :  2.0566ms [1.00x]
    ==============================================================================
    

    The deque-yielding solution by agf is super fast for large inputs -- seemingly O(n) instead of O(n, m) like the others, where n is the length of the iter and m is the size of the window -- because it doesn't have to iterate over every window. But I still think it makes more sense to yield a tuple in the general case, because the calling function is probably just going to iterate over the deque anyway; it's just a shift of the computational burden. The asymptotic behavior of the larger program should remain the same.

    Still, in some special cases, the deque-yielding version will probably be faster.

    Some more timings based on HoverHell's test structure.

    >>> import testmodule
    >>> kwa = dict(gen=xrange(1000), size=4, fill=-1, fill_left=True, fill_right=True)
    >>> %timeit -n 1000 [a + b + c + d for a, b, c, d in testmodule.window(**kwa)]
    1000 loops, best of 3: 462 us per loop
    >>> %timeit -n 1000 [a + b + c + d for a, b, c, d in testmodule.ia(**kwa)]
    1000 loops, best of 3: 463 us per loop
    >>> %timeit -n 1000 [a + b + c + d for a, b, c, d in testmodule.window_iter_fill(**kwa)]
    1000 loops, best of 3: 251 us per loop
    >>> %timeit -n 1000 [sum(x) for x in testmodule.window(**kwa)]
    1000 loops, best of 3: 525 us per loop
    >>> %timeit -n 1000 [sum(x) for x in testmodule.ia(**kwa)]
    1000 loops, best of 3: 462 us per loop
    >>> %timeit -n 1000 [sum(x) for x in testmodule.window_iter_fill(**kwa)]
    1000 loops, best of 3: 333 us per loop
    

    Overall, once you use izip, window_iter_fill is quite fast, as it turns out -- especially for small windows.

    0 讨论(0)
  • 2020-12-15 09:01

    I'm surprised nobody took a simple coroutine approach.

    from collections import deque
    
    
    def window(n, initial_data=None):
        if initial_data:
            win = deque(initial_data, n)
        else:
            win = deque(((yield) for _ in range(n)), n)
        while 1:
            side, val = (yield win)
            if side == 'left':
                win.appendleft(val)
            else:
                win.append(val)
    
    win = window(4)
    win.next()
    
    print(win.send(('left', 1)))
    print(win.send(('left', 2)))
    print(win.send(('left', 3)))
    print(win.send(('left', 4)))
    print(win.send(('right', 5)))
    
    ## -- Results of print statements --
    deque([1, None, None, None], maxlen=4)
    deque([2, 1, None, None], maxlen=4)
    deque([3, 2, 1, None], maxlen=4)
    deque([4, 3, 2, 1], maxlen=4)
    deque([3, 2, 1, 5], maxlen=4)
    
    0 讨论(0)
  • 2020-12-15 09:18

    This page shows how to implement a sliding window with itertools. http://docs.python.org/release/2.3.5/lib/itertools-example.html

    def window(seq, n=2):
        "Returns a sliding window (of width n) over data from the iterable"
        "   s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ...                   "
        it = iter(seq)
        result = tuple(islice(it, n))
        if len(result) == n:
            yield result    
        for elem in it:
            result = result[1:] + (elem,)
            yield result
    

    Example output:

    >>> list(window(range(10)))
    [(0, 1), (1, 2), (2, 3), (3, 4), (4, 5), (5, 6), (6, 7), (7, 8), (8, 9)]
    

    You'd need to change it to fill left and right if you need.

    0 讨论(0)
  • 2020-12-15 09:21

    This is my version that fills, keeping the signature the same. I have previously seen the itertools recipe, but did not look at it before writing this.

    from itertools import chain
    from collections import deque
    
    def ia(gen, fill=0, size=2, fill_left=True, fill_right=False):
        gen, ssize = iter(gen), size - 1
        deq = deque(chain([fill] * ssize * fill_left,
                          (next(gen) for _ in xrange((not fill_left) * ssize))),
                    maxlen = size)
        for item in chain(gen, [fill] * ssize * fill_right):
            deq.append(item)
            yield deq
    

    Edit: I also didn't see your comments on your question before posting this.

    Edit 2: Fixed. I had tried to do it with one chain but this design needs two.

    Edit 3: As @senderle noted, only use it this as a generator, don't wrap it with list or accumulate the output, as it yields the same mutable item repeatedly.

    0 讨论(0)
提交回复
热议问题