Python generator that groups another iterable into groups of N [duplicate]

问题

I'm looking for a function that takes an iterable i and a size n and yields tuples of length n that are sequential values from i:

x = [1,2,3,4,5,6,7,8,9,0]
[z for z in TheFunc(x,3)]

gives

[(1,2,3),(4,5,6),(7,8,9),(0)]

Does such a function exist in the standard library?

If it exists as part of the standard library, I can't seem to find it and I've run out of terms to search for. I could write my own, but I'd rather not.

回答1:

See the grouper recipe in the docs for the itertools package

def grouper(n, iterable, fillvalue=None):
  "grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx"
  args = [iter(iterable)] * n
  return izip_longest(fillvalue=fillvalue, *args)

(However, this is a duplicate of quite a few questions.)

回答2:

When you want to group an iterator in chunks of n without padding the final group with a fill value, use iter(lambda: list(IT.islice(iterable, n)), []):

import itertools as IT

def grouper(n, iterable):
    """
    >>> list(grouper(3, 'ABCDEFG'))
    [['A', 'B', 'C'], ['D', 'E', 'F'], ['G']]
    """
    iterable = iter(iterable)
    return iter(lambda: list(IT.islice(iterable, n)), [])

seq = [1,2,3,4,5,6,7]
print(list(grouper(3, seq)))

yields

[[1, 2, 3], [4, 5, 6], [7]]

There is an explanation of how it works in the second half of this answer.

When you want to group an iterator in chunks of n and pad the final group with a fill value, use the grouper recipe zip_longest(*[iterator]*n):

For example, in Python2:

>>> list(IT.izip_longest(*[iter(seq)]*3, fillvalue='x'))
[(1, 2, 3), (4, 5, 6), (7, 'x', 'x')]

In Python3, what was izip_longest is now renamed zip_longest:

>>> list(IT.zip_longest(*[iter(seq)]*3, fillvalue='x'))
[(1, 2, 3), (4, 5, 6), (7, 'x', 'x')]

When you want to group a sequence in chunks of n you can use the chunks recipe:

def chunks(seq, n):
    # https://stackoverflow.com/a/312464/190597 (Ned Batchelder)
    """ Yield successive n-sized chunks from seq."""
    for i in xrange(0, len(seq), n):
        yield seq[i:i + n]

Note that, unlike iterators in general, sequences by definition have a length (i.e. __len__ is defined).

回答3:

How about this one? It doesn't have a fill value though.

>>> def partition(itr, n):
...     i = iter(itr)
...     res = None
...     while True:
...             res = list(itertools.islice(i, 0, n))
...             if res == []:
...                     break
...             yield res
...
>>> list(partition([1, 2, 3, 4, 5, 6, 7, 8, 9], 3))
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
>>>

It utilizes a copy of the original iterable, which it exhausts for each successive splice. The only other way my tired brain could come up with was generating splice end-points with range.

Maybe I should change list() to tuple() so it better corresponds to your output.

回答4:

This is a very common request in Python. Common enough that it made it into the boltons unified utility package. First off, there are extensive docs here. Furthermore, the module is designed and tested to only rely on the standard library (Python 2 and 3 compatible), meaning you can just download the file directly into your project.

# if you downloaded/embedded, try:
# from iterutils import chunked

# with `pip install boltons` use:

from boltons.iterutils import chunked 

print(chunked(range(10), 3))
# [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9]]

There's an iterator/generator form for indefinite/long sequences as well:

print(list(chunked_iter(range(10), 3, fill=None)))
# [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, None, None]]

As you can see, you can also fill the sequence with a value of your choosing, as well. Finally, as the maintainer, I can assure you that, while the code has been downloaded/tested by thousands of developers, if you encounter any issues, you'll get the fastest support possible through the boltons GitHub Issues page. Hope this (and/or any of the other 150+ boltons recipes) helped!

回答5:

I use the chunked function from the more_itertools package.

$ pip install more_itertools
$ python
>>> x = [1,2,3,4,5,6,7,8,9,0]
>>> [tuple(z) for z in more_itertools.more.chunked(x, 3)]
[(1, 2, 3), (4, 5, 6), (7, 8, 9), (0,)]

回答6:

This is a very old quesiton, but I think it is useful to mention the following approach for the general case. Its main merit is that it only needs to iterate over the data once, so it will work with database cursors or other sequences that can only be used once. I also find it more readable.

def chunks(n, iterator):
    out = []
    for elem in iterator:
        out.append(elem)
        if len(out) == n:
            yield out
            out = []
    if out:
        yield out

回答7:

I know this has been answered several times but I'm adding my solution which should improve in both, general applicability to sequences and iterators, readability (no invisible loop exit condition by StopIteration exception) and performance when compared to the grouper recipe. It is most similar to the last answer by Svein.

def chunkify(iterable, n):
    iterable = iter(iterable)
    n_rest = n - 1

    for item in iterable:
        rest = itertools.islice(iterable, n_rest)
        yield itertools.chain((item,), rest)

回答8:

Here is a different solution which makes no use of itertools and, even though it has a couple more lines, it apparently outperforms the given answers when chunks are a lot shorter than the iterable lenght. However, for big chunks the other answers are much faster.

def batchiter(iterable, batch_size):
    """
    >>> list(batchiter('ABCDEFG', 3))
    [['A', 'B', 'C'], ['D', 'E', 'F'], ['G']]
    """
    next_batch = []
    for element in iterable:
        next_batch.append(element)
        if len(next_batch) == batch_size:
            batch, next_batch = next_batch, []
            yield batch
    if next_batch:
        yield next_batch


In [19]: %timeit [b for b in batchiter(range(1000), 3)]
1000 loops, best of 3: 644 µs per loop

In [20]: %timeit [b for b in grouper(3, range(1000))]
1000 loops, best of 3: 897 µs per loop

In [21]: %timeit [b for b in partition(range(1000), 3)]
1000 loops, best of 3: 890 µs per loop

In [22]: %timeit [b for b in batchiter(range(1000), 333)]
1000 loops, best of 3: 540 µs per loop

In [23]: %timeit [b for b in grouper(333, range(1000))]
10000 loops, best of 3: 81.7 µs per loop

In [24]: %timeit [b for b in partition(range(1000), 333)]
10000 loops, best of 3: 80.1 µs per loop

回答9:

    def grouper(iterable, n):
        while True:
            yield itertools.chain((next(iterable),), itertools.islice(iterable, n-1))

来源：https://stackoverflow.com/questions/3992735/python-generator-that-groups-another-iterable-into-groups-of-n

标签

python

generator

std