split a generator/iterable every n items in python (splitEvery)

前端 未结 13 1414
臣服心动
臣服心动 2020-11-27 16:44

I\'m trying to write the Haskel function \'splitEvery\' in Python. Here is it\'s definition:

splitEvery :: Int -> [e] -> [[e]]
    @\'splitEvery\' n@ s         


        
13条回答
  •  鱼传尺愫
    2020-11-27 17:37

    If you want a solution that

    • uses generators only (no intermediate lists or tuples),
    • works for very long (or infinite) iterators,
    • works for very large batch sizes,

    this does the trick:

    def one_batch(first_value, iterator, batch_size):
        yield first_value
        for i in xrange(1, batch_size):
            yield iterator.next()
    
    def batch_iterator(iterator, batch_size):
        iterator = iter(iterator)
        while True:
            first_value = iterator.next()  # Peek.
            yield one_batch(first_value, iterator, batch_size)
    

    It works by peeking at the next value in the iterator and passing that as the first value to a generator (one_batch()) that will yield it, along with the rest of the batch.

    The peek step will raise StopIteration exactly when the input iterator is exhausted and there are no more batches. Since this is the correct time to raise StopIteration in the batch_iterator() method, there is no need to catch the exception.

    This will process lines from stdin in batches:

    for input_batch in batch_iterator(sys.stdin, 10000):
        for line in input_batch:
            process(line)
        finalise()
    

    I've found this useful for processing lots of data and uploading the results in batches to an external store.

提交回复
热议问题