I\'m trying to write the Haskel function \'splitEvery\' in Python. Here is it\'s definition:
splitEvery :: Int -> [e] -> [[e]]
@\'splitEvery\' n@ s
If you want a solution that
this does the trick:
def one_batch(first_value, iterator, batch_size):
yield first_value
for i in xrange(1, batch_size):
yield iterator.next()
def batch_iterator(iterator, batch_size):
iterator = iter(iterator)
while True:
first_value = iterator.next() # Peek.
yield one_batch(first_value, iterator, batch_size)
It works by peeking at the next value in the iterator and passing that as the first value to a generator (one_batch()) that will yield it, along with the rest of the batch.
The peek step will raise StopIteration exactly when the input iterator is exhausted and there are no more batches. Since this is the correct time to raise StopIteration in the batch_iterator() method, there is no need to catch the exception.
This will process lines from stdin in batches:
for input_batch in batch_iterator(sys.stdin, 10000):
for line in input_batch:
process(line)
finalise()
I've found this useful for processing lots of data and uploading the results in batches to an external store.