I have a large dataframe (several million rows).
I want to be able to do a groupby operation on it, but just grouping by arbitrary consecutive (preferably equal-size
Use numpy's array_split():
import numpy as np import pandas as pd data = pd.DataFrame(np.random.rand(10, 3)) for chunk in np.array_split(data, 5): assert len(chunk) == len(data) / 5