How to iterate over consecutive chunks of Pandas dataframe efficiently

前端未结

关注

 6  1776

悲&欢浪女 2020-11-28 03:52

I have a large dataframe (several million rows).

I want to be able to do a groupby operation on it, but just grouping by arbitrary consecutive (preferably equal-size

6条回答

隐瞒了意图╮ (楼主)

2020-11-28 03:58

Use numpy's array_split():

import numpy as np
import pandas as pd

data = pd.DataFrame(np.random.rand(10, 3))
for chunk in np.array_split(data, 5):
  assert len(chunk) == len(data) / 5

0 讨论(0)

查看其它6个回答