How to iterate over consecutive chunks of Pandas dataframe efficiently

前端 未结 6 1758
悲&欢浪女
悲&欢浪女 2020-11-28 03:52

I have a large dataframe (several million rows).

I want to be able to do a groupby operation on it, but just grouping by arbitrary consecutive (preferably equal-size

6条回答
  •  隐瞒了意图╮
    2020-11-28 03:58

    Use numpy's array_split():

    import numpy as np
    import pandas as pd
    
    data = pd.DataFrame(np.random.rand(10, 3))
    for chunk in np.array_split(data, 5):
      assert len(chunk) == len(data) / 5
    

提交回复
热议问题