How to iterate over consecutive chunks of Pandas dataframe efficiently

前端 未结 6 1766
悲&欢浪女
悲&欢浪女 2020-11-28 03:52

I have a large dataframe (several million rows).

I want to be able to do a groupby operation on it, but just grouping by arbitrary consecutive (preferably equal-size

6条回答
  •  予麋鹿
    予麋鹿 (楼主)
    2020-11-28 04:12

    A sign of a good environment is many choices, so I'll add this from Anaconda Blaze, really using Odo

    import blaze as bz
    import pandas as pd
    
    df = pd.DataFrame({'col1':[1,2,3,4,5], 'col2':[2,4,6,8,10]})
    
    for chunk in bz.odo(df, target=bz.chunks(pd.DataFrame), chunksize=2):
        # Do stuff with chunked dataframe
    

提交回复
热议问题