How to iterate over consecutive chunks of Pandas dataframe efficiently

前端未结

关注

 6  1766

悲&欢浪女 2020-11-28 03:52

I have a large dataframe (several million rows).

I want to be able to do a groupby operation on it, but just grouping by arbitrary consecutive (preferably equal-size

6条回答

予麋鹿 (楼主)

2020-11-28 04:12

A sign of a good environment is many choices, so I'll add this from Anaconda Blaze, really using Odo

import blaze as bz
import pandas as pd

df = pd.DataFrame({'col1':[1,2,3,4,5], 'col2':[2,4,6,8,10]})

for chunk in bz.odo(df, target=bz.chunks(pd.DataFrame), chunksize=2):
    # Do stuff with chunked dataframe

0 讨论(0)

查看其它6个回答