iterate over GroupBy object in dask

僤鯓⒐⒋嵵緔 提交于 2019-11-29 10:40:25

you could iterate through groups doing this with dask, maybe there is a better way but this works for me.

import dask.dataframe as dd
import pandas as pd
pdf = pd.DataFrame({'A':[1, 2, 3, 4, 5], 'B':['1','1','a','a','a']})
ddf = dd.from_pandas(pdf, npartitions = 3)
groups = ddf.groupby('B')

for group in pdf['B'].unique():
    print groups.get_group(group)

this would return

dd.DataFrame<dataframe-groupby-get_group-e3ebb5d5a6a8001da9bb7653fface4c1, divisions=(0, 2, 4, 4)>
dd.DataFrame<dataframe-groupby-get_group-022502413b236592cf7d54b2dccf10a9, divisions=(0, 2, 4, 4)>

Generally iterating over Dask.dataframe objects is not recommended. It is inefficient. Instead you might want to try constructing a function and mapping that function over the resulting groups using groupby.apply

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!