Dask Dataframe sum of column always returning scalar [duplicate]

一曲冷凌霜 提交于 2019-12-11 10:16:13

问题


I've created a Dask Dataframe (called "df") and the column with index "11" has integer values:

In [62]: df[11]
Out[62]:
Dask Series Structure:
npartitions=42
    int64
      ...
    ...
      ...
      ...
Name: 11, dtype: int64
Dask Name: getitem, 168 tasks

I'm trying to sum these with:

df[11].sum() 

I get dd.Scalar<series-..., dtype=int64> returned. Despite researching what this might mean I'm still at odds as to why I'm not getting a numerical value returned. How can I translate this into its numerical value?


回答1:


I think you need compute for telling Dask to process everything that came before:

compute(**kwargs)
Compute this dask collection

This turns a lazy Dask collection into its in-memory equivalent. For example a Dask.array turns into a numpy.array() and a Dask.dataframe turns into a Pandas dataframe. The entire dataset must fit into memory before calling this operation.

df[11].sum().compute()


来源:https://stackoverflow.com/questions/52663751/dask-dataframe-sum-of-column-always-returning-scalar

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!