问题
I've created a Dask Dataframe (called "df") and the column with index "11" has integer values:
In [62]: df[11]
Out[62]:
Dask Series Structure:
npartitions=42
int64
...
...
...
...
Name: 11, dtype: int64
Dask Name: getitem, 168 tasks
I'm trying to sum these with:
df[11].sum()
I get dd.Scalar<series-..., dtype=int64> returned. Despite researching what this might mean I'm still at odds as to why I'm not getting a numerical value returned. How can I translate this into its numerical value?
回答1:
I think you need compute for telling Dask to process everything that came before:
compute(**kwargs)
Compute this dask collectionThis turns a lazy Dask collection into its in-memory equivalent. For example a Dask.array turns into a numpy.array() and a Dask.dataframe turns into a Pandas dataframe. The entire dataset must fit into memory before calling this operation.
df[11].sum().compute()
来源:https://stackoverflow.com/questions/52663751/dask-dataframe-sum-of-column-always-returning-scalar