There are few issues I am having with Dask Dataframes.
lets say I have a dataframe with 2 columns [\'a\',\'b\']
if i want a new column c =
Setitem syntax now works in dask.dataframe
df['z'] = df.x + df.y
You're correct that the setitem syntax doesn't work in dask.dataframe.
df['c'] = ... # mutation not supported
As you suggest you should instead use .assign(...)
.
df = df.assign(c=df.a + df.b)
In your example you have an unnecessary call to .compute()
. Generally you want to call compute only at the very end, once you have your final result.
As before, dask.dataframe
does not support changing rows in place. Inplace operations are difficult to reason about in parallel codes. At the moment dask.dataframe
has no nice alternative operation in this case. I've raised issue #653 for conversation on this topic.