Dask item assignment. Cannot use loc for item assignment

可紊 提交于 2020-01-13 11:23:20

问题


I have a folder of parquet files that I can't fit in memory so I am using dask to perform the data cleansing operations. I have a function where I want to perform item assignment but I can't seem to find any solutions online that qualify as solutions to this particular function. Below is the function that works in pandas. How do I get the same results in a dask dataframe? I thought delayed might help but all of the solutions I've tried to write haven't been working.

def item_assignment(df):

    new_col = np.bitwise_and(df['OtherCol'], 0b110)

    df['NewCol'] = 0
    df.loc[new_col == 0b010, 'NewCol'] = 1
    df.loc[new_col == 0b100, 'NewCol'] = -1 

    return df

TypeError: '_LocIndexer' object does not support item assignment


回答1:


You can replace your loc assignments with dask.dataframe.Series.mask:

df['NewCol'] = 0
df['NewCol'] = df['NewCol'].mask(new_col == 0b010, 1)
df['NewCol'] = df['NewCol'].mask(new_col == 0b100, -1)



回答2:


You can use map_partitions in this case where you can use raw pandas functionality. I.e.

ddf.map_partitions(item_assignment)

this operates on the individual pandas constituent dataframes of the dask dataframe

df = pd.DataFrame({"OtherCol":[0b010, 0b110, 0b100, 0b110, 0b100, 0b010]})
ddf = dd.from_pandas(df, npartitions=2)
ddf.map_partitions(item_assignment).compute()

And we see the result as expected:

   OtherCol  NewCol
0         2       1
1         6       0
2         4      -1
3         6       0
4         4      -1
5         2       1


来源:https://stackoverflow.com/questions/54360549/dask-item-assignment-cannot-use-loc-for-item-assignment

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!