Can't drop columns or slice dataframe using dask?

ε祈祈猫儿з 提交于 2019-12-01 02:28:52

问题


I am trying to use dask instead of pandas since I have 2.6gb csv file. I load it and I want to drop a column. but it seems that neither the drop method df.drop('column') or slicing df[ : , :-1]

is implemented yet. Is this the case or am I just missing something ?


回答1:


We implemented the drop method in this PR. This is available as of dask 0.7.0.

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({'x': [1, 2, 3], 'y': [3, 2, 1]})

In [3]: import dask.dataframe as dd

In [4]: ddf = dd.from_pandas(df, npartitions=2)

In [5]: ddf.drop('y', axis=1).compute()
Out[5]: 
   x
0  1
1  2
2  3

Previously one could also have used slicing with column names; though of course this can be less attractive if you have many columns.

In [6]: ddf[['x']].compute()
Out[6]: 
   x
0  1
1  2
2  3


来源:https://stackoverflow.com/questions/31867983/cant-drop-columns-or-slice-dataframe-using-dask

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!