问题
I have a numpy array that i want to add as a column in a existing dask dataframe.
enc = LabelEncoder()
nparr = enc.fit_transform(X[['url']])
I have ddf of type dask dataframe.
ddf['nurl'] = nparr ???
Any elegant way to achieve above please?
Python PANDAS: Converting from pandas/numpy to dask dataframe/array This does not solve my issue as i want numpy array into existing dask dataframe.
回答1:
You can convert the numpy array to a dask Series object, then merge it to the dataframe. You will need to use the .to_frame()
method of the Series object since it dask only support merging dataframes with other dataframes.
import dask.dataframe as dd
import numpy as np
import pandas as pd
df = pd.DataFrame({'x': range(30), 'y': range(0,300, 10)})
arr = np.random.randint(0, 100, size=30)
# create dask frame and series
ddf = ddf = dd.from_pandas(df, npartitions=5)
darr = dd.from_array(arr)
# give it a name to use as a column head
darr.name = 'z'
ddf2 = ddf.merge(darr.to_frame())
ddf2
# returns:
Dask DataFrame Structure:
x y z
npartitions=5
0 int64 int64 int32
6 ... ... ...
... ... ... ...
24 ... ... ...
29 ... ... ...
Dask Name: join-indexed, 33 tasks
来源:https://stackoverflow.com/questions/57607155/converting-numpy-array-into-dask-dataframe-column