Converting a Pandas Dataframe column into one hot labels

风格不统一 提交于 2019-11-28 08:09:56

问题


I have a pandas dataframe similar to this:

  Col1   ABC
0  XYZ    A
1  XYZ    B
2  XYZ    C

By using the pandas get_dummies() function on column ABC, I can get this:

  Col1   A   B   C
0  XYZ   1   0   0
1  XYZ   0   1   0
2  XYZ   0   0   1

While I need something like this, where the ABC column has a list / array datatype:

  Col1    ABC
0  XYZ    [1,0,0]
1  XYZ    [0,1,0]
2  XYZ    [0,0,1]

I tried using the get_dummies function and then combining all the columns into the column which I wanted. I found lot of answers explaining how to combine multiple columns as strings, like this: Combine two columns of text in dataframe in pandas/python. But I cannot figure out a way to combine them as a list.

This question introduced the idea of using sklearn's OneHotEncoder, but I couldn't get it to work. How do I one-hot encode one column of a pandas dataframe?

One more thing: All the answers I came across had solutions where the column names had to be manually typed while combining them. Is there a way to use Dataframe.iloc() or splicing mechanism to combine columns into a list?


回答1:


Here is an example of using sklearn.preprocessing.LabelBinarizer:

In [361]: from sklearn.preprocessing import LabelBinarizer

In [362]: lb = LabelBinarizer()

In [363]: df['new'] = lb.fit_transform(df['ABC']).tolist()

In [364]: df
Out[364]:
  Col1 ABC        new
0  XYZ   A  [1, 0, 0]
1  XYZ   B  [0, 1, 0]
2  XYZ   C  [0, 0, 1]

Pandas alternative:

In [370]: df['new'] = df['ABC'].str.get_dummies().values.tolist()

In [371]: df
Out[371]:
  Col1 ABC        new
0  XYZ   A  [1, 0, 0]
1  XYZ   B  [0, 1, 0]
2  XYZ   C  [0, 0, 1]



回答2:


You can just use tolist():

df['ABC'] = pd.get_dummies(df.ABC).values.tolist()

  Col1        ABC
0  XYZ  [1, 0, 0]
1  XYZ  [0, 1, 0]
2  XYZ  [0, 0, 1]



回答3:


If you have a pd.DataFrame like this:

>>> df
  Col1  A  B  C
0  XYZ  1  0  0
1  XYZ  0  1  0
2  XYZ  0  0  1

You can always do something like this:

>>> df.apply(lambda s: list(s[1:]), axis=1)
0    [1, 0, 0]
1    [0, 1, 0]
2    [0, 0, 1]
dtype: object

Note, this is essentially a for-loop on the rows. Note, columns do not have list data-types, they must be object, which will make your data-frame operations not able to take advantage of the speed benefits of numpy.




回答4:


if you have a data-frame df with categorical column ABC then you could use to create a new column of one-hot vectors

df['new_column'] = list(pandas.get_dummies(df['AB]).get_values())


来源:https://stackoverflow.com/questions/47127388/converting-a-pandas-dataframe-column-into-one-hot-labels

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!