Label encoding across multiple columns in scikit-learn

后端未结

关注

 22  2341

礼貌的吻别 2020-11-22 09:02

I\'m trying to use scikit-learn\'s LabelEncoder to encode a pandas DataFrame of string labels. As the dataframe has many (50+) columns, I want to a

22条回答

爱一瞬间的悲伤 (楼主)

2020-11-22 09:44
You can easily do this though,
```
df.apply(LabelEncoder().fit_transform)
```
EDIT2:

In scikit-learn 0.20, the recommended way is
```
OneHotEncoder().fit_transform(df)
```
as the OneHotEncoder now supports string input. Applying OneHotEncoder only to certain columns is possible with the ColumnTransformer.

EDIT:

Since this answer is over a year ago, and generated many upvotes (including a bounty), I should probably extend this further.

For inverse_transform and transform, you have to do a little bit of hack.
```
from collections import defaultdict
d = defaultdict(LabelEncoder)
```
With this, you now retain all columns LabelEncoder as dictionary.
```
# Encoding the variable
fit = df.apply(lambda x: d[x.name].fit_transform(x))

# Inverse the encoded
fit.apply(lambda x: d[x.name].inverse_transform(x))

# Using the dictionary to label future data
df.apply(lambda x: d[x.name].transform(x))
```
0 讨论(0)

查看其它22个回答
发布评论:

提交评论
- 加载中...