How to reverse sklearn.OneHotEncoder transform to recover original data?

后端未结

关注

 8  1848

深忆病人 2020-12-13 07:34

I encoded my categorical data using sklearn.OneHotEncoder and fed them to a random forest classifier. Everything seems to work and I got my predicted output bac

8条回答

旧巷少年郎 (楼主)

2020-12-13 08:24
Just compute dot-product of the encoded values with ohe.active_features_. It works both for sparse and dense representation. Example:
```
from sklearn.preprocessing import OneHotEncoder
import numpy as np

orig = np.array([6, 9, 8, 2, 5, 4, 5, 3, 3, 6])

ohe = OneHotEncoder()
encoded = ohe.fit_transform(orig.reshape(-1, 1)) # input needs to be column-wise

decoded = encoded.dot(ohe.active_features_).astype(int)
assert np.allclose(orig, decoded)
```
The key insight is that the active_features_ attribute of the OHE model represents the original values for each binary column. Thus we can decode the binary-encoded number by simply computing a dot-product with active_features_. For each data point there's just a single 1 the position of the original value.
0 讨论(0)

查看其它8个回答
发布评论:

提交评论
- 加载中...