How to reverse sklearn.OneHotEncoder transform to recover original data?

后端未结

关注

 8  1851

深忆病人 2020-12-13 07:34

I encoded my categorical data using sklearn.OneHotEncoder and fed them to a random forest classifier. Everything seems to work and I got my predicted output bac

8条回答

再見小時候 (楼主)

2020-12-13 08:19

If the features are dense, like [1,2,4,5,6], with several number missed. Then, we can mapping them to corresponding positions.

>>> import numpy as np
>>> from scipy import sparse
>>> def _sparse_binary(y):
...     # one-hot codes of y with scipy.sparse matrix.
...     row = np.arange(len(y))
...     col = y - y.min()
...     data = np.ones(len(y))
...     return sparse.csr_matrix((data, (row, col)))
... 
>>> y = np.random.randint(-2,2, 8).reshape([4,2])
>>> y
array([[ 0, -2],
       [-2,  1],
       [ 1,  0],
       [ 0, -2]])
>>> yc = [_sparse_binary(y[:,i]) for i in xrange(2)]
>>> for i in yc: print i.todense()
... 
[[ 0.  0.  1.  0.]
 [ 1.  0.  0.  0.]
 [ 0.  0.  0.  1.]
 [ 0.  0.  1.  0.]]
[[ 1.  0.  0.  0.]
 [ 0.  0.  0.  1.]
 [ 0.  0.  1.  0.]
 [ 1.  0.  0.  0.]]
>>> [i.shape for i in yc]
[(4, 4), (4, 4)]

This is a compromised and simple method, but works and easy to reverse by argmax(), e.g.:

>>> np.argmax(yc[0].todense(), 1) + y.min(0)[0]
matrix([[ 0],
        [-2],
        [ 1],
        [ 0]])

0 讨论(0)

查看其它8个回答