How to reverse sklearn.OneHotEncoder transform to recover original data?

后端 未结 8 1851
深忆病人
深忆病人 2020-12-13 07:34

I encoded my categorical data using sklearn.OneHotEncoder and fed them to a random forest classifier. Everything seems to work and I got my predicted output bac

8条回答
  •  再見小時候
    2020-12-13 08:19

    If the features are dense, like [1,2,4,5,6], with several number missed. Then, we can mapping them to corresponding positions.

    >>> import numpy as np
    >>> from scipy import sparse
    >>> def _sparse_binary(y):
    ...     # one-hot codes of y with scipy.sparse matrix.
    ...     row = np.arange(len(y))
    ...     col = y - y.min()
    ...     data = np.ones(len(y))
    ...     return sparse.csr_matrix((data, (row, col)))
    ... 
    >>> y = np.random.randint(-2,2, 8).reshape([4,2])
    >>> y
    array([[ 0, -2],
           [-2,  1],
           [ 1,  0],
           [ 0, -2]])
    >>> yc = [_sparse_binary(y[:,i]) for i in xrange(2)]
    >>> for i in yc: print i.todense()
    ... 
    [[ 0.  0.  1.  0.]
     [ 1.  0.  0.  0.]
     [ 0.  0.  0.  1.]
     [ 0.  0.  1.  0.]]
    [[ 1.  0.  0.  0.]
     [ 0.  0.  0.  1.]
     [ 0.  0.  1.  0.]
     [ 1.  0.  0.  0.]]
    >>> [i.shape for i in yc]
    [(4, 4), (4, 4)]
    

    This is a compromised and simple method, but works and easy to reverse by argmax(), e.g.:

    >>> np.argmax(yc[0].todense(), 1) + y.min(0)[0]
    matrix([[ 0],
            [-2],
            [ 1],
            [ 0]])
    

提交回复
热议问题