How to reverse sklearn.OneHotEncoder transform to recover original data?

后端 未结 8 1848
深忆病人
深忆病人 2020-12-13 07:34

I encoded my categorical data using sklearn.OneHotEncoder and fed them to a random forest classifier. Everything seems to work and I got my predicted output bac

8条回答
  •  旧巷少年郎
    2020-12-13 08:24

    Just compute dot-product of the encoded values with ohe.active_features_. It works both for sparse and dense representation. Example:

    from sklearn.preprocessing import OneHotEncoder
    import numpy as np
    
    orig = np.array([6, 9, 8, 2, 5, 4, 5, 3, 3, 6])
    
    ohe = OneHotEncoder()
    encoded = ohe.fit_transform(orig.reshape(-1, 1)) # input needs to be column-wise
    
    decoded = encoded.dot(ohe.active_features_).astype(int)
    assert np.allclose(orig, decoded)
    

    The key insight is that the active_features_ attribute of the OHE model represents the original values for each binary column. Thus we can decode the binary-encoded number by simply computing a dot-product with active_features_. For each data point there's just a single 1 the position of the original value.

提交回复
热议问题