Scikit-learn's LabelBinarizer vs. OneHotEncoder

前端 未结 4 572
陌清茗
陌清茗 2020-11-30 00:37

What is the difference between the two? It seems that both create new columns, which their number is equal to the number of unique categories in the feature. Then they assig

4条回答
  •  慢半拍i
    慢半拍i (楼主)
    2020-11-30 01:06

    A simple example which encodes an array using LabelEncoder, OneHotEncoder, LabelBinarizer is shown below.

    I see that OneHotEncoder needs data in integer encoded form first to convert into its respective encoding which is not required in the case of LabelBinarizer.

    from numpy import array
    from sklearn.preprocessing import LabelEncoder
    from sklearn.preprocessing import OneHotEncoder
    from sklearn.preprocessing import LabelBinarizer
    
    # define example
    data = ['cold', 'cold', 'warm', 'cold', 'hot', 'hot', 'warm', 'cold', 
    'warm', 'hot']
    values = array(data)
    print "Data: ", values
    # integer encode
    label_encoder = LabelEncoder()
    integer_encoded = label_encoder.fit_transform(values)
    print "Label Encoder:" ,integer_encoded
    
    # onehot encode
    onehot_encoder = OneHotEncoder(sparse=False)
    integer_encoded = integer_encoded.reshape(len(integer_encoded), 1)
    onehot_encoded = onehot_encoder.fit_transform(integer_encoded)
    print "OneHot Encoder:", onehot_encoded
    
    #Binary encode
    lb = LabelBinarizer()
    print "Label Binarizer:", lb.fit_transform(values)
    

    Another good link which explains the OneHotEncoder is: Explain onehotencoder using python

    There may be other valid differences between the two which experts can probably explain.

提交回复
热议问题