What is the difference between the two? It seems that both create new columns, which their number is equal to the number of unique categories in the feature. Then they assign 0 and 1 to data points depending on what category they are in.
A simple example which encodes an array using LabelEncoder, OneHotEncoder, LabelBinarizer is shown below.
I see that OneHotEncoder needs data in integer encoded form first to convert into its respective encoding whihc is not required incase of LabelBinarizer.
from numpy import array
from numpy import argmax
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import OneHotEncoder
# define example
data = ['cold', 'cold', 'warm', 'cold', 'hot', 'hot', 'warm', 'cold',
'warm', 'hot']
values = array(data)
print "Data: ", values
# integer encode
label_encoder = LabelEncoder()
integer_encoded = label_encoder.fit_transform(values)
print "Label Encoder:" ,integer_encoded
# onehot encode
onehot_encoder = OneHotEncoder(sparse=False)
integer_encoded = integer_encoded.reshape(len(integer_encoded), 1)
onehot_encoded = onehot_encoder.fit_transform(integer_encoded)
print "OneHot Encoder:", onehot_encoded
#Binary encode
lb = LabelBinarizer()
print "Label Binarizer:", lb.fit_transform(values)
Another good link which explains the OneHotEncoder is : Explain onehotencoder using python
There may be other valid differences between the two which experts can probably explain.
A difference is that you can use OneHotEncoder
for multi column data, while not for LabelBinarizer
and LabelEncoder
.
from sklearn.preprocessing import LabelBinarizer, LabelEncoder, OneHotEncoder
X = [["US", "M"], ["UK", "M"], ["FR", "F"]]
OneHotEncoder().fit_transform(X).toarray()
# array([[0., 0., 1., 0., 1.],
# [0., 1., 0., 0., 1.],
# [1., 0., 0., 1., 0.]])
LabelBinarizer().fit_transform(X)
# ValueError: Multioutput target data is not supported with label binarization
LabelEncoder().fit_transform(X)
# ValueError: bad input shape (3, 2)
来源:https://stackoverflow.com/questions/50473381/scikit-learns-labelbinarizer-vs-onehotencoder