Scikit-learn's LabelBinarizer vs. OneHotEncoder

前端 未结 4 566
陌清茗
陌清茗 2020-11-30 00:37

What is the difference between the two? It seems that both create new columns, which their number is equal to the number of unique categories in the feature. Then they assig

4条回答
  •  迷失自我
    2020-11-30 01:00

    Scikitlearn suggests using OneHotEncoder for X matrix i.e. the features you feed in a model, and to use a LabelBinarizer for the y labels.

    They are quite similar, except that OneHotEncoder could return a sparse matrix that saves a lot of memory and you won't really need that in y labels.

    Even if you have a multi-label multi-class problem, you can use MultiLabelBinarizer for your y labels rather than switching to OneHotEncoder for multi hot encoding.

    https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html

提交回复
热议问题