One hot encoding of string categorical features

前端 未结 3 811
盖世英雄少女心
盖世英雄少女心 2020-12-04 17:51

I\'m trying to perform a one hot encoding of a trivial dataset.

data = [[\'a\', \'dog\', \'red\']
        [\'b\', \'cat\', \'green\']]

Wha

3条回答
  •  忘掉有多难
    2020-12-04 18:58

    I've faced this problem many times and I found a solution in this book at his page 100 :

    We can apply both transformations (from text categories to integer categories, then from integer categories to one-hot vectors) in one shot using the LabelBinarizer class:

    and the sample code is here :

    from sklearn.preprocessing import LabelBinarizer
    encoder = LabelBinarizer()
    housing_cat_1hot = encoder.fit_transform(data)
    housing_cat_1hot
    

    and as a result : Note that this returns a dense NumPy array by default. You can get a sparse matrix instead by passing sparse_output=True to the LabelBinarizer constructor.

    And you can find more about the LabelBinarizer, here in the sklearn official documentation

提交回复
热议问题