发表新帖

发表新帖

One hot encoding of string categorical features

前端未结

关注

 3  811

盖世英雄少女心 2020-12-04 17:51

I\'m trying to perform a one hot encoding of a trivial dataset.

data = [[\'a\', \'dog\', \'red\']
        [\'b\', \'cat\', \'green\']]

Wha

3条回答

忘掉有多难 (楼主)

2020-12-04 18:58
I've faced this problem many times and I found a solution in this book at his page 100 :

We can apply both transformations (from text categories to integer categories, then from integer categories to one-hot vectors) in one shot using the LabelBinarizer class:

and the sample code is here :
```
from sklearn.preprocessing import LabelBinarizer
encoder = LabelBinarizer()
housing_cat_1hot = encoder.fit_transform(data)
housing_cat_1hot
```
and as a result : Note that this returns a dense NumPy array by default. You can get a sparse matrix instead by passing sparse_output=True to the LabelBinarizer constructor.

And you can find more about the LabelBinarizer, here in the sklearn official documentation
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...

热议问题