发表新帖

发表新帖

Can sklearn random forest directly handle categorical features?

后端未结

关注

 3  565

予麋鹿 2020-12-04 11:35

Say I have a categorical feature, color, which takes the values

[\'red\', \'blue\', \'green\', \'orange\'],

and I want to use it to predict something in a ra

3条回答

佛祖请我去吃肉 (楼主)

2020-12-04 12:04

Most implementations of random forest (and many other machine learning algorithms) that accept categorical inputs are either just automating the encoding of categorical features for you or using a method that becomes computationally intractable for large numbers of categories.

A notable exception is H2O. H2O has a very efficient method for handling categorical data directly which often gives it an edge over tree based methods that require one-hot-encoding.

This article by Will McGinnis has a very good discussion of one-hot-encoding and alternatives.

This article by Nick Dingwall and Chris Potts has a very good discussion about categorical variables and tree based learners.

0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...

热议问题