Can sklearn random forest directly handle categorical features?

后端 未结 3 565
予麋鹿
予麋鹿 2020-12-04 11:35

Say I have a categorical feature, color, which takes the values

[\'red\', \'blue\', \'green\', \'orange\'],

and I want to use it to predict something in a ra

3条回答
  •  佛祖请我去吃肉
    2020-12-04 12:04

    Most implementations of random forest (and many other machine learning algorithms) that accept categorical inputs are either just automating the encoding of categorical features for you or using a method that becomes computationally intractable for large numbers of categories.

    A notable exception is H2O. H2O has a very efficient method for handling categorical data directly which often gives it an edge over tree based methods that require one-hot-encoding.

    This article by Will McGinnis has a very good discussion of one-hot-encoding and alternatives.

    This article by Nick Dingwall and Chris Potts has a very good discussion about categorical variables and tree based learners.

提交回复
热议问题