how to pass mixed (categorical and numeric) features to Decision Tree Regressor in sklearn?

问题

How can I pass Categorical and numeric features to DecisionTreeRegressor in sklearn? below code shows how to use the code in general for numeric features:

make_tree = tree.DecisionTreeRegressor()
fit_tree = make_tree.fit(X_train, y_train)

回答1:

First, all categorical features should be encoded (represented by numbers) to be interpretable for the regression models. To do so, you can use, LabelEncoder followed by OneHotEncoder. In the case of high-cardinal features, you can use FeatureHasher.

As an example:

from sklearn.feature_extraction import FeatureHasher

# n_feature: number of unique values in the feature(s)
# input_type should be passed as 'string' to be compatible to pandas DataFrames
feature_hasher = FeatureHasher(n_features=5000, input_type='string')
df['COLUMN_NAME'] = feature_hasher.transform(df['COLUMN_NAME'])

Then, you can pass your features to the regressor.

来源：https://stackoverflow.com/questions/50191729/how-to-pass-mixed-categorical-and-numeric-features-to-decision-tree-regressor

标签

python

machine-learning

scikit-learn

decision-tree

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!