ValueError: feature_names mismatch: in xgboost in the predict() function

后端未结

关注

 8  1843

I have trained an XGBoostRegressor model. When I have to use this trained model for predicting for a new input, the predict() function throws a feature_names mismatch error,

相关标签:

8条回答

渐次进展

2020-12-25 13:21
Check the exception. What you should see are two arrays. One is the column names of the dataframe you’re passing in and the other is the XGBoost feature names. They should be the same length. If you put them side by side in an Excel spreadsheet you will see that they are not in the same order. My guess is that the XGBoost names were written to a dictionary so it would be a coincidence if the names in then two arrays were in the same order.

The fix is easy. Just reorder your dataframe columns to match the XGBoost names:
```
f_names = model.feature_names
df = df[f_names]
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
灰色年华

2020-12-25 13:25
This is the case where the order of column-names while model building is different from order of column-names while model scoring.

I have used the following steps to overcome this error

First load the pickle file
```
model = pickle.load(open("saved_model_file", "rb"))
```
extraxt all the columns with order in which they were used
```
cols_when_model_builds = model.get_booster().feature_names
```
reorder the pandas dataframe
```
pd_dataframe = pd_dataframe[cols_when_model_builds]
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
轻奢々

2020-12-25 13:30

From what I could find, the predict function does not take the DataFrame (or a sparse matrix) as input. It is one of the bugs which can be found here https://github.com/dmlc/xgboost/issues/1238

In order to get around this issue, use as_matrix() function in case of a DataFrame or toarray() in case of a sparse matrix.

This is the only workaround till the bug is fixed or the feature is implemented in a different manner.

0 讨论(0)
发布评论:

提交评论
- 加载中...
慢半拍i

2020-12-25 13:31
I came across the same problem and it's been solved by adding passing the train dataframe column name to the test dataframe via adding the following code:
```
test_df = test_df[train_df.columns]
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
鱼传尺愫

2020-12-25 13:33
Do this while creating the DMatrix for XGB:
```
dtrain = xgb.DMatrix(np.asmatrix(X_train), label=y_train)
dtest = xgb.DMatrix(np.asmatrix(X_test), label=y_test)
```
Do not pass X_train and X_test directly.
0 讨论(0)
发布评论:

提交评论
- 加载中...
刺人心

2020-12-25 13:35
I also had this problem when i used pandas DataFrame (non-sparse representation).

I converted training and testing data into numpy ndarray.
```
          `X_train = X_train.as_matrix()
           X_test = X_test.as_matrix()` 
```
This how i got rid of that Error!
0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页