ValueError: feature_names mismatch: in xgboost in the predict() function

后端未结

关注

 8  1844

I have trained an XGBoostRegressor model. When I have to use this trained model for predicting for a new input, the predict() function throws a feature_names mismatch error,

相关标签:

8条回答

误落风尘

2020-12-25 13:39
Try converting data into ndarray before passing it to fit/predict. For eg: if your train data is train_df and test data is test_df. Use below code:
```
train_x = train_df.values
test_x = test_df.values
```
Now fit the model:
```
xgb.fit(train_x,train_y)
```
Finally, predict:
```
pred = xgb.predict(test_x)
```
Hope this helps!
0 讨论(0)
发布评论:

提交评论
- 加载中...
再見小時候

2020-12-25 13:43
I'm contributing an answer as I experienced this problem when putting a fitted XGBRegressor model into production. Thus, this is a solution for cases where you cannot select column names from a y training or testing DataFrame, though there may be cross-over which could be helpful.

The model had been fit on a Pandas DataFrame, and I was attempting to pass a single row of values as a np.array to the predict function. Processing the values of the array had already been performed (reverse label encoded, etc.), and the array was all numeric values.

I got the familiar error:

ValueError: feature_names mismatch followed by a list of the features, followed by a list of the same length: ['f0', 'f1' ....]

While there are no doubt more direct solutions, I had little time and this fixed the problem:
1. Make the input vector a Pandas Dataframe:
```
series = {'feature1': [value],
          'feature2': [value],
          'feature3': [value],
          'feature4': [value],
          'feature5': [value],
          'feature6': [value],
          'feature7': [value],
          'feature8': [value],
          'feature9': [value],
          'feature10': [value]
           }

self.vector = pd.DataFrame(series)
```
1. Get the feature names that the trained model knows:
names = model.get_booster().feature_names
1. Select those feature from the input vector DataFrame (defined above), and perform iloc indexing:
result = model.predict(vector[names].iloc[[-1]])

The iloc transformation I found here.

Selecting the feature names – as models in the Scikit Learn implementation do not have a feature_names attribute – using get_booster( ).feature_names I found in @Athar post above.

Check out the the documentation to learn more.

Hope this helps.
0 讨论(0)
发布评论:

提交评论
- 加载中...

上一页 1 2