ValueError: feature_names mismatch: in xgboost in the predict() function

后端 未结 8 1834
悲哀的现实
悲哀的现实 2020-12-25 13:15

I have trained an XGBoostRegressor model. When I have to use this trained model for predicting for a new input, the predict() function throws a feature_names mismatch error,

相关标签:
8条回答
  • 2020-12-25 13:39

    Try converting data into ndarray before passing it to fit/predict. For eg: if your train data is train_df and test data is test_df. Use below code:

    train_x = train_df.values
    test_x = test_df.values
    

    Now fit the model:

    xgb.fit(train_x,train_y)
    

    Finally, predict:

    pred = xgb.predict(test_x)
    

    Hope this helps!

    0 讨论(0)
  • 2020-12-25 13:43

    I'm contributing an answer as I experienced this problem when putting a fitted XGBRegressor model into production. Thus, this is a solution for cases where you cannot select column names from a y training or testing DataFrame, though there may be cross-over which could be helpful.

    The model had been fit on a Pandas DataFrame, and I was attempting to pass a single row of values as a np.array to the predict function. Processing the values of the array had already been performed (reverse label encoded, etc.), and the array was all numeric values.

    I got the familiar error:

    ValueError: feature_names mismatch followed by a list of the features, followed by a list of the same length: ['f0', 'f1' ....]

    While there are no doubt more direct solutions, I had little time and this fixed the problem:

    1. Make the input vector a Pandas Dataframe:
    series = {'feature1': [value],
              'feature2': [value],
              'feature3': [value],
              'feature4': [value],
              'feature5': [value],
              'feature6': [value],
              'feature7': [value],
              'feature8': [value],
              'feature9': [value],
              'feature10': [value]
               }
    
    self.vector = pd.DataFrame(series)
    
    1. Get the feature names that the trained model knows:

    names = model.get_booster().feature_names

    1. Select those feature from the input vector DataFrame (defined above), and perform iloc indexing:

    result = model.predict(vector[names].iloc[[-1]])


    The iloc transformation I found here.

    Selecting the feature names – as models in the Scikit Learn implementation do not have a feature_names attribute – using get_booster( ).feature_names I found in @Athar post above.

    Check out the the documentation to learn more.

    Hope this helps.

    0 讨论(0)
提交回复
热议问题