XGBoost difference in train and test features after converting to DMatrix

后端 未结 3 1190
我寻月下人不归
我寻月下人不归 2021-01-03 05:19

Just wondering how is possible next case:

 def fit(self, train, target):
     xgtrain = xgb.DMatrix(train, label=target, missing=np.nan)
     self.model = xg         


        
3条回答
  •  星月不相逢
    2021-01-03 05:57

    Such an issue occurred for me when RandomUnderSampler (RUS) method returned a np.array rather than a Pandas DataFrame with column names.

    from imblearn.under_sampling import RandomUnderSampler
    rus = RandomUnderSampler(return_indices=True)
    X_rus, y_rus, id_rus = rus.fit_sample(X_train, y_train)
    

    I resolved the issue with this:

    X_rus = pd.DataFrame(X_rus, columns = X_train.columns)
    

    Basically taking the output of RUS method and creating a Pandas DataFrame out of it with column names from the original X_train data which was the input of RUS method.

    This can be generalized to any similar problem where XGBoost expected to read column names but could not. Just create a Pandas DataFrame and assign the column names accordingly.

提交回复
热议问题