XgBoost : The least populated class in y has only 1 members, which is too few

若如初见. 提交于 2019-12-01 17:54:21

问题


Im using Xgboost implementation on sklearn for a kaggle's competition. However, im getting this 'warning' message :

$ python Script1.py /home/sky/private/virtualenv15.0.1dev/myVE/local/lib/python2.7/site-packages/sklearn/cross_validation.py:516:

Warning: The least populated class in y has only 1 members, which is too few. The minimum number of labels for any class cannot be less than n_folds=3. % (min_labels, self.n_folds)), Warning)

According to another question on stackoverflow : "Check that you have at least 3 samples per class to be able to do StratifiedKFold cross validation with k == 3 (I think this is the default CV used by GridSearchCV for classification)."

And well, i dont have at least 3 samples per class.

So my questions are:

a)what are the alternatives?

b) Why can't i use cross validation?

c) What can i use instead?

...
param_test1 = {
    'max_depth': range(3, 10, 2),
    'min_child_weight': range(1, 6, 2)
}

grid_search = GridSearchCV(

estimator=
XGBClassifier(
    learning_rate=0.1,
    n_estimators=3000,
    max_depth=15,
    min_child_weight=1,
    gamma=0,
    subsample=0.8,
    colsample_bytree=0.8,
    objective='multi:softmax',
    nthread=42,
    scale_pos_weight=1,
    seed=27),

    param_grid=param_test1, scoring='roc_auc', n_jobs=42, iid=False, cv=None, verbose=1)
...

grid_search.fit(train_x, place_id)

References:

One-shot learning with scikit-learn

Using a support vector classifier with polynomial kernel in scikit-learn


回答1:


If you have a target/class with only one sample, thats too few for any model. What you can do is get another dataset, preferably as balanced as possible, since most models behave better in balanced sets.

If you cannot have another dataset, you will have to play with what you have. I would suggest you remove the sample that has the lonely target. So you will have a model which does not cover that target. If that does not fit you requirements, you need a new dataset.



来源:https://stackoverflow.com/questions/37240195/xgboost-the-least-populated-class-in-y-has-only-1-members-which-is-too-few

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!