XGBoost prediction always returning the same value - why?

问题

I'm using SageMaker's built in XGBoost algorithm with the following training and validation sets:

https://files.fm/u/pm7n8zcm

When running the prediction model that comes out of the training with the above datasets always produces the exact same result.

Is there something obvious in the training or validation datasets that could explain this behavior?

Here is an example code snippet where I'm setting the Hyperparameters:

{
                    {"max_depth", "1000"},
                    {"eta", "0.001"},
                    {"min_child_weight", "10"},
                    {"subsample", "0.7"},
                    {"silent", "0"},
                    {"objective", "reg:linear"},
                    {"num_round", "50"}
                }

And here is the source code: https://github.com/paulfryer/continuous-training/blob/master/ContinuousTraining/StateMachine/Retrain.cs#L326

It's not clear to me what hyper parameters might need to be adjusted.

This screenshot shows that I'm getting a result with 8 indexes:

But when I add the 11th one, it fails. This leads me to believe that I have to train the model with zero indexes instead of removing them. So I'll try that next. Update: retraining with zero values included doesn't seem to help. I'm still getting the same value every time. I noticed i can't send more than 10 values to the prediction endpoint or it will return an error: "Unable to evaluate payload provided". So at this point using the libsvm format has only added more problems.

回答1:

You've got a few things wrong there.

using {"num_round", "50"} with such a small ETA {"eta", "0.001"} will give you nothing.
{"max_depth", "1000"} 1000 is insane! (default value is 6)

Suggesting:

    {"max_depth", "6"},
    {"eta", "0.05"},
    {"min_child_weight", "3"},
    {"subsample", "0.8"},
    {"silent", "0"},
    {"objective", "reg:linear"},
    {"num_round", "200"}

Try this and report your output

来源：https://stackoverflow.com/questions/49824429/xgboost-prediction-always-returning-the-same-value-why

标签

machine-learning

xgboost

amazon-sagemaker