Can someone explain how these scores are derived in this XGBoost trees?

问题

I am looking at the below image.

Can someone explain how they are calculated? I though it was -1 for an N and +1 for a yes but then I can't figure out how the little girl has .1. But that doesn't work for tree 2 either.

回答1:

The values of leaf elements (aka "scores") - +2, +0.1, -1, +0.9 and -0.9 - were devised by the XGBoost algorithm during training. In this case, the XGBoost model was trained using a dataset where little boys (+2) appear somehow "greater" than little girls (+0.1). If you knew what the response variable was, then you could probably interpret/rationalize those contributions further. Otherwise, just accept those values as they are.

As for scoring samples, then the first addend is produced by tree1, and the second addend is produced by tree2. For little boys (age < 15, is male == Y, and use computer daily == Y), tree1 yields 2 and tree2 yields 0.9.

回答2:

I agree with @user1808924. I think it's still worth to explain how XGBoost works under the hood though.

What is the meaning of leaves' scores ?

First, the score you see in the leaves are not probability. They are the regression values.

In Gradient Boosting Tree, there's only regression tree. To predict if a person like computer games or not, the model (XGboost) will treat it as a regression problem. The labels here become 1.0 for Yes and 0.0 for No. Then, XGboost puts regression trees in for training. The trees of course will return something like +2, +0.1, -1, which we get at the leaves.

We sum up all the "raw scores" and then convert them to probabilities by applying sigmoid function.

How to calculate the score in leaves ?

The leaf score (w) are calculated by this formula:

w = - (sum(gi) / (sum(hi) + lambda))

where g and h are the first derivative (gradient) and the second derivative (hessian).

For the sake of demonstration, let's pick the leaf which has -1 value of the first tree. Suppose our objective function is mean squared error (mse) and we choose the lambda = 0.

With mse, we have g = (y_pred - y_true) and h=1. I just get rid of the constant 2, in fact, you can keep it and the result should stay the same. Another note: at t_th iteration, y_pred is the prediction we have after (t-1)th iteration (the best we get until that time).

Some assumptions:

The girl, grandpa, and grandma are NOT like computer games (y_true = 0 for each person).
The initial prediction is 1 for all the 3 people (i.e., we guess all people love games. Note that, I choose 1 on purpose to get the same result with the first tree. In fact, the initial prediction can be mean, median,... of all the observations' labels in the leaf).

We calculate g and h for each individual:

g_girl = y_pred - y_true = 1 - 0 = 1. Similarly, we have g_grandpa = g_grandma = 1.
h_girl = h_grandpa = h_grandma = 1

Putting the g, h values into the formula above, we have:

w = -( (g_girl + g_grandpa + g_grandma) / (h_girl + h_grandpa + h_grandma) ) = -1

Last note: In practice, the score in leaf which we see when plotting the tree is a bit different. It will be multiplied by the learning rate, i.e., w * learning_rate.

回答3:

Read this

https://towardsdatascience.com/xgboost-mathematics-explained-58262530904a

and then this

https://medium.com/@gabrieltseng/gradient-boosting-and-xgboost-c306c1bcfaf5

and the appendix

https://gabrieltseng.github.io/appendix/2018-02-25-XGB.html

来源：https://stackoverflow.com/questions/41433209/can-someone-explain-how-these-scores-are-derived-in-this-xgboost-trees

标签

xgboost