问题
Consider the cost function with regularization in machine learning:
Why will the parameter θ towards to zero when we set the parameter λ to be very large?
回答1:
The regularized cost function is penalized by the size of the parameters θ.
The regularization term dominates the cost in case λ → +inf
It is worth noting that when λ is very large, most of the cost will be coming from the regularization term λ * sum (θ²) and not the actual cost sum((h_θ - y)²), hence in that case it's mostly about minimizing the regularization term λ * sum (θ²) by tending θ towards 0 (θ → 0)
Why minimizing λ * sum (θ²) results in θ → 0
Consider the regularization term λ * sum (θ²), to minimize this term the only solution is to push sum(θ²) → 0. (λ is a positive constant, and the sum term is also positive)
And since θ terms are squared (θ² is always positive), the only way is to push the θ parameters towards 0. Hence sum(θ²) → 0 means θ → 0
So to sum up, in this case of very large λ:
Minimizing the cost function is mostly about minimizing λ * sum (θ²), which requires minimizing sum (θ²), which requires θ → 0
Some intuition to answer the question in the comment:
Think of λ as a parameter for you to tell how much of a regularization you want to happen. E.g. if on the extreme you set λ to 0, then your cost function is not regularized at all. If you set λ to a lower number then you get less of a regularization.
And vice versa, the more you increase λ, the more your asking your cost function to regularized, so the smaller the parameters θ will have to be in order to minimize the regularized cost function.
Why do we use θ² in the regularization sum rather than θ?
Because the goal is to have small θ (less prone to overfitting).
If the regularization term uses θ instead of θ² in the sum,
you can end up with large θ values that cancel each other,
e.g. θ_1 = 1000000 and θ_2 = -1000001, the sum(θ) here is -1 which is small, vs if you took sum(|θ|) (absolute value) or sum(θ²) (squared) you'd end up with a very big value.
In that case you may end up overfitting because of large θ values that escaped the regularization because the terms cancel each other out.
回答2:
Please also note that the summation (after lambda) doesn't include theta(0). Hope this helps!
来源:https://stackoverflow.com/questions/39052558/regularized-cost-function-with-very-large-%ce%bb