Regularized cost function with very large λ

拜拜、爱过 提交于 2019-12-11 05:05:04

问题


Consider the cost function with regularization in machine learning:

Why will the parameter θ towards to zero when we set the parameter λ to be very large?


回答1:


The regularized cost function is penalized by the size of the parameters θ.

The regularization term dominates the cost in case λ → +inf

It is worth noting that when λ is very large, most of the cost will be coming from the regularization term λ * sum (θ²) and not the actual cost sum((h_θ - y)²), hence in that case it's mostly about minimizing the regularization term λ * sum (θ²) by tending θ towards 0 (θ → 0)

Why minimizing λ * sum (θ²) results in θ → 0

Consider the regularization term λ * sum (θ²), to minimize this term the only solution is to push sum(θ²) → 0. (λ is a positive constant, and the sum term is also positive)

And since θ terms are squared (θ² is always positive), the only way is to push the θ parameters towards 0. Hence sum(θ²) → 0 means θ → 0

So to sum up, in this case of very large λ:

Minimizing the cost function is mostly about minimizing λ * sum (θ²), which requires minimizing sum (θ²), which requires θ → 0

Some intuition to answer the question in the comment:

Think of λ as a parameter for you to tell how much of a regularization you want to happen. E.g. if on the extreme you set λ to 0, then your cost function is not regularized at all. If you set λ to a lower number then you get less of a regularization.

And vice versa, the more you increase λ, the more your asking your cost function to regularized, so the smaller the parameters θ will have to be in order to minimize the regularized cost function.

Why do we use θ² in the regularization sum rather than θ?

Because the goal is to have small θ (less prone to overfitting). If the regularization term uses θ instead of θ² in the sum, you can end up with large θ values that cancel each other, e.g. θ_1 = 1000000 and θ_2 = -1000001, the sum(θ) here is -1 which is small, vs if you took sum(|θ|) (absolute value) or sum(θ²) (squared) you'd end up with a very big value.

In that case you may end up overfitting because of large θ values that escaped the regularization because the terms cancel each other out.




回答2:


Please also note that the summation (after lambda) doesn't include theta(0). Hope this helps!



来源:https://stackoverflow.com/questions/39052558/regularized-cost-function-with-very-large-%ce%bb

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!