Avoiding numerical overflow when calculating the value AND gradient of the Logistic loss function

╄→гoц情女王★ 提交于 2019-12-21 16:53:31

问题


I am currently trying to implement a machine learning algorithm that involves the logistic loss function in MATLAB. Unfortunately, I am having some trouble due to numerical overflow.

In general, for a given an input s, the value of the logistic function is:

 log(1 + exp(s))

and the slope of the logistic loss function is:

 exp(s)./(1 + exp(s)) = 1./(1 + exp(-s))

In my algorithm, the value of s = X*beta. Here X is a matrix with N data points and P features per data point (i.e. size(X)=[N,P]) and beta is a vector of P coefficients for each feature such that size(beta)=[P 1].

I am specifically interested in calculating the average value and gradient of the Logistic function for given value of beta.

The average value of the Logistic function w.r.t to a value of beta is:

 L = 1/N * sum(log(1+exp(X*beta)),1)

The average value of the slope of the Logistic function w.r.t. to a value of b is:

 dL = 1/N * sum((exp(X*beta)./(1+exp(X*beta))' X, 1)'

Note that size(dL) = [P 1].

My issue is that these expressions keep producing numerical overflows. The problem effectively comes from the fact that exp(s)=Inf when s>1000 and exp(s)=0 when s<-1000.

I am looking for a solution such that s can take on any value in floating point arithmetic. Ideally, I would also really appreciate a solution that allows me to evaluate the value and gradient in a vectorized / efficient way.


回答1:


How about the following approximations:

– For computing L, if s is large, then exp(s) will be much larger than 1:

1 + exp(s) ≅ exp(s)

and consequently

log(1 + exp(s)) ≅ log(exp(s)) = s.

If s is small, then using the Taylor series of exp()

exp(s) ≅ 1 + s

and using the Taylor series of log()

log(1 + exp(s)) ≅ log(2 + s) ≅ log(2) + s / 2.

– For computing dL, for large s

exp(s) ./ (1 + exp(s)) ≅ 1

and for small s

exp(s) ./ (1 + exp(s)) ≅ 1/2 + s / 4.

– The code to compute L could look for example like this:

s = X*beta;
l = log(1+exp(s));
ind = isinf(l);
l(ind) = s(ind);
ind = (l == 0);
l(ind) = log(2) + s(ind) / 2;
L = 1/N * sum(l,1)



回答2:


I found a good article about this problem.

Cutting through a lot of words, we can simplify the argument to stating that the original expression

log(1 + exp(s)) 

can be rewritten as

log(exp(s)*(exp(-s) + 1))
= log(exp(s)) + log(exp(-s) + 1)
= s + log(exp(-s) + 1)

This stops overflow from occurring - it doesn't prevent underflow, but by the time that occurs, you have your answer (namely, s). You can't just use this instead of the original, since it will still give you problems. However, we now have the basis for a function that can be written that will be accurate and won't produce over/underflow:

function LL = logistic(s)
if s<0
  LL = log(1 + exp(s));
else
  LL = s + logistic(-s);

I think this maintains reasonably good accuracy.

EDIT now to the meat of your question - making this vectorized, and allowing the calculation of the slope as well. Let's take these one at a time:

function LL = logisticVec(s)
  LL = zeros(size(s));
  LL(s<0) = log(1 + exp(s(s<0)));
  LL(s>=0) = s(s>=0) + log(1 + exp(-s(s>=0)));

To obtain the average you wanted:

L = logisticVec(X*beta) / N;

The slope is a little bit trickier; note I believe you may have a typo in your expression (missing a multiplication sign).

dL/dbeta = sum(X * exp(X*beta) ./ (1 + exp(X*beta))) / N;

If we divide top and bottom by exp(X*beta) we get

dL = sum(X ./ (exp(-X*beta) + 1)) / N;

Once again, the overflow has gone away and we are left with underflow - but since the underflowed value has 1 added to it, the error this creates is insignificant.



来源:https://stackoverflow.com/questions/20085768/avoiding-numerical-overflow-when-calculating-the-value-and-gradient-of-the-logis

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!