scipy.optimize.fmin_l_bfgs_b returns 'ABNORMAL_TERMINATION_IN_LNSRCH'

问题

I am using scipy.optimize.fmin_l_bfgs_b to solve a gaussian mixture problem. The means of mixture distributions are modeled by regressions whose weights have to be optimized using EM algorithm.

sigma_sp_new, func_val, info_dict = fmin_l_bfgs_b(func_to_minimize, self.sigma_vector[si][pj], 
                       args=(self.w_vectors[si][pj], Y, X, E_step_results[si][pj]),
                       approx_grad=True, bounds=[(1e-8, 0.5)], factr=1e02, pgtol=1e-05, epsilon=1e-08)

But sometimes I got a warning 'ABNORMAL_TERMINATION_IN_LNSRCH' in the information dictionary:

func_to_minimize value = 1.14462324063e-07
information dictionary: {'task': b'ABNORMAL_TERMINATION_IN_LNSRCH', 'funcalls': 147, 'grad': array([  1.77635684e-05,   2.87769808e-05,   3.51718654e-05,
         6.75015599e-06,  -4.97379915e-06,  -1.06581410e-06]), 'nit': 0, 'warnflag': 2}

RUNNING THE L-BFGS-B CODE

           * * *

Machine precision = 2.220D-16
 N =            6     M =           10
 This problem is unconstrained.

At X0         0 variables are exactly at the bounds

At iterate    0    f=  1.14462D-07    |proj g|=  3.51719D-05

           * * *

Tit   = total number of iterations
Tnf   = total number of function evaluations
Tnint = total number of segments explored during Cauchy searches
Skip  = number of BFGS updates skipped
Nact  = number of active bounds at final generalized Cauchy point
Projg = norm of the final projected gradient
F     = final function value

           * * *

   N    Tit     Tnf  Tnint  Skip  Nact     Projg        F
    6      1     21      1     0     0   3.517D-05   1.145D-07
  F =  1.144619474757747E-007

ABNORMAL_TERMINATION_IN_LNSRCH                              

 Line search cannot locate an adequate point after 20 function
  and gradient evaluations.  Previous x, f and g restored.
 Possible causes: 1 error in function or gradient evaluation;
                  2 rounding error dominate computation.

 Cauchy                time 0.000E+00 seconds.
 Subspace minimization time 0.000E+00 seconds.
 Line search           time 0.000E+00 seconds.

 Total User time 0.000E+00 seconds.

I do not get this warning every time, but sometimes. (Most get 'CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL' or 'CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH').

I know that it means the minimum can be be reached in this iteration. I googled this problem. Someone said it occurs often because the objective and gradient functions do not match. But here I do not provide gradient function because I am using 'approx_grad'.

What are the possible reasons that I should investigate? What does it mean by "rounding error dominate computation"?

======

I also find that the log-likelihood does not monotonically increase:

########## Convergence !!! ##########
log_likelihood_history: [-28659.725891322563, 220.49993177669558, 291.3513633060345, 267.47745327823907, 265.31567762171181, 265.07311121000367, 265.04217683341682]

It usually start decrease at the second or the third iteration, even through 'ABNORMAL_TERMINATION_IN_LNSRCH' does not occurs. I do not know whether it this problem is related to the previous one.

回答1:

Scipy calls the original L-BFGS-B implementation. Which is some fortran77 (old but beautiful and superfast code) and our problem is that the descent direction is actually going up. The problem starts on line 2533 (link to the code at the bottom)

gd = ddot(n,g,1,d,1)
  if (ifun .eq. 0) then
     gdold=gd
     if (gd .ge. zero) then
c                               the directional derivative >=0.
c                               Line search is impossible.
        if (iprint .ge. 0) then
            write(0,*)' ascent direction in projection gd = ', gd
        endif
        info = -4
        return
     endif
  endif

In other words, you are telling it to go down the hill by going up the hill. The code tries something called line search a total of 20 times in the descent direction that you provide and realizes that you are NOT telling it to go downhill, but uphill. All 20 times.

The guy who wrote it (Jorge Nocedal, who by the way is a very smart guy) put 20 because pretty much that's enough. Machine epsilon is 10E-16, I think 20 is actually a little too much. So, my money for most people having this problem is that your gradient does not match your function.

Now, it could also be that "2. rounding errors dominate computation". By this, he means that your function is a very flat surface in which increases are of the order of machine epsilon (in which case you could perhaps rescale the function), Now, I was thiking that maybe there should be a third option, when your function is too weird. Oscillations? I could see something like $\sin({\frac{1}{x}})$ causing this kind of problem. But I'm not a smart guy, so don't assume that there's a third case.

So I think the OP's solution should be that your function is too flat. Or look at the fortran code.

https://github.com/scipy/scipy/blob/master/scipy/optimize/lbfgsb/lbfgsb.f

Here's line search for those who want to see it. https://en.wikipedia.org/wiki/Line_search

Note. This is 7 months too late. I put it here for future's sake.

回答2:

As pointed out in the answer by Wilmer E. Henao, the problem is probably in the gradient. Since you are using approx_grad=True, the gradient is calculated numerically. In this case, reducing the value of epsilon, which is the step size used for numerically calculating the gradient, can help.

回答3:

I also got the error "ABNORMAL_TERMINATION_IN_LNSRCH" using the L-BFGS-B optimizer.

While my gradient function pointed in the right direction, I rescaled the actual gradient of the function by its L2-norm. Removing that or adding another appropriate type of rescaling worked. Before, I guess that the gradient was so large that it went out of bounds immediately.

The problem from OP was unbounded if I read correctly, so this will certainly not help in this problem setting. However, googling the error "ABNORMAL_TERMINATION_IN_LNSRCH" yields this page as one of the first results, so it might help others...

来源：https://stackoverflow.com/questions/34663539/scipy-optimize-fmin-l-bfgs-b-returns-abnormal-termination-in-lnsrch

标签

optimization

machine-learning

statistics

normal-distribution

gradient-descent