Cost Function and Gradient Seem to be Working, but scipy.optimize functions are not

元气小坏坏 提交于 2019-12-06 05:21:23

This was a very difficult problem to debug, and illustrates a poorly documented aspect of the scipy.optimize interface. The documentation vaguely indicates that theta will be passed around as a vector:

Minimization of scalar function of one or more variables.

In general, the optimization problems are of the form:

minimize f(x) subject to

g_i(x) >= 0,  i = 1,...,m
h_j(x)  = 0,  j = 1,...,p 

where x is a vector of one or more variables.

What's important is that they really mean vector in the most primitive sense, a 1-dimensional array. So you have to expect that whenever theta is passed into one of your callbacks, it will be passed in as a 1-d array. But in numpy, 1-d arrays sometimes behave differently from 2-d row arrays (and, obviously, from 2-d column arrays).

I don't know exactly why it's causing a problem in your case, but it's easily fixed regardless. You just have to add the following at the top of both your cost function and your gradient function:

theta = theta.reshape(-1, 1)                                           

This guarantees that theta will be a 2-d column array, as expected. Once you've done this, the results are correct.

I have had similar issues with Scipy dealing with the same problem as you. As senderle points out the interface is not the easiest to deal with, especially combined with the numpy array interface... Here is my implementation which works as expected.

Defining the cost and gradient functions

Note that initial_theta is passed as a simple array of shape (3,) and converted to a column vector of shape (3,1) within the function. The gradient function then returns the grad.ravel() which has shape (3,) again. This is important as doing otherwise caused an error message with various optimization methods in Scipy.optimize.

Note that different methods have different behaviours but returning .ravel() seems to fix most issues...

import pandas as pd
import numpy as np
import scipy.optimize as opt

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def CostFunc(theta,X,y):

    #Initializing variables
    m = len(y)
    J = 0
    grad = np.zeros(theta.shape)

    #Vectorized computations
    z = X @ theta
    h = sigmoid(z)
    J = (1/m) * ( (-y.T @ np.log(h)) - (1 - y).T @ np.log(1-h));

    return J

def Gradient(theta,X,y):

    #Initializing variables
    m = len(y)
    theta = theta[:,np.newaxis]
    grad = np.zeros(theta.shape)

    #Vectorized computations
    z = X @ theta
    h = sigmoid(z)
    grad = (1/m)*(X.T @ ( h - y));

    return grad.ravel() #<-- This is the trick

Initializing variables and parameters

Note that initial_theta.shape returns (3,)

X = data1.iloc[:,0:2].values
m,n = X.shape
X = np.concatenate((np.ones(m)[:,np.newaxis],X),1)
y = data1.iloc[:,-1].values[:,np.newaxis]
initial_theta = np.zeros((n+1))

Calling Scipy.optimize

model = opt.minimize(fun = CostFunc, x0 = initial_theta, args = (X, y), method = 'TNC', jac = Gradient)

Any comments from more knowledgeable people are welcome, this Scipy interface is a mystery to me, thanks

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!