Cost Function and Gradient Seem to be Working, but scipy.optimize functions are not

I'm working through my Matlab code for the Andrew NG Coursera course and turning it into python. I am working on non-regularized logistic regression and after writing my gradient and cost functions I needed something similar to fminunc and after some googling, I found a couple options. They are both returning the same results, but they do not match what is in Andrew NG's expected results code. Others seem to be getting this to work correctly, but I'm wondering why my specific code does not seem to return the desired result when using scipy.optimize functions, but does for the cost and gradient pieces earlier in the code.

The data I'm using can be found at the link below;

ex2data1

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import scipy.optimize as op


#Machine Learning Online Class - Exercise 2: Logistic Regression

#Load Data
#The first two columns contains the exam scores and the third column contains the label.

data = pd.read_csv('ex2data1.txt', header = None)
X = np.array(data.iloc[:, 0:2]) #100 x 3
y = np.array(data.iloc[:,2]) #100 x 1
y.shape = (len(y), 1)


#Creating sub-dataframes for plotting
pos_plot = data[data[2] == 1]
neg_plot = data[data[2] == 0]


#==================== Part 1: Plotting ====================
#We start the exercise by first plotting the data to understand the 
#the problem we are working with.

print('Plotting data with + indicating (y = 1) examples and o indicating (y = 0) examples.')

plt.plot(pos_plot[0], pos_plot[1], "+", label = "Admitted")
plt.plot(neg_plot[0], neg_plot[1], "o", label = "Not Admitted")
plt.xlabel('Exam 1 score')
plt.ylabel('Exam 2 score')
plt.legend()
plt.show()


def sigmoid(z):
    '''
    SIGMOID Compute sigmoid function
    g = SIGMOID(z) computes the sigmoid of z.
    Instructions: Compute the sigmoid of each value of z (z can be a matrix,
    vector or scalar).
    '''
    g = 1 / (1 + np.exp(-z))
    return g


def costFunction(theta, X, y):
    '''
    COSTFUNCTION Compute cost and gradient for logistic regression
    J = COSTFUNCTION(theta, X, y) computes the cost of using theta as the
    parameter for logistic regression and the gradient of the cost
    w.r.t. to the parameters.
    '''
    m = len(y) #number of training examples

    h = sigmoid(X.dot(theta)) #logisitic regression hypothesis
    J = (1/m) * np.sum((-y*np.log(h)) - ((1-y)*np.log(1-h)))

    #h is 100x1, y is %100x1, these end up as 2 vector we subtract from each other
    #then we sum the values by rows
    #cost function for logisitic regression
    return J

def gradient(theta, X, y):
    m = len(y)
    grad = np.zeros((theta.shape))
    h = sigmoid(X.dot(theta))
    for i in range(len(theta)): #number of rows in theta
        XT = X[:,i]
        XT.shape = (len(X),1)
        grad[i] = (1/m) * np.sum((h-y)*XT) #updating each row of the gradient
    return grad


#============ Part 2: Compute Cost and Gradient ============
#In this part of the exercise, you will implement the cost and gradient
#for logistic regression. You neeed to complete the code in costFunction.m


#Add intercept term to x and X_test
Bias = np.ones((len(X), 1))
X = np.column_stack((Bias, X))


#Initialize fitting parameters
initial_theta = np.zeros((len(X[0]), 1))


#Compute and display initial cost and gradient
(cost, grad) = costFunction(initial_theta, X, y), gradient(initial_theta, X, y)

print('Cost at initial theta (zeros): %f' % cost)
print('Expected cost (approx): 0.693\n')
print('Gradient at initial theta (zeros):')
print(grad)
print('Expected gradients (approx):\n -0.1000\n -12.0092\n -11.2628')


#Compute and display cost and gradient with non-zero theta
test_theta = np.array([[-24], [0.2], [0.2]]);
(cost, grad) = costFunction(test_theta, X, y), gradient(test_theta, X, y)

print('\nCost at test theta: %f' % cost)
print('Expected cost (approx): 0.218\n')
print('Gradient at test theta:')
print(grad)
print('Expected gradients (approx):\n 0.043\n 2.566\n 2.647\n')


result = op.fmin_tnc(func = costFunction, x0 = initial_theta, fprime = gradient, args = (X,y))
result[1]


Result = op.minimize(fun = costFunction, 
                                 x0 = initial_theta, 
                                 args = (X, y),
                                 method = 'TNC',
                                 jac = gradient, options={'gtol': 1e-3, 'disp': True, 'maxiter': 1000})


theta = Result.x
theta

test = np.array([[1, 45, 85]]) 
prob = sigmoid(test.dot(theta))
print('For a student with scores 45 and 85, we predict an admission probability of %f,' % prob)
print('Expected value: 0.775 +/- 0.002\n')

This was a very difficult problem to debug, and illustrates a poorly documented aspect of the scipy.optimize interface. The documentation vaguely indicates that theta will be passed around as a vector:

Minimization of scalar function of one or more variables.

In general, the optimization problems are of the form:
minimize f(x) subject to

g_i(x) >= 0,  i = 1,...,m
h_j(x)  = 0,  j = 1,...,p 
where x is a vector of one or more variables.

What's important is that they really mean vector in the most primitive sense, a 1-dimensional array. So you have to expect that whenever theta is passed into one of your callbacks, it will be passed in as a 1-d array. But in numpy, 1-d arrays sometimes behave differently from 2-d row arrays (and, obviously, from 2-d column arrays).

I don't know exactly why it's causing a problem in your case, but it's easily fixed regardless. You just have to add the following at the top of both your cost function and your gradient function:

theta = theta.reshape(-1, 1)

This guarantees that theta will be a 2-d column array, as expected. Once you've done this, the results are correct.

I have had similar issues with Scipy dealing with the same problem as you. As senderle points out the interface is not the easiest to deal with, especially combined with the numpy array interface... Here is my implementation which works as expected.

Defining the cost and gradient functions

Note that initial_theta is passed as a simple array of shape (3,) and converted to a column vector of shape (3,1) within the function. The gradient function then returns the grad.ravel() which has shape (3,) again. This is important as doing otherwise caused an error message with various optimization methods in Scipy.optimize.

Note that different methods have different behaviours but returning .ravel() seems to fix most issues...

import pandas as pd
import numpy as np
import scipy.optimize as opt

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def CostFunc(theta,X,y):

    #Initializing variables
    m = len(y)
    J = 0
    grad = np.zeros(theta.shape)

    #Vectorized computations
    z = X @ theta
    h = sigmoid(z)
    J = (1/m) * ( (-y.T @ np.log(h)) - (1 - y).T @ np.log(1-h));

    return J

def Gradient(theta,X,y):

    #Initializing variables
    m = len(y)
    theta = theta[:,np.newaxis]
    grad = np.zeros(theta.shape)

    #Vectorized computations
    z = X @ theta
    h = sigmoid(z)
    grad = (1/m)*(X.T @ ( h - y));

    return grad.ravel() #<-- This is the trick

Initializing variables and parameters

Note that initial_theta.shape returns (3,)

X = data1.iloc[:,0:2].values
m,n = X.shape
X = np.concatenate((np.ones(m)[:,np.newaxis],X),1)
y = data1.iloc[:,-1].values[:,np.newaxis]
initial_theta = np.zeros((n+1))

Calling Scipy.optimize

model = opt.minimize(fun = CostFunc, x0 = initial_theta, args = (X, y), method = 'TNC', jac = Gradient)

Any comments from more knowledgeable people are welcome, this Scipy interface is a mystery to me, thanks

来源：https://stackoverflow.com/questions/45703151/cost-function-and-gradient-seem-to-be-working-but-scipy-optimize-functions-are

标签

python

pandas

scipy

logistic-regression