Gradient in continuous regression using a neural network

问题

I'm trying to implement a regression NN that has 3 layers (1 input, 1 hidden and 1 output layer with a continuous result). As a basis I took a classification NN from coursera.org class, but changed the cost function and gradient calculation so as to fit a regression problem (and not a classification one):

My nnCostFunction now is:

function [J grad] = nnCostFunctionLinear(nn_params, ...
                                   input_layer_size, ...
                                   hidden_layer_size, ...
                                   num_labels, ...
                                   X, y, lambda)

Theta1 = reshape(nn_params(1:hidden_layer_size * (input_layer_size + 1)), ...
                 hidden_layer_size, (input_layer_size + 1));

Theta2 = reshape(nn_params((1 + (hidden_layer_size * (input_layer_size + 1))):end), ...
                 num_labels, (hidden_layer_size + 1));

m = size(X, 1);

a1 = X;
a1 = [ones(m, 1) a1];
a2 = a1 * Theta1';
a2 = [ones(m, 1) a2];
a3 = a2 * Theta2';
Y = y;

J = 1/(2*m)*sum(sum((a3 - Y).^2))

th1 = Theta1;
th1(:,1) = 0; %set bias = 0 in reg. formula
th2 = Theta2;
th2(:,1) = 0;

t1 = th1.^2;
t2 = th2.^2;
th = sum(sum(t1)) + sum(sum(t2));
th = lambda * th / (2*m);
J = J + th; %regularization


del_3 = a3 - Y;
t1 = del_3'*a2;
Theta2_grad = 2*(t1)/m + lambda*th2/m;

t1 = del_3 * Theta2;
del_2 = t1 .*  a2;
del_2 = del_2(:,2:end);
t1 = del_2'*a1;
Theta1_grad = 2*(t1)/m + lambda*th1/m;

grad = [Theta1_grad(:) ; Theta2_grad(:)];
end

Then I use this func in fmincg algorithm, but in firsts iterations fmincg end it's work. I think my gradient is wrong, but I can't find the error.

Can anybody help?

回答1:

If I understand correctly, your first block of code (shown below) -

m = size(X, 1);

a1 = X;
a1 = [ones(m, 1) a1];
a2 = a1 * Theta1';
a2 = [ones(m, 1) a2];
a3 = a2 * Theta2';
Y = y;

is to get the output a⁽³⁾ at the output layer.

Ng's slides about NN has the below configuration to calculate a⁽³⁾. It's different from what your code presents.

in the middle/output layer, you are not doing the activation function g, e.g., a sigmoid function.

In terms of the cost function J without regularization terms, Ng's slides has the below formula:

I don't understand why you can compute it using:

J = 1/(2*m)*sum(sum((a3 - Y).^2))

because you are not including the log function at all.

回答2:

Mikhaill, I´ve been playing with a NN for continuous regression as well, and had a similar issues at some point. The best thing to do here would be to test gradient computation against a numerical calculation before running the model. If that´s not correct, fmincg won´t be able to train the model. (Btw, I discourage you of using numerical gradient as the time involved is much bigger).

Taking into account that you took this idea from Ng´s Coursera class, I´ll implement a possible solution for you to try using the same notation for Octave.

    % Cost function without regularization.
    J = 1/2/m^2*sum((a3-Y).^2); 

    % In case it´s needed, regularization term is added (i.e. for Training).
    if (reg==true);
 J=J+lambda/2/m*(sum(sum(Theta1(:,2:end).^2))+sum(sum(Theta2(:,2:end).^2)));
    endif;

    % Derivatives are computed for layer 2 and 3.
    d3=(a3.-Y);
    d2=d3*Theta2(:,2:end);

    % Theta grad is computed without regularization.
    Theta1_grad=(d2'*a1)./m;
    Theta2_grad=(d3'*a2)./m;

    % Regularization is added to grad computation.
    Theta1_grad(:,2:end)=Theta1_grad(:,2:end)+(lambda/m).*Theta1(:,2:end);
    Theta2_grad(:,2:end)=Theta2_grad(:,2:end)+(lambda/m).*Theta2(:,2:end);

    % Unroll gradients.
    grad = [Theta1_grad(:) ; Theta2_grad(:)];

Note that, since you have taken out all the sigmoid activation, the derivative calculation is quite simple and results in a simplification of the original code.

Next steps: 1. Check this code to understand if it makes sense to your problem. 2. Use gradient checking to test gradient calculation. 3. Finally, use fmincg and check you get different results.

回答3:

Try to include sigmoid function to compute second layer (hidden layer) values and avoid sigmoid in calculating the target (output) value.

function [J grad] = nnCostFunction1(nnParams, ...
                                   inputLayerSize, ...
                                   hiddenLayerSize, ...
                                   numLabels, ...
                                   X, y, lambda)

Theta1 = reshape(nnParams(1:hiddenLayerSize * (inputLayerSize + 1)), ...
                 hiddenLayerSize, (inputLayerSize + 1));

Theta2 = reshape(nnParams((1 + (hiddenLayerSize * (inputLayerSize + 1))):end), ...
                 numLabels, (hiddenLayerSize + 1));

Theta1Grad = zeros(size(Theta1));
Theta2Grad = zeros(size(Theta2));

m = size(X,1);

a1 = [ones(m, 1) X]';
z2 = Theta1 * a1;
a2 = sigmoid(z2);
a2 = [ones(1, m); a2];
z3 = Theta2 * a2;
a3 = z3;

Y = y';

r1 = lambda / (2 * m) * sum(sum(Theta1(:, 2:end) .* Theta1(:, 2:end)));
r2 = lambda / (2 * m) * sum(sum(Theta2(:, 2:end) .* Theta2(:, 2:end)));

J = 1 / ( 2 * m ) * (a3 - Y) * (a3 - Y)' + r1 + r2;

delta3 = a3 - Y;
delta2 = (Theta2' * delta3) .* sigmoidGradient([ones(1, m); z2]);
delta2 = delta2(2:end, :);

Theta2Grad = 1 / m * (delta3 * a2');
Theta2Grad(:, 2:end) = Theta2Grad(:, 2:end) + lambda / m * Theta2(:, 2:end);
Theta1Grad = 1 / m * (delta2 * a1');
Theta1Grad(:, 2:end) = Theta1Grad(:, 2:end) + lambda / m * Theta1(:, 2:end);

grad = [Theta1Grad(:) ; Theta2Grad(:)];

end

Normalize the inputs before passing it in nnCostFunction.

来源：https://stackoverflow.com/questions/13255724/gradient-in-continuous-regression-using-a-neural-network

标签

machine-learning

neural-network

gradient

regression