Gradient in continuous regression using a neural network

独自空忆成欢 提交于 2019-12-04 22:11:22

问题


I'm trying to implement a regression NN that has 3 layers (1 input, 1 hidden and 1 output layer with a continuous result). As a basis I took a classification NN from coursera.org class, but changed the cost function and gradient calculation so as to fit a regression problem (and not a classification one):

My nnCostFunction now is:

function [J grad] = nnCostFunctionLinear(nn_params, ...
                                   input_layer_size, ...
                                   hidden_layer_size, ...
                                   num_labels, ...
                                   X, y, lambda)

Theta1 = reshape(nn_params(1:hidden_layer_size * (input_layer_size + 1)), ...
                 hidden_layer_size, (input_layer_size + 1));

Theta2 = reshape(nn_params((1 + (hidden_layer_size * (input_layer_size + 1))):end), ...
                 num_labels, (hidden_layer_size + 1));

m = size(X, 1);

a1 = X;
a1 = [ones(m, 1) a1];
a2 = a1 * Theta1';
a2 = [ones(m, 1) a2];
a3 = a2 * Theta2';
Y = y;

J = 1/(2*m)*sum(sum((a3 - Y).^2))

th1 = Theta1;
th1(:,1) = 0; %set bias = 0 in reg. formula
th2 = Theta2;
th2(:,1) = 0;

t1 = th1.^2;
t2 = th2.^2;
th = sum(sum(t1)) + sum(sum(t2));
th = lambda * th / (2*m);
J = J + th; %regularization


del_3 = a3 - Y;
t1 = del_3'*a2;
Theta2_grad = 2*(t1)/m + lambda*th2/m;

t1 = del_3 * Theta2;
del_2 = t1 .*  a2;
del_2 = del_2(:,2:end);
t1 = del_2'*a1;
Theta1_grad = 2*(t1)/m + lambda*th1/m;

grad = [Theta1_grad(:) ; Theta2_grad(:)];
end

Then I use this func in fmincg algorithm, but in firsts iterations fmincg end it's work. I think my gradient is wrong, but I can't find the error.

Can anybody help?


回答1:


If I understand correctly, your first block of code (shown below) -

m = size(X, 1);

a1 = X;
a1 = [ones(m, 1) a1];
a2 = a1 * Theta1';
a2 = [ones(m, 1) a2];
a3 = a2 * Theta2';
Y = y;

is to get the output a(3) at the output layer.

Ng's slides about NN has the below configuration to calculate a(3). It's different from what your code presents.

  • in the middle/output layer, you are not doing the activation function g, e.g., a sigmoid function.

In terms of the cost function J without regularization terms, Ng's slides has the below formula:

I don't understand why you can compute it using:

J = 1/(2*m)*sum(sum((a3 - Y).^2))

because you are not including the log function at all.




回答2:


Mikhaill, I´ve been playing with a NN for continuous regression as well, and had a similar issues at some point. The best thing to do here would be to test gradient computation against a numerical calculation before running the model. If that´s not correct, fmincg won´t be able to train the model. (Btw, I discourage you of using numerical gradient as the time involved is much bigger).

Taking into account that you took this idea from Ng´s Coursera class, I´ll implement a possible solution for you to try using the same notation for Octave.

    % Cost function without regularization.
    J = 1/2/m^2*sum((a3-Y).^2); 

    % In case it´s needed, regularization term is added (i.e. for Training).
    if (reg==true);
 J=J+lambda/2/m*(sum(sum(Theta1(:,2:end).^2))+sum(sum(Theta2(:,2:end).^2)));
    endif;

    % Derivatives are computed for layer 2 and 3.
    d3=(a3.-Y);
    d2=d3*Theta2(:,2:end);

    % Theta grad is computed without regularization.
    Theta1_grad=(d2'*a1)./m;
    Theta2_grad=(d3'*a2)./m;

    % Regularization is added to grad computation.
    Theta1_grad(:,2:end)=Theta1_grad(:,2:end)+(lambda/m).*Theta1(:,2:end);
    Theta2_grad(:,2:end)=Theta2_grad(:,2:end)+(lambda/m).*Theta2(:,2:end);

    % Unroll gradients.
    grad = [Theta1_grad(:) ; Theta2_grad(:)];

Note that, since you have taken out all the sigmoid activation, the derivative calculation is quite simple and results in a simplification of the original code.

Next steps: 1. Check this code to understand if it makes sense to your problem. 2. Use gradient checking to test gradient calculation. 3. Finally, use fmincg and check you get different results.




回答3:


Try to include sigmoid function to compute second layer (hidden layer) values and avoid sigmoid in calculating the target (output) value.

function [J grad] = nnCostFunction1(nnParams, ...
                                   inputLayerSize, ...
                                   hiddenLayerSize, ...
                                   numLabels, ...
                                   X, y, lambda)

Theta1 = reshape(nnParams(1:hiddenLayerSize * (inputLayerSize + 1)), ...
                 hiddenLayerSize, (inputLayerSize + 1));

Theta2 = reshape(nnParams((1 + (hiddenLayerSize * (inputLayerSize + 1))):end), ...
                 numLabels, (hiddenLayerSize + 1));

Theta1Grad = zeros(size(Theta1));
Theta2Grad = zeros(size(Theta2));

m = size(X,1);

a1 = [ones(m, 1) X]';
z2 = Theta1 * a1;
a2 = sigmoid(z2);
a2 = [ones(1, m); a2];
z3 = Theta2 * a2;
a3 = z3;

Y = y';

r1 = lambda / (2 * m) * sum(sum(Theta1(:, 2:end) .* Theta1(:, 2:end)));
r2 = lambda / (2 * m) * sum(sum(Theta2(:, 2:end) .* Theta2(:, 2:end)));

J = 1 / ( 2 * m ) * (a3 - Y) * (a3 - Y)' + r1 + r2;

delta3 = a3 - Y;
delta2 = (Theta2' * delta3) .* sigmoidGradient([ones(1, m); z2]);
delta2 = delta2(2:end, :);

Theta2Grad = 1 / m * (delta3 * a2');
Theta2Grad(:, 2:end) = Theta2Grad(:, 2:end) + lambda / m * Theta2(:, 2:end);
Theta1Grad = 1 / m * (delta2 * a1');
Theta1Grad(:, 2:end) = Theta1Grad(:, 2:end) + lambda / m * Theta1(:, 2:end);

grad = [Theta1Grad(:) ; Theta2Grad(:)];

end

Normalize the inputs before passing it in nnCostFunction.



来源:https://stackoverflow.com/questions/13255724/gradient-in-continuous-regression-using-a-neural-network

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!