gradient descent seems to fail

前端 未结 9 1928
忘掉有多难
忘掉有多难 2020-12-12 15:54

I implemented a gradient descent algorithm to minimize a cost function in order to gain a hypothesis for determining whether an image has a good quality. I did that in Octav

相关标签:
9条回答
  • 2020-12-12 16:06

    Using only vectors here is the compact implementation of LR with Gradient Descent in Mathematica:

    Theta = {0, 0}
    alpha = 0.0001;
    iteration = 1500;
    Jhist = Table[0, {i, iteration}];
    Table[  
      Theta = Theta - 
      alpha * Dot[Transpose[X], (Dot[X, Theta] - Y)]/m; 
      Jhist[[k]] = 
      Total[ (Dot[X, Theta] - Y[[All]])^2]/(2*m); Theta, {k, iteration}]
    

    Note: Of course one assumes that X is a n * 2 matrix, with X[[,1]] containing only 1s'

    0 讨论(0)
  • 2020-12-12 16:08

    If you are wondering how the seemingly complex looking for loop can be vectorized and cramped into a single one line expression, then please read on. The vectorized form is:

    theta = theta - (alpha/m) * (X' * (X * theta - y))

    Given below is a detailed explanation for how we arrive at this vectorized expression using gradient descent algorithm:

    This is the gradient descent algorithm to fine tune the value of θ:

    Assume that the following values of X, y and θ are given:

    • m = number of training examples
    • n = number of features + 1

    Here

    • m = 5 (training examples)
    • n = 4 (features+1)
    • X = m x n matrix
    • y = m x 1 vector matrix
    • θ = n x 1 vector matrix
    • xi is the ith training example
    • xj is the jth feature in a given training example

    Further,

    • h(x) = ([X] * [θ]) (m x 1 matrix of predicted values for our training set)
    • h(x)-y = ([X] * [θ] - [y]) (m x 1 matrix of Errors in our predictions)

    whole objective of machine learning is to minimize Errors in predictions. Based on the above corollary, our Errors matrix is m x 1 vector matrix as follows:

    To calculate new value of θj, we have to get a summation of all errors (m rows) multiplied by jth feature value of the training set X. That is, take all the values in E, individually multiply them with jth feature of the corresponding training example, and add them all together. This will help us in getting the new (and hopefully better) value of θj. Repeat this process for all j or the number of features. In matrix form, this can be written as:

    This can be simplified as:

    • [E]' x [X] will give us a row vector matrix, since E' is 1 x m matrix and X is m x n matrix. But we are interested in getting a column matrix, hence we transpose the resultant matrix.

    More succinctly, it can be written as:

    Since (A * B)' = (B' * A'), and A'' = A, we can also write the above as

    This is the original expression we started out with:

    theta = theta - (alpha/m) * (X' * (X * theta - y))
    
    0 讨论(0)
  • 2020-12-12 16:11

    its cleaner this way, and vectorized also

    predictions = X * theta;
    errorsVector = predictions - y;
    theta = theta - (alpha/m) * (X' * errorsVector);
    
    0 讨论(0)
  • 2020-12-12 16:12

    This should work:-

    theta(1,1) = theta(1,1) - (alpha*(1/m))*((X*theta - y)'* X(:,1) ); 
    
    theta(2,1) = theta(2,1) - (alpha*(1/m))*((X*theta - y)'* X(:,2) ); 
    
    0 讨论(0)
  • 2020-12-12 16:15

    If you are OK with using a least-squares cost function, then you could try using the normal equation instead of gradient descent. It's much simpler -- only one line -- and computationally faster.

    Here is the normal equation: http://mathworld.wolfram.com/NormalEquation.html

    And in octave form:

    theta = (pinv(X' * X )) * X' * y
    

    Here is a tutorial that explains how to use the normal equation: http://www.lauradhamilton.com/tutorial-linear-regression-with-octave

    0 讨论(0)
  • 2020-12-12 16:17

    i vectorized the theta thing... may could help somebody

    theta = theta - (alpha/m *  (X * theta-y)' * X)';
    
    0 讨论(0)
提交回复
热议问题