gradient descent seems to fail

前端未结

关注

 9  1946

I implemented a gradient descent algorithm to minimize a cost function in order to gain a hypothesis for determining whether an image has a good quality. I did that in Octav

相关标签:

9条回答

南方客

2020-12-12 16:06
Using only vectors here is the compact implementation of LR with Gradient Descent in Mathematica:
```
Theta = {0, 0}
alpha = 0.0001;
iteration = 1500;
Jhist = Table[0, {i, iteration}];
Table[  
  Theta = Theta - 
  alpha * Dot[Transpose[X], (Dot[X, Theta] - Y)]/m; 
  Jhist[[k]] = 
  Total[ (Dot[X, Theta] - Y[[All]])^2]/(2*m); Theta, {k, iteration}]
```
Note: Of course one assumes that X is a n * 2 matrix, with X[[,1]] containing only 1s'
0 讨论(0)
发布评论:

提交评论
- 加载中...
南方客

2020-12-12 16:08
If you are wondering how the seemingly complex looking for loop can be vectorized and cramped into a single one line expression, then please read on. The vectorized form is:

theta = theta - (alpha/m) * (X' * (X * theta - y))

Given below is a detailed explanation for how we arrive at this vectorized expression using gradient descent algorithm:

This is the gradient descent algorithm to fine tune the value of θ:

Assume that the following values of X, y and θ are given:
- m = number of training examples
- n = number of features + 1
Here
- m = 5 (training examples)
- n = 4 (features+1)
- X = m x n matrix
- y = m x 1 vector matrix
- θ = n x 1 vector matrix
- xⁱ is the i^th training example
- x_j is the j^th feature in a given training example
Further,
- h(x) = ([X] * [θ]) (m x 1 matrix of predicted values for our training set)
- h(x)-y = ([X] * [θ] - [y]) (m x 1 matrix of Errors in our predictions)
whole objective of machine learning is to minimize Errors in predictions. Based on the above corollary, our Errors matrix is m x 1 vector matrix as follows:

To calculate new value of θ_j, we have to get a summation of all errors (m rows) multiplied by j^th feature value of the training set X. That is, take all the values in E, individually multiply them with j^th feature of the corresponding training example, and add them all together. This will help us in getting the new (and hopefully better) value of θ_j. Repeat this process for all j or the number of features. In matrix form, this can be written as:

This can be simplified as:
- [E]' x [X] will give us a row vector matrix, since E' is 1 x m matrix and X is m x n matrix. But we are interested in getting a column matrix, hence we transpose the resultant matrix.
More succinctly, it can be written as:

Since (A * B)' = (B' * A'), and A'' = A, we can also write the above as

This is the original expression we started out with:
```
theta = theta - (alpha/m) * (X' * (X * theta - y))
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
感动是毒

2020-12-12 16:11
its cleaner this way, and vectorized also
```
predictions = X * theta;
errorsVector = predictions - y;
theta = theta - (alpha/m) * (X' * errorsVector);
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

天涯浪人

2020-12-12 16:12

This should work:-

theta(1,1) = theta(1,1) - (alpha*(1/m))*((X*theta - y)'* X(:,1) ); 

theta(2,1) = theta(2,1) - (alpha*(1/m))*((X*theta - y)'* X(:,2) );

0 讨论(0)

臣服心动

2020-12-12 16:15
If you are OK with using a least-squares cost function, then you could try using the normal equation instead of gradient descent. It's much simpler -- only one line -- and computationally faster.

Here is the normal equation: http://mathworld.wolfram.com/NormalEquation.html

And in octave form:
```
theta = (pinv(X' * X )) * X' * y
```
Here is a tutorial that explains how to use the normal equation: http://www.lauradhamilton.com/tutorial-linear-regression-with-octave
0 讨论(0)
发布评论:

提交评论
- 加载中...
你的背包

2020-12-12 16:17
i vectorized the theta thing... may could help somebody
```
theta = theta - (alpha/m *  (X * theta-y)' * X)';
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页