I\'m trying to write out a bit of code for the gradient descent algorithm explained in the Stanford Machine Learning lecture (lecture 2 at around 25:00). Below is the implementa
I have experienced the same problem (albeit in Java) because my learning rate was too big.
For short, I was using α = 0.001 and I had to push it to 0.000001 to see actual convergence.
Of course these values are linked to your dataset.