gradient-descent | 易学教程

scipy.optimize.fmin_l_bfgs_b returns 'ABNORMAL_TERMINATION_IN_LNSRCH'

阅读更多关于 scipy.optimize.fmin_l_bfgs_b returns 'ABNORMAL_TERMINATION_IN_LNSRCH'

问题 I am using scipy.optimize.fmin_l_bfgs_b to solve a gaussian mixture problem. The means of mixture distributions are modeled by regressions whose weights have to be optimized using EM algorithm. sigma_sp_new, func_val, info_dict = fmin_l_bfgs_b(func_to_minimize, self.sigma_vector[si][pj], args=(self.w_vectors[si][pj], Y, X, E_step_results[si][pj]), approx_grad=True, bounds=[(1e-8, 0.5)], factr=1e02, pgtol=1e-05, epsilon=1e-08) But sometimes I got a warning 'ABNORMAL_TERMINATION_IN_LNSRCH' in

What's the triplet loss back propagation gradient formula?

阅读更多关于 What's the triplet loss back propagation gradient formula?

问题 I am trying to use caffe to implement triplet loss described in Schroff, Kalenichenko and Philbin "FaceNet: A Unified Embedding for Face Recognition and Clustering", 2015. I am new to this so how to calculate the gradient in back propagation? 回答1: I assume you define the loss layer as layer { name: "tripletLoss" type: "TripletLoss" bottom: "anchor" bottom: "positive" bottom: "negative" ... } Now you need to compute a gradient w.r.t each of the "bottom"s. The loss is given by: The gradient w.r

Machine learning - Linear regression using batch gradient descent

阅读更多关于 Machine learning - Linear regression using batch gradient descent

I am trying to implement batch gradient descent on a data set with a single feature and multiple training examples ( m ). When I try using the normal equation, I get the right answer but the wrong one with this code below which performs batch gradient descent in MATLAB. function [theta] = gradientDescent(X, y, theta, alpha, iterations) m = length(y); delta=zeros(2,1); for iter =1:1:iterations for i=1:1:m delta(1,1)= delta(1,1)+( X(i,:)*theta - y(i,1)) ; delta(2,1)=delta(2,1)+ (( X(i,:)*theta - y(i,1))*X(i,2)) ; end theta= theta-( delta*(alpha/m) ); computeCost(X,y,theta) end end y is the

What is `lr_policy` in Caffe?

阅读更多关于 What is `lr_policy` in Caffe?

I just try to find out how I can use Caffe . To do so, I just took a look at the different .prototxt files in the examples folder. There is one option I don't understand: # The learning rate policy lr_policy: "inv" Possible values seem to be: "fixed" "inv" "step" "multistep" "stepearly" "poly" Could somebody please explain those options? w1res If you look inside the /caffe-master/src/caffe/proto/caffe.proto file (you can find it online here ) you will see the following descriptions: // The learning rate decay policy. The currently implemented learning rate // policies are as follows: // -

Why do we need to call zero_grad() in PyTorch?

阅读更多关于 Why do we need to call zero_grad() in PyTorch?

问题 The method zero_grad() needs to be called during training. But the documentation is not very helpful | zero_grad(self) | Sets gradients of all model parameters to zero. Why do we need to call this method? 回答1: In PyTorch, we need to set the gradients to zero before starting to do backpropragation because PyTorch accumulates the gradients on subsequent backward passes. This is convenient while training RNNs. So, the default action is to accumulate (i.e. sum) the gradients on every loss

gradient descent seems to fail

阅读更多关于 gradient descent seems to fail

问题 I implemented a gradient descent algorithm to minimize a cost function in order to gain a hypothesis for determining whether an image has a good quality. I did that in Octave. The idea is somehow based on the algorithm from the machine learning class by Andrew Ng Therefore I have 880 values "y" that contains values from 0.5 to ~12. And I have 880 values from 50 to 300 in "X" that should predict the image's quality. Sadly the algorithm seems to fail, after some iterations the value for theta

Spark mllib predicting weird number or NaN

阅读更多关于 Spark mllib predicting weird number or NaN

I am new to Apache Spark and trying to use the machine learning library to predict some data. My dataset right now is only about 350 points. Here are 7 of those points: "365","4",41401.387,5330569 "364","3",51517.886,5946290 "363","2",55059.838,6097388 "362","1",43780.977,5304694 "361","7",46447.196,5471836 "360","6",50656.121,5849862 "359","5",44494.476,5460289 Here's my code: def parsePoint(line): split = map(sanitize, line.split(',')) rev = split.pop(-2) return LabeledPoint(rev, split) def sanitize(value): return float(value.strip('"')) parsedData = textFile.map(parsePoint) model =

pytorch - connection between loss.backward() and optimizer.step()

阅读更多关于 pytorch - connection between loss.backward() and optimizer.step()

问题 Where is an explicit connection between the optimizer and the loss ? How does the optimizer know where to get the gradients of the loss without a call liks this optimizer.step(loss) ? -More context- When I minimize the loss, I didn't have to pass the gradients to the optimizer. loss.backward() # Back Propagation optimizer.step() # Gardient Descent 回答1: Without delving too deep into the internals of pytorch, I can offer a simplistic answer: Recall that when initializing optimizer you

Cost function training target versus accuracy desired goal

阅读更多关于 Cost function training target versus accuracy desired goal

问题 When we train neural networks, we typically use gradient descent, which relies on a continuous, differentiable real-valued cost function. The final cost function might, for example, take the mean squared error. Or put another way, gradient descent implicitly assumes the end goal is regression - to minimize a real-valued error measure. Sometimes what we want a neural network to do is perform classification - given an input, classify it into two or more discrete categories. In this case, the

Tensorflow: How to write op with gradient in python?

阅读更多关于 Tensorflow: How to write op with gradient in python?

I would like to write a TensorFlow op in python, but I would like it to be differentiable (to be able to compute a gradient). This question asks how to write an op in python, and the answer suggests using py_func (which has no gradient): Tensorflow: Writing an Op in Python The TF documentation describes how to add an op starting from C++ code only: https://www.tensorflow.org/versions/r0.10/how_tos/adding_an_op/index.html In my case, I am prototyping so I don't care about whether it runs on GPU, and I don't care about it being usable from anything other than the TF python API. patapouf_ai Yes,