gradient-descent

scipy.optimize.fmin_l_bfgs_b returns 'ABNORMAL_TERMINATION_IN_LNSRCH'

≯℡__Kan透↙ 提交于 2019-11-27 13:51:43
问题 I am using scipy.optimize.fmin_l_bfgs_b to solve a gaussian mixture problem. The means of mixture distributions are modeled by regressions whose weights have to be optimized using EM algorithm. sigma_sp_new, func_val, info_dict = fmin_l_bfgs_b(func_to_minimize, self.sigma_vector[si][pj], args=(self.w_vectors[si][pj], Y, X, E_step_results[si][pj]), approx_grad=True, bounds=[(1e-8, 0.5)], factr=1e02, pgtol=1e-05, epsilon=1e-08) But sometimes I got a warning 'ABNORMAL_TERMINATION_IN_LNSRCH' in

What's the triplet loss back propagation gradient formula?

自闭症网瘾萝莉.ら 提交于 2019-11-27 11:45:47
问题 I am trying to use caffe to implement triplet loss described in Schroff, Kalenichenko and Philbin "FaceNet: A Unified Embedding for Face Recognition and Clustering", 2015. I am new to this so how to calculate the gradient in back propagation? 回答1: I assume you define the loss layer as layer { name: "tripletLoss" type: "TripletLoss" bottom: "anchor" bottom: "positive" bottom: "negative" ... } Now you need to compute a gradient w.r.t each of the "bottom"s. The loss is given by: The gradient w.r

Machine learning - Linear regression using batch gradient descent

冷暖自知 提交于 2019-11-27 11:37:11
I am trying to implement batch gradient descent on a data set with a single feature and multiple training examples ( m ). When I try using the normal equation, I get the right answer but the wrong one with this code below which performs batch gradient descent in MATLAB. function [theta] = gradientDescent(X, y, theta, alpha, iterations) m = length(y); delta=zeros(2,1); for iter =1:1:iterations for i=1:1:m delta(1,1)= delta(1,1)+( X(i,:)*theta - y(i,1)) ; delta(2,1)=delta(2,1)+ (( X(i,:)*theta - y(i,1))*X(i,2)) ; end theta= theta-( delta*(alpha/m) ); computeCost(X,y,theta) end end y is the

What is `lr_policy` in Caffe?

安稳与你 提交于 2019-11-27 11:07:34
I just try to find out how I can use Caffe . To do so, I just took a look at the different .prototxt files in the examples folder. There is one option I don't understand: # The learning rate policy lr_policy: "inv" Possible values seem to be: "fixed" "inv" "step" "multistep" "stepearly" "poly" Could somebody please explain those options? w1res If you look inside the /caffe-master/src/caffe/proto/caffe.proto file (you can find it online here ) you will see the following descriptions: // The learning rate decay policy. The currently implemented learning rate // policies are as follows: // -

Why do we need to call zero_grad() in PyTorch?

ⅰ亾dé卋堺 提交于 2019-11-27 10:16:55
问题 The method zero_grad() needs to be called during training. But the documentation is not very helpful | zero_grad(self) | Sets gradients of all model parameters to zero. Why do we need to call this method? 回答1: In PyTorch, we need to set the gradients to zero before starting to do backpropragation because PyTorch accumulates the gradients on subsequent backward passes. This is convenient while training RNNs. So, the default action is to accumulate (i.e. sum) the gradients on every loss

gradient descent seems to fail

不打扰是莪最后的温柔 提交于 2019-11-27 09:32:36
问题 I implemented a gradient descent algorithm to minimize a cost function in order to gain a hypothesis for determining whether an image has a good quality. I did that in Octave. The idea is somehow based on the algorithm from the machine learning class by Andrew Ng Therefore I have 880 values "y" that contains values from 0.5 to ~12. And I have 880 values from 50 to 300 in "X" that should predict the image's quality. Sadly the algorithm seems to fail, after some iterations the value for theta

Spark mllib predicting weird number or NaN

↘锁芯ラ 提交于 2019-11-27 09:09:48
I am new to Apache Spark and trying to use the machine learning library to predict some data. My dataset right now is only about 350 points. Here are 7 of those points: "365","4",41401.387,5330569 "364","3",51517.886,5946290 "363","2",55059.838,6097388 "362","1",43780.977,5304694 "361","7",46447.196,5471836 "360","6",50656.121,5849862 "359","5",44494.476,5460289 Here's my code: def parsePoint(line): split = map(sanitize, line.split(',')) rev = split.pop(-2) return LabeledPoint(rev, split) def sanitize(value): return float(value.strip('"')) parsedData = textFile.map(parsePoint) model =

pytorch - connection between loss.backward() and optimizer.step()

江枫思渺然 提交于 2019-11-27 02:42:36
问题 Where is an explicit connection between the optimizer and the loss ? How does the optimizer know where to get the gradients of the loss without a call liks this optimizer.step(loss) ? -More context- When I minimize the loss, I didn't have to pass the gradients to the optimizer. loss.backward() # Back Propagation optimizer.step() # Gardient Descent 回答1: Without delving too deep into the internals of pytorch, I can offer a simplistic answer: Recall that when initializing optimizer you

Cost function training target versus accuracy desired goal

半城伤御伤魂 提交于 2019-11-26 23:43:11
问题 When we train neural networks, we typically use gradient descent, which relies on a continuous, differentiable real-valued cost function. The final cost function might, for example, take the mean squared error. Or put another way, gradient descent implicitly assumes the end goal is regression - to minimize a real-valued error measure. Sometimes what we want a neural network to do is perform classification - given an input, classify it into two or more discrete categories. In this case, the

Tensorflow: How to write op with gradient in python?

谁说我不能喝 提交于 2019-11-26 22:25:34
I would like to write a TensorFlow op in python, but I would like it to be differentiable (to be able to compute a gradient). This question asks how to write an op in python, and the answer suggests using py_func (which has no gradient): Tensorflow: Writing an Op in Python The TF documentation describes how to add an op starting from C++ code only: https://www.tensorflow.org/versions/r0.10/how_tos/adding_an_op/index.html In my case, I am prototyping so I don't care about whether it runs on GPU, and I don't care about it being usable from anything other than the TF python API. patapouf_ai Yes,