gradient-descent | 易学教程

Pytorch, what are the gradient arguments

阅读更多关于 Pytorch, what are the gradient arguments

I am reading through the documentation of PyTorch and found an example where they write gradients = torch.FloatTensor([0.1, 1.0, 0.0001]) y.backward(gradients) print(x.grad) where x was an initial variable, from which y was constructed (a 3-vector). The question is, what are the 0.1, 1.0 and 0.0001 arguments of the gradients tensor ? The documentation is not very clear on that. The original code I haven't found on PyTorch website anymore. gradients = torch.FloatTensor([0.1, 1.0, 0.0001]) y.backward(gradients) print(x.grad) The problem with the code above there is no function based on what to

Is my implementation of stochastic gradient descent correct?

阅读更多关于 Is my implementation of stochastic gradient descent correct?

问题 I am trying to develop stochastic gradient descent, but I don't know if it is 100% correct. The cost generated by my stochastic gradient descent algorithm is sometimes very far from the one generated by FMINUC or Batch gradient descent. while batch gradient descent cost converge when I set a learning rate alpha of 0.2, I am forced to set a learning rate alpha of 0.0001 for my stochastic implementation for it not to diverge. Is this normal? Here are some results I obtained with a training set

What's the triplet loss back propagation gradient formula?

阅读更多关于 What's the triplet loss back propagation gradient formula?

I am trying to use caffe to implement triplet loss described in Schroff, Kalenichenko and Philbin "FaceNet: A Unified Embedding for Face Recognition and Clustering", 2015 . I am new to this so how to calculate the gradient in back propagation? Shai I assume you define the loss layer as layer { name: "tripletLoss" type: "TripletLoss" bottom: "anchor" bottom: "positive" bottom: "negative" ... } Now you need to compute a gradient w.r.t each of the "bottom"s. The loss is given by: The gradient w.r.t the "anchor" input ( fa ): The gradient w.r.t the "positive" input ( fp ): The gradient w.r.t the

Why do we need to explicitly call zero_grad()?

阅读更多关于 Why do we need to explicitly call zero_grad()?

Why do we need to explicitly zero the gradients in PyTorch? Why can't gradients be zeroed when loss.backward() is called? What scenario is served by keeping the gradients on the graph and asking the user to explicitly zero the gradients? danche We explicitly need to call zero_grad() because, after loss.backward() (when gradients are computed), we need to use optimizer.step() to proceed gradient descent. More specifically, the gradients are not automatically zeroed because these two operations, loss.backward() and optimizer.step() , are separated, and optimizer.step() requires the just computed

gradient descent seems to fail

阅读更多关于 gradient descent seems to fail

I implemented a gradient descent algorithm to minimize a cost function in order to gain a hypothesis for determining whether an image has a good quality. I did that in Octave. The idea is somehow based on the algorithm from the machine learning class by Andrew Ng Therefore I have 880 values "y" that contains values from 0.5 to ~12. And I have 880 values from 50 to 300 in "X" that should predict the image's quality. Sadly the algorithm seems to fail, after some iterations the value for theta is so small, that theta0 and theta1 become "NaN". And my linear regression curve has strange values...

Sklearn SGDClassifier partial fit

阅读更多关于 Sklearn SGDClassifier partial fit

问题 I'm trying to use SGD to classify a large dataset. As the data is too large to fit into memory, I'd like to use the partial_fit method to train the classifier. I have selected a sample of the dataset (100,000 rows) that fits into memory to test fit vs. partial_fit : from sklearn.linear_model import SGDClassifier def batches(l, n): for i in xrange(0, len(l), n): yield l[i:i+n] clf1 = SGDClassifier(shuffle=True, loss='log') clf1.fit(X, Y) clf2 = SGDClassifier(shuffle=True, loss='log') n_iter =

Pytorch, what are the gradient arguments

阅读更多关于 Pytorch, what are the gradient arguments

问题 I am reading through the documentation of PyTorch and found an example where they write gradients = torch.FloatTensor([0.1, 1.0, 0.0001]) y.backward(gradients) print(x.grad) where x was an initial variable, from which y was constructed (a 3-vector). The question is, what are the 0.1, 1.0 and 0.0001 arguments of the gradients tensor ? The documentation is not very clear on that. 回答1: The original code I haven't found on PyTorch website anymore. gradients = torch.FloatTensor([0.1, 1.0, 0.0001])

Cost function training target versus accuracy desired goal

阅读更多关于 Cost function training target versus accuracy desired goal

When we train neural networks, we typically use gradient descent, which relies on a continuous, differentiable real-valued cost function. The final cost function might, for example, take the mean squared error. Or put another way, gradient descent implicitly assumes the end goal is regression - to minimize a real-valued error measure. Sometimes what we want a neural network to do is perform classification - given an input, classify it into two or more discrete categories. In this case, the end goal the user cares about is classification accuracy - the percentage of cases classified correctly.

How to calculate optimal batch size

阅读更多关于 How to calculate optimal batch size

Sometimes I run into a problem: OOM when allocating tensor with shape e.q. OOM when allocating tensor with shape (1024, 100, 160) Where 1024 is my batch size and I don't know what's the rest. If I reduce the batch size or the number of neurons in the model, it runs fine. Is there a generic way to calculate optimal batch size based on model and GPU memory, so the program doesn't crash? EDIT Since my question might seem unclear, let me put it his way: I want the largest batch size possible in terms of my model, which will fit into my GPU memory and won't crash the program. EDIT 2 To whoever

Fast gradient-descent implementation in a C++ library?

阅读更多关于 Fast gradient-descent implementation in a C++ library?

问题 I'm looking to run a gradient descent optimization to minimize the cost of an instantiation of variables. My program is very computationally expensive, so I'm looking for a popular library with a fast implementation of GD. What is the recommended library/reference? 回答1: GSL is a great (and free) library that already implements common functions of mathematical and scientific interest. You can peruse through the entire reference manual online. Poking around, this starts to look interesting, but