gradient-descent | 易学教程

Tensorflow: How to write op with gradient in python?

阅读更多关于 Tensorflow: How to write op with gradient in python?

问题 I would like to write a TensorFlow op in python, but I would like it to be differentiable (to be able to compute a gradient). This question asks how to write an op in python, and the answer suggests using py_func (which has no gradient): Tensorflow: Writing an Op in Python The TF documentation describes how to add an op starting from C++ code only: https://www.tensorflow.org/versions/r0.10/how_tos/adding_an_op/index.html In my case, I am prototyping so I don\'t care about whether it runs on

Common causes of nans during training

阅读更多关于 Common causes of nans during training

问题 I\'ve noticed that a frequent occurrence during training is NAN s being introduced. Often times it seems to be introduced by weights in inner-product/fully-connected or convolution layers blowing up. Is this occurring because the gradient computation is blowing up? Or is it because of weight initialization (if so, why does weight initialization have this effect)? Or is it likely caused by the nature of the input data? The overarching question here is simply: What is the most common reason for

How to interpret caffe log with debug_info?

阅读更多关于 How to interpret caffe log with debug_info?

问题 When facing difficulties during training (nans, loss does not converge, etc.) it is sometimes useful to look at more verbose training log by setting debug_info: true in the \'solver.prototxt\' file. The training log then looks something like: I1109 ...] [Forward] Layer data, top blob data data: 0.343971 I1109 ...] [Forward] Layer conv1, top blob conv1 data: 0.0645037 I1109 ...] [Forward] Layer conv1, param blob 0 data: 0.00899114 I1109 ...] [Forward] Layer conv1, param blob 1 data: 0 I1109 ..

gradient descent using python and numpy

阅读更多关于 gradient descent using python and numpy

问题 def gradient(X_norm,y,theta,alpha,m,n,num_it): temp=np.array(np.zeros_like(theta,float)) for i in range(0,num_it): h=np.dot(X_norm,theta) #temp[j]=theta[j]-(alpha/m)*( np.sum( (h-y)*X_norm[:,j][np.newaxis,:] ) ) temp[0]=theta[0]-(alpha/m)*(np.sum(h-y)) temp[1]=theta[1]-(alpha/m)*(np.sum((h-y)*X_norm[:,1])) theta=temp return theta X_norm,mean,std=featureScale(X) #length of X (number of rows) m=len(X) X_norm=np.array([np.ones(m),X_norm]) n,m=np.shape(X_norm) num_it=1500 alpha=0.01 theta=np