gradient-descent

Tensorflow: How to write op with gradient in python?

余生颓废 提交于 2019-11-26 07:38:03
问题 I would like to write a TensorFlow op in python, but I would like it to be differentiable (to be able to compute a gradient). This question asks how to write an op in python, and the answer suggests using py_func (which has no gradient): Tensorflow: Writing an Op in Python The TF documentation describes how to add an op starting from C++ code only: https://www.tensorflow.org/versions/r0.10/how_tos/adding_an_op/index.html In my case, I am prototyping so I don\'t care about whether it runs on

Common causes of nans during training

时光毁灭记忆、已成空白 提交于 2019-11-26 04:32:02
问题 I\'ve noticed that a frequent occurrence during training is NAN s being introduced. Often times it seems to be introduced by weights in inner-product/fully-connected or convolution layers blowing up. Is this occurring because the gradient computation is blowing up? Or is it because of weight initialization (if so, why does weight initialization have this effect)? Or is it likely caused by the nature of the input data? The overarching question here is simply: What is the most common reason for

How to interpret caffe log with debug_info?

筅森魡賤 提交于 2019-11-26 04:27:08
问题 When facing difficulties during training (nans, loss does not converge, etc.) it is sometimes useful to look at more verbose training log by setting debug_info: true in the \'solver.prototxt\' file. The training log then looks something like: I1109 ...] [Forward] Layer data, top blob data data: 0.343971 I1109 ...] [Forward] Layer conv1, top blob conv1 data: 0.0645037 I1109 ...] [Forward] Layer conv1, param blob 0 data: 0.00899114 I1109 ...] [Forward] Layer conv1, param blob 1 data: 0 I1109 ..

gradient descent using python and numpy

守給你的承諾、 提交于 2019-11-26 04:05:47
问题 def gradient(X_norm,y,theta,alpha,m,n,num_it): temp=np.array(np.zeros_like(theta,float)) for i in range(0,num_it): h=np.dot(X_norm,theta) #temp[j]=theta[j]-(alpha/m)*( np.sum( (h-y)*X_norm[:,j][np.newaxis,:] ) ) temp[0]=theta[0]-(alpha/m)*(np.sum(h-y)) temp[1]=theta[1]-(alpha/m)*(np.sum((h-y)*X_norm[:,1])) theta=temp return theta X_norm,mean,std=featureScale(X) #length of X (number of rows) m=len(X) X_norm=np.array([np.ones(m),X_norm]) n,m=np.shape(X_norm) num_it=1500 alpha=0.01 theta=np