I find a problem in the evaluation of tensorflow r1.2 gradients and hessian function. In particular I give for granted that the evaluation of a gradient is numerically done at the point of values of the defined variables, probing the response of the placeholder function.
However now I am trying with to evaluate the hessian function (thus gradients) before and after the training of the model, and I always get the same results (probably according to the feeding placeholders).
I use the following function,
def eval_Consts(sess):
a_v_fin, a_s_fin, a_C_fin, a_a_fin, a_p_fin, loss_fin = sess.run([a_v, a_s, a_C, a_a, a_p, loss], {A:A_train, Z:Z_train, y:BE_train}) #assignes values to parking variables
print a_v_fin, loss_fin
hess = tf.hessians( loss , [a_v, a_s, a_C, a_a, a_p] )
grad = tf.gradients(loss, a_v)
dGra0= tf.gradients(grad[0], a_v)
print '\n', sess.run(a_v, feed_dict={A:A_train, Z:Z_train,
y:BE_train })
print '\n', sess.run(hess, feed_dict={A:A_train, Z:Z_train,
y:BE_train })
print '\n', sess.run(dGra0, feed_dict={A:A_train, Z:Z_train,
y:BE_train })
to evaluate the output and calculate gradients once before and once after the training. Note that calculation and printing of a_v and loss is done within the function.
In the output a_v = 20.20000076 and the loss function, loss = 1.92866e+09 before training. While after the training a_v = 16.8217 and loss = 148206.0.
However the second derivative respect to a_v evaluated as above gives in both cases the same: 1.52536784e+08.
Moreover the result for printing the Hessian is the following,
[array([[ 1.52536784e+08]], dtype=float32), array([[ 4804347.]], dtype=float32), array([[ 4.80967168e+09]], dtype=float32), array([[ 226923.421875]], dtype=float32), array([[ 41.58702087]], dtype=float32)]
In other words not an Hessian at all (which is a matrix of all possible second derivatives, including cross derivatives), but only the diagonal part of the Hessian matrix.
Why is that?
来源:https://stackoverflow.com/questions/44725228/tensorflow-gradient-and-hessian-evaluation