Tensorflow gradient and hessian evaluation

99封情书 提交于 2019-12-10 10:55:27

问题


I find a problem in the evaluation of tensorflow r1.2 gradients and hessian function. In particular I give for granted that the evaluation of a gradient is numerically done at the point of values of the defined variables, probing the response of the placeholder function.

However now I am trying with to evaluate the hessian function (thus gradients) before and after the training of the model, and I always get the same results (probably according to the feeding placeholders).

I use the following function,

def eval_Consts(sess):
  a_v_fin, a_s_fin, a_C_fin, a_a_fin, a_p_fin, loss_fin = sess.run([a_v, a_s, a_C, a_a, a_p, loss],                                                             {A:A_train, Z:Z_train, y:BE_train}) #assignes values to parking variables
  print a_v_fin, loss_fin

  hess = tf.hessians( loss ,  [a_v, a_s, a_C, a_a, a_p] )

  grad = tf.gradients(loss, a_v)
  dGra0= tf.gradients(grad[0], a_v)

  print '\n', sess.run(a_v, feed_dict={A:A_train, Z:Z_train,
   y:BE_train })

  print '\n', sess.run(hess, feed_dict={A:A_train, Z:Z_train,
        y:BE_train })
  print '\n', sess.run(dGra0, feed_dict={A:A_train, Z:Z_train,
        y:BE_train })

to evaluate the output and calculate gradients once before and once after the training. Note that calculation and printing of a_v and loss is done within the function.

In the output a_v = 20.20000076 and the loss function, loss = 1.92866e+09 before training. While after the training a_v = 16.8217 and loss = 148206.0.

However the second derivative respect to a_v evaluated as above gives in both cases the same: 1.52536784e+08.

Moreover the result for printing the Hessian is the following,

[array([[  1.52536784e+08]], dtype=float32), array([[ 4804347.]], dtype=float32), array([[  4.80967168e+09]], dtype=float32), array([[ 226923.421875]], dtype=float32), array([[ 41.58702087]], dtype=float32)]

In other words not an Hessian at all (which is a matrix of all possible second derivatives, including cross derivatives), but only the diagonal part of the Hessian matrix.

Why is that?

来源:https://stackoverflow.com/questions/44725228/tensorflow-gradient-and-hessian-evaluation

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!