How it the concept of gradient in TensorFlow related to the mathematical definition of gradient?

余生长醉 提交于 2020-07-09 19:24:07

问题


The TensorFlow documentation explains the function

tf.gradients(
    ys,
    xs,
    grad_ys=None,
    name='gradients',
    colocate_gradients_with_ops=False,
    gate_gradients=False,
    aggregation_method=None,
    stop_gradients=None
)

saying:

  • [it] constructs symbolic derivatives of sum of ys w.r.t. x in xs.
  • ys and xs are each a Tensor or a list of tensors.
  • gradients() adds ops to the graph to output the derivatives of ys with respect to xs.
  • ys: A Tensor or list of tensors to be differentiated

I find it difficult to relate this with the mathematical definition of gradient. For example, according to wikipedia, the gradient of a scalar function f(x1, x2, x3, ..., xn) is a vector field (i.e. a function grad f : Rn -> Rn) with certain properties involving the dot product of vectors. You can also speak about the gradient of f at a certain point: (grad f)(x1, x2, x3, ..., xn).

The TensorFlow documentation speaks about tensors instead of vectors: can the definition of gradient be generalized from functions that map vectors to scalars to functions that map tensors to scalars? Is there a dot product between tensors?

Even if the definition of gradient can be applied to functions f that map tensors to scalars (with the dot product in the definition working on tensors), the documentation speaks about differentiating tensors themselves: the parameter ys is a "Tensor or list of tensors to be differentiated". According to the documentation "Tensor is a multi-dimensional array used for computation", so a tensor is not a function, how can it be differentiated?

So, how is this concept of gradient in TensorFlow exactly related to the definition from wikipedia?


回答1:


One would expect that the Tensorflow Gradient is simply the Jacobian, i.e. the derivative of a rank (m) tensor Y against a rank (n) tensor X is the rank (m + n) tensor comprised of each individual derivative ∂Yj1...jm/∂Xi1...in.

However, you may notice that the gradient isn't actually a rank (m + n) tensor, but rather always takes the rank n of the tensor X -- indeed, it appears that Tensorflow gives you the gradient of the scalar sum(Y) against X.

Of course, the real Jacobians are stored internally for calculation in applying the Chain rule.



来源:https://stackoverflow.com/questions/54088548/how-it-the-concept-of-gradient-in-tensorflow-related-to-the-mathematical-definit

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!