How it the concept of gradient in TensorFlow related to the mathematical definition of gradient?

问题

The TensorFlow documentation explains the function

tf.gradients(
    ys,
    xs,
    grad_ys=None,
    name='gradients',
    colocate_gradients_with_ops=False,
    gate_gradients=False,
    aggregation_method=None,
    stop_gradients=None
)

saying:

[it] constructs symbolic derivatives of sum of ys w.r.t. x in xs.
ys and xs are each a Tensor or a list of tensors.
gradients() adds ops to the graph to output the derivatives of ys with respect to xs.
ys: A Tensor or list of tensors to be differentiated

I find it difficult to relate this with the mathematical definition of gradient. For example, according to wikipedia, the gradient of a scalar function f(x1, x2, x3, ..., xn) is a vector field (i.e. a function grad f : Rn -> Rn) with certain properties involving the dot product of vectors. You can also speak about the gradient of f at a certain point: (grad f)(x1, x2, x3, ..., xn).

The TensorFlow documentation speaks about tensors instead of vectors: can the definition of gradient be generalized from functions that map vectors to scalars to functions that map tensors to scalars? Is there a dot product between tensors?

Even if the definition of gradient can be applied to functions f that map tensors to scalars (with the dot product in the definition working on tensors), the documentation speaks about differentiating tensors themselves: the parameter ys is a "Tensor or list of tensors to be differentiated". According to the documentation "Tensor is a multi-dimensional array used for computation", so a tensor is not a function, how can it be differentiated?

So, how is this concept of gradient in TensorFlow exactly related to the definition from wikipedia?

回答1:

One would expect that the Tensorflow Gradient is simply the Jacobian, i.e. the derivative of a rank (m) tensor Y against a rank (n) tensor X is the rank (m + n) tensor comprised of each individual derivative ∂Y^j₁...j_m/∂X^i₁...i_n.

However, you may notice that the gradient isn't actually a rank (m + n) tensor, but rather always takes the rank n of the tensor X -- indeed, it appears that Tensorflow gives you the gradient of the scalar sum(Y) against X.

Of course, the real Jacobians are stored internally for calculation in applying the Chain rule.

来源：https://stackoverflow.com/questions/54088548/how-it-the-concept-of-gradient-in-tensorflow-related-to-the-mathematical-definit

标签

tensorflow

machine-learning

gradient