gradient calculation for bias term using GradientTape()

问题

I want to calculate gradient tensors with respect to weight variables and bias term, separately. The gradient for weight variables is calculated correctly, But the gradient for bias is NOT computed well. Please, let me know what the problem is, or modify my code correctly.

import numpy as np
import tensorflow as tf

X =tf.constant([[1.0,0.1,-1.0],[2.0,0.2,-2.0],[3.0,0.3,-3.0],[4.0,0.4,-4.0],[5.0,0.5,-5.0]])
b1 = tf.Variable(-0.5)
Bb = tf.constant([ [1.0], [1.0], [1.0], [1.0], [1.0] ]) 
Bb = b1* Bb

Y0 = tf.constant([ [-10.0], [-5.0], [0.0], [5.0], [10.0] ])

W = tf.Variable([ [1.0], [1.0], [1.0] ])

with tf.GradientTape() as tape: 
    Y = tf.matmul(X, W) + Bb
    print("Y : ", Y.numpy())

    loss_val = tf.reduce_sum(tf.square(Y - Y0))  
    print("loss : ", loss_val.numpy())

gw = tape.gradient(loss_val, W)   # gradient calculation works well 
gb = tape.gradient(loss_val, b1)  # does NOT work

print("gradient W : ", gw.numpy())
print("gradient b : ", gb.numpy())

回答1:

Two things. Firstly if you look at the docs here -

https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/GradientTape#args

you'll see that you can only make a single call to gradient unless persistent=True

Secondly, you're setting Bb = b1* Bb outside of the context manager for the tape so this op is not being recorded.

import numpy as np
import tensorflow as tf

X =tf.constant([[1.0,0.1,-1.0],[2.0,0.2,-2.0],[3.0,0.3,-3.0],[4.0,0.4,-4.0],[5.0,0.5,-5.0]])
b1 = tf.Variable(-0.5)
Bb = tf.constant([ [1.0], [1.0], [1.0], [1.0], [1.0] ]) 


Y0 = tf.constant([ [-10.0], [-5.0], [0.0], [5.0], [10.0] ])

W = tf.Variable([ [1.0], [1.0], [1.0] ])

with tf.GradientTape(persistent=True) as tape: 
    Bb = b1* Bb
    Y = tf.matmul(X, W) + Bb
    print("Y : ", Y.numpy())

    loss_val = tf.reduce_sum(tf.square(Y - Y0))  
    print("loss : ", loss_val.numpy())

gw = tape.gradient(loss_val, W)   # gradient calculation works well 
gb = tape.gradient(loss_val, b1)  # does NOT work

print("gradient W : ", gw.numpy())
print("gradient b : ", gb.numpy())

来源：https://stackoverflow.com/questions/57814376/gradient-calculation-for-bias-term-using-gradienttape

标签

tensorflow2.0