问题
I want to calculate gradient tensors with respect to weight variables and bias term, separately. The gradient for weight variables is calculated correctly, But the gradient for bias is NOT computed well. Please, let me know what the problem is, or modify my code correctly.
import numpy as np
import tensorflow as tf
X =tf.constant([[1.0,0.1,-1.0],[2.0,0.2,-2.0],[3.0,0.3,-3.0],[4.0,0.4,-4.0],[5.0,0.5,-5.0]])
b1 = tf.Variable(-0.5)
Bb = tf.constant([ [1.0], [1.0], [1.0], [1.0], [1.0] ])
Bb = b1* Bb
Y0 = tf.constant([ [-10.0], [-5.0], [0.0], [5.0], [10.0] ])
W = tf.Variable([ [1.0], [1.0], [1.0] ])
with tf.GradientTape() as tape:
Y = tf.matmul(X, W) + Bb
print("Y : ", Y.numpy())
loss_val = tf.reduce_sum(tf.square(Y - Y0))
print("loss : ", loss_val.numpy())
gw = tape.gradient(loss_val, W) # gradient calculation works well
gb = tape.gradient(loss_val, b1) # does NOT work
print("gradient W : ", gw.numpy())
print("gradient b : ", gb.numpy())
回答1:
Two things. Firstly if you look at the docs here -
https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/GradientTape#args
you'll see that you can only make a single call to gradient
unless persistent=True
Secondly, you're setting Bb = b1* Bb
outside of the context manager for the tape so this op is not being recorded.
import numpy as np
import tensorflow as tf
X =tf.constant([[1.0,0.1,-1.0],[2.0,0.2,-2.0],[3.0,0.3,-3.0],[4.0,0.4,-4.0],[5.0,0.5,-5.0]])
b1 = tf.Variable(-0.5)
Bb = tf.constant([ [1.0], [1.0], [1.0], [1.0], [1.0] ])
Y0 = tf.constant([ [-10.0], [-5.0], [0.0], [5.0], [10.0] ])
W = tf.Variable([ [1.0], [1.0], [1.0] ])
with tf.GradientTape(persistent=True) as tape:
Bb = b1* Bb
Y = tf.matmul(X, W) + Bb
print("Y : ", Y.numpy())
loss_val = tf.reduce_sum(tf.square(Y - Y0))
print("loss : ", loss_val.numpy())
gw = tape.gradient(loss_val, W) # gradient calculation works well
gb = tape.gradient(loss_val, b1) # does NOT work
print("gradient W : ", gw.numpy())
print("gradient b : ", gb.numpy())
来源:https://stackoverflow.com/questions/57814376/gradient-calculation-for-bias-term-using-gradienttape