gradient calculation for bias term using GradientTape()

情到浓时终转凉″ 提交于 2020-01-25 07:50:06

问题


I want to calculate gradient tensors with respect to weight variables and bias term, separately. The gradient for weight variables is calculated correctly, But the gradient for bias is NOT computed well. Please, let me know what the problem is, or modify my code correctly.

import numpy as np
import tensorflow as tf

X =tf.constant([[1.0,0.1,-1.0],[2.0,0.2,-2.0],[3.0,0.3,-3.0],[4.0,0.4,-4.0],[5.0,0.5,-5.0]])
b1 = tf.Variable(-0.5)
Bb = tf.constant([ [1.0], [1.0], [1.0], [1.0], [1.0] ]) 
Bb = b1* Bb

Y0 = tf.constant([ [-10.0], [-5.0], [0.0], [5.0], [10.0] ])

W = tf.Variable([ [1.0], [1.0], [1.0] ])

with tf.GradientTape() as tape: 
    Y = tf.matmul(X, W) + Bb
    print("Y : ", Y.numpy())

    loss_val = tf.reduce_sum(tf.square(Y - Y0))  
    print("loss : ", loss_val.numpy())

gw = tape.gradient(loss_val, W)   # gradient calculation works well 
gb = tape.gradient(loss_val, b1)  # does NOT work

print("gradient W : ", gw.numpy())
print("gradient b : ", gb.numpy())

回答1:


Two things. Firstly if you look at the docs here -

https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/GradientTape#args

you'll see that you can only make a single call to gradient unless persistent=True

Secondly, you're setting Bb = b1* Bb outside of the context manager for the tape so this op is not being recorded.

import numpy as np
import tensorflow as tf

X =tf.constant([[1.0,0.1,-1.0],[2.0,0.2,-2.0],[3.0,0.3,-3.0],[4.0,0.4,-4.0],[5.0,0.5,-5.0]])
b1 = tf.Variable(-0.5)
Bb = tf.constant([ [1.0], [1.0], [1.0], [1.0], [1.0] ]) 


Y0 = tf.constant([ [-10.0], [-5.0], [0.0], [5.0], [10.0] ])

W = tf.Variable([ [1.0], [1.0], [1.0] ])

with tf.GradientTape(persistent=True) as tape: 
    Bb = b1* Bb
    Y = tf.matmul(X, W) + Bb
    print("Y : ", Y.numpy())

    loss_val = tf.reduce_sum(tf.square(Y - Y0))  
    print("loss : ", loss_val.numpy())

gw = tape.gradient(loss_val, W)   # gradient calculation works well 
gb = tape.gradient(loss_val, b1)  # does NOT work

print("gradient W : ", gw.numpy())
print("gradient b : ", gb.numpy())


来源:https://stackoverflow.com/questions/57814376/gradient-calculation-for-bias-term-using-gradienttape

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!