How to use gradient_override_map in Tensorflow 2.0?

前端 未结 2 2069
既然无缘
既然无缘 2021-01-01 02:11

I\'m trying to use gradient_override_map with Tensorflow 2.0. There is an example in the documentation, which I will use as the example here as well.

In

相关标签:
2条回答
  • 2021-01-01 02:24

    In addition to mrry's answer, there are two points I would like to add:

    (1) In TF 2, we can use tf.GradientTape without building a graph, like this:

    @tf.custom_gradient
    def custom_square(x):
      def grad(dy):
        return tf.constant(0.0)
      return tf.square(x), grad
    
    with tf.GradientTape() as tape:
      x = tf.Variable(5.0)
      s_2 = custom_square(x)
    
    print(tape.gradient(s_2,x).numpy())
    

    (2) Multiply your custom grad with the previous grad

    Be careful, gradient calculation is a chained computation, we should multiply our custom grad by dy (the previously computed gradient). Without doing this, our customized function will be broken in a chain calculation. This is an example:

    @tf.custom_gradient
    def custom_square(x):
      def grad(dy):
        return tf.constant(4.0)
      return tf.square(x), grad
    
    with tf.GradientTape(persistent=True) as tape:
      x = tf.Variable(5.0)
      s_2 = custom_square(x)
      s_4 = custom_square(s_2)
    
    print("Grad from s_4 to x: ",tape.gradient(s_4,x).numpy())
    print("Grad from s_4 to s_2: ",tape.gradient(s_4,s_2).numpy())
    print("Grad from s_2 to x: ",tape.gradient(s_2,x).numpy())
    

    The result:

    Grad from s_4 to x:  4.0
    Grad from s_4 to s_2:  4.0
    Grad from s_2 to x:  4.0
    

    Grad from s_4 to x should be 16 (accumulated grad from s_4 to s_2 and grad frm s_2 to x).

    but the result was 4. That mean it didn't accumulate gradient from previous step.

    Multiply the custom grad with dywill solve the problem:

    @tf.custom_gradient
    def custom_square(x):
      def grad(dy):
        return tf.constant(4.0)*dy
      return tf.square(x), grad
    
    with tf.GradientTape(persistent=True) as tape:
      x = tf.Variable(5.0)
      s_2 = custom_square(x)
      s_4 = custom_square(s_2)
    
    print("Grad from s_4 to x: ",tape.gradient(s_4,x).numpy())
    print("Grad from s_4 to s_2: ",tape.gradient(s_4,s_2).numpy())
    print("Grad from s_2 to x: ",tape.gradient(s_2,x).numpy())
    

    Here is the result:

    Grad from s_4 to x:  16.0
    Grad from s_4 to s_2:  4.0
    Grad from s_2 to x:  4.0
    

    You can try the implementation through Colab here: https://colab.research.google.com/drive/1gbLopOLJiyznDA-Cr473bZEeWkWh_KGG?usp=sharing

    0 讨论(0)
  • 2021-01-01 02:35

    There is no built-in mechanism in TensorFlow 2.0 to override all gradients for a built-in operator within a scope. However, if you are able to modify the call-site for each call to the built-in operator, you can use the tf.custom_gradient decorator as follows:

    @tf.custom_gradient
    def custom_square(x):
      def grad(dy):
        return tf.constant(0.0)
      return tf.square(x), grad
    
    with tf.Graph().as_default() as g:
      x = tf.Variable(5.0)
      with tf.GradientTape() as tape:
        s_2 = custom_square(x)
    
      with tf.compat.v1.Session() as sess:
        sess.run(tf.compat.v1.global_variables_initializer())            
        print(sess.run(tape.gradient(s_2, x)))
    
    0 讨论(0)
提交回复
热议问题