问题
I have a custom model that takes an arbitrary "hidden model" as an input and wraps it in another tensor that treats the output of the hidden model as a return and computes the implied output by adding 1 and multiplying it by the original data:
class Model(tf.keras.Model):
def __init__(self, hidden_model):
super(Model, self).__init__(name='')
self.hidden_model = hidden_model
def build(
self,
reference_price_shape,
hidden_inputs_shape):
super(Model, self).build([reference_price_shape, hidden_inputs_shape])
def call(self, inputs):
reference_prices = inputs[0]
hidden_layers_input = inputs[1]
hidden_output = self.hidden_model(hidden_layers_input)
return (hidden_output + 1) * reference_prices
def compute_output_shape(self, input_shape):
return (input_shape[0][0], 1)
However, I'd now like to know how sensitive the model is to changes in each of the inputs. To do this I thought I'd be able to use the keras.backend.gradients:
rows = 10
cols = 2
hidden_model = tf.keras.Sequential()
hidden_model.add(
tf.keras.layers.Dense(
1,
name='output',
use_bias=True,
kernel_initializer=tf.constant_initializer(0.1),
bias_initializer=tf.constant_initializer(0)))
model = Model(hidden_model)
model.build(
reference_price_shape=(rows,),
hidden_inputs_shape=(rows, cols))
from tensorflow.keras import backend as K
grads = K.gradients(model.output, model.input)
However, this returns an error:
--------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) in 1 from tensorflow import keras 2 from tensorflow.keras import backend as K ----> 3 K.gradients(hidden_model.output, hidden_model.input)
/usr/lib64/python3.6/site-packages/tensorflow_core/python/keras/backend.py in gradients(loss, variables) 3795 """ 3796 return gradients_module.gradients( -> 3797 loss, variables, colocate_gradients_with_ops=True) 3798 3799
/usr/lib64/python3.6/site-packages/tensorflow_core/python/ops/gradients_impl.py in gradients(ys, xs, grad_ys, name, colocate_gradients_with_ops, gate_gradients, aggregation_method, stop_gradients, unconnected_gradients) 156 ys, xs, grad_ys, name, colocate_gradients_with_ops, 157 gate_gradients, aggregation_method, stop_gradients, --> 158 unconnected_gradients) 159 # pylint: enable=protected-access 160
/usr/lib64/python3.6/site-packages/tensorflow_core/python/ops/gradients_util.py in _GradientsHelper(ys, xs, grad_ys, name, colocate_gradients_with_ops, gate_gradients, aggregation_method, stop_gradients, unconnected_gradients, src_graph) 503 """Implementation of gradients().""" 504 if context.executing_eagerly(): --> 505 raise RuntimeError("tf.gradients is not supported when eager execution " 506 "is enabled. Use tf.GradientTape instead.") 507 if src_graph is None:
RuntimeError: tf.gradients is not supported when eager execution is enabled. Use tf.GradientTape instead.
I had a look at the guide for tf.GradientTape, based on which I tried to add the following to my code:
with tf.GradientTape() as g:
g.watch(x)
But where do I put this? x is a tensor, and I don't have an input tensor. I just have inputs, which is an array of numpy arrays.
Just to add to the confusion, there's a github post here that seems to suggest this is tensorflow 2.0 bug, and that adding tf.compat.v1.disable_eager_execution() will resolve the issue for me. It didn't (although it did get the above error to change to Layer model_1 has no inbound nodes. - not sure if that's a step forwards or backwards).
Sorry I realise this question is bordering on untenable, but at this point I'm really confused and this is probably the best I can do at framing it as something answerable.
As a test I tried running K.gradients with hidden_model instead, which kind of worked:
But I don't know what to do with this, as I usually run my model using model.predict(input_data) - how am I supposed to get the local derivatives using that tensor?
So I think I have two problems:
- How do I calculate the derivative of my output with respect to my input for the whole model - it's tensors all the way through so
Keras/tensorflowreally should be able to apply the chain rule even with my customcall()function/model. - Once I have the derivative tensor, what do I do with it?
I initially thought I should try to separate these questions, but either of them asked alone might be an XY problem so I thought I'd ask them together to give the answerers some context.
回答1:
It is possible but requires some work (apparently). Would love to see a more elegant solution. But this is as better as it got for me.
import tensorflow as tf
from tensorflow.keras import backend as K
import numpy as np
rows = 10
cols = 2
with tf.Graph().as_default():
hidden_model = tf.keras.Sequential()
hidden_model.add(
tf.keras.layers.Dense(
1,
name='output',
use_bias=True,
kernel_initializer=tf.constant_initializer(0.1),
bias_initializer=tf.constant_initializer(0)))
model = Model(hidden_model)
model.build(
reference_price_shape=(rows,),
hidden_inputs_shape=(rows, cols))
Note that, model building needs to happen in the same graph you try to get the gradients within. Probably doesn't need to be the default graph, but the same graph.
Then within the same context of the graph, create a gradient tape context. Also note that x needs to be a tf.Variable() in order to register as an input to a gradient.
with tf.GradientTape() as tape:
x = tf.Variable(np.random.normal(size=(10, rows, cols)), dtype=tf.float32)
out = model(x)
With that you can get the gradients.
grads = tape.gradient(out, x)
sess = tf.compat.v1.keras.backend.get_session()
sess.run(tf.compat.v1.global_variables_initializer())
g = sess.run(grads)
print(g)
来源:https://stackoverflow.com/questions/59588928/how-do-i-find-the-derivative-of-a-custom-model-in-keras