问题
I'm using Tensorflow 1.15.2 for making a WSD system, made with BERt in the Embeddings Layer.
This is the code that I use for the model
input_word_ids = tf.keras.layers.Input(shape=(64,), dtype=tf.int32, name="input_word_ids")
input_mask = tf.keras.layers.Input(shape=(64,), dtype=tf.int32, name="input_mask")
segment_ids = tf.keras.layers.Input(shape=(64,), dtype=tf.int32, name="segment_ids")
# BERt = BERtLayer()([input_word_ids, input_mask, segment_ids])
bert = hub.KerasLayer("https://tfhub.dev/tensorflow/bert_multi_cased_L-12_H-768_A-12/1", trainable=True)
pooled_output, sequence_output = bert([input_word_ids, input_mask, segment_ids])
# self.vocab_file = bert.resolved_object.vocab_file.asset_path.numpy()
# self.do_lower_case = bert.resolved_object.do_lower_case.numpy()
LSTM = tf.keras.layers.Bidirectional(
tf.keras.layers.LSTM(
units=hidden_size,
dropout=dropout,
recurrent_dropout=recurrent_dropout,
return_sequences=True,
return_state=True
)
)(sequence_output)
LSTM = self.attention_layer(LSTM)
After that, I initialize an attention layer that is structured like this and follows the 2017's Raganato et al. paper
def attention_layer(self, lstm):
"""
Produces an Attention Layer like the one mentioned in the Raganato et al. Neural Sequence Learning Models for Word Sense Disambiguation,
chapter 3.2
:param lstm: The LSTM that will be used in the task
:return: The LSTM that was previously given in input with the enhancement of the Attention Layer
"""
hidden_state = tf.keras.layers.Concatenate()([lstm[1], lstm[3]]) # Layer that concatenates a list of inputs.
hidden_state = tf.keras.layers.RepeatVector(tf.keras.backend.shape(lstm[0])[1])(hidden_state)
u = tf.keras.layers.Dense(1, activation="tanh")(hidden_state)
a = tf.keras.layers.Activation("softmax")(u)
context_vector = tf.keras.layers.Lambda(lambda x: tf.keras.backend.sum(x[0] * x[1], axis=1))([lstm[0], a])
print(context_vector.shape)
return tf.keras.layers.Multiply()([lstm[0], context_vector])
Keras, though, on training, raises the following exception. How can I fix this?
NotFoundError: [_Derived_]No gradient defined for op: Einsum
[[{{node Func/_36}}]]
[[training/SGD/gradients/gradients/keras_layer/cond/StatefulPartitionedCall_grad/PartitionedCall/gradients/StatefulPartitionedCall_grad/PartitionedCall/gradients/StatefulPartitionedCall_grad/SymbolicGradient]]
来源:https://stackoverflow.com/questions/60393980/notfounderror-derived-no-gradient-defined-for-op-einsum-on-tensorflow-1-15