NotFoundError: [_Derived_]No gradient defined for op: Einsum on Tensorflow 1.15.2

问题

I'm using Tensorflow 1.15.2 for making a WSD system, made with BERt in the Embeddings Layer.
This is the code that I use for the model

input_word_ids = tf.keras.layers.Input(shape=(64,), dtype=tf.int32, name="input_word_ids")
input_mask = tf.keras.layers.Input(shape=(64,), dtype=tf.int32, name="input_mask")
segment_ids = tf.keras.layers.Input(shape=(64,), dtype=tf.int32, name="segment_ids")
# BERt = BERtLayer()([input_word_ids, input_mask, segment_ids])
bert = hub.KerasLayer("https://tfhub.dev/tensorflow/bert_multi_cased_L-12_H-768_A-12/1", trainable=True)
pooled_output, sequence_output = bert([input_word_ids, input_mask, segment_ids])
# self.vocab_file = bert.resolved_object.vocab_file.asset_path.numpy()
# self.do_lower_case = bert.resolved_object.do_lower_case.numpy()
LSTM = tf.keras.layers.Bidirectional(
    tf.keras.layers.LSTM(
        units=hidden_size,
        dropout=dropout,
        recurrent_dropout=recurrent_dropout,
        return_sequences=True,
        return_state=True
    )
)(sequence_output)
LSTM = self.attention_layer(LSTM)

After that, I initialize an attention layer that is structured like this and follows the 2017's Raganato et al. paper

def attention_layer(self, lstm):
    """
    Produces an Attention Layer like the one mentioned in the Raganato et al. Neural Sequence Learning Models for Word Sense Disambiguation,
    chapter 3.2
    :param lstm: The LSTM that will be used in the task
    :return: The LSTM that was previously given in input with the enhancement of the Attention Layer
    """
    hidden_state = tf.keras.layers.Concatenate()([lstm[1], lstm[3]])  # Layer that concatenates a list of inputs.
    hidden_state = tf.keras.layers.RepeatVector(tf.keras.backend.shape(lstm[0])[1])(hidden_state)
    u = tf.keras.layers.Dense(1, activation="tanh")(hidden_state)
    a = tf.keras.layers.Activation("softmax")(u)
    context_vector = tf.keras.layers.Lambda(lambda x: tf.keras.backend.sum(x[0] * x[1], axis=1))([lstm[0], a])
    print(context_vector.shape)
    return tf.keras.layers.Multiply()([lstm[0], context_vector])

Keras, though, on training, raises the following exception. How can I fix this?

NotFoundError: [_Derived_]No gradient defined for op: Einsum
     [[{{node Func/_36}}]]
     [[training/SGD/gradients/gradients/keras_layer/cond/StatefulPartitionedCall_grad/PartitionedCall/gradients/StatefulPartitionedCall_grad/PartitionedCall/gradients/StatefulPartitionedCall_grad/SymbolicGradient]]

来源：https://stackoverflow.com/questions/60393980/notfounderror-derived-no-gradient-defined-for-op-einsum-on-tensorflow-1-15

标签

python

tensorflow

machine-learning

keras