Note: All code for a self-contained example to reproduce my problem can be found below.
I have a tf.keras.models.Model
instance and need to train it with a
You apply a softmax activation on your last layer
x = tf.keras.layers.Dense(num_classes, activation='softmax', kernel_initializer='he_normal')(x)
and you apply again a softmax when using
tf.nn.softmax_cross_entropy_with_logits_v2
as it expects unscaled logits. From the documentation:
WARNING: This op expects unscaled logits, since it performs a softmax on logits internally for efficiency. Do not call this op with the output of softmax, as it will produce incorrect results.
Thus, remove the softmax activation of your last layer and it should work.
x = tf.keras.layers.Dense(num_classes, activation=None, kernel_initializer='he_normal')(x)
[...]
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(labels=tf.stop_gradient(labels), logits=model_output))