I am replicating a pytorch model in keras and ahve problems to see where the extra dimension comes from. This how my code looks so far:
class Attention(tf.ker