I am trying to use the MultiHeadAttention layer to process variable-length sets of elements, that is, sequences where the order is not important (otherwise I would try RNNs)