Effect of padding sequences in MultiHeadAttention (TensorFlow/Keras)

后端 未结 0 675
眼角桃花
眼角桃花 2020-11-27 21:19

I am trying to use the MultiHeadAttention layer to process variable-length sets of elements, that is, sequences where the order is not important (otherwise I would try RNNs)

相关标签:
回答
  • 消灭零回复
提交回复
热议问题