Multi Head Attention: Correct implementation of Linear Transformations of Q, K, V

后端 未结 0 1517
小鲜肉
小鲜肉 2020-12-17 19:10

I am implementing the Multi-Head Self-Attention in Pytorch now. I looked at a couple of implementations and they seem a bit wrong, or at least I am not sure

相关标签:
回答
  • 消灭零回复
提交回复
热议问题