Why doesn't the transformer use positional encoding in every layer?

前端 未结 0 1701
自闭症患者
自闭症患者 2021-01-03 03:26

Positional encoding is added to the input before it is passed into the transformer model, because otherwise the attention mechanism would be order invariant. However, both t

相关标签:
回答
  • 消灭零回复
提交回复
热议问题