Why doesn't the transformer use positional encoding in every layer?

前端未结

关注

 0  1701

Positional encoding is added to the input before it is passed into the transformer model, because otherwise the attention mechanism would be order invariant. However, both t