Attention for short sequences

后端未结

关注

 0  1087

I am aware that attention mechanism proves itself specifically when dealing with long sequences, where problems related to gradient vanishing and, more generally, representi