I am trying to run Stand-Alone-Self-Attention model. Even with batch size=1, it complains CUDA out of memory because of out=key*query: https://github.com/leaderj1001/Stand-A