I understand this question is very general, but I was wondering if anyone has good resources for how to apply pytorch multiheaded attention (https://pytorch.org/docs/stable/