I have a (100,64,22,3,3) shaped pytorch tensor, and I would like to sort along axis=0 by the trace of the (3,3) components. The code I have below works, but it is very slow