Nevermind, I figured it out. np.swapaxes(1, 2) was the missing piece I needed.
The answer is just to do mat.swapaxes(1, 2).reshape(N*Q, N*Q).
Feel foolish for posting without attempting to figure it out myself for too long, but I'll leave it up so others can benefit from it.