Recently, when benchmarking the "matmul" op in Pytorch, and I realize the fixed point(when the dytpe is int) multiplication is much slower, this looks counter intu