Many CPUs have single assembly opcodes for returning the high order bits of a 32 bit integer multiplication. Normally multiplying two 32 bit integers produc
On 32 bit intel, a multiply affects two registers for the output. That is, the 64 bits are fully available, whether you want it or not. Its just a function of whether the compiler is smart enough to take advantage of it.
Modern compilers do amazing things, so my suggestion is to experiment with optimization flags some more, at least on Intel. You would think that the optimizer might know that the processor produces a 64 bit value from 32 by 32 bits.
That said, at some point I tried to get the compiler to use the modulo as well as the dividend on a division result, but the old Microsoft compiler from 1998 was not smart enough to realize the same instruction produced both results.