In C++, say that:
uint64_t i;
uint64_t j;
then i * j
will yield an uint64_t
that has as value the lower part of t
Long multiplication should be ok performance.
Split a*b
into (hia+loa)*(hib+lob)
. This gives 4 32 bit multiplies plus some shifts. Do them in 64 bits, and do the carries manually, and you'll get the high portion.
Note that an approximation of the high portion can be done with fewer multiplies -- accurate within 2^33 or so with 1 multiply, and within 1 with 3 multiplies.
I do not think there is a portable alternative.