Modern CPU\'s can perform extended multiplication between two native-size words and store the low and high result in separate registers. Similarly, when performing division,
For those wondering about the other half of the question (division), gcc does not provide an intrinsic for that because the processor division instructions don't conform to the standard.
This is true both with 128-bit dividends on 64-bit x86 targets and 64-bit dividends on 32-bit x86 targets. The problem is that DIV will cause divide overflow exceptions in cases where the standard says the result should be truncated. For example (unsigned long long) (((unsigned _int128) 1 << 64) / 1) should evaluate to 0, but would cause divide overflow exception if evaluated with DIV.
(Thanks to @ross-ridge for this info)