I often see code that converts ints to doubles to ints to doubles and back once again (sometimes for good reasons, sometimes not), and it just occurred to me that this seems like a "hidden" cost in my program. Let's assume the conversion method is truncation.
So, just how expensive is it? I'm sure it varies depending on hardware, so let's assume a newish Intel processor (Haswell, if you like, though I'll take anything). Some metrics I'd be interested in (though a good answer needn't have all of them):
- # of generated instructions
- # of cycles used
- Relative cost compared to basic arithmetic operations
I would also assume that the way we would most acutely experience the impact of a slow conversion would be with respect to power usage rather than execution speed, given the difference in how many computations we can perform each second relative to how much data can actually arrive at the CPU each second.
Here's what I could dig up myself:
- When I take a look at the generated assembly from clang and gcc, it looks like the cast
int to double, it boils down to one instruction: cvttsd2si. From double to int it's cvtsi2sdl on clang, cvtsi2sd on gcc. So I suppose the question becomes: what is the cost of those? - The says that cost of the
cvttsd2si instruction is 5 latency (see Appendix C-16). I can't find a reference for cvtsi2sdl, but cvtsi2sd, depending on your architecture, has latency varying from 1 on Silvermont to more like 7-16 on several other architectures. The manual defines latency as: "The number of clock cycles that are required for the execution core to complete the execution of all of the μops that form an instruction." - The same manual says that an
add instruction costs 1 latency and a mul costs 3-4 (Appendix C-27)
So, the answer boils down to: 1) It's hardware optimized, and the compiler leverages the hardware machinery. 2) It costs only a bit more than a multiply does in terms of the # of cycles in one direction, and a highly variable amount in the other (depending on your architecture). Its cost is neither free nor absurd, but probably warrants more attention given how easy it is write code that incurs the cost in a non-obvious way.
Of course this kind of question depends on the exact hardware and even on the mode.
On x86 my i7 when used in 32-bit mode with default options (gcc -m32 -O3) the conversion from int to double is quite fast, the opposite instead is much slower because the C standard mandates an absurd rule (truncation of decimals).
This way of rounding is bad both for math and for hardware and requires the FPU to switch to this special rounding mode, perform the truncation, and switch back to a sane way of rounding.
If you need speed doing the float->int conversion using the simple fistp instruction is faster and also much better for computation results, but requires some inline assembly.
inline int my_int(double x) { int r; asm ("fldl %1\n" "fistpl %0\n" :"=m"(r) :"m"(x)); return r; }
is more than 6 times faster than naive x = (int)y; conversion (and doesn't have a bias toward 0).
The very same processor, when used in 64-bit mode however has no speed problems and using the fistp code actually makes the code run somewhat slower.
Apparently the hardware guys gave up and implemented the bad rounding algorithm directly in hardware (so bad code can now run fast).