How can I compare the performance of log() and fp division in C++?

生来就可爱ヽ(ⅴ<●) 提交于 2019-11-28 12:00:44

Do you divide by the same integer multiple times? If so you can instead multiply by 1./yourInteger, and only do the divide once. That would be faster than either if possible.

As to your actual question, it's not only compiler and architecture dependent, but also micro-architecture and data dependent.

On your particular platform (darwin/x86), for current hardware i5/i7: ~24 cycles for divide(1), ~35 cycles for log( )(2). However, because divide only uses a single instruction dispatch slot, the hardware's reorder engine can do other useful computation while the divide is in flight; log( ) is implemented in software, by contrast, and so there is less opportunity for the processor to hoist other computations into the latency of the logarithm. This means that in practice, divide will often be a good bit faster.

1) From the Intel Optimization Manual

2) Measured by calling log( ) in a tight loop and using mach_absolute_time( ) to get wall time.

On the x86 architecture, logarithms take significantly longer than divisions: 85 cycles (throughput) for FYL2X compared to 40 cycles for FDIV. I would be surprised if other architectures are much different. Go with the the floating-point division.

The main problem with division is that although it is a single instruction on most modern CPUs it typically has a high latency (31 cycles on PowerPC - not sure what is on x86). Some of this latency can be buried though if you have other non-dependent instructions which can be issued at the same time as the division. So the answer will depend somewhat on what kind of instruction mix and dependencies you have in the loop that contains your divide (not to mention which CPU you are using).

Having said that, my gut feeling is that divide will be faster than a log function on most architectures.

I'm pretty sure that doing a log computation via whatever algorithm is going to be rather more expensive than even FP division would be.

Of course the only way to be sure is to code it up and measure the performance of the code. From your description it sounds like it shouldn't be too difficult to implement both versions and try it side-by-side.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!