发表新帖

发表新帖

NumPy performance: uint8 vs. float and multiplication vs. division?

后端未结

关注

 4  1248

暖寄归人 2021-02-02 14:50

I have just noticed that the execution time of a script of mine nearly halves by only changing a multiplication to a division.

To investigate this, I have written a smal

4条回答

天命终不由人 (楼主)

2021-02-02 15:14

This answer only looks at vectorised operations, as the reason for the other operations being slow has been answered by ead.

A lot of "optimisations" are based on old hardware. The assumptions that meant that optimisations held true on older hardware do not old true on newer hardware.

Pipelines and division

Division is slow. Division operations consist of several units that each have to perform one calculation one after another. This is what makes division slow.

However, in a floating-point processing unit (FPU) [common on most modern CPUs] there are dedicated units arranged in a "pipeline" for the division instruction. Once a unit is done, that unit isn't needed for the rest of the operation. If you have several division operations you can get these units with nothing to do started on the next division operation. So though each operation is slow, the FPU can actually achieve a high throughput of division operations. Pipeline-ing isn't the same as vectorisation, but the results are mostly the same -- higher throughput when you have lots of the same operations to do.

Think of pipeline-ing like traffic. Compare three lanes of traffic moving at 30 mph versus one lane of traffic moving at 90 mph. The slower traffic is definitely slower individually, but the three-lane-road still has the same throughput.

0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...

热议问题