binary search efficiency vs. linear search efficiency in fortran

荒凉一梦 提交于 2019-11-28 10:11:12

For small arrays, the problem is not cache. You are right: A small array is likely to be cached quickly.

The problem is that branch prediction is likely to fail for binary search because branches are taken or skipped at random in a data-dependent way. Branch prediction misses stall the CPU pipeline.

This effect can be severe. You can easily search 3 to 8 elements linearly in the same time it takes to do a single binary search branch (and you need to do multiple binary search branches). The exact break even point needs to be measured.

Stalling the CPU pipeline is extremely expensive. A Core i7 can retire up to 4 instructions per clock cycle (12 giga-instructions per second at 3 GHz!). But only, if you are not stalling.

There are branch-free algorithms doing binary search by using conditional-move CPU instructions. These algorithms basically unroll 32 search steps and use a CMOV in each step (32 steps are the theoretical maximum). They are branch-free but not stall free: Each next step depends 100% on the previous one so the CPU cannot charge ahead in the instruction stream. It has to wait all the time. So they don't solve this problem, only improve it slightly.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!