This question is about the efficiency of a linear search vs. the efficiency of a binary search for a pre-sorted array in contiguous storage...
I have an application
For small arrays, the problem is not cache. You are right: A small array is likely to be cached quickly.
The problem is that branch prediction is likely to fail for binary search because branches are taken or skipped at random in a data-dependent way. Branch prediction misses stall the CPU pipeline.
This effect can be severe. You can easily search 3 to 8 elements linearly in the same time it takes to do a single binary search branch (and you need to do multiple binary search branches). The exact break even point needs to be measured.
Stalling the CPU pipeline is extremely expensive. A Core i7 can retire up to 4 instructions per clock cycle (12 giga-instructions per second at 3 GHz!). But only, if you are not stalling.
There are branch-free algorithms doing binary search by using conditional-move CPU instructions. These algorithms basically unroll 32 search steps and use a CMOV
in each step (32 steps are the theoretical maximum). They are branch-free but not stall free: Each next step depends 100% on the previous one so the CPU cannot charge ahead in the instruction stream. It has to wait all the time. So they don't solve this problem, only improve it slightly.