This question is about the speed of accessing elements of arrays and slices, not about the efficiency of passing them to functions as arguments.
I would exp
Comparing the amd64 assembly of both BenchmarkArrayLocal
and BenchmarkSliceLocal
(too long to fit in this post):
The array version loads the address of a
from memory multiple times, practically on every array-access operation:
LEAQ "".a+1000(SP),BX
Whereas the slice version is computing exclusively on registers after loading once from memory:
LEAQ (DX)(SI*1),BX
This is not conclusive but probably the cause. Reason being that both methods are otherwise virtually identical. One other notable detail is that the array version calls into runtime.duffcopy, which is a quite long assembly routine, whereas the slice version doesn't.