问题
This question is about the speed of accessing elements of arrays and slices, not about the efficiency of passing them to functions as arguments.
I would expect arrays to be faster than slices in most cases because a slice is a data structure describing a contiguous section of an array and so there may be an extra step involved when accessing elements of a slice (indirectly the elements of its underlying array).
So I wrote a little test to benchmark a batch of simple operations. There are 4 benchmark functions, the first 2 test a global slice and a global array, the other 2 test a local slice and a local array:
var gs = make([]byte, 1000) // Global slice
var ga [1000]byte // Global array
func BenchmarkSliceGlobal(b *testing.B) {
for i := 0; i < b.N; i++ {
for j, v := range gs {
gs[j]++; gs[j] = gs[j] + v + 10; gs[j] += v
}
}
}
func BenchmarkArrayGlobal(b *testing.B) {
for i := 0; i < b.N; i++ {
for j, v := range ga {
ga[j]++; ga[j] = ga[j] + v + 10; ga[j] += v
}
}
}
func BenchmarkSliceLocal(b *testing.B) {
var s = make([]byte, 1000)
for i := 0; i < b.N; i++ {
for j, v := range s {
s[j]++; s[j] = s[j] + v + 10; s[j] += v
}
}
}
func BenchmarkArrayLocal(b *testing.B) {
var a [1000]byte
for i := 0; i < b.N; i++ {
for j, v := range a {
a[j]++; a[j] = a[j] + v + 10; a[j] += v
}
}
}
I ran the test multiple times, here is the typical output (go test -bench .*
):
BenchmarkSliceGlobal 300000 4210 ns/op
BenchmarkArrayGlobal 300000 4123 ns/op
BenchmarkSliceLocal 500000 3090 ns/op
BenchmarkArrayLocal 500000 3768 ns/op
Analyzing the results:
Accessing the global slice is slightly slower than accessing the global array which is as I expected:4210
vs 4123
ns/op
But accessing the local slice is significantly faster than accessing the local array:3090
vs 3768
ns/op
My question is: What is the reason for this?
Notes
I tried varying the following things but none changed the outcome:
- the size of the array/slice (tried 100, 1000, 10000)
- the order of the benchmark functions
- the element type of the array/slice (tried
byte
andint
)
回答1:
Comparing the amd64 assembly of both BenchmarkArrayLocal
and BenchmarkSliceLocal
(too long to fit in this post):
The array version loads the address of a
from memory multiple times, practically on every array-access operation:
LEAQ "".a+1000(SP),BX
Whereas the slice version is computing exclusively on registers after loading once from memory:
LEAQ (DX)(SI*1),BX
This is not conclusive but probably the cause. Reason being that both methods are otherwise virtually identical. One other notable detail is that the array version calls into runtime.duffcopy, which is a quite long assembly routine, whereas the slice version doesn't.
回答2:
Go version 1.8 can eliminate some range checks so the difference got bigger.
BenchmarkSliceGlobal-4 500000 3220 ns/op
BenchmarkArrayGlobal-4 1000000 1287 ns/op
BenchmarkSliceLocal-4 1000000 1267 ns/op
BenchmarkArrayLocal-4 1000000 1301 ns/op
For arrays I'd recommend to use sizes from powers of two and include a logical and operation. In that way you're sure the compiler eliminates the check. Thus var ga [1024]byte
with ga[j & 1023]
.
来源:https://stackoverflow.com/questions/30525184/array-vs-slice-accessing-speed