I am learning Go by going through A Tour of Go. One of the exercises there asks me to create a 2D slice of dy
rows and dx
columns containing
There are two ways to use slices to create a matrix. Let's take a look at the differences between them.
First method:
matrix := make([][]int, n)
for i := 0; i < n; i++ {
matrix[i] = make([]int, m)
}
Second method:
matrix := make([][]int, n)
rows := make([]int, n*m)
for i := 0; i < n; i++ {
matrix[i] = rows[i*m : (i+1)*m]
}
In regards to the first method, making successive make
calls doesn't ensure that you will end up with a contiguous matrix, so you may have the matrix divided in memory. Let's think of an example with two Go routines that could cause this:
make([][]int, n)
to get allocated memory for matrix
, getting a piece of memory from 0x000 to 0x07F.make([]int, m)
, getting from 0x080 to 0x0FF.make
(for its own purposes) and gets from 0x100 to 0x17F (right next to the first row of routine #0).make([]int, m)
corresponding to the second loop iteration and gets from 0x180 to 0x1FF for the second row. At this point, we already got two divided rows.With the second method, the routine does make([]int, n*m)
to get all the matrix allocated in a single slice, ensuring contiguity. After that, a loop is needed to update the matrix pointers to the subslices corresponding to each row.
You can play with the code shown above in the Go Playground to see the difference in the memory assigned by using both methods. Note that I used runtime.Gosched()
only with the purpose of yielding the processor and forcing the scheduler to switch to another routine.
Which one to use? Imagine the worst case with the first method, i.e. each row is not next in memory to another row. Then, if your program iterates through the matrix elements (to read or write them), there will probably be more cache misses (hence higher latency) compared to the second method because of worse data locality. On the other hand, with the second method it may not be possible to get a single piece of memory allocated for the matrix, because of memory fragmentation (chunks spread all over the memory), even though theoretically there may be enough free memory for it.
Therefore, unless there's a lot of memory fragmentation and the matrix to be allocated is huge enough, you would always want to use the second method to get advantage of data locality.