Cache Memory Optimization Array Transpose: C

柔情痞子 提交于 2019-12-03 08:10:21

So, in the first instance you have two cache lines of 8 bytes each for a total of 16 bytes. I'll assume an int data size of 4 bytes. Given the placement of arrays in C and the offsets you've provided these are the memory lines which can be cached:

Cacheable lines:
#A: &src[0][0] = 0x00, &src[0][1] = 0x04
#B: &src[1][0] = 0x08, &src[1][1] = 0x0C
#C: &dst[0][0] = 0x10, &dst[0][1] = 0x14
#D: &dst[1][0] = 0x18, &dst[1][1] = 0x1C

Then we need to know the access order that each memory address is visited by the program. I'm assuming no optimizations which might cause reorderings by the compiler.

Access order and cache behavior (assuming initially empty):
#1: load src[0][0] --> Miss line A = cache slot 1
#2: save dst[0][0] --> Miss line C = cache slot 2
#3: load src[0][1] --> Hit  line A = cache slot 1
#4: save dst[0][1] --> Hit  line C = cache slot 2
#5: load src[1][0] --> Miss line B = cache slot 1 (LRU, replaces line A)
#6: save dst[1][0] --> Miss line D = cache slot 2 (LRU, replaces line C)
#7: load src[1][1] --> Hit  line B = cache slot 1
#8: save dst[1][1] --> Hit  line D = cache slot 2

Which, I think, matches your second answer. Then with a cache size of 32 bytes (4 lines), assuming all other factors are constant:

Access order and cache behavior (assuming initially empty):
#1: load src[0][0] --> Miss line A = cache slot 1
#2: save dst[0][0] --> Miss line C = cache slot 2
#3: load src[0][1] --> Hit  line A = cache slot 1
#4: save dst[0][1] --> Hit  line C = cache slot 2
#5: load src[1][0] --> Miss line B = cache slot 3
#6: save dst[1][0] --> Miss line D = cache slot 4
#7: load src[1][1] --> Hit  line B = cache slot 3
#8: save dst[1][1] --> Hit  line D = cache slot 4

They are identical. The only difference would be if you reran transpose again. In case 1 you would get the exact same behavior (well, you'ld start with the cache full of all the wrong things, so it might as well be empty). In the larger cache case, though, everything you need for the second call is already cached, so there will be no cache misses.

The difference between my answers and yours is most likely due to our assumptions about the behavior of the compiler for your loop count registers (i and j). I would assume they are both stored in registers (and so would not affect the data cache). You may need to assume they are in memory somewhere (and therefore interact with the cache) to get the expected results.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!