Some questions related to cache performance (computer architecture)

孤街醉人 提交于 2021-02-04 08:28:45

问题


Details about the X5650 processor at https://www.cpu-world.com/CPUs/Xeon/Intel-Xeon%20X5650%20-%20AT80614004320AD%20(BX80614X5650).html

important notes: L3 cache size : 12288KB cache line size : 64

Consider the following two functions, which each increment the values in an array by 100.

void incrementVector1(INT4* v, int n) {
                       for (int k = 0; k < 100; ++k) {
                           for (int i = 0; i < n; ++i) {
                               v[i] = v[i] + 1;
} }
}
                   void incrementVector2(INT4* v, int n) {
                       for (int i = 0; i < n; ++i) {
                           for (int k = 0; k < 100; ++k) {
                               v[i] = v[i] + 1;
} }
}

The following data collected using the perf utility captures runtime information for executing
each of these functions on the Intel Xeon X5650 processor for various data sizes. In this data: • the program vector1.bin executes the function incrementVector1;
• the program vector2.bin executes the function incrementVector2;
• the programs take a command line argument which sets the value of n;
• both programs begin by allocating an array of size n and initializing all elements to 0. • LLC-loads means “last level cache loads”, the number of accesses to L3;
• LLC-load-misses means “last level cache misses”, the number of L3 cache misses.
Runtime performance of vector1.bin.
Performance counter stats for ’./vector1.bin 1000000’:
230,070      LLC-loads
3,280        LLC-load-misses           #    1.43% of all LL-cache references
0.383542737 seconds time elapsed
Performance counter stats for ’./vector1.bin 3000000’:
669,884      LLC-loads
242,876      LLC-load-misses           #   36.26% of all LL-cache references
1.156663301 seconds time elapsed
Performance counter stats for ’./vector1.bin 5000000’:
1,234,031    LLC-loads
898,577      LLC-load-misses           #   72.82% of all LL-cache references
1.941832434 seconds time elapsed
Performance counter stats for ’./vector1.bin 7000000’:
1,620,026      LLC-loads
1,142,275      LLC-load-misses           #   70.51% of all LL-cache references
2.621428714 seconds time elapsed
Performance counter stats for ’./vector1.bin 9000000’:
2,068,028      LLC-loads
1,422,269      LLC-load-misses           #   68.77% of all LL-cache references
3.308037628 seconds time elapsed
8
Runtime performance of vector2.bin.
Performance counter stats for ’./vector2.bin 1000000’:
16,464     LLC-loads
1,168      LLC-load-misses            #   7.049% of all LL-cache references
0.319311959 seconds time elapsed
Performance counter stats for ’./vector2.bin 3000000’:
42,052      LLC-loads
17,027      LLC-load-misses           #   40.49% of all LL-cache references
0.954854798 seconds time elapsed
Performance counter stats for ’./vector2.bin 5000000’:
63,991      LLC-loads
38,459      LLC-load-misses           #   60.10% of all LL-cache references
1.593526338 seconds time elapsed
Performance counter stats for ’./vector2.bin 7000000’:
99,773      LLC-loads
56,481      LLC-load-misses           #   56.61% of all LL-cache references
2.198810471 seconds time elapsed
Performance counter stats for ’./vector2.bin 9000000’:
120,456     LLC-loads
76,951      LLC-load-misses           #   63.88% of all LL-cache references
2.772653964 seconds time elapsed

I have two questions:

  1. Consider the cache miss rates for vector1.bin. Between the vector sizes 1000000 and 5000000, the cache miss rate drastically increases. What is the cause of this increase in cache miss rate?
  2. Consider the cache miss rates for both programs. Notice that the miss rate between the two programs is roughly equal for any particular array size. Why is that?

来源:https://stackoverflow.com/questions/65954222/some-questions-related-to-cache-performance-computer-architecture

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!