So I am trying to measure the latencies of L1, L2, L3 cache using C. I know the size of them and I feel I understand conceptually how to do it but I am running into problems
Not really an answer but read anyway some thing has already been mentioned in other answers and comments here
well just the other day I answer this question:
it is about measurement of L1/L2/.../L?/MEMORY transfer rates take a look at it for better start point of your problem
[Notes]
I strongly recommend to use RDTSC instruction for time measurement
especially for L1 as anything else is too slow. Do not forget to set process affinity to single CPU because all cores have their own counter and their count differs a lot even on the same input Clock !!!
Adjust the CPU clock to Maximum for variable clock computers and do not forget to account for RDTSC overflow if you use just 32bit part (modern CPU overflow 32bit counter in a second). For time computation use CPU clock (measure it or use registry value)
t0 <- RDTSC
Sleep(250);
t1 <- RDTSC
CPU f=(t1-t0)<<2 [Hz]
set process affinity to single CPU
all CPU cores have usually their own L1,L2 caches so on multi-task OS you can measure confusing things if you do not do this
do graphical output (diagram)
then you will see what actually happens in that link above I posted quite a few plots
use highest process priority available by OS