Cycles/cost for L1 Cache hit vs. Register on x86?

后端 未结 4 1209
春和景丽
春和景丽 2020-12-04 13:42

I remember assuming that an L1 cache hit is 1 cycle (i.e. identical to register access time) in my architecture class, but is that actually true on modern x86 processors?

4条回答
  •  猫巷女王i
    2020-12-04 13:56

    Actually the cost of the L1 cache hit is almost the same as a cost of register access. It was surprising for me, but this is true, at least for my processor (Athlon 64). Some time ago I written a simple test application to benchmark efficiency of access to the shared data in a multiprocessor system. The application body is a simple memory variable incrementing during the predefined period of time. To make a comapison, I benchmarked non-shared variable at first. And during that activity I captured the result, but then during application disassembling I found that compiler was deceived my expectations and apply unwanted optimisation to my code. It just put variable in the CPU register and increment it iterativetly in the register without memory access. But real surprise was achived after I force compliler to use in-memory variable instead of register variable. On updated application I achived almost the same benchmarking results. Performance degradation was really negligeble (~1-2%) and looks like related to some side effect.

    As result:

    1) I think you can consider L1 cache as an unmanaged processor registers pool.

    2) There is no any sence to apply brutal assambly optimization by forcing compiler store frequently accesing data in processor registers. If they are really frequently accessed, they will live in the L1 cache, and due to this will have same access cost as the processor register.

提交回复
热议问题