I am looking for some reference on average latencies for lock cmpxchg instruction for various intel processors. I am not able to locate any good reference on the topic
The best x86 instruction latency reference is probably that contained in Agner's optimization manuals, based on actual empirical measurements on various Intel/AMD/VIA chips and frequently updated for the latest CPUs on the market.
Unfortunately, I don't see the CMPXCHG
instruction listed in the instruction latency tables, but page 4 does state:
Instructions with a LOCK prefix have a long latency that depends on cache organization and possibly RAM speed. If there are multiple processors or cores or direct memory access (DMA) devices then all locked instructions will lock a cache line for exclusive access, which may involve RAM access. A LOCK prefix typically costs more than a hundred clock cycles, even on single-processor systems. This also applies to the XCHG instruction with a memory operand.