x86-64 usage of LFENCE
I'm trying to understand the right way to use fences when measuring time with RDTSC/RDTSCP. Several questions on SO related to this have already been answered elaborately. I have gone through a few of them. I have also gone through this really helpful article on the same topic: http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/ia-32-ia-64-benchmark-code-execution-paper.pdf However, in another online blog, there's an example of using LFENCE instead of CPUID on x86. I was wondering how LFENCE prevents earlier stores from contaminating the RDTSC measurements. E.g. <Instr A>