I am trying to measure the time taken by some code inside Linux kernel at very high accuracy by a Linux kernel module.
For this purpose, I have tried rdtscl
Some of things mentioned here are accurate like TSC not being a measure of time because of S states in the CPU. But I think TSC can be used for relative sequencing even in a multi-core environment. There is a flag called TSCInvariant which is set to true in Intel CPUs >= nehalem arch. In those CPUs the TSC varies at a constant rate on all cores. Therefore you will never go back in TSC count if you get context switched to a different core.
In Ubuntu you can do sudo apt-get install cpuid
cpuid | grep TscInvariant to verify it in your desktop.