I am writing a C code for measuring the number of clock cycles needed to acquire a semaphore. I am using rdtsc, and before doing the measurement on the semaphore, I call rdt
In the face of thermal and idle throttling, mouse-motion and network traffic interrupts, whatever it's doing with the GPU, and all the other overhead that a modern multicore system can absorb without anyone much caring, I think your only reasonable course for this is to accumulate a few thousand individual samples and just toss the outliers before taking the median or mean (not a statistician but I'll venture it won't make much difference here).
I'd think anything you do to eliminate the noise of a running system will skew the results much worse than just accepting that there's no way you'll ever be able to reliably predict how long it'll take anything to complete these days.