Suppose I want to measure the time that a certain piece of code takes. For that I would normally do something like this
clock_t startTime = clock();
//do stu
I have prepared two very simple classes. The first one ProfileHelper the class populate the start time in the constructor and the end time in the destructor. The second class ProfileHelperStatistic is a container with extra statistical capability (a std::multimap + few methods to return average, standard deviation and other funny stuff).
I have used this idea often for profiling. I guess you could make it work even in a multi-thread environment. It will require a bit of work, but I don't think it will be so difficult.
Have a look at this question for more information C++ Benchmark tool.