I tried out linux\' perf utility today and am having trouble in interpreting its results. I\'m used to valgrind\'s callgrind which is of course a totally different approach
Unless your program has very few functions and hardly ever calls a system function or I/O, profilers that sample the program counter won't tell you much, as you're discovering. In fact, the well-known profiler gprof was created specifically to try to address the uselessness of self-time-only profiling (not that it succeeded).
What actually works is something that samples the call stack (thereby finding out where the calls are coming from), on wall-clock time (thereby including I/O time), and report by line or by instruction (thereby pinpointing the function calls that you should investigate, not just the functions they live in).
Furthermore, the statistic you should look for is percent of time on stack, not number of calls, not average inclusive function time. Especially not "self time". If a call instruction (or a non-call instruction) is on the stack 38% of the time, then if you could get rid of it, how much would you save? 38%! Pretty simple, no?
An example of such a profiler is Zoom.
There are more issues to be understood on this subject.
Added: @caf got me hunting for the perf info, and since you included the command-line argument -g it does collect stack samples. Then you can get a call-tree report.
Then if you make sure you're sampling on wall-clock time (so you get wait time as well as cpu time) then you've got almost what you need.