Everyone always says to profile your program before performing optimizations but no-one ever describes how to do so.
What are your practices for profiling C code?
For the sake of completion i would add oprofile. It is especially interesting if you want to benchmark the kernel.