Here are a couple catch all paths for optimization.
There is no one way for optimization problems... they are always hand tuned to the hardware/software/system-considerations.
Assuming you have the best algorithm:
- compile with "show assembly output" and "highest optimization"
- look at the assembly output
- identify inefficiencies that ruin compiler optimization or bad caching
- Re-code the snippet
- if it is still bad loop back to 2.
- done
Example seen here: What is the fastest way to swap values in C?
General tips:
http://www.ceet.niu.edu/faculty/kuo/exp/exp6/figuree6-1.jpg:
- Try it in floating point first
- Try it in fixed point second
- If you are really disparate and have lots of time and money, try it in assembly
http://www.asc.edu/seminars/optimization/fig1.gif:
- Check if it is memory OR I/O OR CPU bound
- Attack which ever is the limiting factor