I have a multiply-add kernel inside my application and I want to increase its performance.
I use an Intel Core i7-960 (3.2 GHz clock) and have already manually impl
Making this an answer from my comment.
On a non-server Linux distro I believe the interrupt timer is usually set to 250Hz by default, though that varies by distro it's almost always over 150. That speed is necessary to provide a 30+fps interactive GUI. That interrupt timer is used to preempt code. That means 150+ times per second your code is interrupted and the scheduler code runs and decides what to give more time to. It sounds like you're doing great to simply get 80% of max speed, no problems there. If you need better install say, Ubuntu Server (100Hz default) and tweak the kernel (preemption off) a bit
EDIT: On a 2+ core system this has much less impact as your process will almost definitely be slapped onto one core and more-or-less left to do its own thing.