Visual Studio includes support for __forceinline. The Microsoft Visual Studio 2005 documentation states:
The __forceinline keyword overrides the co
For SIMD code.
SIMD code often uses constants/magic numbers. In a regular function, every const __m128 c = _mm_setr_ps(1,2,3,4); becomes a memory reference.
With __forceinline, compiler can load it once and reuse the value, unless your code exhausts registers (usually 16).
CPU caches are great but registers are still faster.
P.S. Just got 12% performance improvement by __forceinline alone.