I am doing image processing in C that requires copying large chunks of data around memory - the source and destination never overlap.
What is the absolute fastest wa
At any optimisation level of -O1
or above, GCC will use builtin definitions for functions like memcpy
- with the right -march
parameter (-march=pentium4
for the set of features you mention) it should generate pretty optimal architecture-specific inline code.
I'd benchmark it and see what comes out.