I am doing image processing in C that requires copying large chunks of data around memory - the source and destination never overlap.
What is the absolute fastest wa
If specific to Intel processors, you might benefit from IPP. If you know it will run with an Nvidia GPU perhaps you could use CUDA - in both cases it may be better to look wider than optimising memcpy() - they provide opportunities for improving your algorithm at a higher level. They are both however reliant on specific hardware.