Summary:
memcpy seems unable to transfer over 2GB/sec on my system in a real or test application. What can I do to get faster memory-to-memory copies?
Full d
First of all, you need to check that memory is aligned on 16 byte boundary, otherwise you get penalties. This is the most important thing.
If you don't need a standard-compliant solution, you could check if things improve by using some compiler specific extension such as memcpy64
(check with your compiler doc if there's something available). Fact is that memcpy
must be able to deal with single byte copy, but moving 4 or 8 bytes at a time is much faster if you don't have this restriction.
Again, is it an option for you to write inline assembly code?