Summary:
memcpy seems unable to transfer over 2GB/sec on my system in a real or test application. What can I do to get faster memory-to-memory copies?
Full d
One source I would recommend you read is MPlayer's fast_memcpy
function. Also consider the expected usage patterns, and note that modern cpus have special store instructions which let you inform the cpu whether or not you will need to read back the data you're writing. Using the instructions that indicate you won't be reading back the data (and thus it doesn't need to be cached) can be a huge win for large memcpy
operations.