Summary:
memcpy seems unable to transfer over 2GB/sec on my system in a real or test application. What can I do to get faster memory-to-memory copies?
Full d
You can write a better implementation of memcpy using SSE2 registers. The version in VC2010 does this already. So the question is more, if you are handing it aligned memory.
Maybe you can do better then the version of VC 2010, but it does need some understanding, of how to do it.
PS: You can pass the buffer to the user mode program in an inverted call, to prevent the copy altogether.