I have a function foo(int[] nums) which I understand is essentially equivalent to foo(int* nums). Inside foo I need to copy the conte
A simple loop is slightly faster for about 10-20 bytes and less (It's a single compare+branch, see OP_T_THRES), but for larger sizes, memcpy is faster and portable.
Additionally, if the amount of memory you want to copy is constant, you can use memcpy to let the compiler decide what method to use.
Side note: the optimizations that memcpy uses may significantly slow your program down in a multithreaded environment when you're copying a lot of data above the OP_T_THRES size mark since the instructions this invokes are not atomic and the speculative execution and caching behavior for such instructions doesn't behave nicely when multiple threads are accessing the same memory. Easiest solution is to not share memory between threads and only merge the memory at the end. This is good multi-threading practice anyway.