I am investigating performance hotspots in an application which spends 50% of its time in memmove(3). The application inserts millions of 4-byte integers into sorted arrays,
Your memmove
calls are shuffling memory along by 2 to 128 bytes, while your memcpy
source and destination are completely different. Somehow that's accounting for the performance difference: if you copy to the same place, you'll see memcpy
ends up possibly a smidge faster, e.g. on ideone.com:
memmove (002) 0.0610362
memmove (004) 0.0554264
memmove (008) 0.0575859
memmove (016) 0.057326
memmove (032) 0.0583542
memmove (064) 0.0561934
memmove (128) 0.0549391
memcpy 0.0537919
Hardly anything in it though - no evidence that writing back to an already faulted in memory page has much impact, and we're certainly not seeing a halving of time... but it does show that there's nothing wrong making memcpy
unnecessarily slower when compared apples-for-apples.