Poor memcpy Performance on Linux

后端未结

关注

 7  2010

执笔经年 2021-01-29 21:04

We have recently purchased some new servers and are experiencing poor memcpy performance. The memcpy performance is 3x slower on the servers compared to our laptops.

7条回答

刺人心 (楼主)

2021-01-29 21:46

I modified the benchmark to use the nsec timer in Linux and found similar variation on different processors, all with similar memory. All running RHEL 6. Numbers are consistent across multiple runs.

Sandy Bridge E5-2648L v2 @ 1.90GHz, HT enabled, L2/L3 256K/20M, 16 GB ECC
malloc for 1073741824 took 47us 
memset for 1073741824 took 643841us
memcpy for 1073741824 took 486591us 

Westmere E5645 @2.40 GHz, HT not enabled, dual 6-core, L2/L3 256K/12M, 12 GB ECC
malloc for 1073741824 took 54us
memset for 1073741824 took 789656us 
memcpy for 1073741824 took 339707us

Jasper Forest C5549 @ 2.53GHz, HT enabled, dual quad-core, L2 256K/8M, 12 GB ECC
malloc for 1073741824 took 126us
memset for 1073741824 took 280107us 
memcpy for 1073741824 took 272370us

Here are results with inline C code -O3

Sandy Bridge E5-2648L v2 @ 1.90GHz, HT enabled, 256K/20M, 16 GB
malloc for 1 GB took 46 us
memset for 1 GB took 478722 us
memcpy for 1 GB took 262547 us

Westmere E5645 @2.40 GHz, HT not enabled, dual 6-core, 256K/12M, 12 GB
malloc for 1 GB took 53 us
memset for 1 GB took 681733 us
memcpy for 1 GB took 258147 us

Jasper Forest C5549 @ 2.53GHz, HT enabled, dual quad-core, 256K/8M, 12 GB
malloc for 1 GB took 67 us
memset for 1 GB took 254544 us
memcpy for 1 GB took 255658 us

For the heck of it, I also tried making the inline memcpy do 8 bytes at a time. On these Intel processors it made no noticeable difference. Cache merges all of the byte operations into the minimum number of memory operations. I suspect the gcc library code is trying to be too clever.

0 讨论(0)

查看其它7个回答