SSE-copy, AVX-copy and std::copy performance

前端 未结 5 1601
庸人自扰
庸人自扰 2020-12-08 03:28

I\'m tried to improve performance of copy operation via SSE and AVX:

    #include 

    const int sz = 1024;
    float *mas = (float *)_mm_         


        
5条回答
  •  小蘑菇
    小蘑菇 (楼主)
    2020-12-08 04:17

    The problem is that your test does a poor job to migrate some factors in the hardware that make benchmarking hard. To test this, I've made my own test case. Something like this:

    for blah blah:
        sleep(500ms)
        std::copy
        sse
        axv
    

    output:

    SSE: 1.11753x faster than std::copy
    AVX: 1.81342x faster than std::copy
    

    So in this case, AVX is a bunch faster than std::copy. What happens when I change to test case to..

    for blah blah:
        sleep(500ms)
        sse
        axv
        std::copy
    

    Notice that absolutely nothing changed, except the order of the tests.

    SSE: 0.797673x faster than std::copy
    AVX: 0.809399x faster than std::copy
    

    Woah! how is that possible? The CPU takes a while to ramp up to full speed, so tests that are run later have an advantage. This question has 3 answers now, including an 'accepted' answer. But only the one with the lowest amount of upvotes was on the right track.

    This is one of the reasons why benchmarking is hard and you should never trust anyone's micro-benchmarks unless they've included detailed information of their setup. It isn't just the code that can go wrong. Power saving features and weird drivers can completely mess up your benchmark. One time i've measured an factor 7 difference in performance by toggling a switch in the bios that less than 1% of notebooks offer.

提交回复
热议问题