Why vectorizing the loop does not have performance improvement

前端 未结 4 1716
难免孤独
难免孤独 2020-11-27 02:17

I am investigating the effect of vectorization on the performance of the program. In this regard, I have written following code:

#include 
#in         


        
4条回答
  •  天命终不由人
    2020-11-27 03:00

    EDIT: Modified the answer a lot. Also, please disregard most of what I wrote before about Mystical's answer not being entirely correct. Though, I still do not agree it being bottlenecked by memory, as despite doing a very wide variety of tests, I couldn't see any signs of the original code being bound by memory speed. Meanwhile it kept showing clear signs of being CPU-bound.


    There can be many reasons. And since the reason[s] can be very hardware-dependent, I decided I shouldn't speculate based on guesses. Just going to outline these things I encountered during later testing, where I used a much more accurate and reliable CPU time measuring method and looping-the-loop 1000 times. I believe this information could be of help. But please take it with a grain of salt, as it's hardware dependent.

    • When using instructions from the SSE family, vectorized code I got was over 10% faster vs. non-vectorized code.
    • Vectorized code using SSE-family and vectorized code using AVX ran more or less with the same performance.
    • When using AVX instructions, non-vectorized code ran the fastest - 25% or more faster than every other thing I tried.
    • Results scaled linearly with CPU clock in all cases.
    • Results were hardly affected by memory clock.
    • Results were considerably affected by memory latency - much more than memory clock, but not nearly as much as CPU clock affected the results.

    WRT Mystical's example of running nearly 1 iteration per clock - I didn't expect the CPU scheduler to be that efficient and was assuming 1 iteration every 1.5-2 clock ticks. But to my surprise, that is not the case; I sure was wrong, sorry about that. My own CPU ran it even more efficiently - 1.048 cycles/iteration. So I can attest to this part of Mystical's answer to be definitely right.

提交回复
热议问题