Why is vectorization, faster in general, than loops?

后端未结

关注

 3  2157

我在风中等你 2020-11-28 19:13

Why, at the lowest level of the hardware performing operations and the general underlying operations involved (i.e.: things general to all programming languages\' actual imp

3条回答

庸人自扰 (楼主)

2020-11-28 19:40
Vectorization has two main benefits.
1. The primary benefit is that hardware designed to support vector instructions generally has hardware that is capable of performing multiple ALU operations in parallel when vector instructions are used. For example, if you ask it to perform 16 additions with a 16-element vector instruction, it may have 16 adders that can do all the additions at once, in parallel. The only way to access all those adders¹ is through vectorization. With scalar instructions you just get the 1 lonely adder.
2. There is usually some overhead saved by using vector instructions. You load and store data in big chunks (up to 512 bits at a time on some recent Intel CPUs) and each loop iteration does more work so the loop overhead is generally lower in a relative sense², and you need fewer instructions to do the same work so the CPU front-end overhead is lower, etc.
Finally, your dichotomy between loops and vectorization is odd. When you take non-vector code and vectorize it, you are generally going to end up with a loop if there was a loop there before, or not if there wasn't. The comparison is really between scalar (non-vector) instructions and vector instructions.

¹ Or at least 15 of the 16, perhaps one is used also to do scalar operations.

² You could probably get a similar loop-overhead benefit in the scalar case at the cost of a lot of loop unrolling.
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...