Fusing a triangle loop for parallelization, calculating sub-indices

前端 未结 3 1754
余生分开走
余生分开走 2020-12-10 06:42

A common technique in parallelization is to fuse nested for loops like this

for(int i=0; i

to

3条回答
  •  醉酒成梦
    2020-12-10 07:14

    I'm wondering if there is a simpler or more efficient way of doing this?

    Yes, the code you had to begin with. Please keep the following in mind:

    • There exists no case where floating point arithmetic is ever faster than plain integers.
    • There does however exist plenty of cases where floating point is far slower than plain integers. FPU or no FPU.
    • Float variables are generally larger than plain integers on most systems and therefore slower for that reason alone.
    • The first version of the code is likely most friendly to the cache memory. As for any case of manual optimization, this depends entirely on what CPU you are using.
    • Division is generally slow on most systems, no matter if done to plain integers or floats.
    • Any form of complex arithmetic is going to be slower than simple counting.

    So your second example is pretty much guaranteed to be far slower than the first example, for any given CPU in the world. In addition, it is also completely unreadable.

提交回复
热议问题