A common technique in parallelization is to fuse nested for loops like this
for(int i=0; i
to
Considering that you're trying to fuse a triangle with the intent of parallelizing, the non-obvious solution is to choose a non-trivial mapping of x to (i,j):
j |\ i ->
| \ ____
| | \ => |\\ |
V |___\ |_\\__|
After all, you're not processing them in any special order, so the exact mapping is a don't care.
So calculate x->i,j
as you'd do for a rectangle, but if i > j
then { i=N-i, j = N-j }
(mirror Y axis, then mirror X axis).
____
|\\ | |\ |\
|_\\__| ==> |_\ __ => | \
/ | | \
/__| |___\
The most sane form is of course the first form.
That said, the fused form is better done with conditionals:
int i = 0; int j = 0;
for(int x=0; x<(n*(n+1)/2); x++) {
// ...
++j;
if (j>i)
{
j = 0;
++i;
}
}
I'm wondering if there is a simpler or more efficient way of doing this?
Yes, the code you had to begin with. Please keep the following in mind:
So your second example is pretty much guaranteed to be far slower than the first example, for any given CPU in the world. In addition, it is also completely unreadable.