A common technique in parallelization is to fuse nested for loops like this
for(int i=0; i
to
Considering that you're trying to fuse a triangle with the intent of parallelizing, the non-obvious solution is to choose a non-trivial mapping of x to (i,j):
j |\ i ->
| \ ____
| | \ => |\\ |
V |___\ |_\\__|
After all, you're not processing them in any special order, so the exact mapping is a don't care.
So calculate x->i,j as you'd do for a rectangle, but if i > j then { i=N-i, j = N-j } (mirror Y axis, then mirror X axis).
____
|\\ | |\ |\
|_\\__| ==> |_\ __ => | \
/ | | \
/__| |___\