I have this double for-loop, where I have both row-order and column-order array indexing, which should be bad for performance.
for (int row = 0; row < h
This is never going to be very fast as you'll probably have a number of cache misses, you'll either have to step to the one matrix with a large pitch or the other, there's no escaping that. The problem here is that a computer likes successive memory accesses to be close together, which in your algorithm is not the case the indexing of array_a skips by height elements at a time due to the col*height term. To fix that you could switch around the for loops, but then you'd have the same problem with the width*(height-1 -row) term in array_b.
You could rewrite one of the arrays to match the ordering of the other, but then you would have the exact same problem in the code which does the rewrite, so it depends on whether you need to do this kind of thing more than once on the same data, if you do, then it makes sense to first rewrite one of the matrices like Poita_ described, otherwise you'd best leave the algorithm as is.