Time performance when permuting and casting double to float

后端 未结 3 1596
花落未央
花落未央 2021-01-05 07:39

I have some big arrays given by MATLAB to C++ (therefore I need to take them as they are) that needs casting and permuting (row-mayor, column mayor issues).

The arr

3条回答
  •  滥情空心
    2021-01-05 07:40

    The problem in this example is cache locality. Looking at the assignment, the fastest-changing index is j but this has the largest effect on the address of the write in the target array:

    img[i + k*size_proj[1] + j*size_proj[0] * size_proj[1]] = 
    

    Notice that j is multiplied by 2 constants. Every change in j is likely to cause the result to be written to a new cache line.

    The solution is this case is to invert the order of the loops:

        const auto K = size_proj[0];
        const auto I = size_proj[1];
        const auto J = size_proj[2];
        for (int j = 0; j < J; j++)
            for (int i = 0; i < I; i++)
                for (int k = 0; k < K; k++)
                    img[i + k * I  + j * K * I] = (float)imgaux[k + i * K + j * K * I];
    

    Which (on my laptop) brings us down from:

    Time permuting and casting the input 4.416232
    

    to:

    Time permuting and casting the input 0.844341
    

    Which I think you'll agree is something of an improvement.

提交回复
热议问题