Speedup a short to float cast?

后端未结

关注

 7  2043

孤独总比滥情好 2020-12-17 01:39

I have a short to float cast in C++ that is bottlenecking my code.

The code translates from a hardware device buffer which is natively shorts, this represents the in

7条回答

鱼传尺愫 (楼主)

2020-12-17 01:53

and You can use OpenMP to hire every core of your CPU, and it is simple just do as following:

#include 
float factor=  1.0f/value;
#pragma omp parallel for 
for (int i = 0; i < W*H; i++)//25% of time is spent doing this
{
    int value = source[i];//ushort -> int
    destination[i] = value*factor;//int*float->float
}

here is the result based on previous program, just add the like this:

#pragma omp parallel for 
for (int it = 0; it < iterations; it++){
 ...
}

and then here is the result

beta@beta-PC ~
$ g++ -o opt.exe opt.c -msse4.1 -fopenmp

beta@beta-PC ~
$ opt
0.748
2.90873e+007
0.484
2.90873e+007
0.796
2.90873e+007


beta@beta-PC ~
$ g++ -o opt.exe opt.c -msse4.1 -O3


beta@beta-PC ~
$ opt
1.404
2.90873e+007
1.404
2.90873e+007
1.404
2.90873e+007

. .

result shows 100% improvment with openmp. Visual C++ supports openmp too.

0 讨论(0)

查看其它7个回答