Speedup a short to float cast?

后端 未结 7 2043
孤独总比滥情好
孤独总比滥情好 2020-12-17 01:39

I have a short to float cast in C++ that is bottlenecking my code.

The code translates from a hardware device buffer which is natively shorts, this represents the in

7条回答
  •  鱼传尺愫
    2020-12-17 01:53

    and You can use OpenMP to hire every core of your CPU, and it is simple just do as following:

    #include 
    float factor=  1.0f/value;
    #pragma omp parallel for 
    for (int i = 0; i < W*H; i++)//25% of time is spent doing this
    {
        int value = source[i];//ushort -> int
        destination[i] = value*factor;//int*float->float
    }
    

    here is the result based on previous program, just add the like this:

    #pragma omp parallel for 
    for (int it = 0; it < iterations; it++){
     ...
    }
    

    and then here is the result

    beta@beta-PC ~
    $ g++ -o opt.exe opt.c -msse4.1 -fopenmp
    
    beta@beta-PC ~
    $ opt
    0.748
    2.90873e+007
    0.484
    2.90873e+007
    0.796
    2.90873e+007
    
    
    beta@beta-PC ~
    $ g++ -o opt.exe opt.c -msse4.1 -O3
    
    
    beta@beta-PC ~
    $ opt
    1.404
    2.90873e+007
    1.404
    2.90873e+007
    1.404
    2.90873e+007
    

    . .

    result shows 100% improvment with openmp. Visual C++ supports openmp too.

提交回复
热议问题