Speedup a short to float cast?
I have a short to float cast in C++ that is bottlenecking my code. The code translates from a hardware device buffer which is natively shorts, this represents the input from a fancy photon counter. float factor= 1.0f/value; for (int i = 0; i < W*H; i++)//25% of time is spent doing this { int value = source[i];//ushort -> int destination[i] = value*factor;//int*float->float } A few details Value should go from 0 to 2^16-1, it represents the pixel values of a highly sensitive camera I'm on a multicore x86 machine with an i7 processor (i7 960 which is SSE 4.2 and 4.1). Source is aligned to an 8