I have a short to float cast in C++ that is bottlenecking my code.
The code translates from a hardware device buffer which is natively shorts, this represents the in
This is not a valid answer, don't take it as it, but I'm actually wondering how would the code behave by using a 256k look-up table. (basically a 'short to float' table with 65536 entries).
A CoreI7 has about 8 megabytes of cache I believe, so the look-up table would fit in the data cache.
I really wonder how that would impact the performance :)