I\'m trying to figure out an efficient way to load compile time constant floats into SSE(2/3) registers. I\'ve tried doing simple code like this,
const __m128 x
First off, what optimization level are you compiling at? It's not uncommon to see that sort of codegen at -O0 or -O1, but I would be quite surprised to see it with -O2 or higher in most compilers.
Second, there are no immediate loads in SSE. You can do a load immediate to a GPR, then move that value to SSE, but you cannot conjure other values without an actual load (ignoring certain special values like 0 or (int)-1, which can be produced via logical operations.
Finally, if the bad code is being generated with optimizations turned on and in a performance-critical location, please file a bug against your compiler.