Choosing SSE instruction execution domains in mixed contexts
问题 I am playing with a bit of SSE assembly code in which I do not have enough xmm registers to keep all the temporary results and useful constants in registers at the same time. As a workaround, for some constant vectors that have identical components, I “compress” several vectors into a single xmm register, xmm14 below. I use the pshufd instruction to decompress the constant vector I need. This instruction has a bit of latency, but since it takes a source and a destination register, it is