Standard C++11 code equivalent to the PEXT Haswell instruction (and likely to be optimized by compiler)
问题 The Haswell architectures comes up with several new instructions. One of them is PEXT (parallel bits extract) whose functionality is explained by this image (source here): It takes a value r2 and a mask r3 and puts the extracted bits of r2 into r1 . My question is the following: what would be the equivalent code of an optimized templated function in pure standard C++11, that would be likely to be optimized to this instruction by compilers in the future. 回答1: Here is some code from Matthew