问题
consider we have this:
....
pxor xmm1, xmm1
movdqu xmm0, [reax]
pcmpeqb xmm0, xmm1
pmovmskb eax, xmm0
test ax , ax
jz .zero
...
is there any way to not use 'pmovmskb' and test the bitmask directly from xmm0 (to check if it's zero) ? is there any SSE instruction for this action ?
in fact, im searching for something like 'ptest xmm0, xmm0' action but in SSE2 ... not SSE4
回答1:
It's generally not worth using SSE4.1 ptest xmm0,xmm0 on a pcmpeqb result, especially not if you're branching.
pmovmskb is 1 uop, and cmp or test can macro-fuse with jnz into another single uop on both Intel and AMD CPUs. Total of 2 uops to branch on a pcmpeqb result with pmovmsk + test/jcc
But ptest is 2 uops, and its 2nd uop can't macro-fuse with a following branch. Total of 3 uops to branch on a vector with ptest + jcc.
It's break-even when you can use ptest directly, without needing a pcmp, e.g. testing any / all bits in the whole vector (or with a mask, some bits). And actually a win if you use it for cmov or setcc instead of a branch. It's also a win for code-size, even though same number of uops.
You can amortize the checking over multiple vectors. e.g. por some vectors together and then check that all of the bytes zero. Or pminub some vectors together and then check for any zeros. (glibc string functions like strlen and strchr use this trick to check a whole cache-line of vectors in parallel, before sorting out where it came from after leaving the loop.)
You can combine pcmpeq results instead of raw inputs, e.g. for memchr. In that case you can use pand instead of pminub to get a zero in an element where any input has a zero. Some CPUs run pand on more ports than pminub, so less competition for vector ALU.
Also note that pmovmskb zero-extends into EAX; you can test eax,eax instead of wasting a prefix byte to only test AX.
回答2:
Use ptest:
ptest xmm0, xmm0
jz .zero
ptest a, b sets ZF if a ∧ b is zero and CF if a ∧ ¬ b is zero.
Note however that SSE 4.1 is required for ptest to be present.
Otherwise, I suppose your approach is as good as it gets.
来源:https://stackoverflow.com/questions/60446759/sse2-test-xmm-bitmask-directly-without-using-pmovmskb