SSE2 test xmm bitmask directly without using 'pmovmskb'

蓝咒 提交于 2021-02-11 16:40:42

问题


consider we have this:

....
pxor            xmm1, xmm1
movdqu          xmm0, [reax]
pcmpeqb         xmm0, xmm1
pmovmskb        eax,  xmm0
test            ax , ax
jz              .zero
...

is there any way to not use 'pmovmskb' and test the bitmask directly from xmm0 (to check if it's zero) ? is there any SSE instruction for this action ?

in fact, im searching for something like 'ptest xmm0, xmm0' action but in SSE2 ... not SSE4


回答1:


It's generally not worth using SSE4.1 ptest xmm0,xmm0 on a pcmpeqb result, especially not if you're branching.

pmovmskb is 1 uop, and cmp or test can macro-fuse with jnz into another single uop on both Intel and AMD CPUs. Total of 2 uops to branch on a pcmpeqb result with pmovmsk + test/jcc

But ptest is 2 uops, and its 2nd uop can't macro-fuse with a following branch. Total of 3 uops to branch on a vector with ptest + jcc.


It's break-even when you can use ptest directly, without needing a pcmp, e.g. testing any / all bits in the whole vector (or with a mask, some bits). And actually a win if you use it for cmov or setcc instead of a branch. It's also a win for code-size, even though same number of uops.


You can amortize the checking over multiple vectors. e.g. por some vectors together and then check that all of the bytes zero. Or pminub some vectors together and then check for any zeros. (glibc string functions like strlen and strchr use this trick to check a whole cache-line of vectors in parallel, before sorting out where it came from after leaving the loop.)

You can combine pcmpeq results instead of raw inputs, e.g. for memchr. In that case you can use pand instead of pminub to get a zero in an element where any input has a zero. Some CPUs run pand on more ports than pminub, so less competition for vector ALU.


Also note that pmovmskb zero-extends into EAX; you can test eax,eax instead of wasting a prefix byte to only test AX.




回答2:


Use ptest:

ptest xmm0, xmm0
jz .zero

ptest a, b sets ZF if ab is zero and CF if a ∧ ¬ b is zero.

Note however that SSE 4.1 is required for ptest to be present.

Otherwise, I suppose your approach is as good as it gets.



来源:https://stackoverflow.com/questions/60446759/sse2-test-xmm-bitmask-directly-without-using-pmovmskb

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!