sse | 易学教程

What does AT&T syntax do about ambiguity between other mnemonics and operand-size suffixes?

阅读更多关于 What does AT&T syntax do about ambiguity between other mnemonics and operand-size suffixes?

问题 In AT&T syntax instructions often have to be suffixed with the appropriate operand size, with q for operations on 64-bit operands. However in MMX and SSE there is also movq instruction, with the q being in the original Intel mnemonic and not an additional suffix. So how will this be represented in AT&T? Is another q suffix needed like movqq %mm1, %mm0 movqq %xmm1, %xmm0 or not? And if there are any other instructions that end like AT&T suffixes (like paddd , slld ), do they work the same way?

How to simulate pcmpgtq on sse2?

阅读更多关于 How to simulate pcmpgtq on sse2?

问题 PCMPGTQ was introduced in sse4.2, and it provides a greater than signed comparison for 64 bit numbers that yields a mask. How does one support this functionality on instructions sets predating sse4.2? Update: This same question applies to ARMv7 with Neon which also lacks a 64-bit comparator. The sister question to this is found here: What is the most efficient way to support CMGT with 64bit signed comparisons on ARMv7a with Neon? 回答1: __m128i pcmpgtq_sse2 (__m128i a, __m128i b) { __m128i r =

How to simulate pcmpgtq on sse2?

阅读更多关于 How to simulate pcmpgtq on sse2?

How to simulate pcmpgtq on sse2?

阅读更多关于 How to simulate pcmpgtq on sse2?

How to simulate pcmpgtq on sse2?

阅读更多关于 How to simulate pcmpgtq on sse2?

Conditional SSE/AVX add or zero elements based on compare

阅读更多关于 Conditional SSE/AVX add or zero elements based on compare

问题 I have the following __m128 vectors: v_weight v_entropy I need to add v_entropy to v_weight only where elements in v_weight are not 0f. Obviously _mm_add_ps() adds all elements regardless. I can compile up to AVX, but not AVX2. EDIT I do know beforehand how many elements in v_weight will be 0 (there will always be either 0 or the last 1, 2, or 3 elements). If it's easier, how do I zero-out the corresponding elements in v_entropy ? 回答1: The cmpeq/cmpgt instructions create a mask, all ones or

What are the 128-bit to 512-bit registers used for?

阅读更多关于 What are the 128-bit to 512-bit registers used for?

问题 After looking at a table of registers in the x86/x64 architecture, I noticed that there's a whole section of 128, 256, and 512-bit registers that I've never seen them being used in assembly, or decompiled C/C++ code: XMM(0-15) for 128, YMM(0-15) for 256, ZMM(0-31) 512. After doing a bit of digging what I've gathered is that you have to use 2 64 bit operations in order to perform math on a 128 bit number, instead of using generic add , sub , mul , div operations. If this is the case, then what

What are the 128-bit to 512-bit registers used for?

阅读更多关于 What are the 128-bit to 512-bit registers used for?

What are the 128-bit to 512-bit registers used for?

阅读更多关于 What are the 128-bit to 512-bit registers used for?

What are the 128-bit to 512-bit registers used for?

阅读更多关于 What are the 128-bit to 512-bit registers used for?