simd

What is the avx2 instruction to store 8 integers?

狂风中的少年 提交于 2021-02-05 11:32:06
问题 I want to store the 8 integers from a __m256i variable to an array of 8 x 32 bit int s. I thought the instruction for that would be _mm256_store_epi32 , but I get an error that this instruction doesn't even exist! 回答1: Have a look at the Intel Intrinsics Guide. Depending on whether your destination is aligned, you need _mm256_store_si256 or _mm256_storeu_si256. 来源: https://stackoverflow.com/questions/43304021/what-is-the-avx2-instruction-to-store-8-integers

Why floating point registers are different than general purpose ones

筅森魡賤 提交于 2021-02-05 07:13:25
问题 Most architectures have different set of registers for storing regular integers and floating points. From a binary storage point of view, it shouldn't matter where things are stored right? it's just 1's and 0's, couldn't they pipe the same general purpose registers into floating point ALUs? SIMD ( xmm in x64) registers are capable of storing both Floating point and regular integers, so why doesn't the same concept apply to regular registers? 回答1: For practical processor design, there are a

How to simulate pcmpgtq on sse2?

寵の児 提交于 2021-02-05 05:13:17
问题 PCMPGTQ was introduced in sse4.2, and it provides a greater than signed comparison for 64 bit numbers that yields a mask. How does one support this functionality on instructions sets predating sse4.2? Update: This same question applies to ARMv7 with Neon which also lacks a 64-bit comparator. The sister question to this is found here: What is the most efficient way to support CMGT with 64bit signed comparisons on ARMv7a with Neon? 回答1: __m128i pcmpgtq_sse2 (__m128i a, __m128i b) { __m128i r =

How to simulate pcmpgtq on sse2?

醉酒当歌 提交于 2021-02-05 05:11:43
问题 PCMPGTQ was introduced in sse4.2, and it provides a greater than signed comparison for 64 bit numbers that yields a mask. How does one support this functionality on instructions sets predating sse4.2? Update: This same question applies to ARMv7 with Neon which also lacks a 64-bit comparator. The sister question to this is found here: What is the most efficient way to support CMGT with 64bit signed comparisons on ARMv7a with Neon? 回答1: __m128i pcmpgtq_sse2 (__m128i a, __m128i b) { __m128i r =

How to simulate pcmpgtq on sse2?

喜欢而已 提交于 2021-02-05 05:11:01
问题 PCMPGTQ was introduced in sse4.2, and it provides a greater than signed comparison for 64 bit numbers that yields a mask. How does one support this functionality on instructions sets predating sse4.2? Update: This same question applies to ARMv7 with Neon which also lacks a 64-bit comparator. The sister question to this is found here: What is the most efficient way to support CMGT with 64bit signed comparisons on ARMv7a with Neon? 回答1: __m128i pcmpgtq_sse2 (__m128i a, __m128i b) { __m128i r =

How to simulate pcmpgtq on sse2?

本小妞迷上赌 提交于 2021-02-05 05:05:43
问题 PCMPGTQ was introduced in sse4.2, and it provides a greater than signed comparison for 64 bit numbers that yields a mask. How does one support this functionality on instructions sets predating sse4.2? Update: This same question applies to ARMv7 with Neon which also lacks a 64-bit comparator. The sister question to this is found here: What is the most efficient way to support CMGT with 64bit signed comparisons on ARMv7a with Neon? 回答1: __m128i pcmpgtq_sse2 (__m128i a, __m128i b) { __m128i r =

What are the 128-bit to 512-bit registers used for?

我们两清 提交于 2021-02-04 07:14:02
问题 After looking at a table of registers in the x86/x64 architecture, I noticed that there's a whole section of 128, 256, and 512-bit registers that I've never seen them being used in assembly, or decompiled C/C++ code: XMM(0-15) for 128, YMM(0-15) for 256, ZMM(0-31) 512. After doing a bit of digging what I've gathered is that you have to use 2 64 bit operations in order to perform math on a 128 bit number, instead of using generic add , sub , mul , div operations. If this is the case, then what

What are the 128-bit to 512-bit registers used for?

二次信任 提交于 2021-02-04 07:13:24
问题 After looking at a table of registers in the x86/x64 architecture, I noticed that there's a whole section of 128, 256, and 512-bit registers that I've never seen them being used in assembly, or decompiled C/C++ code: XMM(0-15) for 128, YMM(0-15) for 256, ZMM(0-31) 512. After doing a bit of digging what I've gathered is that you have to use 2 64 bit operations in order to perform math on a 128 bit number, instead of using generic add , sub , mul , div operations. If this is the case, then what

What are the 128-bit to 512-bit registers used for?

泪湿孤枕 提交于 2021-02-04 07:13:15
问题 After looking at a table of registers in the x86/x64 architecture, I noticed that there's a whole section of 128, 256, and 512-bit registers that I've never seen them being used in assembly, or decompiled C/C++ code: XMM(0-15) for 128, YMM(0-15) for 256, ZMM(0-31) 512. After doing a bit of digging what I've gathered is that you have to use 2 64 bit operations in order to perform math on a 128 bit number, instead of using generic add , sub , mul , div operations. If this is the case, then what

What are the 128-bit to 512-bit registers used for?

大城市里の小女人 提交于 2021-02-04 07:12:47
问题 After looking at a table of registers in the x86/x64 architecture, I noticed that there's a whole section of 128, 256, and 512-bit registers that I've never seen them being used in assembly, or decompiled C/C++ code: XMM(0-15) for 128, YMM(0-15) for 256, ZMM(0-31) 512. After doing a bit of digging what I've gathered is that you have to use 2 64 bit operations in order to perform math on a 128 bit number, instead of using generic add , sub , mul , div operations. If this is the case, then what