avx | 易学教程

Using xmm parameter in AVX intrinsics

阅读更多关于 Using xmm parameter in AVX intrinsics

问题 Is it possible to use xmm register parameter with AVX intrinsics function ( _mm256_**_** )? My code require the usage of vecter integer operation (for load and storing data) along with vector floating point operation. The integer code is written with SSE2 intrinsics to be compatible with older CPU, while floating point is written with AVX to improve speed (there is also SSE code branch, so do not suggest this). Currently, except for using compiler flag to automatically convert all SSE

How do I disable avx instructions on a linux computer? [closed]

阅读更多关于 How do I disable avx instructions on a linux computer? [closed]

问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 7 years ago . Or more specifically, how do I ensure that /proc/cpuinfo and the CPUID opcode do not show that AVX is enabled? (For context, there is a bug on some Amazon EC2 instances, where AVX is falsely reported as active which causes programs that dynamically use AVX instructions to crash with SIGILL). I've seen this

How to clear the upper 128 bits of __m256 value?

阅读更多关于 How to clear the upper 128 bits of __m256 value?

问题 How can I clear the upper 128 bits of m2: __m256i m2 = _mm256_set1_epi32(2); __m128i m1 = _mm_set1_epi32(1); m2 = _mm256_castsi128_si256(_mm256_castsi256_si128(m2)); m2 = _mm256_castsi128_si256(m1); don't work -- Intel’s documentation for the _mm256_castsi128_si256 intrinsic says that “the upper bits of the resulting vector are undefined”. At the same time I can easily do it in assembly: VMOVDQA xmm2, xmm2 //zeros upper ymm2 VMOVDQA xmm2, xmm1 Of course I'd not like to use "and" or _mm256

Why do processors with only AVX out-perform AVX2 processors for many SIMD algorithms?

阅读更多关于 Why do processors with only AVX out-perform AVX2 processors for many SIMD algorithms?

问题 I've been investigating the benefits of SIMD algorithms in C# and C++, and found that in many cases using 128-bit registers on an AVX processor offers a better improvement than using 256-bit registers on a processor with AVX2, but I don't understand why. By improvement I mean the speed-up of a SIMD algorithm relative to a non-SIMD algorithm on the same machine. 回答1: On an AVX processor, the upper half of the 256 bit registers and floating point units are powered down by the CPU when not

Is there a version of TensorFlow not compiled for AVX instructions?

阅读更多关于 Is there a version of TensorFlow not compiled for AVX instructions?

问题 I'm trying to get TensorFlow up on my Chromebook, not the best place, I know, but I just want to get a feel for it. I haven't done much work in the Python dev environment, or in any dev environment for that matter, so bear with me. After figuring out pip, I installed TensorFlow and tried to import it, receiving this error: Python 3.5.2 (default, Nov 23 2017, 16:37:01) [GCC 5.4.0 20160609] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import tensorflow as

SSE loading ints into __m128

阅读更多关于 SSE loading ints into __m128

问题 What are the gcc's intrinsic for loading 4 ints into __m128 and 8 ints into __m256 (aligned/unaligned)? What about unsigned ints ? 回答1: Using Intel's SSE intrnisics, the ones you're looking for are: _mm_load_si128() _mm_loadu_si128() _mm256_load_si256() _mm256_loadu_si256() Documentation: https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_load_si128 https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm256_load_si256 There's no distinction between signed or

Does Xcode 4 have support for AVX?

阅读更多关于 Does Xcode 4 have support for AVX?

问题 Before I spend time and money downloading Xcode 4, can anyone tell me whether it comes with a version of gcc (or any other compiler, e.g. LLVM) which supports the AVX instruction set on Sandy Bridge CPUs (i.e. gcc -mavx on mainstream gcc builds) ? I don't seen any public release notes anywhere so it's not easy to check, and I don't really need Xcode 4 yet unless it has AVX support. 回答1: I eventually cracked and downloaded Xcode 4 - it looks like clang is the only compiler that may support AVX

Are there SIMD(SSE / AVX) instructions in the x86-compatible accelerators Intel Xeon Phi?

阅读更多关于 Are there SIMD(SSE / AVX) instructions in the x86-compatible accelerators Intel Xeon Phi?

问题 Are there SIMD(SSE / AVX) instructions in the x86-compatible accelerators MIC Intel Xeon Phi? http://en.wikipedia.org/wiki/Xeon_Phi 回答1: Yes, current generation of Intel Xeon Phi co-processors (codename "Knight's Corner" , abbreviated KNC) supports 512-bit SIMD instruction set called "Intel® Initial Many Core Instructions" (abbreviated Intel® IMCI ). Intel IMCI is not "compatible with" and is not equialent to SSE, AVX, AVX2 or AVX-512 ISA. However it's officially announced that next planned

Is NOT missing from SSE, AVX?

阅读更多关于 Is NOT missing from SSE, AVX?

问题 Is it my imagination, or is a PNOT instruction missing from SSE and AVX? That is, an instruction which flips every bit in the vector. If yes, is there a better way of emulating it than PXOR with a vector of all 1s? Quite annoying since I need to set up a vector of all 1s to use that approach. 回答1: For cases such as this it can be instructive to see what a compiler would generate. E.g. for the following function: #include <immintrin.h> __m256i test(const __m256i v) { return ~v; } both gcc and

How to detect SSE/SSE2/AVX/AVX2/AVX-512/AVX-128-FMA/KCVI availability at compile-time?

阅读更多关于 How to detect SSE/SSE2/AVX/AVX2/AVX-512/AVX-128-FMA/KCVI availability at compile-time?

问题 I'm trying to optimize some matrix computations and I was wondering if it was possible to detect at compile-time if SSE/SSE2/AVX/AVX2/AVX-512/AVX-128-FMA/KCVI [1] is enabled by the compiler ? Ideally for GCC and Clang, but I can manage with only one of them. I'm not sure it is possible and perhaps I will use my own macro, but I'd prefer detecting it rather and asking the user to select it. [1] "KCVI" stands for Knights Corner Vector Instruction optimizations. Libraries like FFTW detect