avx2 | 易学教程

Disabling AVX2 in CPU for testing purposes

阅读更多关于 Disabling AVX2 in CPU for testing purposes

问题 I've got an application that requires AVX2 to work correctly. A check was implemented to check during application start if CPU has AVX2 instruction. I would like to check if it works correctly, but i only have CPU that has AVX2. Is there a way to temporarly turn it off for testing purposes? Or to somehow emulate other CPU? 回答1: Yes, use an "emulation" (or dynamic recompilation) layer like Intel's Software Development Emulator (SDE), or maybe QEMU. SDE is closed-source freeware, and very handy

AVX2 integer multiply of signed 8-bit elements, producing signed 16-bit results?

阅读更多关于 AVX2 integer multiply of signed 8-bit elements, producing signed 16-bit results?

问题 I have two __m256i vectors, filled with 32 8-bit integers. Something like this: __int8 *a0 = new __int8[32] {2}; __int8 *a1 = new __int8[32] {3}; __m256i v0 = _mm256_loadu_si256((__m256i*)a0); __m256i v1 = _mm256_loadu_si256((__m256i*)a1); How can i multiply these vectors, using something like _mm256_mul_epi8(v0, v1) (which does not exist) or any another way? I want 2 vectors of results, because the output element width is twice the input element width. Or something that works similarly to

AVX2 integer multiply of signed 8-bit elements, producing signed 16-bit results?

阅读更多关于 AVX2 integer multiply of signed 8-bit elements, producing signed 16-bit results?

I've some problems understanding how AVX shuffle intrinsics are working for 8 bits

阅读更多关于 I've some problems understanding how AVX shuffle intrinsics are working for 8 bits

问题 I'm trying to pack 16 bits data to 8 bits by using _mm256_shuffle_epi8 but the result i have is not what i'm expecting. auto srcData = _mm256_setr_epi8(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32); __m256i vperm = _mm256_setr_epi8( 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1); auto result = _mm256_shuffle_epi8(srcData, vperm); I'm expecting

I've some problems understanding how AVX shuffle intrinsics are working for 8 bits

阅读更多关于 I've some problems understanding how AVX shuffle intrinsics are working for 8 bits

What is the avx2 instruction to store 8 integers?

阅读更多关于 What is the avx2 instruction to store 8 integers?

问题 I want to store the 8 integers from a __m256i variable to an array of 8 x 32 bit int s. I thought the instruction for that would be _mm256_store_epi32 , but I get an error that this instruction doesn't even exist! 回答1: Have a look at the Intel Intrinsics Guide. Depending on whether your destination is aligned, you need _mm256_store_si256 or _mm256_storeu_si256. 来源： https://stackoverflow.com/questions/43304021/what-is-the-avx2-instruction-to-store-8-integers

How to deal with SIGSEGV, Segmentation fault. while using Avx2(Solved)

阅读更多关于 How to deal with SIGSEGV, Segmentation fault. while using Avx2(Solved)

问题 How to deal with SIGSEGV, Segmentation fault. while using Avx2 (_mm256_load_pd)(_mm256_store_pd) (solved) _mm256_load_pd I've received segmentation fault wile called _mm256_load_pd usage are as blew double * Val = malloc(sizeof(double)*4); __m256d vecv = _mm256_load_pd(&Val[0]); gdb shows Program received signal SIGSEGV, Segmentation fault. 0x00007ffff7fc5017 in _mm256_load_pd (__P=0x555555559370) at /usr/lib/gcc/x86_64-linux-gnu/9/include/avxintrin.h:862 862 return *(__m256d *)__P; (gdb)

gcc target for AVX2 disabling SSE instruction set

阅读更多关于 gcc target for AVX2 disabling SSE instruction set

问题 We have a translation unit we want to compile with AVX2 (only that one): It's telling GCC upfront, first line in the file: #pragma GCC target "arch=core-avx2,tune=core-avx2" This used to work with GCC 4.8 and 4.9 but from 6 onward (tried 7 and 8 too) we get this warning (that we treat as an error): error: SSE instruction set disabled, using 387 arithmetics On the first function returning a float. I have tried to enable back SSE 4.2 (and avx and avx2) like so #pragma GCC target "sse4.2,arch

Comparing 2 vectors in AVX/AVX2 (c)

阅读更多关于 Comparing 2 vectors in AVX/AVX2 (c)

问题 I have two __m256i vectors (each containing chars), and I want to find out if they are completely identical or not. All I need is true if all bits are equal, and 0 otherwise. What's the most efficient way of doing that? Here's the code loading the arrays: char * a1 = "abcdefhgabcdefhgabcdefhgabcdefhg"; __m256i r1 = _mm256_load_si256((__m256i *) a1); char * a2 = "abcdefhgabcdefhgabcdefhgabcdefhg"; __m256i r2 = _mm256_load_si256((__m256i *) a2); 回答1: The most efficient way on current Intel and

Comparing 2 vectors in AVX/AVX2 (c)

阅读更多关于 Comparing 2 vectors in AVX/AVX2 (c)