avx2

Disabling AVX2 in CPU for testing purposes

*爱你&永不变心* 提交于 2021-02-07 05:40:42
问题 I've got an application that requires AVX2 to work correctly. A check was implemented to check during application start if CPU has AVX2 instruction. I would like to check if it works correctly, but i only have CPU that has AVX2. Is there a way to temporarly turn it off for testing purposes? Or to somehow emulate other CPU? 回答1: Yes, use an "emulation" (or dynamic recompilation) layer like Intel's Software Development Emulator (SDE), or maybe QEMU. SDE is closed-source freeware, and very handy

AVX2 integer multiply of signed 8-bit elements, producing signed 16-bit results?

隐身守侯 提交于 2021-02-07 03:44:08
问题 I have two __m256i vectors, filled with 32 8-bit integers. Something like this: __int8 *a0 = new __int8[32] {2}; __int8 *a1 = new __int8[32] {3}; __m256i v0 = _mm256_loadu_si256((__m256i*)a0); __m256i v1 = _mm256_loadu_si256((__m256i*)a1); How can i multiply these vectors, using something like _mm256_mul_epi8(v0, v1) (which does not exist) or any another way? I want 2 vectors of results, because the output element width is twice the input element width. Or something that works similarly to

AVX2 integer multiply of signed 8-bit elements, producing signed 16-bit results?

空扰寡人 提交于 2021-02-07 03:43:17
问题 I have two __m256i vectors, filled with 32 8-bit integers. Something like this: __int8 *a0 = new __int8[32] {2}; __int8 *a1 = new __int8[32] {3}; __m256i v0 = _mm256_loadu_si256((__m256i*)a0); __m256i v1 = _mm256_loadu_si256((__m256i*)a1); How can i multiply these vectors, using something like _mm256_mul_epi8(v0, v1) (which does not exist) or any another way? I want 2 vectors of results, because the output element width is twice the input element width. Or something that works similarly to

I've some problems understanding how AVX shuffle intrinsics are working for 8 bits

倾然丶 夕夏残阳落幕 提交于 2021-02-05 11:51:07
问题 I'm trying to pack 16 bits data to 8 bits by using _mm256_shuffle_epi8 but the result i have is not what i'm expecting. auto srcData = _mm256_setr_epi8(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32); __m256i vperm = _mm256_setr_epi8( 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1); auto result = _mm256_shuffle_epi8(srcData, vperm); I'm expecting

I've some problems understanding how AVX shuffle intrinsics are working for 8 bits

烈酒焚心 提交于 2021-02-05 11:48:05
问题 I'm trying to pack 16 bits data to 8 bits by using _mm256_shuffle_epi8 but the result i have is not what i'm expecting. auto srcData = _mm256_setr_epi8(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32); __m256i vperm = _mm256_setr_epi8( 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1); auto result = _mm256_shuffle_epi8(srcData, vperm); I'm expecting

What is the avx2 instruction to store 8 integers?

狂风中的少年 提交于 2021-02-05 11:32:06
问题 I want to store the 8 integers from a __m256i variable to an array of 8 x 32 bit int s. I thought the instruction for that would be _mm256_store_epi32 , but I get an error that this instruction doesn't even exist! 回答1: Have a look at the Intel Intrinsics Guide. Depending on whether your destination is aligned, you need _mm256_store_si256 or _mm256_storeu_si256. 来源: https://stackoverflow.com/questions/43304021/what-is-the-avx2-instruction-to-store-8-integers

How to deal with SIGSEGV, Segmentation fault. while using Avx2(Solved)

怎甘沉沦 提交于 2021-02-05 09:28:04
问题 How to deal with SIGSEGV, Segmentation fault. while using Avx2 (_mm256_load_pd)(_mm256_store_pd) (solved) _mm256_load_pd I've received segmentation fault wile called _mm256_load_pd usage are as blew double * Val = malloc(sizeof(double)*4); __m256d vecv = _mm256_load_pd(&Val[0]); gdb shows Program received signal SIGSEGV, Segmentation fault. 0x00007ffff7fc5017 in _mm256_load_pd (__P=0x555555559370) at /usr/lib/gcc/x86_64-linux-gnu/9/include/avxintrin.h:862 862 return *(__m256d *)__P; (gdb)

gcc target for AVX2 disabling SSE instruction set

China☆狼群 提交于 2021-01-27 10:33:21
问题 We have a translation unit we want to compile with AVX2 (only that one): It's telling GCC upfront, first line in the file: #pragma GCC target "arch=core-avx2,tune=core-avx2" This used to work with GCC 4.8 and 4.9 but from 6 onward (tried 7 and 8 too) we get this warning (that we treat as an error): error: SSE instruction set disabled, using 387 arithmetics On the first function returning a float. I have tried to enable back SSE 4.2 (and avx and avx2) like so #pragma GCC target "sse4.2,arch

Comparing 2 vectors in AVX/AVX2 (c)

和自甴很熟 提交于 2021-01-20 07:12:20
问题 I have two __m256i vectors (each containing chars), and I want to find out if they are completely identical or not. All I need is true if all bits are equal, and 0 otherwise. What's the most efficient way of doing that? Here's the code loading the arrays: char * a1 = "abcdefhgabcdefhgabcdefhgabcdefhg"; __m256i r1 = _mm256_load_si256((__m256i *) a1); char * a2 = "abcdefhgabcdefhgabcdefhgabcdefhg"; __m256i r2 = _mm256_load_si256((__m256i *) a2); 回答1: The most efficient way on current Intel and

Comparing 2 vectors in AVX/AVX2 (c)

筅森魡賤 提交于 2021-01-20 07:11:50
问题 I have two __m256i vectors (each containing chars), and I want to find out if they are completely identical or not. All I need is true if all bits are equal, and 0 otherwise. What's the most efficient way of doing that? Here's the code loading the arrays: char * a1 = "abcdefhgabcdefhgabcdefhgabcdefhg"; __m256i r1 = _mm256_load_si256((__m256i *) a1); char * a2 = "abcdefhgabcdefhgabcdefhgabcdefhg"; __m256i r2 = _mm256_load_si256((__m256i *) a2); 回答1: The most efficient way on current Intel and