icc

Why vectorizing the loop does not have performance improvement

二次信任 提交于 2019-11-26 21:44:10
I am investigating the effect of vectorization on the performance of the program. In this regard, I have written following code: #include <stdio.h> #include <sys/time.h> #include <stdlib.h> #define LEN 10000000 int main(){ struct timeval stTime, endTime; double* a = (double*)malloc(LEN*sizeof(*a)); double* b = (double*)malloc(LEN*sizeof(*b)); double* c = (double*)malloc(LEN*sizeof(*c)); int k; for(k = 0; k < LEN; k++){ a[k] = rand(); b[k] = rand(); } gettimeofday(&stTime, NULL); for(k = 0; k < LEN; k++) c[k] = a[k] * b[k]; gettimeofday(&endTime, NULL); FILE* fh = fopen("dump", "w"); for(k = 0;

Different compiler behavior for expression: auto p {make_pointer()};

你离开我真会死。 提交于 2019-11-26 17:49:55
问题 Which is the correct behaviour for the following program? // example.cpp #include <iostream> #include <memory> struct Foo { void Bar() const { std::cout << "Foo::Bar()" << std::endl; } }; std::shared_ptr<Foo> MakeFoo() { return std::make_shared<Foo>(); } int main() { auto p { MakeFoo() }; p->Bar(); } When I compile it in my Linux RHEL 6.6 workstation, I obtain the following results: $ g++ -v gcc version 5.1.0 (GCC) $ g++ example.cpp -std=c++14 -Wall -Wextra -pedantic $ ./a.out Foo::Bar() but

is there an inverse instruction to the movemask instruction in intel avx2?

感情迁移 提交于 2019-11-26 16:43:59
The movemask instruction(s) take an __m256i and return an int32 where each bit (either the first 4, 8 or all 32 bits depending on the input vector element type) is the most significant bit of the corresponding vector element. I would like to do the inverse: take a 32 (where only the 4, 8 or 32 least significant bits are meaningful), and get a __m256i where the most significant bit of each int8, int32 or int64 sized block is set to the original bit. Basically, I want to go from a compressed bitmask to one that is usable as a mask by other AVX2 instructions (such as maskstore, maskload, mask

is there an inverse instruction to the movemask instruction in intel avx2?

纵然是瞬间 提交于 2019-11-26 04:00:11
问题 The movemask instruction(s) take an __m256i and return an int32 where each bit (either the first 4, 8 or all 32 bits depending on the input vector element type) is the most significant bit of the corresponding vector element. I would like to do the inverse: take a 32 (where only the 4, 8 or 32 least significant bits are meaningful), and get a __m256i where the most significant bit of each int8, int32 or int64 sized block is set to the original bit. Basically, I want to go from a compressed