icc

How to set ICC attribute “fp-model precise” for a single function, to prevent associative optimizations?

别等时光非礼了梦想. 提交于 2019-12-24 01:01:17
问题 I am implementing Kahan summation, in a project that supports compilation with gcc47, gcc48, clang33, icc13, and icc14. As part of this algorithm, I would like to disable optimizations that take advantage of the associativity of addition of real numbers. (Floating point operations are not associative.) I would like to disable those optimizations only in the relevant function . I have figured out how to do this under gcc, using the ''no-associative-math'' attribute. How can I do this in icc or

OpenMP Parallelizing for loop with map

拈花ヽ惹草 提交于 2019-12-24 00:41:41
问题 I am trying to parallelize a for-loop which scans std::map. Below is my toy program: #include <iostream> #include <cstdio> #include <map> #include <string> #include <cassert> #include <omp.h> #define NUM 100000 using namespace std; int main() { omp_set_num_threads(16); int realThreads = 0; string arr[] = {"0", "1", "2"}; std::map<int, string> myMap; for(int i=0; i<NUM; ++i) myMap[i] = arr[i % 3]; string is[NUM]; #pragma omp parallel for for(map<int, string>::iterator it = myMap.begin(); it !=

Return statement does not get executed in c

筅森魡賤 提交于 2019-12-23 18:39:53
问题 So, I have a curious case and can't quite figure out what I've done wrong. Here's the scenario: I have written a creator function that should return a pointer to a function. To fill the structure with data, I read in a text file. Depending on what text file I use as input, the error either occurs or it doesn't occur. (The error occurs for a text file with ~4000 lines and not for a file with ~200 if that makes a difference). The strange thing is that the code executes until right before the

OMP threadprivate objects not being destructed

点点圈 提交于 2019-12-23 12:37:47
问题 Bottom line How can I make sure that the threadprivate instances are properly destructed? Background When answering this question I came across an oddity when using the Intel C++ 15.0 compiler in VS2013. When declaring a global variable threadprivate the slave threads copies are not destructed. I started looking for ways to force their destruction. At this site, they say that adding an OMP barrier should help. It doesn't (see MCVE). I tried setting the OMP blocktime to 0 so that the threads

Is there any benefit to passing all source files at once to a compiler?

我是研究僧i 提交于 2019-12-23 12:25:43
问题 I have read about "Whole Program Optimization" (wpo) and "Link Time Code Generation" (ltcg). I wonder is there more inter-module-analysis going on if I pass all sources at once to the compiler from the cli (like "g++ a.cpp b.cpp")? Or is that just going to enable one of those flags? Is there a difference between compilers for this? For instance can the intel compiler benefit from such practice while other compilers don't? 回答1: I wonder is there more inter-module-analysis going on if I pass

Segmentation fault with array of __m256i when using clang/g++

孤者浪人 提交于 2019-12-23 10:23:08
问题 I'm attempting to generate arrays of __m256i 's to reuse in another computation. When I attempt to do that (even with a minimal testcase), I get a segmentation fault - but only if the code is compiled with g++ or clang. If I compile the code with the Intel compiler (version 16.0), no segmentation fault occurs. Here is a test case I created: int main() { __m256i *table = new __m256i[10000]; __m256i zeroes = _mm256_set_epi64x(0, 0, 0, 0); table[99] = zeroes; } When compiling the above with

Segmentation fault while working with SSE intrinsics due to incorrect memory alignment

非 Y 不嫁゛ 提交于 2019-12-23 09:23:41
问题 I am working with SSE intrinsics for the first time and I am encountering a segmentation fault even after ensuring 16byte memory alignment. This post is an extension to my earlier question: How to allocate 16byte memory aligned data This is how I have declared my array: float *V = (float*) memalign(16,dx*sizeof(float)); When I try to do this: __m128 v_i = _mm_load_ps(&V[i]); //It works But when I do this: __m128 u1 = _mm_load_ps(&V[(i-1)]); //There is a segmentation fault But if I do : __m128

Is a using-declaration supposed to hide an inherited virtual function?

旧巷老猫 提交于 2019-12-23 08:04:33
问题 struct level0 { virtual void foo() = 0; }; struct level1 : level0 { virtual void foo() { cout <<" level1 " << endl; } }; struct level2 : level1 { virtual void foo() { cout <<" level2 " << endl; } }; struct level3 : level2 { using level1::foo; }; int main() { level1* l1 = new level3; l1->foo(); level3 l3; l3.foo(); return 0; } the above code using gcc gives level2 level1 but in icc gives level2 level2 Which one is correct or is it undefined by standard? Edit: This proves there is a bug for

Exception 'cudaError_enum' thrown in cudaGetExportTable (CUDA runtime library)?

ぐ巨炮叔叔 提交于 2019-12-23 05:33:08
问题 I am debugging a MPI-based CUDA program with DDT. My code aborts when the CUDA runtime library (libcudart) throws an exception in the (undocumented) function cudaGetExportTable , when called from cudaMalloc and cudaThreadSynchronize (UPDATED: using cudaDeviceSynchronize gives the same error) in my code. Why is libcudart throwing an exception (I am using the C API, not the C++ API) before I can detect it in my code with its cudaError_t return value or with CHECKCUDAERROR ? (I'm using CUDA 4.2

icpc slower than gcc?

。_饼干妹妹 提交于 2019-12-23 03:44:10
问题 I'm trying to make an optimized parallel version of opencv SURF and in particular surf.cpp using Intel C++ compiler. I'm using Intel Advisor to locate inefficient and unvectorized loops. In particular, it suggests to rebuild the code using the icpc compiler (instead of gcc ) and then to use the xCORE-AVX2 flag since it's available for my hardware. So my original cmake for building opencv using g++ was: cmake -D CMAKE_BUILD_TYPE=RelWithDebInfo -D CMAKE_INSTALL_PREFIX=... -D OPENCV_EXTRA