icc | 易学教程

How to set ICC attribute “fp-model precise” for a single function, to prevent associative optimizations?

阅读更多关于 How to set ICC attribute “fp-model precise” for a single function, to prevent associative optimizations?

问题 I am implementing Kahan summation, in a project that supports compilation with gcc47, gcc48, clang33, icc13, and icc14. As part of this algorithm, I would like to disable optimizations that take advantage of the associativity of addition of real numbers. (Floating point operations are not associative.) I would like to disable those optimizations only in the relevant function . I have figured out how to do this under gcc, using the ''no-associative-math'' attribute. How can I do this in icc or

OpenMP Parallelizing for loop with map

阅读更多关于 OpenMP Parallelizing for loop with map

问题 I am trying to parallelize a for-loop which scans std::map. Below is my toy program: #include <iostream> #include <cstdio> #include <map> #include <string> #include <cassert> #include <omp.h> #define NUM 100000 using namespace std; int main() { omp_set_num_threads(16); int realThreads = 0; string arr[] = {"0", "1", "2"}; std::map<int, string> myMap; for(int i=0; i<NUM; ++i) myMap[i] = arr[i % 3]; string is[NUM]; #pragma omp parallel for for(map<int, string>::iterator it = myMap.begin(); it !=

Return statement does not get executed in c

阅读更多关于 Return statement does not get executed in c

问题 So, I have a curious case and can't quite figure out what I've done wrong. Here's the scenario: I have written a creator function that should return a pointer to a function. To fill the structure with data, I read in a text file. Depending on what text file I use as input, the error either occurs or it doesn't occur. (The error occurs for a text file with ~4000 lines and not for a file with ~200 if that makes a difference). The strange thing is that the code executes until right before the

OMP threadprivate objects not being destructed

阅读更多关于 OMP threadprivate objects not being destructed

问题 Bottom line How can I make sure that the threadprivate instances are properly destructed? Background When answering this question I came across an oddity when using the Intel C++ 15.0 compiler in VS2013. When declaring a global variable threadprivate the slave threads copies are not destructed. I started looking for ways to force their destruction. At this site, they say that adding an OMP barrier should help. It doesn't (see MCVE). I tried setting the OMP blocktime to 0 so that the threads

Is there any benefit to passing all source files at once to a compiler?

阅读更多关于 Is there any benefit to passing all source files at once to a compiler?

问题 I have read about "Whole Program Optimization" (wpo) and "Link Time Code Generation" (ltcg). I wonder is there more inter-module-analysis going on if I pass all sources at once to the compiler from the cli (like "g++ a.cpp b.cpp")? Or is that just going to enable one of those flags? Is there a difference between compilers for this? For instance can the intel compiler benefit from such practice while other compilers don't? 回答1: I wonder is there more inter-module-analysis going on if I pass

Segmentation fault with array of __m256i when using clang/g++

阅读更多关于 Segmentation fault with array of __m256i when using clang/g++

问题 I'm attempting to generate arrays of __m256i 's to reuse in another computation. When I attempt to do that (even with a minimal testcase), I get a segmentation fault - but only if the code is compiled with g++ or clang. If I compile the code with the Intel compiler (version 16.0), no segmentation fault occurs. Here is a test case I created: int main() { __m256i *table = new __m256i[10000]; __m256i zeroes = _mm256_set_epi64x(0, 0, 0, 0); table[99] = zeroes; } When compiling the above with

Segmentation fault while working with SSE intrinsics due to incorrect memory alignment

阅读更多关于 Segmentation fault while working with SSE intrinsics due to incorrect memory alignment

问题 I am working with SSE intrinsics for the first time and I am encountering a segmentation fault even after ensuring 16byte memory alignment. This post is an extension to my earlier question: How to allocate 16byte memory aligned data This is how I have declared my array: float *V = (float*) memalign(16,dx*sizeof(float)); When I try to do this: __m128 v_i = _mm_load_ps(&V[i]); //It works But when I do this: __m128 u1 = _mm_load_ps(&V[(i-1)]); //There is a segmentation fault But if I do : __m128

Is a using-declaration supposed to hide an inherited virtual function?

阅读更多关于 Is a using-declaration supposed to hide an inherited virtual function?

问题 struct level0 { virtual void foo() = 0; }; struct level1 : level0 { virtual void foo() { cout <<" level1 " << endl; } }; struct level2 : level1 { virtual void foo() { cout <<" level2 " << endl; } }; struct level3 : level2 { using level1::foo; }; int main() { level1* l1 = new level3; l1->foo(); level3 l3; l3.foo(); return 0; } the above code using gcc gives level2 level1 but in icc gives level2 level2 Which one is correct or is it undefined by standard? Edit: This proves there is a bug for

Exception 'cudaError_enum' thrown in cudaGetExportTable (CUDA runtime library)?

阅读更多关于 Exception 'cudaError_enum' thrown in cudaGetExportTable (CUDA runtime library)?

问题 I am debugging a MPI-based CUDA program with DDT. My code aborts when the CUDA runtime library (libcudart) throws an exception in the (undocumented) function cudaGetExportTable , when called from cudaMalloc and cudaThreadSynchronize (UPDATED: using cudaDeviceSynchronize gives the same error) in my code. Why is libcudart throwing an exception (I am using the C API, not the C++ API) before I can detect it in my code with its cudaError_t return value or with CHECKCUDAERROR ? (I'm using CUDA 4.2

icpc slower than gcc?

阅读更多关于 icpc slower than gcc?

问题 I'm trying to make an optimized parallel version of opencv SURF and in particular surf.cpp using Intel C++ compiler. I'm using Intel Advisor to locate inefficient and unvectorized loops. In particular, it suggests to rebuild the code using the icpc compiler (instead of gcc ) and then to use the xCORE-AVX2 flag since it's available for my hardware. So my original cmake for building opencv using g++ was: cmake -D CMAKE_BUILD_TYPE=RelWithDebInfo -D CMAKE_INSTALL_PREFIX=... -D OPENCV_EXTRA