icc

Automatically unrolling and outputting for C/C++ code

余生长醉 提交于 2020-01-03 08:34:18
问题 I'm doing an experiment and the first step is to unroll a loop (from C/C++) a dozen of times (ex: 10, 50, etc) and output the C/C++ unrolled code. Is there any tool that I can use to automatize such unrolling? In other words, what I need is: C/C++ source/loop --->> TOOL (Unroll by X) ----> Unrolled C/C++ source/loop 回答1: Our source-to-source transformation engine, the DMS Software Reengineering Toolkit, with its C++17 front end can be used to do this. DMS can accept explicit source-to-source

Effects of Loop unrolling on memory bound data

白昼怎懂夜的黑 提交于 2020-01-03 02:24:10
问题 I have been working with a piece of code which is intensively memory bound. I am trying to optimize it within a single core by manually implementing cache blocking, sw prefetching, loop unrolling etc. Even though cache blocking gives significant improvement in performance. However when i introduce loop unrolling I get tremendous performance degradation. I am compiling with Intel icc with compiler flags -O2 and -ipo in all my test cases. My code is similar to this (3D 25-point stencil): void

What are good heuristics for inlining functions?

丶灬走出姿态 提交于 2019-12-30 09:15:49
问题 Considering that you're trying solely to optimize for speed, what are good heuristics for deciding whether to inline a function or not? Obviously code size should be important, but are there any other factors typically used when (say) gcc or icc is determining whether to inline a function call? Has there been any significant academic work in the area? 回答1: Wikipedia has a few paragraphs about this, with some links at the bottom: In addition to memory size and cache issues, another

Setting the Search Path for Plug In (Bundle / DyLib)

自古美人都是妖i 提交于 2019-12-29 09:32:20
问题 I'm creating a Photoshop Plug In on OS X (Basically a Bundle / DyLib). I'm using Intel Compiler and uses OpenMP by linking against OpenMP ( libiomp5 ). When I use Static Linking it crashes Photoshop (Only on OS X, on Windows it works). So I tried dynamic linking. The host, Photoshop, uses by itself libiomp5.dylib which is available on its Framework folder. So, on Xcode I set on the Linking Part the Runpath Search Paths to @executable_path/../Frameworks/ yet when I try to load it on Photoshop

Setting the Search Path for Plug In (Bundle / DyLib)

ε祈祈猫儿з 提交于 2019-12-29 09:32:13
问题 I'm creating a Photoshop Plug In on OS X (Basically a Bundle / DyLib). I'm using Intel Compiler and uses OpenMP by linking against OpenMP ( libiomp5 ). When I use Static Linking it crashes Photoshop (Only on OS X, on Windows it works). So I tried dynamic linking. The host, Photoshop, uses by itself libiomp5.dylib which is available on its Framework folder. So, on Xcode I set on the Linking Part the Runpath Search Paths to @executable_path/../Frameworks/ yet when I try to load it on Photoshop

RDRAND and RDSEED intrinsics GCC and Intel C++

我的未来我决定 提交于 2019-12-29 07:14:27
问题 Does Intel C++ compiler and/or GCC support the following intrinsics, like MSVC does since 2012 / 2013? int _rdrand16_step(uint16_t*); int _rdrand32_step(uint32_t*); int _rdrand64_step(uint64_t*); int _rdseed16_step(uint16_t*); int _rdseed32_step(uint32_t*); int _rdseed64_step(uint64_t*); And if these intrinsics are supported, since which version are they supported (with compile-time-constant please)? 回答1: Both GCC and Intel compiler support them. GCC support was introduced at the end of 2010.

RDRAND and RDSEED intrinsics GCC and Intel C++

雨燕双飞 提交于 2019-12-29 07:13:12
问题 Does Intel C++ compiler and/or GCC support the following intrinsics, like MSVC does since 2012 / 2013? int _rdrand16_step(uint16_t*); int _rdrand32_step(uint32_t*); int _rdrand64_step(uint64_t*); int _rdseed16_step(uint16_t*); int _rdseed32_step(uint32_t*); int _rdseed64_step(uint64_t*); And if these intrinsics are supported, since which version are they supported (with compile-time-constant please)? 回答1: Both GCC and Intel compiler support them. GCC support was introduced at the end of 2010.

The Effect of Architecture When Using SSE / AVX Intrinisics

只愿长相守 提交于 2019-12-24 13:34:06
问题 I wonder how does a Compiler treats Intrinsics. If one uses SSE2 Intrinsics (Using #include <emmintrin.h> ) and compile with -mavx flag. What will the compiler generate? Will it generate AVX or SSE code? If one uses AVX2 Intrinsics (Using #include <immintrin.h> ) and compile with -msse2 flag. What will the compiler generate? Will it generate SSE Only or AVX code? How does compilers treat Intrinsics? If one uses Intrinsics, does it help the compiler understand the dependency in the loop for

-O2 in ICC messes up assembler, fine with -O1 in ICC and all optimizations in GCC / Clang

别等时光非礼了梦想. 提交于 2019-12-24 10:35:34
问题 I was recently starting to use ICC (18.0.1.126) to compile a code that worked fine with GCC and Clang on arbitrary optimization settings. The code contains an assembler routine that multiplies 4x4 matrices of doubles using AVX2 and FMA instructions. After much fiddling it turned out that the assembler routine is working properly when compiled with -O1 - xcore-avx2, but gives a wrong numerical result when compiled with -O2 - xcore-avx2. The code compiles however without any error messages on

Is there an equivalent to WinAPI GetColorDirectory in .NET?

泪湿孤枕 提交于 2019-12-24 01:37:20
问题 Is there an analogue of the function GetColorDirectory? Or should I just call through a DLL? The purpose is to get the path to the system directory with color profiles 回答1: As per MSDN you call it using the API: [DllImport(DllImport.Mscms, CharSet = CharSet.Auto, BestFitMapping = false)] internal static extern bool GetColorDirectory(IntPtr pMachineName, StringBuilder pBuffer, ref uint pdwSize); 来源: https://stackoverflow.com/questions/14792764/is-there-an-equivalent-to-winapi-getcolordirectory