icc | 易学教程

How to allocate 16byte memory aligned data

阅读更多关于 How to allocate 16byte memory aligned data

问题 I am trying to implement SSE vectorization on a piece of code for which I need my 1D array to be 16 byte memory aligned. However, I have tried several ways to allocate 16byte memory aligned data but it ends up being 4byte memory aligned. I have to work with the Intel icc compiler. This is a sample code I am testing with: #include <stdio.h> #include <stdlib.h> void error(char *str) { printf("Error:%s\n",str); exit(-1); } int main() { int i; //float *A=NULL; float *A = (float*) memalign(16,20

ICC中对Skew进行Debug的好工具--Interactive CTS Window

阅读更多关于 ICC中对Skew进行Debug的好工具--Interactive CTS Window

本文转自：自己的微信公众号《集成电路设计及EDA教程》以后打算交替着推送多种EDA工具的教程而不只是单纯针对某个工具，依次来满足不同粉丝的需求。这里分享一篇多年之前写的推文，虽然时间比较久了，但是非常实用，非常受欢迎。《ICC中对Skew进行Debug的好工具-- Interactive CTS Window》 CTS无疑是数字后端中一个除了Floorplan之外最复杂的一个步骤，因为这一步可能会需要很多人为的分析以及操作。用一个例子（该例子有一个主时钟、一个它的分频时钟，此外还做了DFT，插入了scan chain、boundary scan）来讲解如果CTS之后发现自己是设计的Skew很大该如何去Debug。后边会用一个长文来讲解做时钟树过程中遇到的问题以及尝试解决该问题的步骤，其中Debug主要用的就是这个工具。 >report_clock_tree -summary ( 报告的是Global Skew) >report_clock_timing -type skew ( 报告的是Local Skew) 从上图发现，Skew很大，那么该如何进行Debug呢？可以使用菜单栏中的Clock> New Interactive CTS Window。 ICC会显示出设计中的时钟树名称，可以右键点击任意一个时钟树，选择Clock Arrival

Risks of different GCC versions at link / run time?

阅读更多关于 Risks of different GCC versions at link / run time?

I'm using Intel's C++ compiler, which on Linux relies on the GNU-supplied libc.so and libstdc++.so. Here's my problem. To have access to some of the newest C++11 features, I need to use the libstdc++ which ships with GCC 4.7 or higher. But I'm stuck using CentOS 6.4. On CentOS 6.4, the native version of GCC is 4.4. But using a RedHat thing called "SCL" and a package named "devtoolset-1.1", I'm able to get GCC 4.7 installed under "/opt". I set things up to be using GCC 4.7 in the manner mentioned above, I can use the newer C++11 features. So here's my question: If a user runs my program with

intrinsic for the mulx instruction

阅读更多关于 intrinsic for the mulx instruction

问题 The mulx instruction was introduced with the BMI2 instruction set starting with the Haswell processor. According to Intel's documentation there should be an intrinsic for mulx unsigned __int64 umul128(unsigned __int64 a, unsigned __int64 b, unsigned __int64 * hi); However, I find no such intrinsic from Intel's intrinsic guide online under BMI2 or in general. I do however find the addcarry intrinsics from the ADX instruction set. According to this link the intrinsic is mulx_u64 but I don't

Intel c++ compiler, ICC, seems to ingnore SSE/AVX seetings

阅读更多关于 Intel c++ compiler, ICC, seems to ingnore SSE/AVX seetings

问题 I have recently downloaded and installed the Intel C++ compiler, Composer XE 2013, for Linux which is free to use for non-commercial development. http://software.intel.com/en-us/non-commercial-software-development I'm running on a ivy bridge system (which has AVX). I have two versions of a function which do the same thing. One does not use SSE/AVX. The other version uses AVX. In GCC the AVX code is about four times faster than the scalar code. However, with the Intel C++ compiler the

Missing AVX-512 intrinsics for masks?

阅读更多关于 Missing AVX-512 intrinsics for masks?

问题 Intel's intrinsics guide lists a number of intrinsics for the AVX-512 K* mask instructions, but there seem to be a few missing: KSHIFT{L/R} KADD KTEST The Intel developer manual claims that intrinsics are not necessary as they are auto generated by the compiler. How does one do this though? If it means that __mmask* types can be treated as regular integers, it would make a lot of sense, but testing something like mask << 4 seems to cause the compiler to move the mask to a regular register,

Simplest TBB example

阅读更多关于 Simplest TBB example

Can someone give me a TBB example how to: set the maximum count of active threads. execute tasks that are independent from each others and presented in the form of class, not static functions. timday Here's a couple of complete examples, one using parallel_for , the other using parallel_for_each . Update 2014-04-12 : These show what I'd consider to be a pretty old fashioned way of using TBB now; I've added a separate answer using parallel_for with a C++11 lambda. #include "tbb/blocked_range.h" #include "tbb/parallel_for.h" #include "tbb/task_scheduler_init.h" #include <iostream> #include

Risks of different GCC versions at link / run time?

阅读更多关于 Risks of different GCC versions at link / run time?

问题 I'm using Intel's C++ compiler, which on Linux relies on the GNU-supplied libc.so and libstdc++.so. Here's my problem. To have access to some of the newest C++11 features, I need to use the libstdc++ which ships with GCC 4.7 or higher. But I'm stuck using CentOS 6.4. On CentOS 6.4, the native version of GCC is 4.4. But using a RedHat thing called "SCL" and a package named "devtoolset-1.1", I'm able to get GCC 4.7 installed under "/opt". I set things up to be using GCC 4.7 in the manner

Different compiler behavior for expression: auto p {make_pointer()};

阅读更多关于 Different compiler behavior for expression: auto p {make_pointer()};

Which is the correct behaviour for the following program? // example.cpp #include <iostream> #include <memory> struct Foo { void Bar() const { std::cout << "Foo::Bar()" << std::endl; } }; std::shared_ptr<Foo> MakeFoo() { return std::make_shared<Foo>(); } int main() { auto p { MakeFoo() }; p->Bar(); } When I compile it in my Linux RHEL 6.6 workstation, I obtain the following results: $ g++ -v gcc version 5.1.0 (GCC) $ g++ example.cpp -std=c++14 -Wall -Wextra -pedantic $ ./a.out Foo::Bar() but $ clang++ -v clang version 3.6.0 (trunk 217965) $ clang++ example.cpp -std=c++14 -Wall -Wextra

_addcarry_u64 and _addcarryx_u64 with MSVC and ICC

阅读更多关于 _addcarry_u64 and _addcarryx_u64 with MSVC and ICC

MSVC and ICC both support the intrinsics _addcarry_u64 and _addcarryx_u64 . According to Intel's Intrinsic Guide and white paper these should map to adcx and adox respectively. However, by looking at the generated assembly it's clear they map to adc and adcx respectively and there is no intrinsic which maps to adox . Additionally, telling the compiler to enable AVX2 with /arch:AVX2 in MSVC or -march=core-avx2 with ICC on Linux makes no difference. I'm not sure how to enable ADX with MSVC and ICC. The documentation for MSVC lists _addcarryx_u64 with the technology of ADX whereas _addcarry_u64