intel-mic | 易学教程

Using GCC on Xeon Phi

阅读更多关于 Using GCC on Xeon Phi

问题 I was told one can run a program on MIC that was built with gcc. Is that true? If yes, how to proceed? I'm using gcc version 4.4.7. 回答1: Intel Xeon Phi can indeed run programs that are compiled with the gcc cross compiler. However gcc is not suitable for compiling any applications for the coprocessor, since gcc does "not include support for Knights Corner vector instructions and related optimization improvements. GCC for Knights Corner is really only for building the kernel and related tools;

High performance implement of atomic minimal operation

阅读更多关于 High performance implement of atomic minimal operation

问题 There is no atomic minimal operation in OpenMP, also no intrinsic in Intel MIC's instruction set. #pragmma omp critial is very insufficient in the performance. I want to know if there is a high performance implement of atomic minimal for Intel MIC. 回答1: According to the OpenMP 4.0 Specifications (Section 2.12.6), there is a lot of fast atomic minimal operations you can do by using the #pragma omp atomic construct in place of #pragma omp critical (and thereby avoid the huge overhead of its

Are there SIMD(SSE / AVX) instructions in the x86-compatible accelerators Intel Xeon Phi?

阅读更多关于 Are there SIMD(SSE / AVX) instructions in the x86-compatible accelerators Intel Xeon Phi?

问题 Are there SIMD(SSE / AVX) instructions in the x86-compatible accelerators MIC Intel Xeon Phi? http://en.wikipedia.org/wiki/Xeon_Phi 回答1: Yes, current generation of Intel Xeon Phi co-processors (codename "Knight's Corner" , abbreviated KNC) supports 512-bit SIMD instruction set called "Intel® Initial Many Core Instructions" (abbreviated Intel® IMCI ). Intel IMCI is not "compatible with" and is not equialent to SSE, AVX, AVX2 or AVX-512 ISA. However it's officially announced that next planned

Is the Intel Xeon Phi usable without a costly Intel Compiler?

阅读更多关于 Is the Intel Xeon Phi usable without a costly Intel Compiler?

问题 Does the Intel Xeon Phi coprocessor, to be usable as parallel platform, require a license of the Intel Composer XE compiler, or are there alternative compilers? 回答1: There are a few options I can list here to use/get the Intel compiler...gcc, as you know, is not equipped to vectorize code for this platform. There is a non-commercial license of the Intel compiler for Linux* that provides the same Intel Xeon Phi coprocessor enabled Intel Development tools as a commercial/eval/academic license

How to offload particular thread of a single app to particular Xeon Phi cores?

阅读更多关于 How to offload particular thread of a single app to particular Xeon Phi cores?

问题 Suppose I have a single c/c++ app running on the host. there are few threads running on the host CPU and 50 threads running on the Xeon Phi cores. How can I make sure that each of these 50 runs on its own Xeon Phi core and is never purged off the core cache (given the code is small enough). Could someone please to outline a very general idea how to do this and which tool/API would be more suitable (for C/C++ code) ? What is the fastest way to exchange data between the host thread-aggregator

MKL Performance on Intel Phi

阅读更多关于 MKL Performance on Intel Phi

问题 I have a routine that performs a few MKL calls on small matrices (50-100 x 1000 elements) to fit a model, which I then call for different models. In pseudo-code: double doModelFit(int model, ...) { ... while( !done ) { cblas_dgemm(...); cblas_dgemm(...); ... dgesv(...); ... } return result; } int main(int argc, char **argv) { ... c_start = 1; c_stop = nmodel; for(int c=c_start; c<c_stop; c++) { ... result = doModelFit(c, ...); ... } } Call the above version 1. Since the models are independent

How do the Conflict Detection instructions make it easier to vectorize loops?

阅读更多关于 How do the Conflict Detection instructions make it easier to vectorize loops?

The AVX512CD instruction families are: VPCONFLICT, VPLZCNT and VPBROADCASTM. The Wikipedia section about these instruction says: The instructions in AVX-512 conflict detection (AVX-512CD) are designed to help efficiently calculate conflict-free subsets of elements in loops that normally could not be safely vectorized. What are some examples that show these instruction being useful in vectorizing loops? It would be helpful if answers will include scalar loops and their vectorized counterparts. Thanks! One example where the CD instructions might be useful is histogramming. For scalar code

Is the Intel Xeon Phi usable without a costly Intel Compiler?

阅读更多关于 Is the Intel Xeon Phi usable without a costly Intel Compiler?

Does the Intel Xeon Phi coprocessor , to be usable as parallel platform, require a license of the Intel Composer XE compiler, or are there alternative compilers? There are a few options I can list here to use/get the Intel compiler...gcc, as you know, is not equipped to vectorize code for this platform. There is a non-commercial license of the Intel compiler for Linux* that provides the same Intel Xeon Phi coprocessor enabled Intel Development tools as a commercial/eval/academic license assuming the requesting individual fulfills the licensing requirements. http://software.intel.com/en-us/non

How do the Conflict Detection instructions make it easier to vectorize loops?

阅读更多关于 How do the Conflict Detection instructions make it easier to vectorize loops?

问题 The AVX512CD instruction families are: VPCONFLICT, VPLZCNT and VPBROADCASTM. The Wikipedia section about these instruction says: The instructions in AVX-512 conflict detection (AVX-512CD) are designed to help efficiently calculate conflict-free subsets of elements in loops that normally could not be safely vectorized. What are some examples that show these instruction being useful in vectorizing loops? It would be helpful if answers will include scalar loops and their vectorized counterparts.