xeon-phi

Using GCC on Xeon Phi

痞子三分冷 提交于 2021-01-27 18:55:33
问题 I was told one can run a program on MIC that was built with gcc. Is that true? If yes, how to proceed? I'm using gcc version 4.4.7. 回答1: Intel Xeon Phi can indeed run programs that are compiled with the gcc cross compiler. However gcc is not suitable for compiling any applications for the coprocessor, since gcc does "not include support for Knights Corner vector instructions and related optimization improvements. GCC for Knights Corner is really only for building the kernel and related tools;

Padding array manually

余生长醉 提交于 2020-03-23 06:19:10
问题 I am trying to understand 9 point stencil's algorithm from this book , the logic is clear to me , but the calculation of WIDTHP macro is what i am unable to understand, here is the breif code (original code is more than 300 lines length!!): #define PAD64 0 #define WIDTH 5900 #if PAD64 #define WIDTHP ((((WIDTH*sizeof(REAL))+63)/64)*(64/sizeof(REAL))) #else #define WIDTHP WIDTH #endif #define HEIGHT 10000 REAL *fa = (REAL *)malloc(sizeof(REAL)*WIDTHP*HEIGHT); REAL *fb = (REAL *)malloc(sizeof

What is the most efficient way to clear a single or a few ZMM registers on Knights Landing?

孤街醉人 提交于 2020-01-23 06:06:50
问题 Say, I want to clear 4 zmm registers. Will the following code provide the fastest speed? vpxorq zmm0, zmm0, zmm0 vpxorq zmm1, zmm1, zmm1 vpxorq zmm2, zmm2, zmm2 vpxorq zmm3, zmm3, zmm3 On AVX2, if I wanted to clear ymm registers, vpxor was fastest, faster than vxorps, since vpxor could run on multiple units. On AVX512, we don't have vpxor for zmm registers, only vpxorq and vpxord. Is that an efficient way to clear a register? Is the CPU smart enough to not make false dependencies on previous

Unexplained Xeon-Phi Overhead

隐身守侯 提交于 2020-01-03 01:22:45
问题 I am trying to run this code with these different n sizes on an Xeon Phi KNC. I am getting the timings as shown in the table, but I have no idea why I am experiencing those fluctuations. Can you please guide me through it? Thanks in advance. CODE: program prog integer, allocatable :: arr1(:), arr2(:) integer :: i, n, time_start, time_end n=481 do while (n .le. 481000000) allocate(arr1(n),arr2(n)) call system_clock(time_start) !dir$ offload begin target(mic) !$omp SIMD do i=1,n arr1(i) = arr1

Automatic Offloading with Intel Python 2019 and Xeon Phi (KNC)

|▌冷眼眸甩不掉的悲伤 提交于 2019-12-24 07:17:53
问题 I am currently trying to get automatic offloading working using Intel Python 2019 and a Xeon Phi X100 KNC (SC3120A) card. For testing the offloading I am trying this benchmark: https://github.com/accre/Intel-Xeon-Phi/blob/master/Python/automatic-offloading/bmark.py However, I cannot get it to work. The code is simply executed on the host CPU. I am using MPSS 3.8.6 and Intel Parallel Studio 2017 (last version with X100 support) on CentOS. miccheck passes and I can also use SSH to run

R Parallel Processing with Xeon Phi, minimal code changes?

只愿长相守 提交于 2019-12-22 06:09:10
问题 Looking at buying a couple Xeon Phi 5110P, but trying to estimate how much code I have to change or other software needed. Currently I make good use of R on a multi-core Windows machine (24 cores) by using the foreach package, passing it other packages forecast , glmnet , etc. to do my parallel processing. Having a Xeon Phi I understand I would want to compile R https://software.intel.com/en-us/articles/running-r-with-support-for-intel-xeon-phi-coprocessors And I understand this could be done

R Parallel Processing with Xeon Phi, minimal code changes?

醉酒当歌 提交于 2019-12-22 06:09:06
问题 Looking at buying a couple Xeon Phi 5110P, but trying to estimate how much code I have to change or other software needed. Currently I make good use of R on a multi-core Windows machine (24 cores) by using the foreach package, passing it other packages forecast , glmnet , etc. to do my parallel processing. Having a Xeon Phi I understand I would want to compile R https://software.intel.com/en-us/articles/running-r-with-support-for-intel-xeon-phi-coprocessors And I understand this could be done

Installing R `forecast` package on a Linux Cluster: compiler issues?

南楼画角 提交于 2019-12-22 05:53:11
问题 I am looking to test performance of R , more specifically some routines in the forecast package on an HPC cluster with Intel Xeon Phi co-processors. The sysadmin has, I understand, built R/3.2.5 from source following the instructions on Intel's website: https://software.intel.com/en-us/articles/build-r-301-with-intel-c-compiler-and-intel-mkl-on-linux So R works, installation of packages including devtools , data.table , dplyr , ggplot2 , Rcpp , RcppArmadillo can be carried out from within an

Installing R `forecast` package on a Linux Cluster: compiler issues?

天涯浪子 提交于 2019-12-22 05:53:08
问题 I am looking to test performance of R , more specifically some routines in the forecast package on an HPC cluster with Intel Xeon Phi co-processors. The sysadmin has, I understand, built R/3.2.5 from source following the instructions on Intel's website: https://software.intel.com/en-us/articles/build-r-301-with-intel-c-compiler-and-intel-mkl-on-linux So R works, installation of packages including devtools , data.table , dplyr , ggplot2 , Rcpp , RcppArmadillo can be carried out from within an

Atomic test-and-set in x86: inline asm or compiler-generated lock bts?

依然范特西╮ 提交于 2019-12-20 02:48:08
问题 The below code when compiled for a xeon phi throws Error: cmovc is not supported on k1om . But it does compile properly for a regular xeon processor. #include<stdio.h> int main() { int in=5; int bit=1; int x=0, y=1; int& inRef = in; printf("in=%d\n",in); asm("lock bts %2,%0\ncmovc %3,%1" : "+m" (inRef), "+r"(y) : "r" (bit), "r"(x)); printf("in=%d\n",in); } Compiler - icc (ICC) 13.1.0 20130121 Related question: bit test and set (BTS) on a tbb atomic variable 回答1: IIRC, first-gen Xeon Phi is