knights-landing

What is the most efficient way to clear a single or a few ZMM registers on Knights Landing?

孤街醉人 提交于 2020-01-23 06:06:50
问题 Say, I want to clear 4 zmm registers. Will the following code provide the fastest speed? vpxorq zmm0, zmm0, zmm0 vpxorq zmm1, zmm1, zmm1 vpxorq zmm2, zmm2, zmm2 vpxorq zmm3, zmm3, zmm3 On AVX2, if I wanted to clear ymm registers, vpxor was fastest, faster than vxorps, since vpxor could run on multiple units. On AVX512, we don't have vpxor for zmm registers, only vpxorq and vpxord. Is that an efficient way to clear a register? Is the CPU smart enough to not make false dependencies on previous

adding “-march=native” intel compiler flag to the compilation line leads to a floating point exception on KNL

前提是你 提交于 2019-12-08 02:06:00
问题 I have a code, which i launch on Intel Xeon Phi Knights Landing (KNL) 7210 (64 cores) processor (it is a PC, in native mode) and use the Intel c++ compiler (icpc) version 17.0.4. Also i launch the same code on Intel core i7 processor, where the version of icpc is 17.0.1. To be more correct, i compile the code on the machine i'm launching it (compiled on i7 and launched on i7, the same for KNL). I never make the binary file on one machine and bring it to another. The loops are parallelized and