Where is Clang's '_mm256_pow_ps' intrinsic?

回眸只為那壹抹淺笑 提交于 2019-11-26 18:35:11

问题


I can't seem to find the intrinsics for either _mm_pow_ps or _mm256_pow_ps, both of which are supposed to be included with 'immintrin.h'.

Does Clang not define these or are they in a header I'm not including?


回答1:


That's not an intrinsic; it's an Intel library function name that confusingly uses the same naming scheme as actual intrinsics. There's no vpowps instruction. (AVX512ER on Xeon Phi does have the semi-related vexp2ps instruction...)

For functions like that and _mm_sin_ps to be usable, you need Intel's Short Vector Math Library (SVML). Most people just avoid using them. If it has an implementation of something you want, though, it's worth looking into. IDK what other vector pow implementations exist.

In the intrinsics finder, you can avoid seeing these non-portable functions in your search results if you leave the SVML box unchecked.

There are some "composite" intrinsics like _mm_set_epi8() that typically compile to multiple loads and shuffles which are portable across compilers, and do inline instead of being calls to library functions.

Also note that sqrtps is a native machine instruction, so _mm_sqrt_ps() is a real intrinsic. IEEE 754 specifies mul, div, add, sub, and sqrt as "basic" operations that are requires to produce correctly-rounded results (error <= 0.5ulp), so sqrt() is special and does have direct hardware support, unlike most other "math library" functions.


There are various libraries of SIMD math functions. Some of them come with C++ wrapper libraries that allow a+b instead of _mm_add_ps(a,b).

  • glibc libmvec - since glibc 2.22, to support OpenMP 4.0 vector math functions. GCC knows how to auto-vectorize some functions like cos(), sin(), and probably pow() using it. This answer shows one inconvenient way of using it explicitly for manual vectorization. (Hopefully better ways are possible that don't have mangled names in the source code).

  • Agner Fog's VCL has some math functions like exp and log. (GPL licensed, not LGPL, so only usable in GPL-compatible projects).

  • https://github.com/microsoft/DirectXMath (MIT license) - I think portable to non-Windows, and doesn't require DirectX.
  • https://sleef.org/ - apparently only supports MSVC on Windows

  • Intel's own SVML (comes with ICC; ICC auto-vectorizes with SVML by default). Confusingly has its prototypes in immintrin.h along with actual intrinsics. Maybe they want to trick people into writing code that's dependent on Intel tools/libraries. Or maybe they think fewer includes are better and that everyone should use their compiler...

    Also related: Intel MKL (Math Kernel Library), with matrix BLAS functions.

  • AMD ACML - end-of-life closed-source freeware. I think it just has functions that loop over arrays/matrices (like Intel MKL), not functions for single SIMD vectors.

  • sse_mathfun (zlib license) SSE2 and ARM NEON. Hasn't been updated since about 2011 it seems. But does have implementations of single-vector math / trig functions.



来源:https://stackoverflow.com/questions/36636159/where-is-clangs-mm256-pow-ps-intrinsic

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!