Does anyone know an open-source C++ x86 SIMD intrinsics library?
Intel supplies exactly what I need in their integrated performance primitives library, but I can't use that because of the copyrights all over the place.
EDIT
I already know the intrinsics provided by the compilers. What I need is a convenient interface to use them.
Take a look at libsimdpp header-only C++ SIMD wrapper library.
The library supports several instruction sets via single interface: SSE2, SSE3, SSSE3, SSE4.1, AVX, AVX2, AVX512F, XOP, FMA3/4, NEON, NEONv2, Altivec. All of Clang, GCC, MSVC and ICC are suported.
Any differences between instruction sets are resolved by implementing the missing instructions as a combination of supported ones. As a bonus, it's possible to compile the same code for several instruction sets, link the resulting object files to a single executable and use a convenient dynamic dispatch mechanism to run the implementation most tailored to the current processor.
There are several libraries that have emerged in recent years to abstract explicit SIMD programming. The most important ones:
- Vc
- boost::simd (not actually in boost - part of NT²)
- Prof. Agner Fog's Vectorclass library
The most important thing to look for is to have a usable set of types that correctly abstract the best available SIMD registers and instructions for a given target. And, obviously, full portability to systems without SIMD support.
I wrote a GLSL-style library that will convert to near-perfect quality ASM code.
A very common operation - cross product:
vec4 cross(const vec4 &a, const vec4 &b)
{
return a.yzxw * b.zxyw - a.zxyw * b.yzxw;
}
would be converted to this assemly code using glsl-sse2:
_Z5crossRK4vec4S1_:
movaps (%rsi), %xmm1
movaps (%rdx), %xmm2
pshufd $201, %xmm1, %xmm5
pshufd $210, %xmm2, %xmm0
pshufd $210, %xmm1, %xmm4
pshufd $201, %xmm2, %xmm3
mulps %xmm0, %xmm5
mulps %xmm3, %xmm4
subps %xmm4, %xmm5
movaps %xmm5, (%rdi)
ret
Please note the library isn't perfect yet, and most likely have unfound bugs as it is still new.
Have a look at AMD's SSEPlus project, might be what your after
Microsoft has just released its new "DirectXMath" library. It includes support for SSE2 and NEON intrinsics. Documentation looks decent too.
The DirectXMath API provides SIMD-friendly C++ types and functions for common linear algebra and graphics math operations common to DirectX applications. The library provides optimized versions for Windows 32-bit (x86), Windows 64-bit (x64), and Windows on ARM through SSE2 and ARM-NEON intrinsics support in the Visual Studio compiler.
Vc is another C++ library that implements vector classes and allows writing vectorized code that is independent from the actual instruction set that is used.
来源:https://stackoverflow.com/questions/4953121/c-sse-simd-framework