Is there a good double-precision small matrix SIMD library for x86?

问题

I'm looking for a SIMD library focused small (4x4) matrix operations for graphics. There's lots of single precision ones out there, but I need to support both single and double precision.

I've looked at Intel's IPP MX library, but I'd prefer something with source. I'm very interested in SSE3+ implementations of these particular operations:

Mat4 * Mat4
Mat4 * Vec4
Mat4 * Array of Mat4
Mat4 * Array of Vec4
Mat4 inversion (nice to have)

EDIT: No "premature optimization" answers please. Anyone who has worked with small matrices knows GCC does not vectorize these as well as hand optimized intrinsics or ASM. And in this case it's important, or I wouldn't be asking.

回答1:

Maybe the Eigen library?

It supports SSE 2/3/4, ARM NEON and AltiVec instruction set.

回答2:

Eigen supports fixed size matrices. Small fixed size matrices can be allocated on stack for better performance. 4x4 is good for SSE, since SSE vector size is 128 bits. A row or a column of 4 double precision numbers would fit evenly into 2x128 bit SSE vectors. This makes SIMD implementation easy.

Another option is to code it yourself. Since your matrices are small and fit into L1 cache, you don't have to bother with memory titling needed for large matrices. You could use AVX for even better performance. Newer versions of GCC and Visual C++ 2010 support AVX intrinsics. AVX vector size is 256 bit can hold exactly 4 double precision numbers.

回答3:

Not fully complete yet, but I wanted to pitch my own library - glsl-sse2.

回答4:

There's a 4x4 AVX implementation here. It's written as an example application but I'm sure it wouldn't be too hard for anyone to extract the interesting parts into a shared library. Thought I'd post this despite the age of the original question for anyone alighting here in the future.

回答5:

If you're using a modern compiler, you probably don't need to bother. Automatic vectorization from most compilers should be able to easily transform for loops with fixed bounds in to SIMD code. GCC has had this for quite a while, and it is one of the main selling points of Intel's compiler (though you should be careful about using Intel's compiler if you might want to use AMD chips).

来源：https://stackoverflow.com/questions/5748660/is-there-a-good-double-precision-small-matrix-simd-library-for-x86

标签

c++

sse

simd

matrix-multiplication