Fastest Implementation of Exponential Function Using AVX
I'm looking for an efficient (Fast) approximation of the exponential function operating on AVX elements (Single Precision Floating Point). Namely - __m256 _mm256_exp_ps( __m256 x ) without SVML. Relative Accuracy should be something like ~1e-6, or ~20 mantissa bits (1 part in 2^20). I'd be happy if it is written in C Style with Intel intrinsics. Code should be portable (Windows, macOS, Linux, MSVC, ICC, GCC, etc...). This is similar to Fastest Implementation of Exponential Function Using SSE , but that question is looking for very fast with low precision (The current answer there gives about