I am googling the question for past hour, but there are only points to Taylor Series or some sample code that is either too slow or does not compile at all. Well, most answe
I use the following CORDIC code to compute trigonometric functions in quadruple precision. The constant N determines the number of bits of precision required (for example N=26 will give single precision accuracy). Depending on desired accuracy, the precomputed storage can be small and will fit in the cache. It only requires addition and multiplication operations and is also very easy to vectorize.
The algorithm pre-computes sin and cos values for 0.5^i, i=1,...,N. Then, we can combine these precomputed values, to compute sin and cos for any angle up to a resolution of 0.5^N
template
QuadReal_t sin(const QuadReal_t a){
const int N=128;
static std::vector theta;
static std::vector sinval;
static std::vector cosval;
if(theta.size()==0){
#pragma omp critical (QUAD_SIN)
if(theta.size()==0){
theta.resize(N);
sinval.resize(N);
cosval.resize(N);
QuadReal_t t=1.0;
for(int i=0;i=0;i--){
sinval[i]=2.0*sinval[i+1]*cosval[i+1];
cosval[i]=sqrt(1.0-sinval[i]*sinval[i]);
}
}
}
QuadReal_t t=(a<0.0?-a:a);
QuadReal_t sval=0.0;
QuadReal_t cval=1.0;
for(int i=0;i