The literature on computing the elementary function sin
with tables refers to the formula:
sin(x) = sin(Cn) * cos(h) + cos(Cn) * sin(h)
<
The implementation below partially answers the question, in that it is a single-precision implementation of sine, using the formula suggested in the question, that is accurate to 0.53 ULP over [0 … 1.57], and accurate to 0.5 ULP for 99.98% of its arguments over this range.
Specifically, I get the output:
error 285758762/536870912 ULP sin(2.11219326e-01) ref:2.09652290e-01 new:2.09652275e-01 differences: 176880 / 1070134723
Meaning the error was never more than 285/536 of an ULP (about 0.53 ULP), and 176880 is the number of arguments where the error was above 0.5 ULP, out of a total of 1070134723 arguments.
It does not seem possible to achieve this sort of result with the plain sin(Cn) * cos(h) + cos(Cn) * sin(h)
formula and only single-precision computations. The article cited in the question alludes to “the term c0*h
being evaluated in an extended precision” in order to achieve overall accuracy.
#include <inttypes.h>
#include <stdint.h>
#include <stdio.h>
#include <math.h>
#include <string.h>
#include <stdlib.h>
float c_cos_sin[][3] = {
// 0x0.000000000p+0 /* 0.000000 */, 0x1.000000p+0, 0x0.000000p+0,
// 0x0.00fb76590p+2 /* 0.015348 */, 0x1.fff090p-1, 0x1.f6e7a4p-7,
// 0x0.01fd02f80p+2 /* 0.031068 */, 0x1.ffc0c0p-1, 0x1.fcee02p-6,
// 0x0.0302f6280p+2 /* 0.047056 */, 0x1.ff6eeap-1, 0x1.8156aap-5,
// 0x0.04029a400p+2 /* 0.062659 */, 0x1.fefec8p-1, 0x1.007b94p-4,
// 0x0.0500a9d80p+2 /* 0.078165 */, 0x1.fe6fcap-1, 0x1.3fd706p-4,
// 0x0.060215b80p+2 /* 0.093877 */, 0x1.fdbedcp-1, 0x1.7ff4e8p-4,
// 0x0.070225580p+2 /* 0.109506 */, 0x1.fceee8p-1, 0x1.bfa3fcp-4,
// 0x0.080460e00p+2 /* 0.125267 */, 0x1.fbfcf6p-1, 0x1.ffc0f6p-4,
// 0x0.08fed4a00p+2 /* 0.140554 */, 0x1.faf372p-1, 0x1.1ee830p-3,
// 0x0.0a0054100p+2 /* 0.156270 */, 0x1.f9c2d8p-1, 0x1.3ebd74p-3,
// 0x0.0afc8eb00p+2 /* 0.171665 */, 0x1.f87978p-1, 0x1.5dd872p-3,
0x0.0bff5db00p+2 /* 0.187461 */, 0x1.f707b0p-1, 0x1.7dad14p-3,
0x0.0cfe70200p+2 /* 0.203030 */, 0x1.f57bcep-1, 0x1.9cf438p-3,
0x0.0e024ef00p+2 /* 0.218891 */, 0x1.f3c87ap-1, 0x1.bcb7a0p-3,
0x0.0efeab400p+2 /* 0.234294 */, 0x1.f202ecp-1, 0x1.db74a8p-3,
0x0.10003da00p+2 /* 0.250015 */, 0x1.f014d0p-1, 0x1.fab664p-3,
0x0.110242c00p+2 /* 0.265763 */, 0x1.ee0660p-1, 0x1.0cf2f4p-2,
0x0.12055d400p+2 /* 0.281577 */, 0x1.ebd62ap-1, 0x1.1c8a4ap-2,
0x0.13025de00p+2 /* 0.297019 */, 0x1.e994c2p-1, 0x1.2bb212p-2,
0x0.13fc96600p+2 /* 0.312292 */, 0x1.e73c4ep-1, 0x1.3a9d34p-2,
0x0.15014c400p+2 /* 0.328204 */, 0x1.e4abbcp-1, 0x1.4a1472p-2,
0x0.15fe27a00p+2 /* 0.343637 */, 0x1.e210eep-1, 0x1.58fffep-2,
0x0.1703b1200p+2 /* 0.359600 */, 0x1.df4050p-1, 0x1.685884p-2,
0x0.180296e00p+2 /* 0.375158 */, 0x1.dc63e8p-1, 0x1.7736b2p-2,
0x0.18fc8a600p+2 /* 0.390414 */, 0x1.d9790cp-1, 0x1.85b472p-2,
0x0.19ffac000p+2 /* 0.406230 */, 0x1.d654fap-1, 0x1.94a1ecp-2,
0x0.1aff07c00p+2 /* 0.421816 */, 0x1.d31f26p-1, 0x1.a33e6ap-2,
0x0.1c0162800p+2 /* 0.437585 */, 0x1.cfc21ep-1, 0x1.b1ec42p-2,
0x0.1cfe63200p+2 /* 0.453027 */, 0x1.cc5a50p-1, 0x1.c0317ep-2,
0x0.1e0153a00p+2 /* 0.468831 */, 0x1.c8c0f4p-1, 0x1.ceb01ep-2,
0x0.1efe6d800p+2 /* 0.484279 */, 0x1.c52024p-1, 0x1.dcbe7ep-2,
0x0.1ffde5600p+2 /* 0.499872 */, 0x1.c15a92p-1, 0x1.ead0fcp-2,
0x0.20fa9ac00p+2 /* 0.515296 */, 0x1.bd83eap-1, 0x1.f89e82p-2,
0x0.220491000p+2 /* 0.531529 */, 0x1.b95c6cp-1, 0x1.038212p-1,
0x0.22ff9c800p+2 /* 0.546851 */, 0x1.b55542p-1, 0x1.0a3d7ap-1,
0x0.23faafc00p+2 /* 0.562176 */, 0x1.b133aep-1, 0x1.10e916p-1,
0x0.250a2cc00p+2 /* 0.578746 */, 0x1.ac9ed2p-1, 0x1.180d0ep-1,
0x0.25fee2800p+2 /* 0.593682 */, 0x1.a863d2p-1, 0x1.1e6bdep-1,
0x0.2700b4000p+2 /* 0.609418 */, 0x1.a3d498p-1, 0x1.251056p-1,
0x0.28025e000p+2 /* 0.625144 */, 0x1.9f2b7ap-1, 0x1.2ba13ap-1,
0x0.28f975400p+2 /* 0.640226 */, 0x1.9a9aa0p-1, 0x1.31db54p-1,
0x0.29fc6dc00p+2 /* 0.656032 */, 0x1.95b7ecp-1, 0x1.384ef4p-1,
0x0.2afc27c00p+2 /* 0.671640 */, 0x1.90cb6cp-1, 0x1.3e9a4ap-1,
0x0.2c0659c00p+2 /* 0.687888 */, 0x1.8b90c6p-1, 0x1.45127ap-1,
0x0.2d017dc00p+2 /* 0.703216 */, 0x1.868952p-1, 0x1.4b18dep-1,
0x0.2e04f3c00p+2 /* 0.719052 */, 0x1.813e8cp-1, 0x1.513d70p-1,
0x0.2efcb8800p+2 /* 0.734175 */, 0x1.7c19bcp-1, 0x1.5706f0p-1,
0x0.300642800p+2 /* 0.750382 */, 0x1.767dc8p-1, 0x1.5d2464p-1,
0x0.30ff5cc00p+2 /* 0.765586 */, 0x1.7123d0p-1, 0x1.62cb9cp-1,
0x0.3204f6c00p+2 /* 0.781553 */, 0x1.6b6d98p-1, 0x1.68a4d6p-1,
0x0.3303af000p+2 /* 0.797100 */, 0x1.65c70cp-1, 0x1.6e4010p-1,
0x0.34002f400p+2 /* 0.812511 */, 0x1.601740p-1, 0x1.73b86cp-1,
0x0.35080ac00p+2 /* 0.828616 */, 0x1.5a0f1cp-1, 0x1.79579cp-1,
0x0.35fda7800p+2 /* 0.843607 */, 0x1.545d16p-1, 0x1.7e7cc6p-1,
0x0.37040f800p+2 /* 0.859623 */, 0x1.4e31bep-1, 0x1.83e3aep-1,
0x0.3800eac00p+2 /* 0.875056 */, 0x1.482b1cp-1, 0x1.89002ap-1,
0x0.390737c00p+2 /* 0.891066 */, 0x1.41d5b8p-1, 0x1.8e3432p-1,
0x0.39fce7800p+2 /* 0.906061 */, 0x1.3bd3dep-1, 0x1.92fc2ap-1,
0x0.3b0596c00p+2 /* 0.922216 */, 0x1.3546c4p-1, 0x1.9808d0p-1,
0x0.3bf971c00p+2 /* 0.937100 */, 0x1.2f2b58p-1, 0x1.9c979ep-1,
0x0.3d0275800p+2 /* 0.953275 */, 0x1.2874c8p-1, 0x1.a17120p-1,
0x0.3e02c4400p+2 /* 0.968919 */, 0x1.21e3cap-1, 0x1.a60740p-1,
0x0.3ef759000p+2 /* 0.983847 */, 0x1.1b8ec4p-1, 0x1.aa4f02p-1,
0x0.3ff90a800p+2 /* 0.999575 */, 0x1.14d158p-1, 0x1.aeb732p-1,
0x0.40f703800p+2 /* 1.015077 */, 0x1.0e1baep-1, 0x1.b2f468p-1,
0x0.420693000p+2 /* 1.031651 */, 0x1.06dcb2p-1, 0x1.b75f2ap-1,
0x0.4300fb800p+2 /* 1.046935 */, 0x1.001dcep-1, 0x1.bb5678p-1,
0x0.440282800p+2 /* 1.062653 */, 0x1.f23bb2p-2, 0x1.bf4efcp-1,
0x0.44fb18000p+2 /* 1.077826 */, 0x1.e49a58p-2, 0x1.c3095ep-1,
0x0.45fe26000p+2 /* 1.093637 */, 0x1.d647a4p-2, 0x1.c6cfaap-1,
0x0.4700de800p+2 /* 1.109428 */, 0x1.c7dba0p-2, 0x1.ca77aap-1,
0x0.47fd2d800p+2 /* 1.124828 */, 0x1.b9af14p-2, 0x1.cdec48p-1,
0x0.48fd3c000p+2 /* 1.140456 */, 0x1.ab3138p-2, 0x1.d1515ep-1,
0x0.49f66d000p+2 /* 1.155666 */, 0x1.9cfd2cp-2, 0x1.d48338p-1,
0x0.4b05ec000p+2 /* 1.172236 */, 0x1.8d67dcp-2, 0x1.d7deb0p-1,
0x0.4bfebf800p+2 /* 1.187424 */, 0x1.7f0718p-2, 0x1.dad544p-1,
0x0.4cfa07800p+2 /* 1.202761 */, 0x1.706b10p-2, 0x1.ddb6e0p-1,
0x0.4e0324800p+2 /* 1.218942 */, 0x1.60e920p-2, 0x1.e0a1e6p-1,
0x0.4efdb7800p+2 /* 1.234236 */, 0x1.522b24p-2, 0x1.e34658p-1,
0x0.4ffb51000p+2 /* 1.249714 */, 0x1.432afap-2, 0x1.e5d57ep-1,
0x0.50fb6d000p+2 /* 1.265346 */, 0x1.33f0b2p-2, 0x1.e84ce2p-1,
0x0.5200bc000p+2 /* 1.281295 */, 0x1.24536ep-2, 0x1.eab19cp-1,
0x0.52fbc0800p+2 /* 1.296616 */, 0x1.1541a6p-2, 0x1.ece01ep-1,
0x0.54066d800p+2 /* 1.312892 */, 0x1.052cfep-2, 0x1.ef1104p-1,
0x0.550424000p+2 /* 1.328378 */, 0x1.eb9ff8p-3, 0x1.f1077cp-1,
0x0.55f93c800p+2 /* 1.343337 */, 0x1.cdd470p-3, 0x1.f2cfeap-1,
0x0.56fa9d000p+2 /* 1.359046 */, 0x1.ae6e42p-3, 0x1.f49074p-1,
0x0.57ff02000p+2 /* 1.374939 */, 0x1.8e8e2cp-3, 0x1.f63612p-1,
0x0.58f813000p+2 /* 1.390141 */, 0x1.6ff8f0p-3, 0x1.f7aaf6p-1,
0x0.5a0036800p+2 /* 1.406263 */, 0x1.4f722cp-3, 0x1.f915dcp-1,
0x0.5b005b800p+2 /* 1.421897 */, 0x1.2fd214p-3, 0x1.fa55aep-1,
0x0.5bfe01800p+2 /* 1.437378 */, 0x1.106e24p-3, 0x1.fb732ap-1,
0x0.5cf89f000p+2 /* 1.452675 */, 0x1.e2b3c0p-4, 0x1.fc6ea8p-1,
0x0.5e00f4800p+2 /* 1.468808 */, 0x1.a104e8p-4, 0x1.fd56eap-1,
0x0.5f0527000p+2 /* 1.484689 */, 0x1.60420cp-4, 0x1.fe1a64p-1,
0x0.5fffc8000p+2 /* 1.499987 */, 0x1.21cb4cp-4, 0x1.feb78ap-1,
0x0.610318000p+2 /* 1.515814 */, 0x1.c23092p-5, 0x1.ff39eep-1,
0x0.62032d800p+2 /* 1.531444 */, 0x1.424a9cp-5, 0x1.ff9a86p-1,
0x0.62fd46000p+2 /* 1.546709 */, 0x1.8a9d8cp-6, 0x1.ffd9fap-1,
0x0.63fbc1800p+2 /* 1.562241 */, 0x1.1856c2p-7, 0x1.fffb34p-1,
0x0.65011f000p+2 /* 1.578193 */, -0x1.e4c59ap-8, 0x1.fffc6ap-1,
};
/*@ requires 0 <= x <= 1.6 ; */
float my_sinf(float x)
{
const float offs = 0x0.0b8p+2f;
if (x < offs)
{
float xx = x * x;
/* Remez-optimized polynomial for relative accuracy on -0.164 .. 0.164,
Not the full -0.18 .. 0.18 where it is used, which makes it worse
on -0.164 .. 0.164. But even optimized without regard for 0.164 .. 0.18
It is better than the table entry + correction there so we use it there
*/
return x + x * xx * (-0.16666660487324f + xx * 8.3259065018069e-3f);
}
int i = (x - offs) * 64.0f;
float *p = c_cos_sin[i];
float F = p[0];
float C = p[1];
float S = p[2];
float h = x - F;
#if 0
float s = S * (cosl(h) - 1.0) + C * sinl(h); // ext-double computation
#endif
#if 1
// Two Remez-optimized polynomials for absolute accuracy on -0.008 .. 0.008
float s = h * (C + h * (-0.4999976959797f * S + h * -0.166666183241f * C));
#endif
return S + s;
}
unsigned int m, c, t;
uint64_t max_ulp;
int main(){
for (float f = 0.0f; f < 1.57f; f = nextafterf(f, 3.0f))
{
double rd = sin(f);
float r = rd;
float n = my_sinf(f);
t++;
if (r != n)
{
c++;
uint64_t in, ir;
double nd = n;
memcpy(&in, &nd, 8);
memcpy(&ir, &rd, 8);
uint64_t ulp = in > ir ? in - ir : ir - in;
if (ulp > max_ulp)
printf("error %" PRIu64 "/536870912 ULP sin(%.8e) ref:%.8e new:%.8e \n",
ulp, f, r, n);
if (ulp > max_ulp)
max_ulp = ulp;
}
}
printf("differences: %u / %u\n", c, t);
}
Well, this formula is a start. Then other transformations could be done, depending on the context. I agree that if the formula sin(x) = sin(Cn) * cos(h) + cos(Cn) * sin(h)
is applied in the target precision, then the rounding error of sin(Cn) * cos(h)
is up to 1/2 ulp of the result, which is bad if the goal is to get an accurate result. However some terms are sometimes expressed in greater precision by using pseudo-expansions. For instance, a number can be represented by a pair (a,b) where b is much smaller than a and whose value is regarded as a+b. In such a case, cos(h) could be represented by a pair (1,h') and the computation would be equivalent to what you suggest.
Alternatively, the implementation can be detailed once the formulas to evaluate cos(h) and sin(h) are given. See Section 3.1 in Stehlé and Zimmermann's paper you cited: they define C*(h) = C(h) − 1, and use C* in the final formula, which is basically what you suggest.
Note: I'm note sure that using the above formula is the best choice. One could start with sin(x) = sin(Cn) + error_term
, and compute the error term in some other way.
You are meeting the boundary between theoretical mathematics and practical numerical calculations.
The trigonometric identity
sin(a + b) = sin(a) * cos( b ) + cos(a) * sin(b)
gives rise to the formula that you quote:
sin(x) = sin(Cn) * cos(h) + cos(Cn) * sin(h)
when you substitute Cn + h for x. This formula is mathematically exact.
However, due to the limitations of practical numeric computation, i.e. floating point arithmetic in your case, we do not have the infinite precision available to numerically calculate such formula exactly. In practice we need to consider the accuracy that we can use to represent the values in the table and what errors are introduced when we perform limited accuracy calculations on these limited accuracy values. The mathematical discipline that deals with practical numerical computation is Numerical Analysis.
There is a very brief summary of Numerical Analysis on Wikipedia with many links to various topics within the subject. I think that you might find Computing values of functions and Interpolation, Extrapolation, and Regression of particular relevance.