问题
In a lot of situations I not only need the sine, but also the cosine of the same parameter.
For C, there is the sincos
function in the common unix m
math library. And actually, at least on i386, this should be a single assembly instruction, fsincos
.
sincos, sincosf, sincosl - calculate sin and cos simultaneously
I guess these benefits exist because there is an obvious overlap in computing sine and cosine: sin(x)^2 + cos(x)^2 = 1
. But AFAIK it does not pay off to try to shortcut this as cos = Math.sqrt(1 - sin*sin)
, as the sqrt
function comes at a similar cost.
Is there any way to reap the same benefits in Java? I guess I'm going to pay a price for a double[]
then; which maybe makes all the efforts moot because of the added garbage collection.
Or is the Hotspot compiler smart enough to recognize that I need both, and will compile this to a sincos
command? Can I test whether it recognizes it, and can I help it recognizing this, e.g. by making sure the Math.sin
and Math.cos
commands are directly successive in my code? This would actually make the most sense from a Java language point of view: having the comiler optimize this to use the fsincos
assembly call.
Collected from some assembler documentation:
Variations 8087 287 387 486 Pentium
fsin - - 122-771 257-354 16-126 NP
fsincos - - 194-809 292-365 17-137 NP
Additional cycles required if operand > pi/4 (~3.141/4 = ~.785)
sqrt 180-186 180-186 122-129 83-87 70 NP
fsincos
should need an extra pop, but that should come at 1 clock cycle. Assuming that the CPU also does not optimize this, sincos
should be almost twice as fast as calling sin
twice (second time to compute cosine; so i figure it will need to do an addition). sqrt
could be faster in some situations, but sine can be faster.
Update: I've done some experiments in C, but they are inconclusive. Interestingly enough, sincos
seems to be even slightly faster than sin
(without cos
), and the GCC compiler will use fsincos
when you compute both sin
and cos
- so it does what I'd like Hotspot to do (or does Hotspot, too?). I could not yet prevent the compiler from outsmarting me by using fsincos
except by not using cos
. It will then fall back to a C sin
, not fsin
.
回答1:
I have performed some microbenchmarks with caliper. 10000000 iterations over a (precomputed) array of random numbers in the range -4*pi .. 4*pi. I tried my best to get the fastest JNI solution I could come up going - it's a bit hard to predict whether you will actually get fsincos
or some emulated sincos
. Reported numbers are the best of 10 caliper trials (which in turn consist of 3-10 trials, the average of which is reported). So roughly it's 30-100 runs of the inner loop each.
I've benchmarked several variants:
Math.sin
only (reference)Math.cos
only (reference)Math.sin
+Math.cos
sincos
via JNIMath.sin
+ cos viaMath.sqrt( (1+sin) * (1-sin) )
+ sign reconstructionMath.cos
+ sin viaMath.sqrt( (1+cos) * (1-cos) )
+ sign reconstruction
(1+sin)*(1-sin)=1-sin*sin
mathematically, but if sin is close to 1 it should be more precise? Runtime difference is minimal, you save one addition.
Sign reconstruction via x %= TWOPI; if (x<0) x+=TWOPI;
and then checking the quadrant. If you have an idea how to do this with less CPU, I'd be happy to hear.
Numerical loss via sqrt
seems to be okay, at least for common angles. On the range of 1e-10 from rough experiments.
Sin 1,30 ==============
Cos 1,29 ==============
Sin, Cos 2,52 ============================
JNI sincos 1,77 ===================
SinSqrt 1,49 ================
CosSqrt 1,51 ================
The sqrt(1-s*s)
vs. sqrt((1+s)*(1-s))
makes about 0,01 difference. As you can see, the sqrt
based approach wins hands down against any of the others (as we can't currently access sincos
in pure Java). The JNI sincos
is better than computing sin
and cos
, but the sqrt
approach is still faster. cos
itself seems to be consistently a tick (0,01) better than sin
, but the case distinction to reconstruct the sign has an extra >
test. I don't think my results support that either sin+sqrt
or cos+sqrt
is clearly preferrable, but they do save around 40% of the time compared to sin
then cos
.
If we would extend Java to have an intrinsic optimized sincos, then this would likely be even better. IMHO it is a common use case, e.g. in graphics. When used in AWT, Batik etc. numerous applications could benefit from it.
If I'd run this again, I would also add JNI sin
and a noop
to estimate the cost of JNI. Maybe also benchmark the sqrt
trick via JNI. Just to make sure that we actually do want an intrinsic sincos
in the long run.
回答2:
Most sin and cos calculations are calls directly to the hardware. There isn't much of a faster way to calculate it than that. Specifically, in the range +- pi/4, the rates are extremely fast. If you use hardware acceleration in general, and try to limit the values to those specified, you should be fine. Source.
回答3:
Looking at the Hotspot code, I am rather convinced that the Oracle Hotspot VM does not optimize sin(a) + cos(a) into fsincos: See assembler_x86.cpp, line 7482ff.
However, I would suspect that the increased number of machine cycles for using fsin and fcos separately is easily outshadowed by other operations such as running the GC. I would use the standard Java features and profile the application. Only if a profile run indicates that a significant time is spent in the sin/cos calls, I would venture out to do something about it.
In this case, I would create a JNI wrapper that uses a 2-element jdoublearray as out parameter. If you have only one thread that uses the sincos JNI operations, you could use a statically initialized double[2] array in your Java code that would be reused over and over again.
回答4:
You can always profile.
Generally however, sqrt should come at the same speed as division, as the internal implementation of div and sqrt are very similar.
Sin and cosine, OTOH are calculated with polynomials of up to 10 degrees without any common coefficients and possibly a difficult modulo 2pi reduction -- that is the only common part shared in sincos (when not using CORDIC).
EDIT Revised profiling (with typo corrected) shows timing difference for
sin+cos: 1.580 1.580 1.840 (time for 200M iterations, 3 successive trials)
sincos: 1.080 0.900 0.920
sin+sqrt: 0.870 1.010 0.860
回答5:
There is no fsincos available in regular Java. Also, a JNI version may be slower than a double call to java.lang.Math.sin() and cos().
I guess you are concerned about the speed of sin(x)/cos(x). So I give you a suggestion for fast trigonometric operations, in replacement to fsincos: Look Up Table. Below are my original post. I hope it helps you.
=====
I tried to achieve the best possible performance on trigonometric functions (sin and cos), using Look Up Tables (LUT).
What I have found:
- LUT can be 20-25 times faster then java.lang.Math.sin()/cos(). Possible as fast as native fsin / fcos. Maybe as fast as fsincos.
- But java.lang.Math.sin() and cos() are FASTER than any other way to calculate sin/cos, if you use angles between 0 and 45 degree;
But notice that angles lower than 12 deg has sin(x) almost == x. It is even faster;
Some implementations use float array to store sin and another one for cos. This is unnecessary. Just remember that:
cos(x) == sin(x + PI/2)
- That is, if you have sin(x) table you have cos(x) table for free.
I did some tests with sin() for angles in range [0..45], using java.lang.Math.sin(); a naive look up table for 360 positions, a optimized LUT90 with table values for range[0..90], but expanded to work with [0..360]; and Look up table with interpolation.Note that after warn-up, java.lang.Math.sin() is faster than others:
Size test: 10000000
Angles range: [0.0...45.0]
Time in ms
Trial | Math.sin() | Lut sin() | LUT90.sin() | Lut sin2() [interpolation]
0 312,5879 25,2280 27,7313 36,4127
1 12,9468 19,5467 21,9396 34,2344
2 7,6811 16,7897 18,9646 32,5473
3 7,7565 16,7022 19,2343 32,8700
4 7,6634 16,9498 19,6307 32,8087
Sources available here GitHub
But, if you need high performance in range[-360..360], java.lang.Math lib is slower. A Look up table (LUT) is around 20 times faster. If high precision is required, you can use LUT with interpolation, it is a bit slower but still faster than java.lang.Math. See my sin2() in Math2.java, on link above.
Below numbers are for angle high range:
Size test: 10000000
Angles range: [-360.0...360.0]
Time in ms
Trial|Math.sin() | Lut sin() | LUT90.sin() | Lut.sin2() [interpolation]
0 942,7756 35,1488 47,4198 42,9466
1 915,3628 28,9924 37,9051 41,5299
2 430,3372 24,8788 34,9149 39,3297
3 428,3750 24,8316 34,5718 39,5187
来源:https://stackoverflow.com/questions/13460693/using-sincos-in-java