Using “sincos” in Java

萝らか妹 提交于 2019-12-03 09:01:00

问题


In a lot of situations I not only need the sine, but also the cosine of the same parameter.

For C, there is the sincos function in the common unix m math library. And actually, at least on i386, this should be a single assembly instruction, fsincos.

sincos, sincosf, sincosl - calculate sin and cos simultaneously

I guess these benefits exist because there is an obvious overlap in computing sine and cosine: sin(x)^2 + cos(x)^2 = 1. But AFAIK it does not pay off to try to shortcut this as cos = Math.sqrt(1 - sin*sin), as the sqrt function comes at a similar cost.

Is there any way to reap the same benefits in Java? I guess I'm going to pay a price for a double[] then; which maybe makes all the efforts moot because of the added garbage collection.

Or is the Hotspot compiler smart enough to recognize that I need both, and will compile this to a sincos command? Can I test whether it recognizes it, and can I help it recognizing this, e.g. by making sure the Math.sin and Math.cos commands are directly successive in my code? This would actually make the most sense from a Java language point of view: having the comiler optimize this to use the fsincos assembly call.

Collected from some assembler documentation:

Variations    8087         287        387      486     Pentium
fsin           -            -       122-771  257-354   16-126  NP
fsincos        -            -       194-809  292-365   17-137  NP
 Additional cycles required if operand > pi/4 (~3.141/4 = ~.785)
sqrt        180-186      180-186    122-129   83-87    70      NP

fsincos should need an extra pop, but that should come at 1 clock cycle. Assuming that the CPU also does not optimize this, sincos should be almost twice as fast as calling sin twice (second time to compute cosine; so i figure it will need to do an addition). sqrt could be faster in some situations, but sine can be faster.

Update: I've done some experiments in C, but they are inconclusive. Interestingly enough, sincos seems to be even slightly faster than sin (without cos), and the GCC compiler will use fsincos when you compute both sin and cos - so it does what I'd like Hotspot to do (or does Hotspot, too?). I could not yet prevent the compiler from outsmarting me by using fsincos except by not using cos. It will then fall back to a C sin, not fsin.


回答1:


I have performed some microbenchmarks with caliper. 10000000 iterations over a (precomputed) array of random numbers in the range -4*pi .. 4*pi. I tried my best to get the fastest JNI solution I could come up going - it's a bit hard to predict whether you will actually get fsincos or some emulated sincos. Reported numbers are the best of 10 caliper trials (which in turn consist of 3-10 trials, the average of which is reported). So roughly it's 30-100 runs of the inner loop each.

I've benchmarked several variants:

  • Math.sin only (reference)
  • Math.cos only (reference)
  • Math.sin + Math.cos
  • sincos via JNI
  • Math.sin + cos via Math.sqrt( (1+sin) * (1-sin) ) + sign reconstruction
  • Math.cos + sin via Math.sqrt( (1+cos) * (1-cos) ) + sign reconstruction

(1+sin)*(1-sin)=1-sin*sin mathematically, but if sin is close to 1 it should be more precise? Runtime difference is minimal, you save one addition.

Sign reconstruction via x %= TWOPI; if (x<0) x+=TWOPI; and then checking the quadrant. If you have an idea how to do this with less CPU, I'd be happy to hear.

Numerical loss via sqrt seems to be okay, at least for common angles. On the range of 1e-10 from rough experiments.

Sin         1,30 ==============
Cos         1,29 ==============
Sin, Cos    2,52 ============================
JNI sincos  1,77 ===================
SinSqrt     1,49 ================
CosSqrt     1,51 ================

The sqrt(1-s*s) vs. sqrt((1+s)*(1-s)) makes about 0,01 difference. As you can see, the sqrt based approach wins hands down against any of the others (as we can't currently access sincos in pure Java). The JNI sincos is better than computing sin and cos, but the sqrt approach is still faster. cos itself seems to be consistently a tick (0,01) better than sin, but the case distinction to reconstruct the sign has an extra > test. I don't think my results support that either sin+sqrt or cos+sqrt is clearly preferrable, but they do save around 40% of the time compared to sin then cos.

If we would extend Java to have an intrinsic optimized sincos, then this would likely be even better. IMHO it is a common use case, e.g. in graphics. When used in AWT, Batik etc. numerous applications could benefit from it.

If I'd run this again, I would also add JNI sin and a noop to estimate the cost of JNI. Maybe also benchmark the sqrt trick via JNI. Just to make sure that we actually do want an intrinsic sincos in the long run.




回答2:


Most sin and cos calculations are calls directly to the hardware. There isn't much of a faster way to calculate it than that. Specifically, in the range +- pi/4, the rates are extremely fast. If you use hardware acceleration in general, and try to limit the values to those specified, you should be fine. Source.




回答3:


Looking at the Hotspot code, I am rather convinced that the Oracle Hotspot VM does not optimize sin(a) + cos(a) into fsincos: See assembler_x86.cpp, line 7482ff.

However, I would suspect that the increased number of machine cycles for using fsin and fcos separately is easily outshadowed by other operations such as running the GC. I would use the standard Java features and profile the application. Only if a profile run indicates that a significant time is spent in the sin/cos calls, I would venture out to do something about it.

In this case, I would create a JNI wrapper that uses a 2-element jdoublearray as out parameter. If you have only one thread that uses the sincos JNI operations, you could use a statically initialized double[2] array in your Java code that would be reused over and over again.




回答4:


You can always profile.

Generally however, sqrt should come at the same speed as division, as the internal implementation of div and sqrt are very similar.

Sin and cosine, OTOH are calculated with polynomials of up to 10 degrees without any common coefficients and possibly a difficult modulo 2pi reduction -- that is the only common part shared in sincos (when not using CORDIC).

EDIT Revised profiling (with typo corrected) shows timing difference for

sin+cos:  1.580 1.580 1.840 (time for 200M iterations, 3 successive trials)
sincos:   1.080 0.900 0.920
sin+sqrt: 0.870 1.010 0.860



回答5:


There is no fsincos available in regular Java. Also, a JNI version may be slower than a double call to java.lang.Math.sin() and cos().

I guess you are concerned about the speed of sin(x)/cos(x). So I give you a suggestion for fast trigonometric operations, in replacement to fsincos: Look Up Table. Below are my original post. I hope it helps you.

=====

I tried to achieve the best possible performance on trigonometric functions (sin and cos), using Look Up Tables (LUT).

What I have found:

  • LUT can be 20-25 times faster then java.lang.Math.sin()/cos(). Possible as fast as native fsin / fcos. Maybe as fast as fsincos.
  • But java.lang.Math.sin() and cos() are FASTER than any other way to calculate sin/cos, if you use angles between 0 and 45 degree;
  • But notice that angles lower than 12 deg has sin(x) almost == x. It is even faster;

  • Some implementations use float array to store sin and another one for cos. This is unnecessary. Just remember that:

cos(x) == sin(x + PI/2)

  • That is, if you have sin(x) table you have cos(x) table for free.

I did some tests with sin() for angles in range [0..45], using java.lang.Math.sin(); a naive look up table for 360 positions, a optimized LUT90 with table values for range[0..90], but expanded to work with [0..360]; and Look up table with interpolation.Note that after warn-up, java.lang.Math.sin() is faster than others:

Size test: 10000000
Angles range: [0.0...45.0]
Time in ms
Trial | Math.sin() | Lut sin() | LUT90.sin() | Lut sin2() [interpolation]
0    312,5879        25,2280        27,7313      36,4127
1    12,9468         19,5467        21,9396      34,2344
2    7,6811          16,7897        18,9646      32,5473
3    7,7565          16,7022        19,2343      32,8700
4    7,6634          16,9498        19,6307      32,8087

Sources available here GitHub

But, if you need high performance in range[-360..360], java.lang.Math lib is slower. A Look up table (LUT) is around 20 times faster. If high precision is required, you can use LUT with interpolation, it is a bit slower but still faster than java.lang.Math. See my sin2() in Math2.java, on link above.

Below numbers are for angle high range:

Size test: 10000000
Angles range: [-360.0...360.0]
Time in ms
Trial|Math.sin() | Lut sin() | LUT90.sin() | Lut.sin2() [interpolation]
0    942,7756        35,1488        47,4198      42,9466
1    915,3628        28,9924        37,9051      41,5299
2    430,3372        24,8788        34,9149      39,3297
3    428,3750        24,8316        34,5718      39,5187


来源:https://stackoverflow.com/questions/13460693/using-sincos-in-java

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!