Approx. of π used to compare Sequential v/s Parallel speeds in java. Why .parallel() was slower?

后端 未结 2 1319
抹茶落季
抹茶落季 2021-01-06 22:48

Can someone please explain me why the sequential version π-approximation was faster than the parallel one?

I can\'t figure it out

I\'m playing around with us

2条回答
  •  情书的邮戳
    2021-01-06 23:24

    I get even worse results running in parallel on my machine (3.0 GHz Intel Core i7, two cores, four threads):

    sequential: PI ~ 3.14175124 calculated in  4952 msecs
      parallel: PI ~ 3.14167776 calculated in 21320 msecs
    

    I suspect the main reason is that Math.random() is thread-safe, and so it synchronizes around every call. Since there are multiple threads all trying to get random numbers at the same time, they're all contending for the same lock. This adds a tremendous amount of overhead. Note that the specification for Math.random() says the following:

    This method is properly synchronized to allow correct use by more than one thread. However, if many threads need to generate pseudorandom numbers at a great rate, it may reduce contention for each thread to have its own pseudorandom-number generator.

    To avoid lock contention, use ThreadLocalRandom instead:

    long count = LongStream.rangeClosed(1, NUM_SAMPLES)
                           .parallel()
                           .filter(e -> {
                               ThreadLocalRandom cur = ThreadLocalRandom.current();
                               double x = cur.nextDouble();
                               double y = cur.nextDouble();
                               return x * x + y * y < 1;
                           })
                           .count();
    

    This gives the following results:

    sequential2: PI ~ 3.14169156 calculated in 1171 msecs
      parallel2: PI ~ 3.14166796 calculated in  648 msecs
    

    which is 1.8x speedup, not too bad for a two-core machine. Note that this is also faster when run sequentially, probably because there's no lock overhead at all.

    Aside: Normally for benchmarks I'd suggest using JMH. However, this benchmark seems to run long enough that it gives a reasonable indication of relative speeds. For more precise results, though, I do recommend using JMH.

    UPDATE

    Here are additional results (requested by user3666197 in comments), using a NUM_SAMPLES value of 1_000_000_000 compared to the original 100_000_000. I've copied the results from above for easy comparison.

    NUM_SAMPLES = 100_000_000
    
    sequential:  PI ~ 3.14175124 calculated in    4952 msecs
    parallel:    PI ~ 3.14167776 calculated in   21320 msecs
    sequential2: PI ~ 3.14169156 calculated in    1171 msecs
    parallel2:   PI ~ 3.14166796 calculated in     648 msecs
    
    NUM_SAMPLES = 1_000_000_000
    
    sequential:  PI ~ 3.141572896 calculated in  47730 msecs
    parallel:    PI ~ 3.141543836 calculated in 228969 msecs
    sequential2: PI ~ 3.1414865   calculated in  12843 msecs
    parallel2:   PI ~ 3.141635704 calculated in   7953 msecs
    

    The sequential and parallel results are (mostly) the same code as in the question, and sequential2 and parallel2 are using my modified ThreadLocalRandom code. The new timings are overall roughly 10x longer, as one would expect. The longer parallel2 run isn't quite as fast as one would expect, though it's not totally out of line, showing about a 1.6x speedup on a two-core machine.

提交回复
热议问题