Approx. of π used to compare Sequential v/s Parallel speeds in java. Why .parallel() was slower?

后端未结

关注

 2  1319

抹茶落季 2021-01-06 22:48

Can someone please explain me why the sequential version π-approximation was faster than the parallel one?

I can\'t figure it out

I\'m playing around with us

2条回答

情书的邮戳 (楼主)

2021-01-06 23:24
I get even worse results running in parallel on my machine (3.0 GHz Intel Core i7, two cores, four threads):
```
sequential: PI ~ 3.14175124 calculated in  4952 msecs
  parallel: PI ~ 3.14167776 calculated in 21320 msecs
```
I suspect the main reason is that Math.random() is thread-safe, and so it synchronizes around every call. Since there are multiple threads all trying to get random numbers at the same time, they're all contending for the same lock. This adds a tremendous amount of overhead. Note that the specification for Math.random() says the following:

This method is properly synchronized to allow correct use by more than one thread. However, if many threads need to generate pseudorandom numbers at a great rate, it may reduce contention for each thread to have its own pseudorandom-number generator.

To avoid lock contention, use ThreadLocalRandom instead:
```
long count = LongStream.rangeClosed(1, NUM_SAMPLES)
                       .parallel()
                       .filter(e -> {
                           ThreadLocalRandom cur = ThreadLocalRandom.current();
                           double x = cur.nextDouble();
                           double y = cur.nextDouble();
                           return x * x + y * y < 1;
                       })
                       .count();
```
This gives the following results:
```
sequential2: PI ~ 3.14169156 calculated in 1171 msecs
  parallel2: PI ~ 3.14166796 calculated in  648 msecs
```
which is 1.8x speedup, not too bad for a two-core machine. Note that this is also faster when run sequentially, probably because there's no lock overhead at all.

Aside: Normally for benchmarks I'd suggest using JMH. However, this benchmark seems to run long enough that it gives a reasonable indication of relative speeds. For more precise results, though, I do recommend using JMH.

UPDATE

Here are additional results (requested by user3666197 in comments), using a NUM_SAMPLES value of 1_000_000_000 compared to the original 100_000_000. I've copied the results from above for easy comparison.
```
NUM_SAMPLES = 100_000_000

sequential:  PI ~ 3.14175124 calculated in    4952 msecs
parallel:    PI ~ 3.14167776 calculated in   21320 msecs
sequential2: PI ~ 3.14169156 calculated in    1171 msecs
parallel2:   PI ~ 3.14166796 calculated in     648 msecs

NUM_SAMPLES = 1_000_000_000

sequential:  PI ~ 3.141572896 calculated in  47730 msecs
parallel:    PI ~ 3.141543836 calculated in 228969 msecs
sequential2: PI ~ 3.1414865   calculated in  12843 msecs
parallel2:   PI ~ 3.141635704 calculated in   7953 msecs
```
The sequential and parallel results are (mostly) the same code as in the question, and sequential2 and parallel2 are using my modified ThreadLocalRandom code. The new timings are overall roughly 10x longer, as one would expect. The longer parallel2 run isn't quite as fast as one would expect, though it's not totally out of line, showing about a 1.6x speedup on a two-core machine.
0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...