When should streams be preferred over traditional loops for best performance? Do streams take advantage of branch-prediction?

前端未结

关注

 5  1169

旧时难觅i 2020-12-13 17:05

I just read about Branch-Prediction and wanted to try how this works with Java 8 Streams.

However the performance with Streams is always turning out to be wors

5条回答

春和景丽 (楼主)

2020-12-13 17:53

Everything is said, but I want to show you how your code should look like using JMH.

@Fork(3)
@BenchmarkMode(Mode.AverageTime)
@Measurement(iterations = 10, timeUnit = TimeUnit.NANOSECONDS)
@State(Scope.Benchmark)
@Threads(1)
@Warmup(iterations = 5, timeUnit = TimeUnit.NANOSECONDS)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public class MyBenchmark {

  private final int totalSize = 32_768;
  private final int filterValue = 1_280;
  private final int loopCount = 10_000;
  // private Random rnd;

  private int[] array;

  @Setup
  public void setup() {
    array = IntStream.range(0, totalSize).toArray();

    // rnd = new Random(0);
    // array = rnd.ints(totalSize).map(i -> i % 2560).toArray();
  }

  @Benchmark
  public long conditionalOperatorTime() {
    long sum = 0;
    for (int j = 0; j < loopCount; j++) {
      for (int c = 0; c < totalSize; ++c) {
        sum += array[c] >= filterValue ? array[c] : 0;
      }
    }
    return sum;
  }

  @Benchmark
  public long branchStatementTime() {
    long sum = 0;
    for (int j = 0; j < loopCount; j++) {
      for (int c = 0; c < totalSize; ++c) {
        if (array[c] >= filterValue) {
          sum += array[c];
        }
      }
    }
    return sum;
  }

  @Benchmark
  public long streamsTime() {
    long sum = 0;
    for (int j = 0; j < loopCount; j++) {
      sum += IntStream.of(array).filter(value -> value >= filterValue).sum();
    }
    return sum;
  }

  @Benchmark
  public long parallelStreamsTime() {
    long sum = 0;
    for (int j = 0; j < loopCount; j++) {
      sum += IntStream.of(array).parallel().filter(value -> value >= filterValue).sum();
    }
    return sum;
  }
}

The results for a sorted array:

Benchmark                            Mode  Cnt           Score           Error  Units
MyBenchmark.branchStatementTime      avgt   30   119833793,881 ±   1345228,723  ns/op
MyBenchmark.conditionalOperatorTime  avgt   30   118146194,368 ±   1748693,962  ns/op
MyBenchmark.parallelStreamsTime      avgt   30   499436897,422 ±   7344346,333  ns/op
MyBenchmark.streamsTime              avgt   30  1126768177,407 ± 198712604,716  ns/op

Results for unsorted data:

Benchmark                            Mode  Cnt           Score           Error  Units
MyBenchmark.branchStatementTime      avgt   30   534932594,083 ±   3622551,550  ns/op
MyBenchmark.conditionalOperatorTime  avgt   30   530641033,317 ±   8849037,036  ns/op
MyBenchmark.parallelStreamsTime      avgt   30   489184423,406 ±   5716369,132  ns/op
MyBenchmark.streamsTime              avgt   30  1232020250,900 ± 185772971,366  ns/op

I only can say that there are many possibilities of JVM optimizations and maybe branch-prediction is also involved. Now it is up to you to interpret the benchmark results.

0 讨论(0)

查看其它5个回答