Performance for Java Stream.concat VS Collection.addAll

问题

For the purpose of combining two sets of data in a stream.

Stream.concat(stream1, stream2).collect(Collectors.toSet());

stream1.collect(Collectors.toSet())
       .addAll(stream2.collect(Collectors.toSet()));

Which is more efficient and why?

回答1:

For the sake of readability and intention, Stream.concat(a, b).collect(toSet()) is way clearer than the second alternative.

For the sake of the question, which is "what is the most efficient", here a JMH test (I'd like to say that I don't use JMH that much, there might be some room to improve my benchmark test):

Using JMH, with the following code:

package stackoverflow;

import java.util.HashSet;
import java.util.Set;
import java.util.concurrent.TimeUnit;
import java.util.stream.Collectors;
import java.util.stream.Stream;

import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.Measurement;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.OutputTimeUnit;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.Setup;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.annotations.Warmup;
import org.openjdk.jmh.infra.Blackhole;

@State(Scope.Benchmark)
@Warmup(iterations = 2)
@Fork(1)
@Measurement(iterations = 10)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@BenchmarkMode({ Mode.AverageTime})
public class StreamBenchmark {
  private Set<String> s1;
  private Set<String> s2;

  @Setup
  public void setUp() {
    final Set<String> valuesForA = new HashSet<>();
    final Set<String> valuesForB = new HashSet<>();
    for (int i = 0; i < 1000; ++i) {
      valuesForA.add(Integer.toString(i));
      valuesForB.add(Integer.toString(1000 + i));
    }
    s1 = valuesForA;
    s2 = valuesForB;
  }

  @Benchmark
  public void stream_concat_then_collect_using_toSet(final Blackhole blackhole) {
    final Set<String> set = Stream.concat(s1.stream(), s2.stream()).collect(Collectors.toSet());
    blackhole.consume(set);
  }

  @Benchmark
  public void s1_collect_using_toSet_then_addAll_using_toSet(final Blackhole blackhole) {
    final Set<String> set = s1.stream().collect(Collectors.toSet());
    set.addAll(s2.stream().collect(Collectors.toSet()));
    blackhole.consume(set);
  }
}

You get these result (I omitted some part for readability).

Result "s1_collect_using_toSet_then_addAll_using_toSet":
  156969,172 ±(99.9%) 4463,129 ns/op [Average]
  (min, avg, max) = (152842,561, 156969,172, 161444,532), stdev = 2952,084
  CI (99.9%): [152506,043, 161432,301] (assumes normal distribution)

Result "stream_concat_then_collect_using_toSet":
  104254,566 ±(99.9%) 4318,123 ns/op [Average]
  (min, avg, max) = (102086,234, 104254,566, 111731,085), stdev = 2856,171
  CI (99.9%): [99936,443, 108572,689] (assumes normal distribution)
# Run complete. Total time: 00:00:25

Benchmark                                                       Mode  Cnt       Score      Error  Units
StreamBenchmark.s1_collect_using_toSet_then_addAll_using_toSet  avgt   10  156969,172 ± 4463,129  ns/op
StreamBenchmark.stream_concat_then_collect_using_toSet          avgt   10  104254,566 ± 4318,123  ns/op

The version using Stream.concat(a, b).collect(toSet()) should perform faster (if I read well the JMH numbers).

On the other hand, I think this result is normal because you don't create an intermediate set (this has some cost, even with HashSet), and as said in comment of first answer, the Stream is lazily concatenated.

Using a profiler you might see in which part it is slower. You might also want to use toCollection(() -> new HashSet(1000)) instead of toSet() to see if the problem lies in growing the HashSet internal hash array.

回答2:

First of all, it must be emphasized that the second variant is incorrect. The toSet() collector returns a Set with “no guarantees on the type, mutability, serializability, or thread-safety”. If mutability is not guaranteed, it is not correct to invoke addAll on the resulting Set.

It happens to work with the current version of the reference implementation, where a HashSet will be created, but might stop working in a future version or alternative implementations. In order to fix this, you have to replace toSet() with toCollection(HashSet::new) for the first Stream’s collect operation.

This leads to the situation that the second variant is not only less efficient with the current implementation, as shown in this answer, it might also prevent future optimizations made to the toSet() collector, by insisting on the result being of the exact type HashSet. Also, unlike the toSet() collector, the toCollection(…) collector has no way of detecting that the target collection is unordered, which might have a performance relevance in future implementations.

回答3:

Your question is known as premature optimization. Never choose one syntax over the other just because you think it is faster. Always use the syntax that best expresses your intent and supports understanding your logic.

You know nothing about the task i am working on – alan7678

Thats true.

But I don't need to.

There are two general scenarios:

You develop an OLTP application. In this case the application should respond within a second or less. The user will not experience the performance difference between the variants you presented.
You develop some kind of batch processing which will run for a while unattended. In this case the performance difference "could" be important, but only if you are charged for the time your batch process runs.

Either way: Real performance problems (where you speed up you application by multiples, not by fractions) are usually caused by the logic you implemented (e.g.: excessive communication, "hidden loops" or excessive object creation).
These problems usually cannot be solved or prevented by choosing a certain syntax.

If you omit readability for a performance gain you make you application harder to maintain.
And changing a hard to maintain code base easily burns a multiple amount of the money that could be saved because of the programs higher speed during the lifetime of the application by using a less readable but slightly faster syntax.

and without a doubt this question will matter in some cases for other people as well. – alan7678

No doubt, people are curious.

Luckily for me syntax i prefer seems to perform better as well. – alan7678

If you know, why did you ask?

And would you please be so kind to share you measurement results along with your measuring setup?

And more important: will that be valid with Java9 or Java10?

Javas performance comes basically from the JVM implementation and this is subject to change. Of cause there is a better chance for newer syntax constructs (as java streams) that new java versions will bring performance gains. But there is no guarantee...

In my case the need for performance is greater than the difference in readibility. – alan7678

Will you still be responsible for this application in 5 years? Or are you a consultant being payed to start of a project and then switching to the next?

I never had a project where I could solve my performance problems at the syntax level.
But I constantly work with legacy code that exist 10+ years and that is hard to maintain because someone did not honor readability.

So your non-answer does not apply to me. – alan7678

It's a free world, take you pick.

回答4:

Use either.

If you profile your app and this section of code is a bottleneck, then consider profiling your app with different implementations and using the one that works best

回答5:

It is impossible to tell up front without a benchmark, but think about it: if there are many duplicates then Stream.concat(stream1, stream2) must create a large object that must be instantiated because you are callig .collect().

Then .toSet() must compare each occurence against every previous one, probably with a fast hashing function, but still might have a lot of elements.

On the other side, stream1.collect(Collectors.toSet()) .addAll(stream2.collect(Collectors.toSet())) will create two smaller sets and then merge them.

The memory footprint of this second option is potentially less than the first one.

Edit:

I revisited this after reading @NoDataFound benchmark. On a more sophisticated version of the test, indeed Stream.concat seems to perform consistentlly faster that Collection.addAll. I tried to take into account how many distinct elements are there and how big are the initial streams. I also took out of the measure the time required to create the input streams from sets (which is negligible anyway). Here is a sample of the times I get with the code below.

Concat-collect   10000 elements, all distinct: 7205462 nanos
Collect-addAll   10000 elements, all distinct: 12130107 nanos

Concat-collect  100000 elements, all distinct: 78184055 nanos
Collect-addAll  100000 elements, all distinct: 115191392 nanos

Concat-collect 1000000 elements, all distinct: 555265307 nanos
Collect-addAll 1000000 elements, all distinct: 1370210449 nanos

Concat-collect 5000000 elements, all distinct: 9905958478 nanos
Collect-addAll 5000000 elements, all distinct: 27658964935 nanos

Concat-collect   10000 elements, 50% distinct: 3242675 nanos
Collect-addAll   10000 elements, 50% distinct: 5088973 nanos

Concat-collect  100000 elements, 50% distinct: 389537724 nanos
Collect-addAll  100000 elements, 50% distinct: 48777589 nanos

Concat-collect 1000000 elements, 50% distinct: 427842288 nanos
Collect-addAll 1000000 elements, 50% distinct: 1009179744 nanos

Concat-collect 5000000 elements, 50% distinct: 3317183292 nanos
Collect-addAll 5000000 elements, 50% distinct: 4306235069 nanos

Concat-collect   10000 elements, 10% distinct: 2310440 nanos
Collect-addAll   10000 elements, 10% distinct: 2915999 nanos

Concat-collect  100000 elements, 10% distinct: 68601002 nanos
Collect-addAll  100000 elements, 10% distinct: 40163898 nanos

Concat-collect 1000000 elements, 10% distinct: 315481571 nanos
Collect-addAll 1000000 elements, 10% distinct: 494875870 nanos

Concat-collect 5000000 elements, 10% distinct: 1766480800 nanos
Collect-addAll 5000000 elements, 10% distinct: 2721430964 nanos

Concat-collect   10000 elements,  1% distinct: 2097922 nanos
Collect-addAll   10000 elements,  1% distinct: 2086072 nanos

Concat-collect  100000 elements,  1% distinct: 32300739 nanos
Collect-addAll  100000 elements,  1% distinct: 32773570 nanos

Concat-collect 1000000 elements,  1% distinct: 382380451 nanos
Collect-addAll 1000000 elements,  1% distinct: 514534562 nanos

Concat-collect 5000000 elements,  1% distinct: 2468393302 nanos
Collect-addAll 5000000 elements,  1% distinct: 6619280189 nanos

Code

import java.util.HashSet;
import java.util.Random;
import java.util.Set;
import java.util.stream.Collectors;
import java.util.stream.Stream;

public class StreamBenchmark {
    private Set<String> s1;
    private Set<String> s2;

    private long createStreamsTime;
    private long concatCollectTime;
    private long collectAddAllTime;

    public void setUp(final int howMany, final int distinct) {
        final Set<String> valuesForA = new HashSet<>(howMany);
        final Set<String> valuesForB = new HashSet<>(howMany);
        if (-1 == distinct) {
            for (int i = 0; i < howMany; ++i) {
                valuesForA.add(Integer.toString(i));
                valuesForB.add(Integer.toString(howMany + i));
            }
        } else {
            Random r = new Random();
            for (int i = 0; i < howMany; ++i) {
                int j = r.nextInt(distinct);
                valuesForA.add(Integer.toString(i));
                valuesForB.add(Integer.toString(distinct + j));
            }
        }
        s1 = valuesForA;
        s2 = valuesForB;
    }

    public void run(final int streamLength, final int distinctElements, final int times, boolean discard) {
        long startTime;
        setUp(streamLength, distinctElements);
        createStreamsTime = 0l;
        concatCollectTime = 0l;
        collectAddAllTime = 0l;
        for (int r = 0; r < times; r++) {
            startTime = System.nanoTime();
            Stream<String> st1 = s1.stream();
            Stream<String> st2 = s2.stream();
            createStreamsTime += System.nanoTime() - startTime;
            startTime = System.nanoTime();
            Set<String> set1 = Stream.concat(st1, st2).collect(Collectors.toSet());
            concatCollectTime += System.nanoTime() - startTime;
            st1 = s1.stream();
            st2 = s2.stream();
            startTime = System.nanoTime();
            Set<String> set2 = st1.collect(Collectors.toSet());
            set2.addAll(st2.collect(Collectors.toSet()));
            collectAddAllTime += System.nanoTime() - startTime;
        }
        if (!discard) {
            // System.out.println("Create streams "+streamLength+" elements,
            // "+distinctElements+" distinct: "+createStreamsTime+" nanos");
            System.out.println("Concat-collect " + streamLength + " elements, " + (distinctElements == -1 ? "all" : String.valueOf(100 * distinctElements / streamLength) + "%") + " distinct: " + concatCollectTime + " nanos");
            System.out.println("Collect-addAll " + streamLength + " elements, " + (distinctElements == -1 ? "all" : String.valueOf(100 * distinctElements / streamLength) + "%") + " distinct: " + collectAddAllTime + " nanos");
            System.out.println("");
        }
    }

    public static void main(String args[]) {
        StreamBenchmark test = new StreamBenchmark();
        final int times = 5;
        test.run(100000, -1, 1, true);
        test.run(10000, -1, times, false);
        test.run(100000, -1, times, false);
        test.run(1000000, -1, times, false);
        test.run(5000000, -1, times, false);
        test.run(10000, 5000, times, false);
        test.run(100000, 50000, times, false);
        test.run(1000000, 500000, times, false);
        test.run(5000000, 2500000, times, false);
        test.run(10000, 1000, times, false);
        test.run(100000, 10000, times, false);
        test.run(1000000, 100000, times, false);
        test.run(5000000, 500000, times, false);
        test.run(10000, 100, times, false);
        test.run(100000, 1000, times, false);
        test.run(1000000, 10000, times, false);
        test.run(5000000, 50000, times, false);
    }
}

来源：https://stackoverflow.com/questions/41622027/performance-for-java-stream-concat-vs-collection-addall

标签

java

java-8

java-stream