I\'ve already read this and this questions, but still doubt whether the observed behavior of Stream.skip
was intended by JDK authors.
Let\'s have simple
@Ruben, you probably don't understand my question. Roughly the problem is: why unordered().collect(toCollection(HashSet::new)) behaves differently than collect(toSet()). Of course I know that toSet() is unordered.
Probably, but, anyway, I will give it a second try.
Having a look at the Javadocs of Collectors toSet and toCollection we can see that toSet delivers an unordered collector
This is an {@link Collector.Characteristics#UNORDERED unordered} Collector.
i.e., a CollectorImpl with the UNORDERED Characteristic. Having a look at the Javadoc of Collector.Characteristics#UNORDERED we can read:
Indicates that the collection operation does not commit to preserving the encounter order of input elements
In the Javadocs of Collector we can also see:
For concurrent collectors, an implementation is free to (but not required to) implement reduction concurrently. A concurrent reduction is one where the accumulator function is called concurrently from multiple threads, using the same concurrently-modifiable result container, rather than keeping the result isolated during accumulation. A concurrent reduction should only be applied if the collector has the {@link Characteristics#UNORDERED} characteristics or if the originating data is unordered
This means to me that, if we set the UNORDERED characteristic, we do not care at all about the order in which the elements of the stream get passed to the accumulator, and, therefore, the elements can be extracted from the pipeline in any order.
Btw, you get the same behavior if you omit the unordered() in your example:
System.out.println("skip-toSet: "
+ input.parallelStream().filter(x -> x > 0)
.skip(1)
.collect(Collectors.toSet()));
Furthermore, the skip() method in Stream gives us a hint:
While {@code skip()} is generally a cheap operation on sequential stream pipelines, it can be quite expensive on ordered parallel pipelines
and
Using an unordered stream source (such as {@link #generate(Supplier)}) or removing the ordering constraint with {@link #unordered()} may result in significant speedups
When using
Collectors.toCollection(HashSet::new)
you are creating a normal "ordered" Collector (one without the UNORDERED characteristic), what to me means that you do care about the ordering, and, therefore, the elements are being extracted in order and you get the expected behavior.