Java8 Stream over a set consistency of the order

From what i understand, Set in java is an unordered collection and an iterator will process the items in some certain order of its choice(I might be wrong here) but ensures it process all the elements in the set.

In Java8, stream() API in Collections has been introduced with skip and limit functionality. So i want to know if the order of the items processed from stream remain same irrespective of how many times i start a stream or will it be random everytime ? Will the order change if the set gets modified in between the streams ?

May be irrelevant but i am providing the problem here : Now coming to the problem, I have a Set of size 2000 or something which wont be modified post creation and i am doing a batch operation for 50 involving a network call for each batch. I have a start param which gets incremented by 50 post every batch call. If i use a stream over my Set with "start" as the skip argument for every batch, it would be a new stream for every batch right ? So is the order of the stream gaurenteed to remain same. Obviously i dont wont the same entry multiple times and more importantly i dont wont to miss out on any entry. Simplest thing is for me to an Arraylist but i want to know if it is really required for me to create a set.

Let's start with an example here. First the obvious one I think:

List<String> wordList = Arrays.asList("just", "a", "test");

    Set<String> wordSet = new HashSet<>(wordList);

    System.out.println(wordSet);

    for (int i = 0; i < 100; i++) {
        wordSet.add("" + i);
    }

    for (int i = 0; i < 100; i++) {
        wordSet.remove("" + i);
    }

    System.out.println(wordSet);

Output will show a different "order" - because we have made the capacity bigger (via 1-100 addition) and entries have moved. They are still 3 there - but in different order (if such can be called order).

So, yes, once you modify your Set between stream operations, the "order" could change.

Since you say that post creation the Set will not be modified - the order is preserved at the moment, under the current implementation (whatever that is). Or more accurately it is not internally randomized - once entries are laid into the Set.

But this is absolutely something not to rely one - ever. Things can change without notice, since the contract is allowed to do that - the docs don't make any guarantees about any order what-so-ever - Sets are about uniqueness after all.

To give you an example the jdk-9 Immutable Set and Map do have an internal randomization and the "order" will change from run to run:

Set<String> set = Set.of("just", "a", "test");
System.out.println(set);

This is allowed to print:

 [a, test, just] or [a, just, test]

EDIT

Here is how the randomization pattern looks like:

/**
 * A "salt" value used for randomizing iteration order. This is initialized once
 * and stays constant for the lifetime of the JVM. It need not be truly random, but
 * it needs to vary sufficiently from one run to the next so that iteration order
 * will vary between JVM runs.
 */
static final int SALT;
static {
    long nt = System.nanoTime();
    SALT = (int)((nt >>> 32) ^ nt);
}

What this does:

take a long, XOR the first 32 bits with the last 32 bits and take the last 32 bits from that long (by casting to int). XOR is used because it has a 50% zeroes and ones distribution, so it does not alter the result.

How is that used(for a Set of two elements for example):

// based on SALT set the elements in a particular iteration "order"
if (SALT >= 0) {
   this.e0 = e0;
   this.e1 = e1;
} else {
   this.e0 = e1;
   this.e1 = e0;

My guess on the jdk9 internal randomization part, initially taken from here, the relevant part:

The final safety feature is the randomized iteration order of the immutable Set elements and Map keys. HashSet and HashMap iteration order has always been unspecified, but fairly stable, leading to code having inadvertent dependencies on that order. This causes things to break when the iteration order changes, which occasionally happens. The new Set/Map collections change their iteration order from run to run, hopefully flushing out order dependencies earlier in test or development

So it's basically to break all that code that would rely on order for a Set/Map. The same thing happened when people moved from java-7 to java-8 and were relying on HashMap's order (LinkedNodes), that was different due to TreeNodes introduction. If you leave a feature like that and people rely on it for years - it's hard to remove it and perform some optimizations - like HashMap moved to TreeNodes; because now you are forced to preserve that order, even if you don't want to. But that is just a guess obviously, treat it as such please

There are two aspects here. As Eugene correctly pointed out, you can’t assume that a HashSet’s iteration order stays the same—there is no such guaranty.

But the other aspect is the Stream implementation which is not required to maintain the iteration order when the Spliterator doesn’t report an ORDERED characteristic.

In other words, if a stream is unordered, skip(1) is not required to skip the first element, as there is no “first” element, but just to skip one element.

While streams are unlikely to implement randomization, they try to exploit characteristics to minimize the work. A plausible scenario would be that a Stream implementation will treat skip(n) for an unordered but SIZED source just like limit(size-n) as that will also effectively skip n elements, with less work.

Such optimization might not happen today, but in the next version, breaking your batch processing scenario, even in the case that the HashSet’s iteration order does not change.

来源：https://stackoverflow.com/questions/45216891/java8-stream-over-a-set-consistency-of-the-order

标签

lambda

collections

java-8

hashset