Java8 Stream over a set consistency of the order

夙愿已清 提交于 2019-11-30 15:24:20

Let's start with an example here. First the obvious one I think:

List<String> wordList = Arrays.asList("just", "a", "test");

    Set<String> wordSet = new HashSet<>(wordList);

    System.out.println(wordSet);

    for (int i = 0; i < 100; i++) {
        wordSet.add("" + i);
    }

    for (int i = 0; i < 100; i++) {
        wordSet.remove("" + i);
    }

    System.out.println(wordSet);

Output will show a different "order" - because we have made the capacity bigger (via 1-100 addition) and entries have moved. They are still 3 there - but in different order (if such can be called order).

So, yes, once you modify your Set between stream operations, the "order" could change.

Since you say that post creation the Set will not be modified - the order is preserved at the moment, under the current implementation (whatever that is). Or more accurately it is not internally randomized - once entries are laid into the Set.

But this is absolutely something not to rely one - ever. Things can change without notice, since the contract is allowed to do that - the docs don't make any guarantees about any order what-so-ever - Sets are about uniqueness after all.

To give you an example the jdk-9 Immutable Set and Map do have an internal randomization and the "order" will change from run to run:

Set<String> set = Set.of("just", "a", "test");
System.out.println(set);

This is allowed to print:

 [a, test, just] or [a, just, test]

EDIT

Here is how the randomization pattern looks like:

/**
 * A "salt" value used for randomizing iteration order. This is initialized once
 * and stays constant for the lifetime of the JVM. It need not be truly random, but
 * it needs to vary sufficiently from one run to the next so that iteration order
 * will vary between JVM runs.
 */
static final int SALT;
static {
    long nt = System.nanoTime();
    SALT = (int)((nt >>> 32) ^ nt);
}

What this does:

take a long, XOR the first 32 bits with the last 32 bits and take the last 32 bits from that long (by casting to int). XOR is used because it has a 50% zeroes and ones distribution, so it does not alter the result.

How is that used(for a Set of two elements for example):

// based on SALT set the elements in a particular iteration "order"
if (SALT >= 0) {
   this.e0 = e0;
   this.e1 = e1;
} else {
   this.e0 = e1;
   this.e1 = e0;

My guess on the jdk9 internal randomization part, initially taken from here, the relevant part:

The final safety feature is the randomized iteration order of the immutable Set elements and Map keys. HashSet and HashMap iteration order has always been unspecified, but fairly stable, leading to code having inadvertent dependencies on that order. This causes things to break when the iteration order changes, which occasionally happens. The new Set/Map collections change their iteration order from run to run, hopefully flushing out order dependencies earlier in test or development

So it's basically to break all that code that would rely on order for a Set/Map. The same thing happened when people moved from java-7 to java-8 and were relying on HashMap's order (LinkedNodes), that was different due to TreeNodes introduction. If you leave a feature like that and people rely on it for years - it's hard to remove it and perform some optimizations - like HashMap moved to TreeNodes; because now you are forced to preserve that order, even if you don't want to. But that is just a guess obviously, treat it as such please

There are two aspects here. As Eugene correctly pointed out, you can’t assume that a HashSet’s iteration order stays the same—there is no such guaranty.

But the other aspect is the Stream implementation which is not required to maintain the iteration order when the Spliterator doesn’t report an ORDERED characteristic.

In other words, if a stream is unordered, skip(1) is not required to skip the first element, as there is no “first” element, but just to skip one element.

While streams are unlikely to implement randomization, they try to exploit characteristics to minimize the work. A plausible scenario would be that a Stream implementation will treat skip(n) for an unordered but SIZED source just like limit(size-n) as that will also effectively skip n elements, with less work.

Such optimization might not happen today, but in the next version, breaking your batch processing scenario, even in the case that the HashSet’s iteration order does not change.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!