Why does a HashSet sort single alphabetic characters?

感情迁移 提交于 2020-01-03 17:47:26

问题


So what I know is that a HashSet has no real sorting capabilities like a SortedSet, however I stumbled upon this :

When I run the following code :

 public static void main(String[] args) {
    Set<String> collection = new HashSet<String>(2000);
    String[] data = {"a", "c", "g", "f", "b", "f", "b", "d","q","r","d","m"};
    for(String input: data)
    {
        collection.add(input);
    }
    System.out.println("Output: " + collection);
}

I get the following output : Output: [a, b, c, d, f, g, m, q, r]

Which is alphabetically sorted. Why is that? Since a HashSet is not a sorted set.

So I tried with a string of characters instead of a single character :

public static void main(String[] args) {
    Set<String> collection = new HashSet<String>(2000);
    String[] data = {"atjre", "crj", "gertj", "fertj", "berj"};
    for(String input: data)
    {
        collection.add(input);
    }
    System.out.println("Output: " + collection);
}

And i get the following output : Output: [crj, atjre, fertj, gertj, berj]

Now they are not sorted anymore, any explanations for this? Or is this just a random coincidence?


回答1:


HashSet implements Set interface. It means that there is no guarantee of order of elements.

This class implements the Set interface, backed by a hash table (actually a HashMap instance). It makes no guarantees as to the iteration order of the set; in particular, it does not guarantee that the order will remain constant over time. Source

Over the time after you adding, deleting few times you can see the difference.

However, "no guarantee of ordering" does not imply "guaranteed random ordering". Exact answer of your question is,

The hashcode-method of the String class also comes into play here, for single character Strings the hashcode will just be the int value of the one char in the String. And since char's int values are ordered alphabetically, so will the computed hashes of single char Strings.




回答2:


As per the Java docs: https://docs.oracle.com/javase/7/docs/api/java/util/HashSet.html

It makes no guarantees as to the iteration order of the set; in particular, it does not guarantee that the order will remain constant over time.

What I think you are experiencing here is a hash-function distribution anomaly. The hash-function is used internally to give your strings an integer index. For 1-long strings there isn't much complexity. As you make your strings longer, your hash function has more to work with.

This stems back to the whole idea of a hash function: take a set of possible values, and map them as evenly as possible to a set of smaller values. It just so happens that the hash function has those strings mapped as it does. You would probably see the same thing with consecutive numbers. And you start to see them un-ordered once more data is introduced.



来源:https://stackoverflow.com/questions/50800499/why-does-a-hashset-sort-single-alphabetic-characters

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!