What causes the slightly unpredictable ordering of the iterator() for the java.util.HashSet and HashMap.keySet() classes?

前端 未结 4 1797
情歌与酒
情歌与酒 2020-12-15 10:45

Six years ago, I burned several days trying to hunt down where my perfectly deterministic framework was responding randomly. After meticulously chasing the entire framework

4条回答
  •  孤街浪徒
    2020-12-15 11:27

    Short Answer

    There's a tradeoff. If you want amortized constant time O(1) access to elements, the techniques to date rely upon a randomized scheme like hashing. If you want ordered access to elements, the best engineering tradeoff gives you only O(ln(n)) performance. For your case, perhaps this doesn't matter, but the difference between constant time and logarithmic time makes a very big difference starting even with relatively small structures.

    So yes, you can go look at the code and inspect carefully, but it boils down to a rather practical theoretical fact. Now is a good time to brush the dust off that copy of Cormen (or Googly Bookiness here) that's propping up the drooping corner of your house's foundation and take a look at Chapters 11 (Hash Tables) and 13 (Red-Black Trees). These will fill you in on the JDK's implementation of HashMap and TreeMap, respectively.

    Long Answer

    You don't want a Map or Set to return ordered lists of keys/members. That's not what they're for. Maps and Sets structures are not ordered just like the underlying mathematical concepts, and they provide different performance. The objective of these data structures (as @thejh points out) is efficient amortized insert, contains, and get time, not maintaining ordering. You can look into how a hashed data structure is maintained to know what the tradeoffs are. Take a look at the Wikipedia entries on Hash Functions and Hash Tables (ironically, note that the Wiki entry for "unordered map" redirects to the latter) or a computer science / data structures text.

    Remember: Don't depend on properties of ADTs (and specifically collections) such as ordering, immutability, thread safety or anything else unless you look carefully at what the contract is. Note that for Map, the Javadoc says clearly:

    The order of a map is defined as the order in which the iterators on the map's collection views return their elements. Some map implementations, like the TreeMap class, make specific guarantees as to their order; others, like the HashMap class, do not.

    And Set.iterator() has the similar:

    Returns an iterator over the elements in this set. The elements are returned in no particular order (unless this set is an instance of some class that provides a guarantee).

    If you want an ordered view of these, use one of the following approaches:

    • If it's just a Set, maybe you really want a SortedSet such as a TreeSet
    • Use a TreeMap, which allows either natural ordering of keys or a specific ordering via Comparator
    • Abstract your data structure, which probably is an application-specific thing anyway if this is the behavior you want, and maintain both a SortedSet of keys as well as a Map, which will perform better in amortized time.
    • Get the Map.keySet() (or just the Set you're interested in) and put it into a SortedSet such as TreeSet, either using the natural ordering or a specific Comparator.
    • Iterate over the Map.Entry using Map.entrySet().iterator(), after it has been sorted. E.g. for (final Map.Entry entry : new TreeSet(map.entrySet())) { } to efficiently access both keys and values.
    • If you are only doing this once and awhile, you could just get an array of values out of your structure and use Arrays.sort(), which has a different performance profile (space and time).

    Links to the Source

    If you would like to look at the source for j.u.HashSet and j.u.HashMap, they are available on GrepCode. Note that a HashSet is just sugar for a HashMap. Why not always use the sorted versions? Well, as I allude above, the performance differs and that matters in some applications. See the related SO question here. You can also see some concrete performance numbers at the bottom here (I haven't looked closely to verify these are accurate, but they happen to substantiate my point, so I'll blithely pass along the link. :-)

提交回复
热议问题