Which is faster, Hash lookup or Binary search?

后端 未结 17 2570
你的背包
你的背包 2020-12-02 06:47

When given a static set of objects (static in the sense that once loaded it seldom if ever changes) into which repeated concurrent lookups are needed with optimal performanc

17条回答
  •  猫巷女王i
    2020-12-02 07:10

    The only reasonable answer to this question is: It depends. It depends on the size of your data, the shape of your data, your hash implementation, your binary search implementation, and where your data lives (even though it's not mentioned in the question). A couple other answers say as much, so I could just delete this. However, it might be nice to share what I've learned from feedback to my original answer.

    1. I wrote, "Hash algorithms are O(1) while binary search is O(log n)." - As noted in the comments, Big O notation estimates complexity, not speed. This is absolutely true. It's worth noting that we usually use complexity to get a sense of an algorithm's time and space requirements. So, while it's foolish to assume complexity is strictly the same as speed, estimating complexity without time or space in the back of your mind is unusual. My recommendation: avoid Big O notation.
    2. I wrote, "So as n approaches infinity..." - This is about the dumbest thing I could have included in an answer. Infinity has nothing to do with your problem. You mention an upper bound of 10 million. Ignore infinity. As the commenters point out, very large numbers will create all sorts of problems with a hash. (Very large numbers don't make binary search a walk in the park either.) My recommendation: don't mention infinity unless you mean infinity.
    3. Also from the comments: beware default string hashes (Are you hashing strings? You don't mention.), database indexes are often b-trees (food for thought). My recommendation: consider all your options. Consider other data structures and approaches... like an old-fashioned trie (for storing and retrieving strings) or an R-tree (for spatial data) or a MA-FSA (Minimal Acyclic Finite State Automaton - small storage footprint).

    Given the comments, you might assume that people who use hash tables are deranged. Are hash tables reckless and dangerous? Are these people insane?

    Turns out they're not. Just as binary trees are good at certain things (in-order data traversal, storage efficiency), hash tables have their moment to shine as well. In particular, they can be very good at reducing the number of reads required to fetch your data. A hash algorithm can generate a location and jump straight to it in memory or on disk while binary search reads data during each comparison to decide what to read next. Each read has the potential for a cache miss which is an order of magnitude (or more) slower than a CPU instruction.

    That's not to say hash tables are better than binary search. They're not. It's also not to suggest that all hash and binary search implementations are the same. They're not. If I have a point, it's this: both approaches exist for a reason. It's up to you to decide which is best for your needs.

    Original answer:


    Hash algorithms are O(1) while binary search is O(log n). So as n approaches infinity, hash performance improves relative to binary search. Your mileage will vary depending on n, your hash implementation, and your binary search implementation.

    Interesting discussion on O(1). Paraphrased:

    O(1) doesn't mean instantaneous. It means that the performance doesn't change as the size of n grows. You can design a hashing algorithm that's so slow no one would ever use it and it would still be O(1). I'm fairly sure .NET/C# doesn't suffer from cost-prohibitive hashing, however ;)

提交回复
热议问题