Data structure: insert, remove, contains, get random element, all at O(1)

后端 未结 14 693
别跟我提以往
别跟我提以往 2020-11-29 14:53

I was given this problem in an interview. How would you have answered?

Design a data structure that offers the following operations in O(1) time:

  • inse
14条回答
  •  清酒与你
    2020-11-29 15:12

    You might not like this, because they're probably looking for a clever solution, but sometimes it pays to stick to your guns... A hash table already satisfies the requirements - probably better overall than anything else will (albeit obviously in amortised constant time, and with different compromises to other solutions).

    The requirement that's tricky is the "random element" selection: in a hash table, you would need to scan or probe for such an element.

    For closed hashing / open addressing, the chance of any given bucket being occupied is size() / capacity(), but crucially this is typically kept in a constant multiplicative range by a hash-table implementation (e.g. the table may be kept larger than its current contents by say 1.2x to ~10x depending on performance/memory tuning). This means on average we can expect to search 1.2 to 10 buckets - totally independent of the total size of the container; amortised O(1).

    I can imagine two simple approaches (and a great many more fiddly ones):

    • search linearly from a random bucket

      • consider empty/value-holding buckets ala "--AC-----B--D": you can say that the first "random" selection is fair even though it favours B, because B had no more probability of being favoured than the other elements, but if you're doing repeated "random" selections using the same values then clearly having B repeatedly favoured may be undesirable (nothing in the question demands even probabilities though)
    • try random buckets repeatedly until you find a populated one

      • "only" capacity() / size() average buckets visited (as above) - but in practical terms more expensive because random number generation is relatively expensive, and infinitely bad if infinitely improbable worst-case behaviour...
        • a faster compromise would be to use a list of pre-generated random offsets from the initial randomly selected bucket, %-ing them into the bucket count

    Not a great solution, but may still be a better overall compromise than the memory and performance overheads of maintaining a second index array at all times.

提交回复
热议问题