Data structure: insert, remove, contains, get random element, all at O(1)

后端未结

关注

 14  693

别跟我提以往 2020-11-29 14:53

I was given this problem in an interview. How would you have answered?

Design a data structure that offers the following operations in O(1) time:

inse

14条回答

清酒与你 (楼主)

2020-11-29 15:12
You might not like this, because they're probably looking for a clever solution, but sometimes it pays to stick to your guns... A hash table already satisfies the requirements - probably better overall than anything else will (albeit obviously in amortised constant time, and with different compromises to other solutions).

The requirement that's tricky is the "random element" selection: in a hash table, you would need to scan or probe for such an element.

For closed hashing / open addressing, the chance of any given bucket being occupied is size() / capacity(), but crucially this is typically kept in a constant multiplicative range by a hash-table implementation (e.g. the table may be kept larger than its current contents by say 1.2x to ~10x depending on performance/memory tuning). This means on average we can expect to search 1.2 to 10 buckets - totally independent of the total size of the container; amortised O(1).

I can imagine two simple approaches (and a great many more fiddly ones):
- search linearly from a random bucket
  - consider empty/value-holding buckets ala "--AC-----B--D": you can say that the first "random" selection is fair even though it favours B, because B had no more probability of being favoured than the other elements, but if you're doing repeated "random" selections using the same values then clearly having B repeatedly favoured may be undesirable (nothing in the question demands even probabilities though)
- try random buckets repeatedly until you find a populated one
  - "only" capacity() / size() average buckets visited (as above) - but in practical terms more expensive because random number generation is relatively expensive, and infinitely bad if infinitely improbable worst-case behaviour...
    - a faster compromise would be to use a list of pre-generated random offsets from the initial randomly selected bucket, %-ing them into the bucket count
Not a great solution, but may still be a better overall compromise than the memory and performance overheads of maintaining a second index array at all times.
0 讨论(0)

查看其它14个回答
发布评论:

提交评论
- 加载中...