Priority Queue with a find function - Fastest Implementation

问题

I am looking at implementing a priority queue with an added requirement, a find/search function which will tell whether an item is anywhere within the queue. So the functions will be: insert, del-min and find.

I am unsure whether I should use a Heap or a Self-balancing binary search tree. It appears PQs are usually implemented with a Heap, but I am wondering if there is any advantage in using a binary search tree since I also need that find function.

Furthermore, on average I'll be doing more inserts than deletes. I am also considering a d-ary heap. Basically, every second counts.

Thanks!

回答1:

Why can't you just use a Priority Queue and a Set? When you enqueue something, you add it to the set. When you dequeue it, you remove it from the set. That way the set will tell you if something is in the queue.

回答2:

If your find operation is relatively infrequent (and your heap fairly small), I'd just do a linear search. If it is relatively frequent, or the heap is enormous, consider tracking heap membership (to do your 'find' test) with a separate data structure or an object flag. The joy of external indexing is being able to put your object in as many containers as you like.

If by 'find' you really mean 'find and modify' (I find I often need to delete things from priority queues independently of the typical insert/del-min), here are three approaches I've used:

Given a high rate of insert/del-min (100k/s continuous) and a low rate of find-delete (say 1/s) over a fairly small working set (500-1000) I did a linear search for the element and then deleted it from the tree in the standard way.

Given a high rate of insert/del-min plus fairly frequent find-deletes I simply marked the deleted objects as "uninteresting" after finding them indirectly. The actual free was deferred until the object was dequeued as normal.

Given a small std::priority_queue (which has no access methods outside of insert/del-min) of only a few elements and fairly infrequent deletions, I just copied the entire queue to a temporary std::vector and copied the modified/desired part back into the queue. Then I cried myself to sleep.

回答3:

If you need the benefits of more than one data structure then you can use them in composition. For example, if you need the benefits of a priority queue and a binary search tree then make your desired actions on both of them.

If it's insert then insert the element to both of them.

If it's find then you can find the element using the binary search tree and if it was found then continue on to find it in the priority queue.

If it's min then remove it first from the priority queue and now that you know which element it is then you can remove it from the binary search tree.

if it's del then first find it in the binary search tree and remove it then continue to find it in the priority queue and remove it from there too.

It is assumed that the nodes of the binary tree and the nodes of the priority queue are pointers to your elements.

回答4:

IIRC search/find on a heap is O(n) whereas on a tree it is O(log(n)) and the other standard PQ operations are the same.

Heaps are only empirically more efficient by some constant factor, so if its a big queue a tree should be better, if its small you need to test and profile. its all good to know in theory whats faster, but if those constant factors are large it may be completely irrelevant for sufficiently small data sets.

回答5:

Radix trees with a min-heap property will provide the properties you need. This will actually give you constant time complexities for your operations. For example, if we look at this Haskell implementation, all three operations you mention have time complexity O(min(n,W)). Where n is the number of elements, and W is the number of bits in an int (32 or 64).

回答6:

Store your data in the fastest container you've tested and use a bloom filter to test if something is in the container.

I mated a bloom filter with a hash table in a previous project and it sped things up 400 times on hash tables with an average of roughly 10k items.

The bloom filter has a few interesting properties:

If the answer is no from a bloom filter, it's 100% reliable.
If the answer is yes, you have to check the other data structure to make sure the item is actually present.
Make sure you pick a good hash function :)

来源：https://stackoverflow.com/questions/3974292/priority-queue-with-a-find-function-fastest-implementation

标签

optimization

types

heap

binary-tree

priority-queue