What's the purpose of the extra std::list that boost::heap::d_ary_heap holds when configured for mutability?

问题

When configured for mutability, boost::heap::d_ary_heap uses a std::list in addition to the vector that holds the values of the heap nodes. I realize that the handles which are being provided for making the mutable_heap_interface work are in fact iterators of this list, but I'm wondering why such an expensive solution was chosen, and if there's a leaner way to achieve mutability with boost::heap::d_ary_heap.

Mutability requires a way to find the index of a node in the heap vector, given the node itself. Some kind of backward pointer needs to be maintained. Can't this be achieved by storing this backwards pointer in the node, and maintain it by the move/copy constructors/assignment-operators of the value type?

Is there a good reason why it needs to be as expensive as a doubly-linked list?

回答1:

This is kind of an answer to my own question that only speculates why the boost design is as it is, and presents a partial solution to what I would have liked to get with the boost data structure. I'm still interested in receiving further insight into the rationale behind the boost implementation, and of course also feedback on the solution I present below.

Let me first explain the piece of code below, before going on to discuss its merits and problems, and then comment on the boost.heap implementation, why it presumably is like it is, and why I don't like it.

The code below is based on the venerable std::priority_queue. It splits the node managed by the priority queue into a handle and a body. The handle goes into the heap at the core of the priority_queue, and therefore moves around in the underlying vector as entries are added or removed. The handle only contains the priority value and a pointer to the body, in order to make it cheap to move it around. The body is a potentially large object that remains stationary in memory. It holds a backpointer to the handle, because the handle must be invalidated when the body's priority changes, or the body disappears.

Since the handle moves around in the heap, the backpointer in the body must be updated each time the handle changes location. This is done in the move constructor and the move assignment operator of the handle. If a handle gets invalidated, both the pointer in it and the backpointer pointing at it are nulled.

#include <queue>

//! Priority queue that works with handles to managed objects.
template<typename Prio, typename Object> struct PriorityQueue {
    struct Entry;

    //! Each heap entry is a handle, consisting of a pointer to the managed object and a priority value.
    struct Entry {
        Object *obj_;
        Prio val_;

        Entry(Entry const &) =delete;
        Entry &operator=(Entry const &) =delete;

        ~Entry() {
            if(obj_)
                obj_->setLink(nullptr);
        }

        Entry(Object &obj, Prio val)
            : obj_{&obj}
            , val_{val}
        {
            if(obj_)
                obj_->setLink(this);
        }

        Entry(Entry &&v)
            : obj_{v.obj_}
            , val_{v.val_}
        {
            if(obj_)
                obj_->setLink(this);
            v.obj_ = nullptr;
        }

        Entry &operator=(Entry &&v) {
            if(&v != this) {
                val_ = v.val_;
                if(obj_)
                    obj_->setLink(nullptr);
                obj_ = v.obj_;
                if(obj_)
                    obj_->setLink(this);
                v.obj_ = nullptr;
            }
            return *this;
        }

        friend bool operator<(Entry const &a, Entry const &b) {
            return a.val_ < b.val_;
        }

    };

    Prio add(Object &obj, Prio val) {
        while(!heap_.empty() && !heap_.top().obj_)
            heap_.pop();
        heap_.emplace(obj, val);
        return heap_.top().val_;
    }

    Prio remove(Object &obj) {
        // We can't remove the entry straight away, so we null the pointer
        // and leave the entry in the heap, where it will eventually bubble
        // up to the root position, from where it can be removed.
        if(obj.getLink()) {
            obj.getLink()->obj_ = nullptr;
            obj.setLink(nullptr);
        }
        while(!heap_.empty() && !heap_.top().obj_)
            heap_.pop();
        return heap_.empty() ? INT64_MAX : heap_.top().val_;
    }

    Prio update(Object &obj, Prio val) {
        remove(obj);
        return add(obj, val);
    }

    std::priority_queue<Entry> heap_;
};

//! Example of a managed object.
struct MyObject {
    MyObject(MyObject const &) =delete;
    MyObject &operator=(MyObject const &) =delete;

    PriorityQueue<int, MyObject>::Entry *getLink() const {
        return link_;
    }
    
    void setLink(PriorityQueue<int, MyObject>::Entry *link) {
        link_ = link;
    }
    
    PriorityQueue<int, MyObject>::Entry *link_;
};

Unfortunately, std::priority_queue doesn't support mutability, i.e. you can't remove entries except the root entry, so the fallback is to leave handles in the heap, but invalidate them by breaking the relationship with the body. They will eventually bubble up towards the root, where they can be removed. Obviously, that means that they inflate the size of the heap needlessly, consuming some additional memory and CPU time, which may or may not be significant. If std::priority_queue would expose the internal heap maintenance functions, it would be possible to delete or update entries directly.

It would be possible to reduce the handle size even more by holding the priority in the body rather than the handle, but then the body would need to be consulted for each priority comparison, which would destroy locality of reference. The chosen approach avoids this by holding everything in the handle that is relevant for heap maintenance. The updating of the backpointer in the body by the move constructor and move assignment operator is a write-only operation, which needn't hinder performance, since there typically are write buffers in modern processors that can swallow the associated latency.

For optimizing cache performance, one would wish to use a d-ary heap instead of a binary heap, so that all children of a node (i.e. their handles), which are adjacent in the vector, occupy one cache line. Alas, that's not supported by std::priority_queue, either.

The latter would be supported by boost.heap, but in order to also support mutability, they introduce an additional std::list for the management of the backpointers, which I suspect is rooted in the age of the library. It dates back to before C++11, when move support wasn't yet available in the language. Presumably, only minimal maintenance has been done to it since. I'd welcome them bringing the library up to date and use the opportunity to provide leaner implementations.

So, the bottom line is that I have at least a suspicion that answers my original question, and a design that addresses some of my goals, leaving me with a workable but not yet optimal solution based on the standard library.

Thanks go to the commenters, and remember if you have additional insight to add, you're most welcome.

来源：https://stackoverflow.com/questions/62504206/whats-the-purpose-of-the-extra-stdlist-that-boostheapd-ary-heap-holds-whe

标签

c++

boost

heap

priority-queue