Why do C++ data structures for graphs hide contiguous integer indices?

问题

Data structures for directed and undirected graphs are of fundamental importance. Well-known and widely-used implementations such as the Boost Graph Library and Lemon are designed such that the contiguous integer indices of nodes and edges are not exposed to the user via the interface.

Instead, the user identifies nodes and edges by (small) representative objects. One advantage is that these objects are updated automatically when the indices of nodes and edges change due to the removal of edges or nodes from the graph.

In my opinion (!), this advantage is overrated. Users will typically store the representative objects of nodes and/or edges in a container, e.g., an std::vector. Now, if nodes or edges are removed from the graph and their representative objects become invalid, the user needs to either ignore this or rearrange the vector so as to keep valid integer indices contiguous, i.e., do exactly the bookkeeping that the design was supposed to make unnecessary.

Hence, my question is: Does the design choice (of hiding the contiguous integer indices of nodes and edges from the user) have other advantages?

回答1:

_{(I'm at home in the Java world, but hope that it is OK to give an answer that is not focussed on the particular libraries in question)}

There are several possible advantages of such an abstraction. One of the most important ones was already mentioned in the question: The consistency when performing modifications in a graph is much harder to accomplish when indices have to be maintained.

The reason why this may be hard lies in the different possible graph representations: It may be easy to maintain consistent indices if the internal representation only (and always) consisted of random-access lists of Vertex and Edge objects. But for other representations, determining an index may be difficult.

This directly related to the second main advantage: One is free to use different implementations of the graph interface. The section "Graph Data Structures" in the Review of Elementary Graph Theory of the boost documentation lists several data structures that are already offered by the BGL (and everybody may add his own implementation). The running times for certain operations are given in Big-O-notation, and once can see that they vary greatly between the different data structures.

So one can easily imagine that different implementations are better suited for certain tasks. For example, consider an algorithm frequently has to check whether a particular vertex is contained in a graph. For an indexed (that is, list-based) vertex storage, this would require O(n) for each test. With a set-based storage of the vertices, this could be done in O(1) - but there simply are no sensible "indices" in this case.

Additionally, as mentioned in the Graph Concepts overview:

In fact, the BGL interface need not even be implemented using a data-structure, as for some problems it is easier or more efficient to define a graph implicitly based on some functions.

So suggesting that there is an "indexed access" even when the graph does not even exist in memory may hinder such a purely functional implementation.

回答2:

I can't speak for Lemon graph, but for boost graph I think the main goal is to be generic. So abstracting away the vertex (edge) access helps to achieve that goal.

It is stated in the documentation that boost graph is based on Dietmar Kühl's Masters Thesis on generic graph algorithms. (See my answer to Do property maps remain necessary for BGL?). So the main goal behind the library is to be generic and extensible. The choice of encapsulating access is part of abstracting the algorithms from the graph representation. To me, continuous integer indices are an implementation detail.

Boost doesn't make any assumptions on how you will use the graph or what performance trade offs are important to you. It lets you choose (or implement) the container that will best fit your needs.

If you want to break this encapsulation, you are free to do so. In fact, my most common use of boost graph involves vecS containers and a vector of structs. I usually work with graphs where the size is fixed. I could just as easily use a map of vertex_descriptors (or edge_descriptors) to objects to achieve the same goal.

So in summary, I would say that this not so much a design choice, but rather a consequence of achieving the broader goal of being generic. So the hiding of access has the benefit of being more generic.

来源：https://stackoverflow.com/questions/25641807/why-do-c-data-structures-for-graphs-hide-contiguous-integer-indices

标签

c++

graph

indexing

boost-graph

lemon-graph-library