问题
I have a tree where every node may have 0 to N children.
Use-case is the following query: Given pointers to two nodes: Are these nodes within the same branch of the tree?
Examples
q(2,7) => true
q(5,4) => false
By the book (slow)
The straight forward implementation would be to store a pointer to the parent and a pointer to a list of children at each node. But this would lead to bad performance because the tree would be fragmented in memory and therefor not cache-aware.
Question
What would be a good way to represent the tree in compact form? The whole tree has about 100,000 nodes. So it should be possible to find a way to make it fit completely in the CPU-cache.
Binary trees for example are often represented implicitly as an array and are therefor perfect to be completely stored in the CPU-cache (if small enough).
回答1:
You can pre-allocate a contiguous block of memory where you concatenate the information for all nodes.
Afterwards, each node would only need a way to retrieve the beginning of its information, and the length of that information.
In this case, the information for each node could be represented by the parent, followed by the list of children (let's assume that we use -1 when there is no parent, i.e. for the root).
For example, for the tree posted in the question, the information for node 1 would be: -1 2 3 4
, the information for node 2 is: 1 5
, and so on.
The contiguous array would be obtained by concatenating these arrays, resulting in something like:
-1 2 3 4 1 5 1 9 10 1 11 12 13 14 2 3 5 5 5 3 3 4 4 4 15 4
Each node would use some metadata to allow retrieving its associated information. As mentioned, this metadata would need to consist of a startIndex
and length
. E.g. for node 3, we would have startIndex = 6
, length = 3
, which allows to retrieve the 1 9 10
subarray, indicating that the parent is node 1, and its children are nodes 9 and 10.
In addition, the metadata information can also be stored in the contiguous memory block, at the beginning. The metadata has fixed length for each node (two values), thus we can easily obtain the position of the metadata for a certain node, given its index.
In this way, all the information about the graph will be stored in a contiguous, cache-friendly, memory block.
来源:https://stackoverflow.com/questions/43777859/cache-aware-tree-impementation