Removing duplicate subtrees from binary tree

可紊 提交于 2019-12-04 19:27:05


I have to design an algorithm under the additional homework. This algorithm have to compress binary tree by transforming it into DAG by removing repetitive subtrees and redirecting all these connections to one left original subtree. For instance I've got a tree (I'm giving the nodes preorder):

1 2 1 3 2 1 3

The algorithm have to remove right connection (right subtree that means 2 1 3) of 1 (root) and redirect it to left connection (because these substrees are the same and left was first in preorder so we leave only the left)

The way I see it: I'm passing the tree preorder. For current node 'w', I start recursion that have to detect (if there exist) the original subtree equals to the subtree with root 'w'. I'm cutting the recursion if I find equal subtree (and I do what must be done) or when I get to 'w' in my finding the same subtrees recursion. Of course I predict some small improvements like comparing only subtrees with equal number of nodes.

If I'm not wrong it gives complexity O(n^2) where n is number of nodes of given binary tree. Is there any chance to do it faster (I think it is). Is the linear algorithm possible?

Pity that my algorithm finally has complexity O(n^3). Your answers with hashing probably will be very useful for me after some time, when I will know much more.. For now it's too difficult for me..

The last question. Is there any chance to do it in O(n^2) using elementary techniques (not hashing)?


This happens when constructing oBDDs. The Idea is: put the tree into a canonical form, and construct a hashtable with an entry for every node. Hash function is a function of the node + the hash functions for the left/right child nodes. Complexity is O(N), but only if one can rely on the hashvalues being unique. The final compare (e.g. for Resolving collisions) will still cost o(N*N) for the recursive subtree <--> subtree compare. More on BDDs or the original Bryant paper

The hashfunction I currently use:

#define SHUFFLE(x,n) (((x) << (n))|((x) >>(32-(n))))
/* a node's hashvalue is based on its value
 * and (recursively) on it's children's hashvalues.
#define NODE_HASH2(l,r) ((SHUFFLE((l),5)^SHUFFLE((r),9)))
#define NODE_HASH3(v,l,r) ((0x54321u*(v) ^ NODE_HASH2((l),(r))))

Typical usage:

void node_sethash(NodeNum num)
if (NODE_IS_NULL(num)) return;

if (NODE_IS_TERMINAL(num)) switch (nodes[num].var) {
        case 0: nodes[num].hash.hash= HASH_FALSE; break;
        case 1: nodes[num].hash.hash= HASH_TRUE; break;
        case 2: nodes[num].hash.hash= HASH_FALSE^HASH_TRUE; break;
else if (NODE_IS_NAMED(num)) {
        NodeNum f,t;
        f = nodes[num].negative;
        t = nodes[num].positive;
        nodes[num].hash.hash = NODE_HASH3 (nodes[num].var, nodes[f].hash.hash, nodes[t].hash.hash);
return ;

Searching the hash table:

NodeNum *hash_hnd(NodeNum num, int want_exact)
unsigned slot;
NodeNum *ptr, this;
if (NODE_IS_NULL(num)) return NULL;

slot = nodes[num].hash.hash % COUNTOF(hash_nodes);

for (ptr = &hash_nodes[slot]; !NODE_IS_NULL(this= *ptr); ptr = &nodes[this] {
        if (this == num) break;
        if (want_exact) continue;
        if (nodes[this].hash.hash != nodes[num].hash.hash) continue;
        if (nodes[this].var != nodes[num].var) continue;
        if (node_compare( nodes[this].negative , nodes[num].negative)) continue;
        if (node_compare( nodes[this].positive , nodes[num].positive)) continue;
                /* duplicate node := same var+same children */
return ptr;

The recursive compare function:

int node_compare(NodeNum one, NodeNum two)
int rc;

if (one == two) return 0;

if (NODE_IS_NULL(one) && NODE_IS_NULL(two)) return 0;
if (NODE_IS_NULL(one) && !NODE_IS_NULL(two)) return -1;
if (!NODE_IS_NULL(one) && NODE_IS_NULL(two)) return 1;

if (NODE_IS_TERMINAL(one) && !NODE_IS_TERMINAL(two)) return -1;
if (!NODE_IS_TERMINAL(one) && NODE_IS_TERMINAL(two)) return 1;

if (VAR_RANK(nodes[one].var)  < VAR_RANK(nodes[two].var) ) return -1;
if (VAR_RANK(nodes[one].var)  > VAR_RANK(nodes[two].var) ) return 1;

rc = node_compare(nodes[one].negative,nodes[two].negative);
if (rc) return rc;
rc = node_compare(nodes[one].positive,nodes[two].positive);
if (rc) return rc;

return 0;


This is a problem commonly solved to do common sub-expression elimination in programming languages.

The approach is as follows (and is easily generalized to more than 2 children in a node):

Algorithm (Assumes mutable tree structure; You can easily build a new tree along the way):


    HASH = a new hash-table-based dictionary

    foreach subtree NODE in the tree // traverse this however you like

        if NODE is in HASH
            replace NODE with HASH[NODE]
            HASH[NODE] = N // insert the current node, N, in the dictionary

To compute the hash code for a node, you need to recursively compute the hash nodes until you reach the leaves of the tree.

Simply calculating these hash codes naively will bump up your runtime to O(n^2).

It is crucial that you store the results on your way down the tree to avoid repeated recursive calls and to improve the runtime to O(n).


I would go with a hashing approach.

A hash for a leaf is its value mod P_1. Hash for a node is (value+hash(left_son)*P_2+hash(right_son)*P_2^2) mod P_1, where P_1, P_2 are primes. If you count those hashes for at least 5 different big prime pairs(by big i mean something near 10^8-10^9, so you can do your math without overflowing), you can safely assume that nodes with same hashes are the same.

Then you can walk the tree, checking sons, first and do your transform. This will work in O(n) time.

NOTE that you can use other hash functions, like (value + hash(left_son)*P_2 + hash(right_son)*P_3) mod P_1, etc.