There are two binary trees T1 and T2 which store character data, duplicates allowed.
How can I find whether T2 is a subtree of T1 ? .
T1 has millions of nodes and
I assume that your tree are immutable trees so you never change any subtree (you don't do set-car!
in Scheme parlance), but just you are constructing new trees from leaves or from existing trees.
Then I would advise to keep in every node (or subtree) an hash code of that node. In C parlance, declare the tree-s to be
struct tree_st {
const unsigned hash;
const bool isleaf;
union {
const char*leafstring; // when isleaf is true
struct { // when isleaf is false
const struct tree_st* left;
const struct tree_st* right;
};
};
};
then compute the hash at construction time, and when comparing nodes for equality first compare their hash for equality; most of the time the hash code would be different (and you won't bother comparing content).
Here is a possible leaf construction function:
struct tree_st* make_leaf (const char*string) {
assert (string != NULL);
struct tree_st* t = malloc(sizeof(struct tree_st));
if (!t) { perror("malloc"); exit(EXIT_FAILURE); };
t->hash = hash_of_string(string);
t->isleaf = true;
t->leafstring = string;
return t;
}
The function to compute an hash code is
unsigned tree_hash(const struct tree_st *t) {
return (t==NULL)?0:t->hash;
}
The function to construct a node from two subtrees sleft
& sright
is
struct tree_st*make_node (const struct tree_st* sleft,
const struct tree_st* sright) {
struct tree_st* t = malloc(sizeof(struct tree_st));
if (!t) { perror("malloc"); exit(EXIT_FAILURE); };
/// some hashing composition, e.g.
unsigned h = (tree_hash(sleft)*313) ^ (tree_hash(sright)*617);
t->hash = h;
t->left = sleft;
t->right = sright;
return t;
}
The compare function (of two trees tx
& ty
) take advantage that if the hashcodes are different the comparands are different
bool equal_tree (const struct tree_st* tx, const struct tree_st* ty) {
if (tx==ty) return true;
if (tree_hash(tx) != tree_hash(ty)) return false;
if (!tx || !ty) return false;
if (tx->isleaf != ty->isleaf) return false;
if (tx->isleaf) return !strcmp(tx->leafstring, ty->leafstring);
else return equal_tree(tx->left, ty->left)
&& equal_tree(tx->right, ty->right);
}
Most of the time the tree_hash(tx) != tree_hash(ty)
test would succeed and you won't have to recurse.
Read about hash consing.
Once you have such an efficient hash-based equal_tree
function you could use the techniques mentioned in other answers (or in handbooks).