How to find largest common sub-tree in the given two binary search trees?

后端 未结 4 1653
滥情空心
滥情空心 2020-12-24 04:16

Two BSTs (Binary Search Trees) are given. How to find largest common sub-tree in the given two binary trees?

EDIT 1: Here

4条回答
  •  夕颜
    夕颜 (楼主)
    2020-12-24 05:10

    I believe that I have an O(n + m)-time, O(n + m) space algorithm for solving this problem, assuming the trees are of size n and m, respectively. This algorithm assumes that the values in the trees are unique (that is, each element appears in each tree at most once), but they do not need to be binary search trees.

    The algorithm is based on dynamic programming and works with the following intution: suppose that we have some tree T with root r and children T1 and T2. Suppose the other tree is S. Now, suppose that we know the maximum common subtree of T1 and S and of T2 and S. Then the maximum subtree of T and S

    1. Is completely contained in T1 and r.
    2. Is completely contained in T2 and r.
    3. Uses both T1, T2, and r.

    Therefore, we can compute the maximum common subtree (I'll abbreviate this as MCS) as follows. If MCS(T1, S) or MCS(T2, S) has the roots of T1 or T2 as roots, then the MCS we can get from T and S is given by the larger of MCS(T1, S) and MCS(T2, S). If exactly one of MCS(T1, S) and MCS(T2, S) has the root of T1 or T2 as a root (assume w.l.o.g. that it's T1), then look up r in S. If r has the root of T1 as a child, then we can extend that tree by a node and the MCS is given by the larger of this augmented tree and MCS(T2, S). Otherwise, if both MCS(T1, S) and MCS(T2, S) have the roots of T1 and T2 as roots, then look up r in S. If it has as a child the root of T1, we can extend the tree by adding in r. If it has as a child the root of T2, we can extend that tree by adding in r. Otherwise, we just take the larger of MCS(T1, S) and MCS(T2, S).

    The formal version of the algorithm is as follows:

    1. Create a new hash table mapping nodes in tree S from their value to the corresponding node in the tree. Then fill this table in with the nodes of S by doing a standard tree walk in O(m) time.
    2. Create a new hash table mapping nodes in T from their value to the size of the maximum common subtree of the tree rooted at that node and S. Note that this means that the MCS-es stored in this table must be directly rooted at the given node. Leave this table empty.
    3. Create a list of the nodes of T using a postorder traversal. This takes O(n) time. Note that this means that we will always process all of a node's children before the node itself; this is very important!
    4. For each node v in the postorder traversal, in the order they were visited:
      1. Look up the corresponding node in the hash table for the nodes of S.
      2. If no node was found, set the size of the MCS rooted at v to 0.
      3. If a node v' was found in S:
        1. If neither of the children of v' match the children of v, set the size of the MCS rooted at v to 1.
        2. If exactly one of the children of v' matches a child of v, set the size of the MCS rooted at v to 1 plus the size of the MCS of the subtree rooted at that child.
        3. If both of the children of v' match the children of v, set the size of the MCS rooted at v to 1 plus the size of the MCS of the left subtree plus the size of the MCS of the right subtree.
    5. (Note that step (4) runs in expected O(n) time, since it visits each node in S exactly once, makes O(n) hash table lookups, makes n hash table inserts, and does a constant amount of processing per node).
    6. Iterate across the hash table and return the maximum value it contains. This step takes O(n) time as well. If the hash table is empty (S has size zero), return 0.

    Overall, the runtime is O(n + m) time expected and O(n + m) space for the two hash tables.

    To see a correctness proof, we proceed by induction on the height of the tree T. As a base case, if T has height zero, then we just return zero because the loop in (4) does not add anything to the hash table. If T has height one, then either it exists in T or it does not. If it exists in T, then it can't have any children at all, so we execute branch 4.3.1 and say that it has height one. Step (6) then reports that the MCS has size one, which is correct. If it does not exist, then we execute 4.2, putting zero into the hash table, so step (6) reports that the MCS has size zero as expected.

    For the inductive step, assume that the algorithm works for all trees of height k' < k and consider a tree of height k. During our postorder walk of T, we will visit all of the nodes in the left subtree, then in the right subtree, and finally the root of T. By the inductive hypothesis, the table of MCS values will be filled in correctly for the left subtree and right subtree, since they have height ≤ k - 1 < k. Now consider what happens when we process the root. If the root doesn't appear in the tree S, then we put a zero into the table, and step (6) will pick the largest MCS value of some subtree of T, which must be fully contained in either its left subtree or right subtree. If the root appears in S, then we compute the size of the MCS rooted at the root of T by trying to link it with the MCS-es of its two children, which (inductively!) we've computed correctly.

    Whew! That was an awesome problem. I hope this solution is correct!

    EDIT: As was noted by @jonderry, this will find the largest common subgraph of the two trees, not the largest common complete subtree. However, you can restrict the algorithm to only work on subtrees quite easily. To do so, you would modify the inner code of the algorithm so that it records a subtree of size 0 if both subtrees aren't present with nonzero size. A similar inductive argument will show that this will find the largest complete subtree.

    Though, admittedly, I like the "largest common subgraph" problem a lot more. :-)

提交回复
热议问题