Recursive set union: how does it work really?

前端 未结 6 2035
梦谈多话
梦谈多话 2020-12-13 14:29

I am currently taking the Scala course on Coursera on my free time after work, in an attempt to finally give a try to functional programming. I am currently working on an as

相关标签:
6条回答
  • 2020-12-13 14:59
      2
     / \  union  4
    1   3
    
    ((1 union 3) union 4) incl 2
      ^^^^^^^^^......................................assume it works
    
    (((E union E) union 3 incl 1) union 4) incl 2
       ^^^^^^^^^.....................................still E
    
    (E union E) union 3 incl 1 = E union 3 incl 1 = 3 incl 1
    

    The following subtree should be 3 incl 1

    (  3             ) 
    (    \   union D ) incl 2
    (      1         )
    
    
    (((1 union E) union 4) incl 3) incl 2
       ^^^^^^^^^.......................................expand
    
    (((( (E union E) union E) incl 1) union 4) incl 3) incl 2
          ^^^^^^^^^^^^^^^^^^^^^^^^^^..................still 1
    
    ((1 union 4) incl 3) incl 2
       ^^^^^^^^......................................continue
    
    ((((E union E) union 4) incl 1) incl 3) incl 2
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^..........expand 1 union 4
    
    ((4 incl 1) incl 3) incl 2
      ^^^^^^^^^^^^^^^^^^^^^^^^^............Final union result 
    

    Thanks @Rex Kerr draws out the steps. I substitute the second step with the actual runtime step, which may give a more clear description of the Scala union function.

    0 讨论(0)
  • 2020-12-13 15:02
      A
     / \  union  D
    B   C
    
    ((B union C) union D) incl A
      ^^^^^^^^^......................................assume it works
    
    (  B             )
    (    \   union D ) incl A
    (     C          )
    
    (((0 union C) union D) incl B) incl A
       ^^^^^^^^^.....................................just C
    
    (((C union D) incl B) incl A
       ^^^^^^^^^.....................................expand
    
    ((((0 union 0) union D) incl C) incl B) incl A
        ^^^^^^^^^....................................just 0
    
    (((0 union D) incl C) incl B) incl A
       ^^^^^^^^^.....................................just D
    
    ((D incl C) incl B) incl A
    ^^^^^^^^^^^^^^^^^^^^^^^^^^.......................all incl now
    

    Just write it out step-by step. Now you see that union reduces to a bunch of incl statements applied to the right-hand argument.

    0 讨论(0)
  • 2020-12-13 15:04

    I'm doing the same course, and the above implementation of union did turn out to be extremely inefficient.

    I came up with the following not-so-functional solution to creating a union of binary-tree sets, which is WAY more efficient:

    def union(that: BTSet): BTSet = {
      var result:BTSet = this
      that.foreach(element => result = result.incl(element))
      result
    }
    
    0 讨论(0)
  • 2020-12-13 15:07

    You can't understand recursive algorithms unless you look at the base case. In fact, oftentimes, the key to understanding lies in understanding the base case first. Since the base case is not shown (probably because you didn't notice there is one in the first place) there is no understanding possible.

    0 讨论(0)
  • 2020-12-13 15:20

    So based on all the responses above, I think the real workhorse is incl and the recursive way of calling union is just for going through all the elements in the sets.

    I came up with the following implementation of union, is this better?

    def union(other:BTSet) :BTSet = right union (left union (other incl element))
    
    0 讨论(0)
  • 2020-12-13 15:22

    I gather that incl inserts an element into an existing set? If so, that's where all the real work is happening.

    The definition of the union is the set that includes everything in either input set. Given two sets stored as binary trees, if you take the unions of the first set with the branches of the second, the only element in either that could be missing from the result is the element at the root node of the second tree, so if you insert that element you have the union of both input sets.

    It's just a very inefficient way of inserting each element from both sets into a new set which starts out empty. Presumably duplicates are discarded by incl, so the result is the union of the two inputs.


    Maybe it would help to ignore the tree structure for the moment; it's not really important to the essential algorithm. Say we have abstract mathematical sets. Given an input set with unknown elements, we can do two things things:

    • Add an element to it (which does nothing if the element was already present)
    • Check whether the set is non-empty and, if so, decompose it into a single element and two disjoint subsets.

    To take the union of two sets {1,2} and {2,3}, we start by decomposing the first set into the element 1 and subsets {} and {2}. We recursively take the union of {}, {2}, and {2,3} using the same process, then insert 1 into the result.

    At each step, the problem is reduced from one union operation to two union operations on smaller inputs; a standard divide-and-conquer algorithm. When reaching the union of a singleton set {x} and empty set {}, the union is trivially {x}, which is then returned back up the chain.

    The tree structure is just used to both allow the case analysis/decomposition into smaller sets, and to make insertion more efficient. The same could be done using other data structures, such as lists that are split in half for decomposition and with insertion done by an exhaustive check for uniqueness. To take the union efficiently requires an algorithm that's a bit more clever, and takes advantage of the structure used to store the elements.

    0 讨论(0)
提交回复
热议问题