I am currently taking the Scala course on Coursera on my free time after work, in an attempt to finally give a try to functional programming. I am currently working on an as
2
/ \ union 4
1 3
((1 union 3) union 4) incl 2
^^^^^^^^^......................................assume it works
(((E union E) union 3 incl 1) union 4) incl 2
^^^^^^^^^.....................................still E
(E union E) union 3 incl 1 = E union 3 incl 1 = 3 incl 1
The following subtree should be 3 incl 1
( 3 )
( \ union D ) incl 2
( 1 )
(((1 union E) union 4) incl 3) incl 2
^^^^^^^^^.......................................expand
(((( (E union E) union E) incl 1) union 4) incl 3) incl 2
^^^^^^^^^^^^^^^^^^^^^^^^^^..................still 1
((1 union 4) incl 3) incl 2
^^^^^^^^......................................continue
((((E union E) union 4) incl 1) incl 3) incl 2
^^^^^^^^^^^^^^^^^^^^^^^^^^^^..........expand 1 union 4
((4 incl 1) incl 3) incl 2
^^^^^^^^^^^^^^^^^^^^^^^^^............Final union result
Thanks @Rex Kerr draws out the steps. I substitute the second step with the actual runtime step, which may give a more clear description of the Scala union
function.
A
/ \ union D
B C
((B union C) union D) incl A
^^^^^^^^^......................................assume it works
( B )
( \ union D ) incl A
( C )
(((0 union C) union D) incl B) incl A
^^^^^^^^^.....................................just C
(((C union D) incl B) incl A
^^^^^^^^^.....................................expand
((((0 union 0) union D) incl C) incl B) incl A
^^^^^^^^^....................................just 0
(((0 union D) incl C) incl B) incl A
^^^^^^^^^.....................................just D
((D incl C) incl B) incl A
^^^^^^^^^^^^^^^^^^^^^^^^^^.......................all incl now
Just write it out step-by step. Now you see that union reduces to a bunch of incl statements applied to the right-hand argument.
I'm doing the same course, and the above implementation of union
did turn out to be extremely inefficient.
I came up with the following not-so-functional solution to creating a union of binary-tree sets, which is WAY more efficient:
def union(that: BTSet): BTSet = {
var result:BTSet = this
that.foreach(element => result = result.incl(element))
result
}
You can't understand recursive algorithms unless you look at the base case. In fact, oftentimes, the key to understanding lies in understanding the base case first. Since the base case is not shown (probably because you didn't notice there is one in the first place) there is no understanding possible.
So based on all the responses above, I think the real workhorse is incl
and the recursive way of calling union
is just for going through all the elements in the sets.
I came up with the following implementation of union, is this better?
def union(other:BTSet) :BTSet = right union (left union (other incl element))
I gather that incl
inserts an element into an existing set? If so, that's where all the real work is happening.
The definition of the union is the set that includes everything in either input set. Given two sets stored as binary trees, if you take the unions of the first set with the branches of the second, the only element in either that could be missing from the result is the element at the root node of the second tree, so if you insert that element you have the union of both input sets.
It's just a very inefficient way of inserting each element from both sets into a new set which starts out empty. Presumably duplicates are discarded by incl
, so the result is the union of the two inputs.
Maybe it would help to ignore the tree structure for the moment; it's not really important to the essential algorithm. Say we have abstract mathematical sets. Given an input set with unknown elements, we can do two things things:
To take the union of two sets {1,2} and {2,3}, we start by decomposing the first set into the element 1 and subsets {} and {2}. We recursively take the union of {}, {2}, and {2,3} using the same process, then insert 1 into the result.
At each step, the problem is reduced from one union operation to two union operations on smaller inputs; a standard divide-and-conquer algorithm. When reaching the union of a singleton set {x} and empty set {}, the union is trivially {x}, which is then returned back up the chain.
The tree structure is just used to both allow the case analysis/decomposition into smaller sets, and to make insertion more efficient. The same could be done using other data structures, such as lists that are split in half for decomposition and with insertion done by an exhaustive check for uniqueness. To take the union efficiently requires an algorithm that's a bit more clever, and takes advantage of the structure used to store the elements.