Catamorphism and tree-traversing in Haskell

心已入冬 提交于 2019-11-28 18:24:51

Not exactly sure what you're asking. But yeah, you feed a TreeAlgebra to foldTree corresponding to the computation you want to perform on the tree. For example, to sum all the elements in a tree of Ints you would use this algebra:

sumAlgebra :: TreeAlgebra Int Int
sumAlgebra = TreeAlgebra { leaf = id
                         , branch = (+) }

Which means, to get the sum of a leaf, apply id (do nothing) to the value in the leaf. To get the sum of a branch, add together the sums of each of the children.

The fact that we can say (+) for branch instead of, say, \x y -> sumTree x + sumTree y is the essential property of the catamorphism. It says that to compute some function f on some recursive data structure it suffices to have the values of f for its immediate children.

Haskell is a pretty unique language in that we can formalize the idea of catamorphism abstractly. Let's make a data type for a single node in your tree, parameterized over its children:

data TreeNode a child
    = Leaf a
    | Branch child child

See what we did there? We just replaced the recursive children with a type of our choosing. This is so that we can put the subtrees' sums there when we are folding.

Now for the really magical thing. I'm going to write this in pseudohaskell -- writing it in real Haskell is possible, but we have to add some annotations to help the typechecker which can be kind of confusing. We take the "fixed point" of a parameterized data type -- that is, constructing a data type T such that T = TreeNode a T. They call this operator Mu.

type Mu f = f (Mu f)

Look carefully here. The argument to Mu isn't a type, like Int or Foo -> Bar. It's a type constructor like Maybe or TreeNode Int -- the argument to Mu itself takes an argument. (The possibility of abstracting over type constructors is one of the things that makes Haskell's type system really stand out in its expressive power).

So the type Mu f is defined as taking f and filling in its type parameter with Mu f itself. I'm going to define a synonym to reduce some of the noise:

type IntNode = TreeNode Int

Expanding Mu IntNode, we get:

Mu IntNode = IntNode (Mu IntNode)
           = Leaf Int | Branch (Mu IntNode) (Mu IntNode)

Do you see how Mu IntNode is equivalent to your Tree Int? We have just torn the recursive structure apart and then used Mu to put it back together again. This gives us the advantage that we can talk about all Mu types at once. This gives us what we need to define a catamorphism.

Let's define:

type IntTree = Mu IntNode

I said the essential property of the catamorphism is that to compute some function f, it suffices to have the values of f for its immediate children. Let's call the type of the thing we are trying to compute r, and the data structure node (IntNode would be a possible instantiation of this). So to compute r on a particular node, we need the node with its children replaced with their rs. This computation has type node r -> r. So a catamorphism says that if we have one of these computations, then we can compute r for the entire recursive structure (remember recursion is denoted explicitly here with Mu):

cata :: (node r -> r) -> Mu node -> r

Making this concrete for our example, this looks like:

cata :: (IntNode r -> r) -> IntTree -> r

Restating, if we can take a node with rs for its children and compute an r, then we can compute an r for an entire tree.

In order to actually compute this, we need node to be a Functor -- that is we need to be able to map an arbitrary function over the children of a node.

fmap :: (a -> b) -> node a -> node b

This can be done straightforwardly for IntNode.

fmap f (Leaf x) = Leaf x                  -- has no children, so stays the same
fmap f (Branch l r) = Branch (f l) (f r)  -- apply function to each child

Now, finally, we can give a definition for cata (the Functor node constraint just says that node has a suitable fmap):

cata :: (Functor node) => (node r -> r) -> Mu node -> r
cata f t = f (fmap (cata f) t)

I used the parameter name t for the mnemonic value of "tree". This is an abstract, dense definition, but it is really very simple. It says: recursively perform cata f -- the computation we are doing over the tree -- on each of t's children (which are themselves Mu nodes) to get a node r, and then pass that result to f compute the result for t itself.

Tying this back to the beginning, the algebra you are defining is essentially a way of defining that node r -> r function. Indeed, given a TreeAlgebra, we can easily get the fold function:

foldFunction :: TreeAlgebra a r -> (TreeNode a r -> r)
foldFunction alg (Leaf a) = leaf alg a
foldFunction alg (Branch l r) = branch alg l r

Thus the tree catamorphism can be defined in terms of our generic one as follows:

type Tree a = Mu (TreeNode a)

treeCata :: TreeAlgebra a r -> (Tree a -> r)
treeCata alg = cata (foldFunction alg)

I'm out of time. I know that got really abstract really fast, but I hope it at least gave you a new viewpoint to help your learning. Good luck!

jbolden1517

I think you were were asking a question about the {}'s. There is an earlier question with a good discussion of {}'s. Those are called Haskell's record syntax. The other question is why construct the algebra. This is a typical function paradigm where you generalize data as functions.

The most famous example is Church's construction of the Naturals, where f = + 1 and z = 0, 0 = z, 1 = f z, 2 = f (f z), 3 = f (f (f z)), etc...

What you are seeing is essentially the same idea being applied to a tree. Work the church example and the tree will click.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!