This is a literate Haskell file, inspired by the paper
"Functional Programming with Overloading and Higher-Order Polymorphism" and Chapter 20 of TAPL.
In almost all programming languages, recursive types can be easily defined.
For example, a list of naturals can be defined as NatList = nil | cons Nat NatList.
This equation is not a simple definition, the right-hand side mentions the very name that we are defining.
This is just like how we ran into the difficulties of defining (anonymous) recursive functions!
Let us review how to define recursive functions first.
We can abstract this recursive equation as x = f x.
Suppose we have a magical fixed-point combinator, let's call it Y, that for any f, Y f = f (Y f).
If such Y exists, Y f specifies the fixed point of the recursive equation x = f x.
It turns out such Y does exist for untyped lambda calculus : Y = \f.(\x.f(x x)) (\x.f(x x))
For recursive types, we make up a new combinator Mu, such that Mu f = f (Mu f).
Note that f here denotes a type constructor instead of a function.
NatList is the fixed-point of the equation X = nil | cons Nat X.
So it can be defined as NatList = Mu (\X. (nil | cons Nat X)). ("\" is overloaded here for abstraction at type level)
We do not explicitly define Mu because we cannot abstract over type constructors in Haskell.
If we could, the type system would be undecidable, *I think*.
Instead we'll cheat and simply live with this definition: Mu f = f (Mu f).
Because Haskell requires a constructor for types, we have to define it in Haskell this way:
> newtype Mu f = In (f (Mu f))
This extra layer of In doesn't change the mathematical property, because we can view In as an isomorphism.
And here's the reverse isomorphism:
> out :: Mu f -> f (Mu f)
> out (In x) = x
Let's see how Nat plays out. We first define the equation, and then pass the equation to Mu:
> data NatF b = Zero | Succ b
> type Nat = Mu NatF
The elements in Nat are: In Zero, In $ Succ (In Zero), In $ Succ (In $ Succ (In Zero)), ...
This extra layer of indirection helps formalize recursive types. But does it have any practical applications?
The first application was illustrated in the paper "Functional Programming with Bananas, Lenses, Envelopes and Barbed Wire"
In fact, rumor has it that there was a PhD who ate so many bananas that his eyes bugged out, now he needs new lenses!
("The Evolution of a Haskell Programmer")
We note that NatF happens to be a functor:
> instance Functor NatF where
> fmap f Zero = Zero
> fmap f (Succ x) = Succ (f x)
Fold on lists generalizes the pattern of performing some repetitive operations on a homogeneous structure.
Why not generalize fold to any recursively defined structure?
As it turns out, we have catamorphism and anamorphism.
> cata :: Functor f => (f a -> a) -> Mu f -> a
> cata phi = phi . fmap (cata phi) . out
> ana :: Functor f => (a -> f a) -> a -> Mu f
> ana psi = In . fmap (ana psi) . psi
With bananas (cata) and lenses (ana) we can start defining many REALLY higher-order functions:
> zero :: Nat
> zero = In Zero
> elim z s Zero = z
> elim z s (Succ n) = s n
> intro z s 0 = z
> intro z s n = s (n-1)
> instance Enum Nat where
> succ = In . Succ
> toEnum = ana (intro Zero Succ)
> fromEnum = cata (elim 0 (1+))
> instance Show Nat where
> show = cata (elim "Zero" ("Succ "++))
try out:
zero, succ $ succ $ succ zero, toEnum 3 :: Nat, etc.
> instance Eq Nat where
> x == y = (fromEnum x) == (fromEnum y)
> instance Num Nat where
> (+) m = cata (elim m succ)
> (*) m = cata (elim zero (m+))
> fromInteger = ana (intro Zero Succ)
Let's see another example: polymorphic lists.
> data L a b = Nil | Cons a b
> type List a = Mu (L a)
> instance Functor (L a) where
> fmap f = lelim Nil (\a b -> Cons a (f b))
> lelim n c Nil = n
> lelim n c (Cons a b) = c a b
> toList = cata (lelim [] (:))
> instance Show a => Show (List a) where
> show = show . toList
try out:
In Nil, In $ Cons 3 $ In $ Cons 2 $ In Nil
Another benefit is one way to address the so called Expression Problem raised by Philip Wadler:
The Expression Problem is a new name for an old problem. The goal is
to define a datatype by cases, where one can add new cases to the
datatype and new functions over the datatype, without recompiling
existing code, and while retaining static type safety (e.g., no
casts). For the concrete example, we take expressions as the data
type, begin with one case (constants) and one function (evaluators),
then add one more construct (plus) and one more function (conversion
to a string).
Whether a language can solve the Expression Problem is a salient
indicator of its capacity for expression. One can think of cases as
rows and functions as columns in a table. In a functional language,
the rows are fixed (cases in a datatype declaration) but it is easy to
add new columns (functions). In an object-oriented language, the
columns are fixed (methods in a class declaration) but it is easy to
add new rows (subclasses). We want to make it easy to add either rows
or columns.