Wouter Swiestra’s paper [3] was a solution to the expression problem posed by Phil Wadler that became a Functional Pearl.
It demonstrates, using Haskell, that we can define an evaluator for arithmetic expressions that can be assembled from isolated independant components, and that the same technology can be used to combine free monads.
Modern refinements to Haskell’s type system, featuring closed type families, can improve on the type machinery proposed in the paper to track and inject co-products more safely and elegantly and these techniques have been extended to provide practical libraries.
The main utility of these techniques pertain to the development of embedded domain specific languages (eDSL) in Haskell.
[1] Mathew Pickering blog article
[2] Patrick Bahr, Tom Hvitved - compdata package
[3] Wouter Swierstra - Functional pearl
A simple expression language
data Expr
= Val Int
| Add Expr Expr
eval :: Expr -> Int
eval (Val x) = x
eval (Add x y) = eval x + eval y
Since the constructors are fixed in advance all functions dispatching
on Expr
must be modified if we were to add further constructors.
Phil Wadler and history of the expression problem
The proposal is to be able extend the AST, by adding both types and functions without recompilation, whilst retaining static type safety.
Here we will adapt the coproduct of functors approach first proposed in [3] and introduced to me in [1] from practical work in [2].
This first part essentially modernises and tidies up the type level machinery used in the paper using Glasgow Haskell (2020) (and may be skipped on first reading).
We will be using modern ghc to facilitate type level programming.
{-# LANGUAGE GADTSyntax #-}
{-# LANGUAGE AllowAmbiguousTypes #-}
{-# LANGUAGE TypeApplications #-}
{-# LANGUAGE ConstraintKinds #-}
{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE TypeOperators #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE MultiParamTypeClasses #-}
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE UndecidableInstances #-}
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE DeriveFunctor #-}
module Examples.DTALC where
import Data.Kind (Type)
Expr
defines a recursive data type where the f
parameter
specifies all the type constructors (or signatures) that can be used
in an expression:
data Expr f = In (f (Expr f))
The type parameter f
can be thought of as supplying the signature
of the constructors; in that it takes a type parameter corresponding
to the expressions that occur in the subtrees of constructors.
This allows the expression tree to be extended with type constructors
formed into a co-product using a type level operator :+:
data Val e = Val Int deriving (Functor)
data Add e = Add e e deriving (Functor)
The key idea is to combine expressions by taking the co-products of their signatures.
Following [3] We start with defining the co-product of signature functors and then define type level machinery required to inject them at term level.
data (f :+: g) e
= Inl (f e)
| Inr (g e)
infixr 6 :+:
Will fmap
over left and right sub trees.
instance (Functor f, Functor g) => Functor (f :+: g) where
fmap f (Inl e) = Inl (fmap f e)
fmap f (Inr e) = Inr (fmap f e)
[2] Defines Foldable and Traversable instances for :+:
and much
more utility.
[1] Implementing the subtyping relation using type families brings out
quite a subtle point which is implicit in the original pearl. The type
class :<:
has two purposes: the first is to check whether we are able
to construct an injection from f
to g
by computing at the type
level. The second is to work out the correct injection from a
to b
at the term level. Type families make this dependency explicit.
Thus, our first step will be to check whether such an injection exists
from f
to g
.
As type class resolution operates without backtracking - we can’t express any kind of branching computation in type class instances.
This led to the convention where we were forced to make sure that :+:
associated to the right; hence, it was easy to think about our
composed data types as lists of types (with :+:
being a type level
cons and :<:
searching through the list).
As type families allow us to backtrack, this restriction is needless. Instead it is much easier to think about our constructed data types as trees.
Ed: See also note Sandy Maguire’s approach using type level lists.
To find the location of a type signature in the tree we use the following promoted types and closed type families:
data Where = Here | L Where | R Where
data Res = Found Where | NotFound
type family Elem (e :: Type -> Type) (f :: Type -> Type) :: Res where
Elem e e = Found Here
Elem e (l :+: r) = Choose (Elem e l) (Elem e r)
Elem e f = NotFound
type family Choose e f :: Res where
Choose (Found a) b = Found (L a)
Choose a (Found b) = Found (R b)
Choose a b = NotFound
[1] Now we have the path to the type, we must also construct the injection. …thanks to our additional work we don’t need to rely on convention nor OverlappingInstances (Ed: Here Pickering refers to a deficiency in original construction in [3])
class Subsume (res :: Res) (f :: Type -> Type) (g :: Type -> Type) where
inj' :: f a -> g a
instance Subsume (Found Here) f f where
inj' = id
instance Subsume (Found p) f l => Subsume (Found (L p)) f (l :+: r) where
inj' = Inl . inj' @(Found p)
instance Subsume (Found p) f r => Subsume (Found (R p)) f (l :+: r) where
inj' = Inr . inj' @(Found p)
Nota: this use of typeclasses is often an efficient way to deal with recursive structures as the methods are not actually recursive and can thus be inlined.
f
is a subtype of g
(is subsumed by); if there is an injection
from f
to g
.
infixl 5 :<:
type f :<: g = Subsume (Elem f g) f g
infixl 5 :=:
type f :=: g = (f :<: g, g :<: f)
inj :: forall f g a . (f :<: g) => f a -> g a
inj = inj' @(Elem f g)
We now have all the type level equipment we need to define our extensible eDSL by defining co-product types for the expressions.
Iff f
is a functor we can fold over any value of type Expr f
;
the fold generalises the fold over lists.
The first argument f a -> a
is called an algebra and determines
how the different constructors of a data type affect the final
outcome: it specifies one step of the recursion, turning f a
into
a
.
(This notion of algebra comes from the category theory generalisation of algebraic structure known as F-algebras.
foldExpr :: Functor f => (f a -> a) -> Expr f -> a
foldExpr f (In t) = f (fmap (foldExpr f) t)
Using a typeclass we can define and assemble algebras in a modular fashion. In this case the result of evaluation should be an integer
class Functor f => Eval f where
evalAlgebra :: f Int -> Int
To evaluate expressions of values and additions we need the following instances:
instance Eval Val where
evalAlgebra (Val x) = x
instance Eval Add where
evalAlgebra (Add x y) = x + y
We also need to evaluate functors built from coproducts
instance (Eval f, Eval g) => Eval (f :+: g) where
evalAlgebra (Inl x) = evalAlgebra x
evalAlgebra (Inr y) = evalAlgebra y
With all those ingredients in place we can define evaluation by folding over an expresion with the algebra defined above.
eval :: Eval f => Expr f -> Int
eval e = foldExpr evalAlgebra e
We can now define smart constructors for expressions using the type injection machinery:
injectExpr :: (g :<: f) => g (Expr f) -> Expr f
injectExpr = In . inj
val :: (Val :<: f) => Int -> Expr f
val x = injectExpr (Val x)
add :: (Add :<: f) => Expr f -> Expr f -> Expr f
add x y = injectExpr (Add x y)
infixl 6 ⊕
(⊕) :: (Add :<: f) => Expr f -> Expr f -> Expr f
x ⊕ y = add x y
Example of an addition of integer values (contrast this to the example without smart constructors).
myexp :: Expr (Val :+: Add)
myexp = val 2000 ⊕ val 20 ⊕ val 2
ugly :: Expr (Val :+: Add)
ugly = In (Inr (Add (In (Inl (Val 2000))) (In (Inl (Val 22)))))
And how do we ensure correct injections?
eval ugly == eval myexp
So can we extend the expressions with say multiplication?
data Mul x = Mul x x deriving (Functor)
Now we need to extend the evaluator to evaluate Mul
and define a
smart constructor for it.
instance Eval Mul where
evalAlgebra (Mul x y) = x * y
mul :: (Mul :<: f) => Expr f -> Expr f -> Expr f
mul x y = injectExpr (Mul x y)
infixl 7 ⊗
(⊗) :: (Mul :<: f) => Expr f -> Expr f -> Expr f
x ⊗ y = mul x y
expr2 :: Expr (Mul :+: Add :+: Val)
expr2 = val 4 ⊗ val 2000 ⊕ val 20 ⊕ val 2
eval expr2
This follows the example in [3] enabling expressions to be pretty printed. The first requirement is to create a class corresponding to the functions we wish to write.
The render method is sufficiently general to allow us to make recursive calls to the subtrees containing expression of different types.
class Render f where
render :: Render g => f (Expr g) -> String
pretty :: Render f => Expr f -> String
pretty (In t) = render t
Now we can define the desired behaviour for each instance of Render
we wish to handle.
instance (Render f, Render g) => Render (f :+: g) where
render (Inl x) = render x
render (Inr y) = render y
instance Render Val where
render (Val t) = show t
instance Render Add where
render (Add x y) = "(" <> pretty x <> " + " <> pretty y <> ")"
instance Render Mul where
render (Mul x y) = "(" <> pretty x <> " * " <> pretty y <> ")"
Cleary this function is extensible via the typeclass instances. QED.
We use the same constructors as the original DTALC paper but resort to
GADT syntax to make the constructor types explicit in defining the
free monad which consists of either pure values or an impure effect,
constructed using f
.
When f
is a functor Term f
is a monad
data Term f a where
Pure :: a -> Term f a
Impure :: f (Term f a) -> Term f a
instance Functor f => Functor (Term f) where
fmap f (Pure x) = Pure (f x)
fmap f (Impure t) = Impure (fmap f <$> t)
instance Functor f => Applicative (Term f) where
pure = Pure
Pure f <*> xs = f <$> xs
Impure g <*> xs = Impure ((<*> xs) <$> g)
instance Functor f => Monad (Term f) where
(Pure x) >>= f = f x
(Impure t) >>= f = Impure ((>>= f) <$> t)
In this instance the
Term
type is a higher-order functor that maps a functorf
to the monadTerm f
. ThisTerm
functor is left-adjoint to the forgetful functor from monads to their underlying functors.
E.g. the list monad and the state monad are not Free. Even though the
state monad is not a free monad we can use the Term
data type to
represent a language of stateful computations by interpreting them as
computations in the state monad.
data Incr t = Incr Int t deriving (Functor)
data Recall t = Recall (Int -> t) deriving (Functor)
injectTerm :: (g :<: f) => g (Term f a) -> Term f a
injectTerm = Impure . inj
incr :: (Incr :<: f) => Int -> Term f ()
incr i = injectTerm (Incr i (Pure ()))
recall :: (Recall :<: f) => Term f Int
recall = injectTerm (Recall Pure)
How cool is this?
Tick only works in the Term (Recall :+: Incr)
monad.
tick :: Term (Recall :+: Incr) Int
tick = do
y <- recall
incr 1
return y
There is a choice here (Recall :<:f, Incr :<: f) => Term f Int
is
a more general type for tick
where it would work in any Term that
supports the operations.
foldTerm :: Functor f => (a -> b) -> (f b -> b) -> Term f a -> b
foldTerm pur _ (Pure x) = pur x
foldTerm pur imp (Impure t) = imp (fmap (foldTerm pur imp) t)
Now we need an algebra to keep track of the state of the memory cell, to avoid confusion Wouter introduces a newtype:
newtype Mem = Mem Int deriving Show
To interpret our terms we need to define a run function with the following type:
run :: ... => Term f a -> Mem -> (a, Mem)
class Functor f => Run f where
runAlgebra :: f (Mem -> (a, Mem)) -> (Mem -> (a, Mem))
instance Run Incr where
runAlgebra (Incr k r) (Mem i) = r (Mem (i + k))
instance Run Recall where
runAlgebra (Recall r) (Mem i) = r i (Mem i)
instance (Run f, Run g) => Run (f :+: g) where
runAlgebra (Inl r) = runAlgebra r
runAlgebra (Inr r) = runAlgebra r
run :: Run f => Term f a -> Mem -> (a, Mem)
run = foldTerm (,) runAlgebra
run tick (Mem 4)
data Teletype a
= GetChar (Char -> a)
| PutChar Char a
deriving (Functor)
data FileSystem a
= ReadFile FilePath (String -> a)
| WriteFile FilePath String a
deriving (Functor)
exec :: Exec f => Term f a -> IO a
exec = foldTerm return execAlgebra
class Functor f => Exec f where
execAlgebra :: f (IO a) -> IO a
instance Exec Teletype where
execAlgebra (GetChar f) = Prelude.getChar >>= f
execAlgebra (PutChar c io) = Prelude.putChar c >> io
instance Exec FileSystem where
execAlgebra (ReadFile name f) = Prelude.readFile name >>= f
execAlgebra (WriteFile name s io) = Prelude.writeFile name s >> io
readFile' :: (FileSystem :<: f) => FilePath -> Term f String
readFile' name = injectTerm (ReadFile name Pure)
writeFile' :: (FileSystem :<: f) => FilePath -> String -> Term f ()
writeFile' name s = injectTerm (WriteFile name s (Pure ()))
copyFile :: FilePath -> FilePath -> Term (FileSystem) ()
copyFile from to = do
contents <- readFile' from
writeFile' to contents
return ()
exec $ copyFile "DTALC.hs" "CopyOfDTALC.hs"
Write/test some examples using them and co-product with FileSystem
Write execAlgebra, smart constructors and examples with co-product types.
The simplified presentation of co-product injections given above suffers from many omissions:
- inj partial inverse AFAIK this is a left inverse or retraction to
the pre-image of inj (called prj in the original paper [3]) can we
define a partial function for the class:
(sub :<: sup) => ret :: sup a -> Maybe (sub a)
And how would this be useful? (30 mins) - One of these is co-product type ambiguity e.g. what does the following expression mean?
eval (val 40 :: Expr (Val :+: Val))
How can we fix the implementation of our type analysis above to disallow such ambigous constructions? (30 mins)
- How can we allow more complex subtyping relations to hold e.g.: (1hr)?
:kind! (Add :+: Mul :<: Add :+: Mul :+: Val)
- Add product types (:*:)? (2hr)
See: Patrick Bahr’s compdata package for practical implementations of above for guidance and solutions.