2009-08-30

Fun with (Type) Class

I'd like to motivate OCaml's object-oriented features (the "O" in "OCaml") by showing an encoding of (first-order) Haskell type classes. This will require making use of almost everything in OCaml's object buffet: class types, virtual classes, inheritance of class types and of classes, double dispatch and coercion.

Haskell could be said to have the best type system among all languages in wide use at the moment, striking a delicate balance between power and practicality without compromising purity (neither conceptual nor functional). In the words of Oleg Kiselyov, it sports "attractive types". A great feature is the presence of type-indexed values via the type-class mechanism. In a nutshell, type-indexed values are functions from types to values. A classic example is overloaded operators, where for instance the same symbol + denotes addition over the integers, floating-point and rational numbers, depending on the type of its parameters. Haskell provides "type magic" to automatically infer the intended function (the value) based on the types of the arguments to which it is applied, achieving a form of type-safe "ad-hoc" polymorphism.

The ML family of languages doesn't have the machinery to attain such expressive power. There are, however, a number of ways to encode type-indexed families with an explicit parameter encoding the selector type. The oldest, and most complicated technique is that of Danvy (see also this complete example) which is not really equivalent to Haskell's type classes but a sophisticated application of phantom types. Another technique is using an explicit dictionary (Yaron Minsky has a worked example) from which to select the appropriate operation. Normally these dictionaries are records; the labels provide type-safe instantiation of the generic value family. The problem with records is that they are not extensible, so it is not immediate how to encode the equivalent of inheritance of type classes. OCaml's object-oriented sublanguage is especially well-suited for the task, allowing for a natural encoding that shows that the "class" in Haskell's "type classes" is more object-oriented than it seems at first sight.

I'll start with a bit of Haskell's standard prelude, as shown in the Report in a nice graph. I'll focus on the two most common nontrivial classes, Eq and Ord. The first is defined as:

class  Eq a  where
    (==), (/=)  ::  a -> a -> Bool

    x /= y  = not (x == y)
    x == y  = not (x /= y)

I encode a Haskell type class in OCaml with… (surprise!) a class type and associated functions:

class type ['a] eq = object
  method equal     : 'a -> 'a -> bool
  method not_equal : 'a -> 'a -> bool
end

Since the type class has a type parameter a, my class type also has a type parameter 'a. Each member of the type class corresponds to a (named) method and to a regular function taking the class type as a dictionary parameter:

let (===) x y (eq : 'a #eq) = eq#equal x y
and (<=>) x y (eq : 'a #eq) = eq#not_equal x y

(I've opted to not shadow the regular operators in Pervasives). The weird order of the parameters (you'd expect the dictionary to come first) is such that I can write:

    if x === y $ eq_int then …

where ($) is the application operator. Haskell provides default implementations for both members; as the Report notes:

This declaration gives default method declarations for both /= and ==, each being defined in terms of the other. If an instance declaration for Eq defines neither == nor /=, then both will loop.

I find that unsettling, so I'll just define one via a virtual class (not class type):

class virtual ['a] eq_base = object (self)
  method virtual equal : 'a -> 'a -> bool
  method not_equal x y = not (self#equal x y)
end

In this case, equal is left virtual. It is important that the class be left virtual even if equal is defined, so that it cannot be instantiated. This means that it cannot be coerced to the class type eq, since it is illegal to shadow virtual members that were declared non-virtual. Now come the instances of the type class. In Haskell, a very simple instance is that for unit which is built-in (via deriving) but could be declared as:

instance  Eq ()  where
    () == () = true

To encode instances, I use an immediate object bound to an identifier:

let eq_unit : unit eq = object
  inherit [unit] eq_base
  method equal     () () = true
  method not_equal () () = false
end

Note the coercion to the corresponding monomorphic instance of the class type. I inherit from the default implementation in the class base, even if in this case I explicitly implement all methods for efficiency. The same encoding works for booleans too:

let eq_bool : bool eq = object
  inherit [bool] eq_base
  method equal     = (==)
  method not_equal = (!=)
end

OCaml has built-in ad-hoc polymorphic comparisons that I could use directly:

class ['a] eq_poly : ['a] eq = object
  inherit ['a] eq_base
  method equal     = Pervasives.(=)
  method not_equal = Pervasives.(<>)
end

With that I can have type classes for the primitive types:

let eq_int    = (new eq_poly :> int    eq)
and eq_float  = (new eq_poly :> float  eq)
and eq_char   = (new eq_poly :> char   eq)
and eq_string = (new eq_poly :> string eq)

Here I instantiate objects of the eq_poly class coerced to the correct monomorphic instance of eq (recall that in OCaml coercion is always explicit). The type class instance for pairs requires type classes for each of its components. In Haskell, we'd have:

instance  (Eq a, Eq b) => Eq ((,) a b)  where …

(not really, but let me gloss over that). Encoding that in OCaml also requires a witness for each type parameter as explicit arguments to the constructor:

let eq_pair (l : 'a eq) (r : 'b eq) : ('a * 'b) eq = object (self)
  inherit ['a * 'b] eq_base
  method equal (p, q) (p', q') = l#equal p p' && r#equal q q'
end

An inefficiency of this encoding is that it creates a new object with every invocation of the witness function. For instance, in general eq_pair eq_unit eq_bool != eq_pair eq_unit eq_bool and so on for every witness. The same is needed to encode an equality witness for option types:

let eq_option (o : 'a eq) : 'a option eq = object
  inherit ['a option] eq_base
  method equal x y = match x, y with
    None  , None   -> true
  | Some x, Some y -> o#equal x y
  | _              -> false
end

And for lists:

let eq_list (o : 'a eq) : 'a list eq = object
  inherit ['a list] eq_base
  method equal =
    let rec equal x y = match x, y with
    | [], [] -> true
    | _ , []
    | [], _  -> false
    | x :: xs, y :: ys when o#equal x y -> equal xs ys
    | _      -> false
    in equal
end

(I use explicit closed recursion for efficiency). In all these cases, I encoded a derived instance via an explicit witness passed as a parameter. Derived type classes require genuine inheritance of class types, as seen in the encoding of Haskell's Ord type class:

class  (Eq a) => Ord a  where
    compare              :: a -> a -> Ordering
    (<), (<=), (>), (>=) :: a -> a -> Bool
    max, min             :: a -> a -> a

(I omit showing the default implementations). I use the SML ordering type for fidelity:

type ordering = LT | EQ | GT

let of_comparison c = if c < 0 then LT else if c > 0 then GT else EQ
and to_comparison   = function LT -> -1 | EQ -> 0 | GT -> -1

The class type encoding the Ord type class is:

class type ['a] ord = object
  inherit ['a] eq
  method compare       : 'a -> 'a -> ordering
  method less_equal    : 'a -> 'a -> bool
  method less_than     : 'a -> 'a -> bool
  method greater_equal : 'a -> 'a -> bool
  method greater_than  : 'a -> 'a -> bool
  method max           : 'a -> 'a -> 'a
  method min           : 'a -> 'a -> 'a
end

Again, operators take an explicit witness as a dictionary:

let (<==) x y (ord : 'a #ord) = ord#less_equal    x y
and (<<<) x y (ord : 'a #ord) = ord#less_than     x y
and (>==) x y (ord : 'a #ord) = ord#greater_equal x y
and (>>>) x y (ord : 'a #ord) = ord#greater_than  x y

And I provide a virtual class with default implementations:

class virtual ['a] ord_base = object (self)
  inherit ['a] eq_base
  method virtual compare : 'a -> 'a -> ordering
  method equal         x y = match self#compare x y with EQ -> true  | _ -> false
  method less_equal    x y = match self#compare x y with GT -> false | _ -> true
  method less_than     x y = match self#compare x y with LT -> true  | _ -> false
  method greater_equal x y = match self#compare x y with LT -> false | _ -> true
  method greater_than  x y = match self#compare x y with GT -> true  | _ -> false
  method max           x y = match self#compare x y with LT -> y | _ -> x
  method min           x y = match self#compare x y with GT -> y | _ -> x
end

Note how the class type inheritance ord :> eq translates into class inheritance ord_base :> eq_base. The instances for unit and bool use inheritance and selective overriding for efficiency:

let ord_unit : unit ord = object
  inherit [unit] ord_base
  method equal     = eq_unit#equal
  method not_equal = eq_unit#not_equal
  method compare () () = EQ
  method max     () () = ()
  method min     () () = ()
end

let ord_bool : bool ord = object
  inherit [bool] ord_base
  method equal     = eq_bool#equal
  method not_equal = eq_bool#not_equal
  method compare b b' = match b, b' with
  | false, true -> LT
  | true, false -> GT
  | _           -> EQ
  method max = (||)
  method min = (&&)
end

In order to profit from the efficient implementation of equal and not_equal I explicitly delegate on the methods in the corresponding eq instances. Now as before I can use the ad-hoc polymorphic operators in Pervasives to speed writing the type classes for atomic types:

class ['a] ord_poly : ['a] ord = object
  inherit ['a] ord_base
  method compare x y   = of_comparison (Pervasives.compare x y)
  method less_equal    = Pervasives.(<=)
  method less_than     = Pervasives.(< )
  method greater_equal = Pervasives.(>=)
  method greater_than  = Pervasives.(> )
  method max           = Pervasives.max
  method min           = Pervasives.min
end

let ord_int    = (new ord_poly :> int    ord)
and ord_float  = (new ord_poly :> float  ord)
and ord_char   = (new ord_poly :> char   ord)
and ord_string = (new ord_poly :> string ord)

(Note the use of the injection into ordering). The case for pairs requires appropriate witnesses as before:

let ord_pair (l : 'a ord) (r : 'b ord) : ('a * 'b) ord = object
  inherit ['a * 'b] ord_base
  method equal = (eq_pair (l :> 'a eq) (r :> 'b eq))#equal
  method compare (p, q) (p', q') =
    match l#compare p p' with EQ -> r#compare q q' | c -> c
end

In this case, delegating equal requires passing the witnesses to eq_pair explicitly coerced to the eq class type. Finally, the classes for options and lists are straightforward applications of the encoding:

let ord_option (o : 'a ord) : 'a option ord = object
  inherit ['a option] ord_base
  method equal = (eq_option (o :> 'a eq))#equal
  method compare x y = match x with
    None   -> (match y with None -> EQ | Some _ -> LT)
  | Some x -> (match y with None -> GT | Some y -> o#compare x y)
end

let ord_list (o : 'a ord) : 'a list ord = object
  inherit ['a list] ord_base
  method equal = (eq_list (o :> 'a eq))#equal
  method compare =
    let rec compare x y = match x, y with
    | [], [] -> EQ
    | _ , [] -> GT
    | [], _  -> LT
    | x :: xs, y :: ys -> match o#compare x y with
    | EQ -> compare xs ys
    | c  -> c
    in compare
end

As it is clear, the encoding shows a direct correspondence between Haskell type classes and OCaml's classes and objects:

  • type classes into class types with top-level functions dispatching on a witness object of that class type
  • derived type classes into inheriting class types
  • default implementations into virtual classes
  • instances into objects inheriting from the default virtual class
  • derived instances into objects having witnesses of the appropriate class type

In every case, the witness objects are used purely as closed vtables without state. It would be interesting to investigate which use cases would open using row polymorphism or data members. Also, note that there is no need to abstract over the object types, as each witness carries with it the intended implementation in a type-safe manner.

This technique is analogous to the one using functors to encode type classes, but then why have both in the same language?, in the words of Jacques Garrigue. One reason is that functors are not first-class values in OCaml but objects are, which makes using them as type witnesses much more lightweight. On the other hand, one limitation that functors don't have is that this encoding is limited to first-order type classes. In other words, it is not clear what would be required to implement Haskell type classes like:

instance  (Monad m) => Functor m  where
    fmap f x = bind x (return . f)

where m is a type constructor with sort * → * using the encoding presented here, whereas it is straightforward to concoct a (ML) functor:

module MonadFunctor (M : MONAD) : FUNCTOR = struct
  type 'a t = 'a M.t
  let fmap f x = M.bind x (fun v -> M.return (f v))
end

Another limitation is the need to pass a witness object to the type-indexed functions. One would ideally want to write something like:

let (===) (x : eq) = x#equal

where the value is its own witness, but obviously this would require that every value be wrapped into an object, requiring, for instance, code like the following:

if (new eq_int 5) === (new eq_int x) then …

Moreover, in order to actually operate on the wrapped values, the classes would need an explicit accessor. OCaml supports double dispatch making possible forcing the types of x and y to be equal:

class type ['a] eq = object ('self)
  method value     : 'a
  method equal     : 'self -> bool
  method not_equal : 'self -> bool
end

class eq_int x : [int] eq = object (self : 'self)
  method value                    = x
  method equal     (that : 'self) = self#value == that#value
  method not_equal (that : 'self) = self#value != that#value
end

Canonicalizing the witnesses for finite types like bool is very tedious:

let eq_bool : bool -> bool eq =
  let _t : bool eq = object
    method value          = true
    method equal     that =      that#value
    method not_equal that = not (that#value)
  end and _f : bool eq = object
    method value          = false
    method equal     that = not (that#value)
    method not_equal that =      that#value
  end in fun b -> if b then _t else _f

(essentially, a Church-encoded boolean). More importantly, this encoding is much more dynamic, making it rather remote with respect to the original, fully static Haskell semantics (this, incidentally, is a definite advantage of the functorial encoding).

In closing, it would be interesting to see how far this technique can be carried, especially to implement a numeric tower analogous to Haskell's, or maybe a better type class hierarchy to encode would be that of the Numeric Prelude. Another necessary avenue of investigation is measuring the abstraction penalty incurred by this encoding. In any case, I believe this is a neat showcase of the often-forgotten object-oriented features in OCaml.