2009-12-24

Functional Macramé

What is the meaning of an expression like

let rec e = f … e …

known Haskell is as "knot-tying"? As Lloyd Allison explains:

A circular program creates a data structure whose computation depends upon itself or refers to itself. The technique is used to implement the classic data structures circular and doubly-linked lists, threaded trees and queues, in a functional programming language. These structures are normally thought to require updatable variables found in imperative languages…

But first, a bit of motivation. OCaml allows for the construction of cyclic values going through a data constructor. For instance, the following is legal:

let rec ones = 1 :: ones in ones

as the expression is a cons cell. This can be extended naturally to mutually recursive values. For instance, a grammar can be built with:

type 'a symbol = Terminal of 'a | Nonterminal of 'a symbol list list

let arith_grammar =
  let rec s  = Nonterminal [[add]]
  and add    = Nonterminal [[mul]; [add; Terminal "+"; mul]]
  and mul    = Nonterminal [[term]; [mul; Terminal "*"; term]]
  and term   = Nonterminal [[number]; [Terminal "("; s; Terminal ")"]]
  and number = Nonterminal [[digit]; [digit; number]]
  and digit  = Nonterminal (List.map (fun d -> [Terminal (string_of_int d)]) (range 0 9))
  in s

This is a perfectly valid recursive value in OCaml and it is a direct translation from the code in this article. The problem is that an implementation of the Omega monad for breadth-first enumeration of a list of lists requires laziness in an essential way. I'll use a stub (mock?) sequence type:

module Seq = struct
  type 'a cons = Nil | Cons of 'a * 'a t
   and 'a t  = 'a cons Lazy.t
  let rec map f q = lazy (match Lazy.force q with
  | Nil -> Nil
  | Cons (x, q) -> Cons (f x, map f q))
  let rec of_list l = lazy (match l with
  | [] -> Nil
  | x :: xs -> Cons (x, of_list xs))
end

Unfortunately this doesn't work:

type 'a symbol = Terminal of 'a | Nonterminal of 'a symbol Seq.t Seq.t

let rule ss = Nonterminal (Seq.map Seq.of_list (Seq.of_list ss))

let arith_grammar =
  let rec s  = rule [[add]]
  and add    = rule [[mul]; [add; Terminal "+"; mul]]
  and mul    = rule [[term]; [mul; Terminal "*"; term]]
  and term   = rule [[number]; [Terminal "("; s; Terminal ")"]]
  and number = rule [[digit]; [digit; number]]
  and digit  = rule (List.map (fun d -> [Terminal (string_of_int d)]) (range 0 9))
  in s

The dreaded "This kind of expression is not allowed as right-hand side of `let rec'" error raises its ugly head: rule is a function, not a constructor, and OCaml rightly complains that it cannot lazily evaluate a bunch of strict definitions involving computation. In a lazy language, in contrast, the right-hand-side expression is not evaluated until it is needed. So, again, what is the meaning of an expression like

let rec e = f … e …

When the value of e is required, the function f is called with an unevaluated e. If it doesn't use it, the result is well-defined; if it does, this results in a recursive call to f. In a strict language we must be explicit in the delaying and forcing of thunks:

let f' … e … =
  …
  if e_is_needed then … Lazy.force e …
  …
in
let rec e = lazy (f' … e …)

Alas, if f' itself is lazy, the expected code won't work:

let f' … e … = lazy (
  …
  if e_is_needed then … Lazy.force e …
  …
)
in
let rec e = f' … e …

because lazy works syntactically as a constructor in OCaml, again we're told that "This kind of expression is not allowed as right-hand side of `let rec'". This means that we cannot use knot-tying with lazy abstract data types like infinite lists and streams without going explicitly through lazy.

Stepping back and taking a little distance from the problem at hand, let's revisit Allison example of circular lists. He gives essentially this example:

let circ p f g x =
  let rec c = build x
  and build y = f y :: if p y then c else build (g y)
  in c

(which is equivalent to an unfold followed by a knot-tying). This unsurprisingly doesn't work, but using lazy as outlined above does:

let circ p f g x =
  let rec c = lazy (build x)
  and build y = Seq.Cons (f y, if p y then c else lazy (build (g y)))
  in c

and the knot is tied by actually making a reference to the value as desired:

let x = circ (fun _ -> true) id id 0 in let Seq.Cons (_, y) = Lazy.force x in x == y ;;
- : bool = true

A bit of lambda-lifting to make the binding and value recursion distinct and separate gives:

let circ p f g x =
  let rec build y c = Seq.Cons (f y, if p y then c else lazy (build (g y) c)) in
  let rec c = lazy (build x c) in c

This hints at what appears to be a limitation of strict languages, namely that circular computations seem to require explicit binding management in an essential way, either imperative like in this code or by using a method like Dan Piponi's Löb functor. Applying this technique to our grammar makes for tedious work: all the mutually recursive references must be lambda-lifted, and the knot tied simultaneously through lazy:

let rule ss = Lazy.force (Seq.map Seq.of_list (Seq.of_list ss))

let arith_grammar =
  let make_expr   exp add mul trm num dig = rule [[add]]
  and make_add    exp add mul trm num dig = rule [[mul]; [add; Terminal "+"; mul]]
  and make_mul    exp add mul trm num dig = rule [[trm]; [mul; Terminal "*"; trm]]
  and make_term   exp add mul trm num dig = rule [[num]; [Terminal "("; exp; Terminal ")"]]
  and make_number exp add mul trm num dig = rule [[dig]; [dig; num]]
  and make_digit  exp add mul trm num dig = rule (List.map (fun d -> [Terminal (string_of_int d)]) (range 0 9))
  in let
  rec exp = Nonterminal (lazy (make_expr   exp add mul trm num dig))
  and add = Nonterminal (lazy (make_add    exp add mul trm num dig))
  and mul = Nonterminal (lazy (make_mul    exp add mul trm num dig))
  and trm = Nonterminal (lazy (make_term   exp add mul trm num dig))
  and num = Nonterminal (lazy (make_number exp add mul trm num dig))
  and dig = Nonterminal (lazy (make_digit  exp add mul trm num dig)) in
  exp

This can become unworkable pretty quickly, but is a solution! Note that the type of sequences forces me to use an explicit evaluation discipline: rule must return an evaluated expression, but the evaluation itself is delayed inside the Nonterminal constructor.

Allison's paper ends with an alternative for strict imperative languages like Pascal: using an explicit reference for the circular structure, something like this:

let f' … e … =
  …
  if e_is_needed then … Ref.get e …
  …
in
let e_ref = Ref.make () in
let e = f' … e_ref … in
Ref.set e_ref e

where Ref.make has type unit -> 'a, that is, it is magic. Unfortunately, Xavier Leroy himself stated that Obj.magic is not part of the OCaml language :-) (although a quick look shows many a would-be apprentice at work). And in this case it is true that no amount of magic wold make this work in the general case since e_ref must refer to an otherwise dummy value of the appropriate type which gets overwritten with the final result, in effect reserving memory of the necessary size. In specific cases, however, this can be made to work with a bit of care.

Merry Christmas!

2009-12-23

Gained in Translation

First of all, I'd like to apologize for the infrequent updates and the lightness of the last few entries. I seldom have time of late for anything but the quickest of finger exercises, but I wanted to put something on writing before the year is over. What better inspiration than one of Remco Niemeijer's terse solutions to the daily Programming Praxis. This week's asks for an implementation of Parnas's permuted indices, and Remco's solution is minimal enough. I translated his Haskell code to almost-verbatim OCaml, interjecting the necessary definitions to make the code read essencially the same way. For example, I needed to translate:

rot xs = [(unwords a, unwords b) | (a, b) <- init $
          zip (inits xs) (tails xs), notElem (head b) stopList]

(n.b: this is Haskell). The function inits returns all the initial segments of a list, so that inits "abc" = ["", "a", "ab", "abc"]. Conversely, tails returns all the tails of a list, so that tails "abc" = ["abc", "bc", "c", ""]. The zip of both lists is the list of all the ways in which you can split a list, so that with our example:

zip (inits "abc") (tails "abc") = [
  ("", "abc"),
  ("a", "bc"),
  ("ab", "c"),
  ("abc", "")
]

and the init of that is every element on that list except for the last one. After writing the necessary infrastructure, the equivalent solution was simple to write, but then I noticed that I could refactor it into something a bit terser. The first opportunity for compression I found was to use an ad-hoc function for splitting a list in every way possible except the last, in effect subsuming init $ zip (inits xs) (tails xs) into a single recursive function:

let rec split_all l = match l with
| []      -> []
| x :: xs -> ([], l) :: List.map (fun (hs, ts) -> x :: hs, ts) (split_all xs)

Classic of text processing tasks in Haskell is the use of functions converting text into lists and vice versa; this required writing some simple helper functions:

let words = Str.split (Str.regexp " ")
and lines = Str.split (Str.regexp "\n")
and unwords = String.concat " "

The permuted index construction must filter a number of stop words:

let stop_list = words "a an and by for if in is of on the to"

As Remco explains, the core function for generating a permuted index finds all the splittings of a given sentence, and uses the head as the context for the tail. The function he gives is a typical generation—filtering—reduction pipeline expressed as a list comprehension. I initally wrote the comprehension as a right fold (this is always possible), and in a second phase I rewrote that into a point-free function more directly expressing the reduction. For that I needed a number of combinators:

let ( % ) f g x = f (g x)

let cross f (x, y) = (f x, f y)

let distrib f (x, y) (z, t) = (f x z, f y t)

let flip f x y = f y x

The composition operator % is an old friend of this blog. The combinator cross lifts a function over a pair, and distrib distributes a binary function over two pairs. The combinator flip swaps the arguments to a curried function. All of this is standard and it allowed me to write:

let rot = List.map (cross unwords) % List.filter (not % flip List.mem stop_list % List.hd % snd) % split_all

This function takes a list of words, splits it every which way, throws away those pairs whose second component (the tail) begins with a stop word, and joins each component of the resulting pairs of words into sentence fragments. The function pretty-printing an index underwent a similar compression: instead of finding the longest fragment separately on each component, I did it in one pass over the list of pairs:

let pp_index xs =
  let l1, l2 = List.fold_right (distrib (max % String.length)) xs (0, 0) in
  List.iter (fun (a, b) -> Printf.printf "%*s   %-*s\n" l1 a l2 b) xs

The function max % String.length composes on the first argument of the curried max, and thus has type string → int → int; in order to distribute over pairs and make the types come out right I needed to use a right_fold instead of a more natural (in OCaml) left_fold, but this is otherwise straightforward. The pretty printing of the index is exactly like in Remco's code, as both OCaml's and Haskell's printf implement the same formats. Putting everything together needs a sort on the second components; I use a (very inefficient) helper function implementing case-insensitive sort:

let ci_compare a b = compare (String.lowercase a) (String.lowercase b)

let permute_index =
  pp_index % List.sort (fun (_, a) (_, b) -> ci_compare a b) % List.concat % List.map (rot % words) % lines

The text is decomposed into lines, each line is further decomposed into words and an index is built for it, the fragmentary indexes are collated and sorted into the final result which is finally printed out. A test gives out the expected result:

let () = permute_index "All's well that ends well.\nNature abhors a vacuum.\nEvery man has a price.\n"
              Nature   abhors a vacuum.
                       All's well that ends well.
     All's well that   ends well.
                       Every man has a price.
           Every man   has a price.
               Every   man has a price.
                       Nature abhors a vacuum.
     Every man has a   price.
          All's well   that ends well.
     Nature abhors a   vacuum.
               All's   well that ends well.
All's well that ends   well.

A point to keep in mind is that permute_index and especially rot would probably have been clearer written in a monadic style, as it emphasizes an "element-at-a-time" view of list processing as I've written before. The downside would have been the need to name every intermediate value being transformed:

let rot xs =
  split_all xs >>= fun (hs, ts) ->
  guard (not (List.mem (List.hd ts) stop_list)) >>
  return (unwords hs, unwords ts)

It seems that, in this sense, monadic beats recursion but point-free beats monadic for conciseness. As it is, the 30 lines comprising this code fit in one short page. Not bad.

2009-11-06

Reflecting on One-Liners

The delightful Futility Closet posted a simple puzzler about the next year after 1961 that reads the same upside down. It is quite easy to see that it will be 6009, and to convince oneself that indeed no smaller number exists a dozen of lines of code suffice.

The key to concision is to lift the syntax and semantics of monads into the problem. Since not every digit is itself a digit upon rotation by 180° (I admit to taking poetic license with the title), it is natural to work with failing computations, otherwise known as the option monad:

let return x = Some x
let (>>=) m f = match m with None -> None | Some x -> f x

Of all the natural operations on monads known and loved by Haskell practitioners, I'll need just a bit of support:

let fmap f m = m >>= fun x -> return (f x)

let lift2m f m n = m >>= fun x -> n >>= fun y -> return (f x y)

let sequence ms = List.fold_right (lift2m (fun x xs -> x :: xs)) ms (return [])

As the last bit of scaffolding, I need a monadized lookup function:

let find l e = try Some (List.assoc e l) with Not_found -> None

These lines are a very well-known part of the standard library of Haskell, so I'm already six over the par. I'm looking for digits that read as digits upon a rotation of 180°. An association list maps those digits to their rotations:

let rotations = [0, 0; 1, 1; 6, 9; 8, 8; 9, 6]

If these numbers were intended to be read on seven-segment displays, I could have added the (2, 5) and (5, 2) pairs. By repeatedly dividing by 10 I can get the list of decimal digits of a number:

let to_digits = let rec go l n = if n = 0 then l else go (n mod 10 :: l) (n / 10) in go []

The opposite operation, building a number from the list of its decimal digits is an application of Horner's Rule:

let of_digits l = List.fold_left (fun x y -> 10 * x + y) 0 l

Reflection is a compact and dense point-free one-liner:

let reflect = fmap (of_digits % List.rev) % sequence % List.map (find rotations) % to_digits

For each digit I find its rotation, if it exists. The result is a list of digit computations, some of those possibly failed. I turn that into a list computation that succeeds only if all its components succeed with sequence. The desired result is built out of the reversed list of rotated digits. Now a number is symmetric if it can be read upside down as itself:

let is_symmetric n = match reflect n with Some m -> n = m | _ -> false

The first year after 1961 that is symmetric is 6009, as expected:

let year = List.hd $ List.filter is_symmetric (range 1962 10_000)

That's it, twelve lines. Composition and range are left as an exercise for the reader.

2009-08-30

Fun with (Type) Class

I'd like to motivate OCaml's object-oriented features (the "O" in "OCaml") by showing an encoding of (first-order) Haskell type classes. This will require making use of almost everything in OCaml's object buffet: class types, virtual classes, inheritance of class types and of classes, double dispatch and coercion.

Haskell could be said to have the best type system among all languages in wide use at the moment, striking a delicate balance between power and practicality without compromising purity (neither conceptual nor functional). In the words of Oleg Kiselyov, it sports "attractive types". A great feature is the presence of type-indexed values via the type-class mechanism. In a nutshell, type-indexed values are functions from types to values. A classic example is overloaded operators, where for instance the same symbol + denotes addition over the integers, floating-point and rational numbers, depending on the type of its parameters. Haskell provides "type magic" to automatically infer the intended function (the value) based on the types of the arguments to which it is applied, achieving a form of type-safe "ad-hoc" polymorphism.

The ML family of languages doesn't have the machinery to attain such expressive power. There are, however, a number of ways to encode type-indexed families with an explicit parameter encoding the selector type. The oldest, and most complicated technique is that of Danvy (see also this complete example) which is not really equivalent to Haskell's type classes but a sophisticated application of phantom types. Another technique is using an explicit dictionary (Yaron Minsky has a worked example) from which to select the appropriate operation. Normally these dictionaries are records; the labels provide type-safe instantiation of the generic value family. The problem with records is that they are not extensible, so it is not immediate how to encode the equivalent of inheritance of type classes. OCaml's object-oriented sublanguage is especially well-suited for the task, allowing for a natural encoding that shows that the "class" in Haskell's "type classes" is more object-oriented than it seems at first sight.

I'll start with a bit of Haskell's standard prelude, as shown in the Report in a nice graph. I'll focus on the two most common nontrivial classes, Eq and Ord. The first is defined as:

class  Eq a  where
    (==), (/=)  ::  a -> a -> Bool

    x /= y  = not (x == y)
    x == y  = not (x /= y)

I encode a Haskell type class in OCaml with… (surprise!) a class type and associated functions:

class type ['a] eq = object
  method equal     : 'a -> 'a -> bool
  method not_equal : 'a -> 'a -> bool
end

Since the type class has a type parameter a, my class type also has a type parameter 'a. Each member of the type class corresponds to a (named) method and to a regular function taking the class type as a dictionary parameter:

let (===) x y (eq : 'a #eq) = eq#equal x y
and (<=>) x y (eq : 'a #eq) = eq#not_equal x y

(I've opted to not shadow the regular operators in Pervasives). The weird order of the parameters (you'd expect the dictionary to come first) is such that I can write:

    if x === y $ eq_int then …

where ($) is the application operator. Haskell provides default implementations for both members; as the Report notes:

This declaration gives default method declarations for both /= and ==, each being defined in terms of the other. If an instance declaration for Eq defines neither == nor /=, then both will loop.

I find that unsettling, so I'll just define one via a virtual class (not class type):

class virtual ['a] eq_base = object (self)
  method virtual equal : 'a -> 'a -> bool
  method not_equal x y = not (self#equal x y)
end

In this case, equal is left virtual. It is important that the class be left virtual even if equal is defined, so that it cannot be instantiated. This means that it cannot be coerced to the class type eq, since it is illegal to shadow virtual members that were declared non-virtual. Now come the instances of the type class. In Haskell, a very simple instance is that for unit which is built-in (via deriving) but could be declared as:

instance  Eq ()  where
    () == () = true

To encode instances, I use an immediate object bound to an identifier:

let eq_unit : unit eq = object
  inherit [unit] eq_base
  method equal     () () = true
  method not_equal () () = false
end

Note the coercion to the corresponding monomorphic instance of the class type. I inherit from the default implementation in the class base, even if in this case I explicitly implement all methods for efficiency. The same encoding works for booleans too:

let eq_bool : bool eq = object
  inherit [bool] eq_base
  method equal     = (==)
  method not_equal = (!=)
end

OCaml has built-in ad-hoc polymorphic comparisons that I could use directly:

class ['a] eq_poly : ['a] eq = object
  inherit ['a] eq_base
  method equal     = Pervasives.(=)
  method not_equal = Pervasives.(<>)
end

With that I can have type classes for the primitive types:

let eq_int    = (new eq_poly :> int    eq)
and eq_float  = (new eq_poly :> float  eq)
and eq_char   = (new eq_poly :> char   eq)
and eq_string = (new eq_poly :> string eq)

Here I instantiate objects of the eq_poly class coerced to the correct monomorphic instance of eq (recall that in OCaml coercion is always explicit). The type class instance for pairs requires type classes for each of its components. In Haskell, we'd have:

instance  (Eq a, Eq b) => Eq ((,) a b)  where …

(not really, but let me gloss over that). Encoding that in OCaml also requires a witness for each type parameter as explicit arguments to the constructor:

let eq_pair (l : 'a eq) (r : 'b eq) : ('a * 'b) eq = object (self)
  inherit ['a * 'b] eq_base
  method equal (p, q) (p', q') = l#equal p p' && r#equal q q'
end

An inefficiency of this encoding is that it creates a new object with every invocation of the witness function. For instance, in general eq_pair eq_unit eq_bool != eq_pair eq_unit eq_bool and so on for every witness. The same is needed to encode an equality witness for option types:

let eq_option (o : 'a eq) : 'a option eq = object
  inherit ['a option] eq_base
  method equal x y = match x, y with
    None  , None   -> true
  | Some x, Some y -> o#equal x y
  | _              -> false
end

And for lists:

let eq_list (o : 'a eq) : 'a list eq = object
  inherit ['a list] eq_base
  method equal =
    let rec equal x y = match x, y with
    | [], [] -> true
    | _ , []
    | [], _  -> false
    | x :: xs, y :: ys when o#equal x y -> equal xs ys
    | _      -> false
    in equal
end

(I use explicit closed recursion for efficiency). In all these cases, I encoded a derived instance via an explicit witness passed as a parameter. Derived type classes require genuine inheritance of class types, as seen in the encoding of Haskell's Ord type class:

class  (Eq a) => Ord a  where
    compare              :: a -> a -> Ordering
    (<), (<=), (>), (>=) :: a -> a -> Bool
    max, min             :: a -> a -> a

(I omit showing the default implementations). I use the SML ordering type for fidelity:

type ordering = LT | EQ | GT

let of_comparison c = if c < 0 then LT else if c > 0 then GT else EQ
and to_comparison   = function LT -> -1 | EQ -> 0 | GT -> -1

The class type encoding the Ord type class is:

class type ['a] ord = object
  inherit ['a] eq
  method compare       : 'a -> 'a -> ordering
  method less_equal    : 'a -> 'a -> bool
  method less_than     : 'a -> 'a -> bool
  method greater_equal : 'a -> 'a -> bool
  method greater_than  : 'a -> 'a -> bool
  method max           : 'a -> 'a -> 'a
  method min           : 'a -> 'a -> 'a
end

Again, operators take an explicit witness as a dictionary:

let (<==) x y (ord : 'a #ord) = ord#less_equal    x y
and (<<<) x y (ord : 'a #ord) = ord#less_than     x y
and (>==) x y (ord : 'a #ord) = ord#greater_equal x y
and (>>>) x y (ord : 'a #ord) = ord#greater_than  x y

And I provide a virtual class with default implementations:

class virtual ['a] ord_base = object (self)
  inherit ['a] eq_base
  method virtual compare : 'a -> 'a -> ordering
  method equal         x y = match self#compare x y with EQ -> true  | _ -> false
  method less_equal    x y = match self#compare x y with GT -> false | _ -> true
  method less_than     x y = match self#compare x y with LT -> true  | _ -> false
  method greater_equal x y = match self#compare x y with LT -> false | _ -> true
  method greater_than  x y = match self#compare x y with GT -> true  | _ -> false
  method max           x y = match self#compare x y with LT -> y | _ -> x
  method min           x y = match self#compare x y with GT -> y | _ -> x
end

Note how the class type inheritance ord :> eq translates into class inheritance ord_base :> eq_base. The instances for unit and bool use inheritance and selective overriding for efficiency:

let ord_unit : unit ord = object
  inherit [unit] ord_base
  method equal     = eq_unit#equal
  method not_equal = eq_unit#not_equal
  method compare () () = EQ
  method max     () () = ()
  method min     () () = ()
end

let ord_bool : bool ord = object
  inherit [bool] ord_base
  method equal     = eq_bool#equal
  method not_equal = eq_bool#not_equal
  method compare b b' = match b, b' with
  | false, true -> LT
  | true, false -> GT
  | _           -> EQ
  method max = (||)
  method min = (&&)
end

In order to profit from the efficient implementation of equal and not_equal I explicitly delegate on the methods in the corresponding eq instances. Now as before I can use the ad-hoc polymorphic operators in Pervasives to speed writing the type classes for atomic types:

class ['a] ord_poly : ['a] ord = object
  inherit ['a] ord_base
  method compare x y   = of_comparison (Pervasives.compare x y)
  method less_equal    = Pervasives.(<=)
  method less_than     = Pervasives.(< )
  method greater_equal = Pervasives.(>=)
  method greater_than  = Pervasives.(> )
  method max           = Pervasives.max
  method min           = Pervasives.min
end

let ord_int    = (new ord_poly :> int    ord)
and ord_float  = (new ord_poly :> float  ord)
and ord_char   = (new ord_poly :> char   ord)
and ord_string = (new ord_poly :> string ord)

(Note the use of the injection into ordering). The case for pairs requires appropriate witnesses as before:

let ord_pair (l : 'a ord) (r : 'b ord) : ('a * 'b) ord = object
  inherit ['a * 'b] ord_base
  method equal = (eq_pair (l :> 'a eq) (r :> 'b eq))#equal
  method compare (p, q) (p', q') =
    match l#compare p p' with EQ -> r#compare q q' | c -> c
end

In this case, delegating equal requires passing the witnesses to eq_pair explicitly coerced to the eq class type. Finally, the classes for options and lists are straightforward applications of the encoding:

let ord_option (o : 'a ord) : 'a option ord = object
  inherit ['a option] ord_base
  method equal = (eq_option (o :> 'a eq))#equal
  method compare x y = match x with
    None   -> (match y with None -> EQ | Some _ -> LT)
  | Some x -> (match y with None -> GT | Some y -> o#compare x y)
end

let ord_list (o : 'a ord) : 'a list ord = object
  inherit ['a list] ord_base
  method equal = (eq_list (o :> 'a eq))#equal
  method compare =
    let rec compare x y = match x, y with
    | [], [] -> EQ
    | _ , [] -> GT
    | [], _  -> LT
    | x :: xs, y :: ys -> match o#compare x y with
    | EQ -> compare xs ys
    | c  -> c
    in compare
end

As it is clear, the encoding shows a direct correspondence between Haskell type classes and OCaml's classes and objects:

  • type classes into class types with top-level functions dispatching on a witness object of that class type
  • derived type classes into inheriting class types
  • default implementations into virtual classes
  • instances into objects inheriting from the default virtual class
  • derived instances into objects having witnesses of the appropriate class type

In every case, the witness objects are used purely as closed vtables without state. It would be interesting to investigate which use cases would open using row polymorphism or data members. Also, note that there is no need to abstract over the object types, as each witness carries with it the intended implementation in a type-safe manner.

This technique is analogous to the one using functors to encode type classes, but then why have both in the same language?, in the words of Jacques Garrigue. One reason is that functors are not first-class values in OCaml but objects are, which makes using them as type witnesses much more lightweight. On the other hand, one limitation that functors don't have is that this encoding is limited to first-order type classes. In other words, it is not clear what would be required to implement Haskell type classes like:

instance  (Monad m) => Functor m  where
    fmap f x = bind x (return . f)

where m is a type constructor with sort * → * using the encoding presented here, whereas it is straightforward to concoct a (ML) functor:

module MonadFunctor (M : MONAD) : FUNCTOR = struct
  type 'a t = 'a M.t
  let fmap f x = M.bind x (fun v -> M.return (f v))
end

Another limitation is the need to pass a witness object to the type-indexed functions. One would ideally want to write something like:

let (===) (x : eq) = x#equal

where the value is its own witness, but obviously this would require that every value be wrapped into an object, requiring, for instance, code like the following:

if (new eq_int 5) === (new eq_int x) then …

Moreover, in order to actually operate on the wrapped values, the classes would need an explicit accessor. OCaml supports double dispatch making possible forcing the types of x and y to be equal:

class type ['a] eq = object ('self)
  method value     : 'a
  method equal     : 'self -> bool
  method not_equal : 'self -> bool
end

class eq_int x : [int] eq = object (self : 'self)
  method value                    = x
  method equal     (that : 'self) = self#value == that#value
  method not_equal (that : 'self) = self#value != that#value
end

Canonicalizing the witnesses for finite types like bool is very tedious:

let eq_bool : bool -> bool eq =
  let _t : bool eq = object
    method value          = true
    method equal     that =      that#value
    method not_equal that = not (that#value)
  end and _f : bool eq = object
    method value          = false
    method equal     that = not (that#value)
    method not_equal that =      that#value
  end in fun b -> if b then _t else _f

(essentially, a Church-encoded boolean). More importantly, this encoding is much more dynamic, making it rather remote with respect to the original, fully static Haskell semantics (this, incidentally, is a definite advantage of the functorial encoding).

In closing, it would be interesting to see how far this technique can be carried, especially to implement a numeric tower analogous to Haskell's, or maybe a better type class hierarchy to encode would be that of the Numeric Prelude. Another necessary avenue of investigation is measuring the abstraction penalty incurred by this encoding. In any case, I believe this is a neat showcase of the often-forgotten object-oriented features in OCaml.

2009-07-21

(Dis)Functional Bowling

It is saddening to see people invested so heavily in cargo-cult programming practices that they cannot see natural solutions to simple problems after having their thoughts twisted by object-orientation as an ends and TDD as a fighting technique. Uncle Bob is struggling (and failing) to approach functional programming in the natural way by what seems to be his biases and commercial interests. His Bowling Game Kata (warning, PPT; right-click to download) is to find the score of a 10-frame bowling game:

The game consists of 10 frames as shown above. In each frame the player has two opportunities to knock down 10 pins. The score for the frame is the total number of pins knocked down, plus bonuses for strikes and spares.

A spare is when the player knocks down all 10 pins in two tries. The bonus for that frame is the number of pins knocked down by the next roll. So in frame 3 above, the score is 10 (the total number knocked down) plus a bonus of 5 (the number of pins knocked down on the next roll.)

A strike is when the player knocks down all 10 pins on his first try. The bonus for that frame is the value of the next two balls rolled.

In the tenth frame a player who rolls a spare or strike is allowed to roll the extra balls to complete the frame. However no more than three balls can be rolled in tenth frame.

In order to sidestep the problem of validating games for length, I'll consider a bowling game as an infinite number of balls, a finite prefix of which is non-zero:

let repeat x = let rec l = x :: l in l

let game strikes = strikes @ repeat 0

Scoring a frame according to the rules is a matter of pattern matching on the first few balls. According to the result, a complete frame is removed and the rest of the game returned:

let score_frame = function
| 10 :: (n :: m :: _ as g) -> 10 + n + m, g
| n :: m :: (r :: _ as g) when n + m = 10 -> 10 + r, g
| n :: m :: g -> n + m, g

To score a game, I keep track of how many frames I'm into the game, and stop at the tenth:

let rec score frame g =
 if frame = 10 then 0 else
 let n, g = score_frame g in
 n + score (succ frame) g

The game begins at the zeroth-frame, obviously:

# score 0 (game [1; 4; 4; 5; 6; 4; 5; 5; 10; 0; 1; 7; 3; 6; 4; 10; 2; 8; 6]) ;;
- : int = 133

10 minutes of thinking, and some looking around to realize that I had mistyped the scorecard. What TDD do you need, Uncle Bob?

2009-07-19

Monadic Golf

Reading Bonsai Code's solutions to Programming Praxis's weekly puzzles (which I can't recommend highly enough) makes me feel acutely aware of how verbose OCaml is, and how inadequate its standard library when compared to Haskell's. However, I've found that the latest puzzles yield concisely to a monadic style over the lists.

Breaking with my usual literate top-down presentation I'll concentrate on the code required to solve the puzzles and leave the obvious scaffolding to the reader. I'll still putt over the par, especially if scored against Remco's brutally terse solutions, but I hope that what is missing is straightforward to fill in. I've written about point-free style and its limitations in the face of OCaml's value restriction. In this case I'll use monadic style for the solutions as a way to show that procedural expression too has its place in functional programming.

This week's problems are drawn from the International Mathematical Olympiads and are very much in the spirit of Project Euler's problems, yielding nicely to brute-force search. The first:

Determine all three-digit numbers N having the property that N is divisible by 11, and N/11 is equal to the sum of the squares of the digits of N.

can be solved simply by:

let imo_1960_01 =
  range 1 99 >>= fun i ->
  let n = 11 * i in
  guard (sum % fmap square % digits $ n = i) >>
  return n

In this solution and the ones that follow I express the problem in monadic terms via fmap, return, bind (and the two synonyms >>= and >>) and guard. Here % is composition, and range, sum, square and digits are obvious. Equally pythy is the solution to the second problem:

Find the smallest natural number n which has the following properties:

  1. Its decimal representation has 6 as the last digit.
  2. If the last digit 6 is erased and placed in front of the remaining digits, the resulting number is four times as large as the original number n.

Since n = 10*k + 6, the condition is equivalent to asking that 4*(10*k + 6) = 6*10^b + k, where b = ν(k) the number of decimal digits in k. Simplifying, the problem is equivalent to finding the smallest integer k = 2/13*(10^b - 4) with exactly b digits. In code:

let imo_1962_01 =
  range 1 9 >>= fun b ->
  let e = pow 10 b in
  guard (e mod 13 = 4) >>
  let k = (e - 4) / 13 * 2 in
  guard (List.length % digits $ k = b) >>
  return (10 * k + 6)

The number is not too large, and a 31-bit version of pow is sufficient. The third problem will require more scaffolding:

Five students, A, B, C, D, E, took part in a contest. One prediction was that the contestants would finish in the order ABCDE. This prediction was very poor. In fact no contestant finished in the position predicted, and no two contestants predicted to finish consecutively actually did so. A second prediction had the contestants finishing in the order DAECB. This prediction was better. Exactly two of the contestants finished in the places predicted, and two disjoint pairs of students predicted to finish consecutively actually did so. Determine the order in which the contestants finished.

This is more of a word problem than a combinatorial one, and as the latter is not very straightforward and to brute force it I'll need a number of auxiliary functions. A way to list all permutations is first:

let rec selections = function
| [] -> []
| x :: xs ->
  (x, xs) :: List.fold_right (fun (y, ys) l ->
    (y, x :: ys) :: l) (selections xs) []

let rec permutations = function
| ([] | [_]) as l -> [l]
| l ->
  List.fold_right (fun (y, ys) ->
    List.fold_right (fun zs l -> (y :: zs) :: l)
    (permutations ys)) (selections l) []

The first condition asks for permutations having no fixed points, or derangements. I need a way to single derangements out:

let count_fixpoints l p =
  List.fold_left (fun s (x, y) ->
    if x = y then succ s else s) 0 (List.combine l p)

let is_derangement l p = count_fixpoints l p = 0

Lastly, in order to filter consecutive positions, I need a way to generate them and filter them out:

let intersect l m = List.filter (fun x -> List.mem x l) m

let rec pairup = function
| [] | [_] -> []
| x :: (y :: _ as xs) -> (x, y) :: pairup xs

The solution to the problem is a word-for-word translation of the problem's conditions:

let imo_1963_01 =
  let prediction  = ['D'; 'A'; 'E'; 'C'; 'B'] in
  let contestants = List.sort compare prediction in
  let all_pairs   = pairup contestants
  and pred_pairs  = pairup prediction in
  permutations contestants >>= fun p ->
  guard (is_derangement  contestants p
      && count_fixpoints prediction  p = 2) >>
  let pp = pairup p in
  guard (List.length (intersect all_pairs pp) = 0) >>
  guard (match intersect pred_pairs pp with
  | [(x, y); (z, t)] -> y <> z && t <> x
  | _ -> false) >>
  return p

that is, the solution is to be found among all the permutations of the contestants which are derangements and have exactly two positions in common with the prediction. Of these candidates they must have no pair in common with the pairs in the sorted list of contestants, and has to have two disjoint pairs in common with the prediction.

Some would argue, I'm sure, that monadic code is not purely functional, or that it is too rigidly laid out by the sequential nature of monadic binding. I think that it is ideal to solve these word problems since I find that the solution closely follows the conditions laid out in the statement. All in all I've left out less than 40 lines of perfectly obvious support code, and gave solutions with 5, 7 and 30-odd lines. It was a fun exercise.

2009-06-17

The Tree Nursery

On my last post, Benedikt Grundmann asked if changing representations for trie nodes, in particular eliminating 'a option-boxed references, would represent a speed-up or not. I started tinkering with a direct representation:

type 'a t =
{ value : 'a option;
  split : char;
  lokid : 'a t;
  eqkid : 'a t;
  hikid : 'a t; }

(call this representation trie1). This forces me to use a sentinel node to represent empty trees:

let rec empty = { value = None; split = '\255'; lokid = empty; eqkid = empty; hikid = empty; }

let is_empty t = t == empty

(the split value is entirely arbitrary and never relied upon). This recursive structure means that I cannot use pattern matches to discriminate and deconstruct nodes, and all comparisons must be strictly by reference equality on the carefully preserved unique sentinel node. This represented a rough 10% speedup on my original code.

Meanwhile, Mauricio Fernández tried his hand at it, and came with a tantalizing result:

in the lapse between the present post and the preceding one, I implemented ternary trees with different constructors for the nodes with and without a value. In my tests, construction is 50% faster, and lookups are between 20% and 80% faster. The code is much less elegant than the one presented here, because it moreover includes some special-casing to handle the "" (empty string) key, which the above code chokes at.

(It's true that the empty string cannot be a key in both the original and the above implementation. I thought it obvious by the tradeoffs I listed in selecting the representation for the key, but truth is I never guarded the operations against it. It's a bug, not a feature). The numbers are too exciting to not try this approach. By carefully teasing apart each case, I arrived to the following data type:

type 'a t = E
          | L of 'a
          | B of      char * 'a t * 'a t * 'a t
          | K of 'a * char * 'a t * 'a t * 'a t

(call this representation trie2). The first constructor E represents an empty tree. The second constructor L represents a leaf with value α. The third constructor B represents a branching node without value. The fourth constructor K represents a marked, valued branching node. Functions over this type require eight matches in general, four each for the middle and four for the end of the key.

I'll present the benchmark results first, then the code. In every case I've run the benchmark three times in sequence from the command line and retained the last measurement. My test machine is an MSI U100 with an 1.6 GHz Intel Atom and 2 GB RAM running Mac OS X 10.5.7, and the code is compiled with ocamlopt -g -inline 10 -w Aelz -pp camlp4o version 3.11.0. Each run is comprised of three tests using the standard Map balanced-height tree, trie1 and trie2 ternary trees, with a Gc.full_major collection at the start of each. The test data sets are the 234936-word /usr/share/dict/words file in Mac OS X, and a 89546-word Spanish lexicon, with 3.91% overlap between both. The tests measure:

  1. Insertion of all the English words into an empty tree via a fold_left
  2. Retrieval of the list of all inserted words via a S.fold (with validation of length and sortedness outside the timing)
  3. Retrieval of all the English words
  4. Retrieval of all the Spanish words
  5. Deletion of all the English words from the tree via a fold_left (with validation that the resulting tree is empty)

These are the results for the list of words in dictionary order:

Test of 89546-word set in dictionary order. In ops/s.
Data StructureInsertList AllFind AllFind Sp.Delete All
Map167,105.772,056,784.27254,101.92225,067.88802,020.62
trie1184,619.90373,794.66981,049.901,201,493.12273,188.70
trie2232,617.76468,132.42791,158.14984,052.05371,658.90

These are the results for the list of words in median order, that is the list with the median first followed by the left and right sublists both in median order:

Test of 89546-word set in median order. In ops/s.
Data StructureInsertList AllFind AllFind Sp.Delete All
Map168,873.282,100,585.38251,426.31224,190.06318,687.33
trie1270,556.35352,670.961,669,707.662,139,124.18707,988.31
trie2329,939.38462,198.781,385,399.361,692,227.60804,071.63

These are the combined results, normalized so that the corresponding operations on Map are 1:

Test of 89546-word set both in dictionary and in median order. Relative to Map
Data StructureInsertList AllFind AllFind Sp.Delete All
trie1 (dict)1.104810.1817373.860855.338360.340626
trie2 (dict)1.392040.2276043.113554.372250.463403
trie1 (median)1.602130.1678926.640949.541572.22158
trie2 (median)1.953770.2200335.510167.548182.52307

Some things are easy to see and to explain:

  • The insertion code is faster for ternary trees than for the height-balanced tree, as there is no need to maintain expensive shape invariants
  • The ternary trees highly favor lookups, especially by non-present keys
  • Insertion in sorted order is pessimal for ternary trees. Even so, they remain very fast, as claimed by Bentley and Sedgewick
  • The fold reconstructs the keys one character at a time by appending to the current prefix, which is quadratic on the prefix length. This accounts for the speed being a 20% of the corresponding Map.fold
  • The shape of the ternary trees depend on the order of key insertion in a dramatic way
  • The shape of height-balanced trees also depend heavily on the order of key insertion

Some other things are not so easy to understand. In particular, it is not easy to see why trie2 is 20%–30% faster than trie1 except for lookups where it is a 20% slower.

For comparison, here's trie1's lookup:

let lookup k t =
  let n = String.length k in
  if n == 0 then failwith "lookup" else
  let rec go i t =
    if is_empty t then None else
    let c = k.[i] in
    if t.split > c then go  i    t.lokid else
    if t.split < c then go  i    t.hikid else
    if i + 1   < n then go (i+1) t.eqkid else
                        t.value
  in go 0 t

And here's trie2's:

let lookup k t =
  let n = String.length k in
  let rec go i = function
  | E                             -> None
  | L  v              when i == n -> Some v
  | L  _                          -> None
  | B (   _, _, _, _) when i == n -> None
  | B (   c, l, q, h)             ->
    let c' = k.[i] in
    if  c' < c then go  i    l else
    if  c' > c then go  i    h else
                    go (i+1) q
  | K (v, _, _, _, _) when i == n -> Some v
  | K (_, c, l, q, h)             ->
    let c' = k.[i] in
    if  c' < c then go  i    l else
    if  c' > c then go  i    h else
                    go (i+1) q
  in go 0 t

(It's not that ugly). Twice the number of lines, a slew of cases, it should be obvious why the latter is slower, right? Then compare and contrast: here's trie1's insert:

let add k v t =
  let n = String.length k in
  if n == 0 then failwith "add" else
  let rec go i t =
    if is_empty t then
      if i+1 < n then { empty with split = k.[i]; eqkid = go (i+1) t }
                 else { empty with split = k.[i]; value = Some v     }
    else
    let c = k.[i] in
    if t.split > c then { t with lokid = go  i    t.lokid } else
    if t.split < c then { t with hikid = go  i    t.hikid } else
    if i + 1   < n then { t with eqkid = go (i+1) t.eqkid } else
                        { t with value = Some v           }
  in go 0 t

Here's trie2's

let add k v t =
  let n = String.length k in
  let rec go i = function
  | E                 when i == n -> L  v
  | E                             -> B (   k.[i], E, go (i+1) E, E)
  | L  _              when i == n -> L  v
  | L  v                          -> K (v, k.[i], E, go (i+1) E, E)
  | B (   c, l, q, h) when i == n -> K (v, c    , l, q         , h)
  | B (   c, l, q, h)             ->
    let c' = k.[i] in
    if  c' < c then B (   c, go i l,          q,      h) else
    if  c' > c then B (   c,      l,          q, go i h) else
                    B (   c,      l, go (i+1) q,      h)
  | K (_, c, l, q, h) when i == n -> K (v, c    , l, q         , h)
  | K (v, c, l, q, h)             ->
    let c' = k.[i] in
    if  c' < c then K (v, c, go i l,          q,      h) else
    if  c' > c then K (v, c,      l,          q, go i h) else
                    K (v, c,      l, go (i+1) q,      h)
  in go 0 t

(Now this is not as pretty). They are exactly in the same shape and relation one with another. Some things I've observed:

  • OCaml is not as clever as it should be compiling or-patterns. Many of the branches in trie2's functions could be coalesced into those, but the speed hit is very consistently noticeable
  • Refining the constructors for each code path pays off in algorithmic ways: the normalization required in remove can be restricted to those branches statically known to have modified children, and the sharing can be made explicit:
    let remove k t =
      let prune = function
      | B (   _, E, E, E) -> E
      | K (v, _, E, E, E) -> L v
      | t                 -> t
      in
      let n = String.length k in
      let rec go i t = match t with
      | E                             -> t
      | L  _              when i == n -> E
      | L  _                          -> t
      | B (   _, _, _, _) when i == n -> t
      | B (   c, l, q, h)             ->
        let c' = k.[i] in
        if  c' < c then prune (B (   c, go i l,          q,      h)) else
        if  c' > c then prune (B (   c,      l,          q, go i h)) else
                        prune (B (   c,      l, go (i+1) q,      h))
      | K (_, c, l, q, h) when i == n -> B (c, l, q, h)
      | K (v, c, l, q, h) ->
        let c' = k.[i] in
        if  c' < c then prune (K (v, c, go i l,          q,      h)) else
        if  c' > c then prune (K (v, c,      l,          q, go i h)) else
                        prune (K (v, c,      l, go (i+1) q,      h))
      in go 0 t
    

    Furthermore, the duality between add and remove is easy to see: the former's base cases introduce L and K nodes, the latter turns them into E and B nodes, respectively.

In any case, I look forward to read Mauricio's analysis and code. All this goes to show that these ternary trees are excellent data structures to represent string maps, as claimed by Bentley and Sedgewick.

2009-06-15

Simple, Efficient Trie Maps

I became interested in implementing Bentley and Sedgewick's Ternary Trees by reading the challenge proposed on Programming Praxis (a great blog, by the way). I thought the algorithms would be a good fit for a purely functional data structure, and I am happy with the results. Here is my implementation.

I follow the Map.S signature, so that this can be a drop-in replacement for string-keyed maps, so that the interface (.mli file) is simply:

include Map.S with type key = string

For the implementation I restrict the key space to strings:

type key = string

Trees are optional references to nodes, and these are records containing the mapped value whenever this node represents a full key, the character suffix for the tree and the three branches:

type 'a t = 'a node option
and 'a node =
{ value : 'a option;
  split : char;
  lokid : 'a t;
  eqkid : 'a t;
  hikid : 'a t; }

In this I follow closely the paper, except for a crucial detail: in it, the authors make use of the terminator inherent to every C string, the null byte '\0'. This forces a design decision:

  • emulate the termination character. This has the distinct disadvantage of having to prohibit it from occurring in string keys
  • use a char option as the key. This forces boxing the key into a constructed type and complicates the logic in all the comparisons, which has a negative impact in the inner loops
  • use a char list or string as the key. This has the advantage that it can be adapted to compress the search paths, but it requires generating substrings of everything
  • guard on the string length. This is the choice I took. It complicates the comparison logic somewhat, but it has the advantage of making the node representation more regular and compact, and allowing the full range of characters in string keys

An empty tree is represented by a None reference:

let empty = None

Consequently, the test for an empty tree is a match on the option:

let is_empty = function None -> true | Some _ -> false

This means that I'll be forced to normalize trees on deletion, of which more later. The workhorse function is lookup:

let lookup k t =
  let n = String.length k in
  let rec go i = function
  | None   -> None
  | Some t ->
    let cmp = Char.compare k.[i] t.split in
    if cmp < 0 then go  i    t.lokid else
    if cmp > 0 then go  i    t.hikid else
    if i+1 < n then go (i+1) t.eqkid else
                    t.value
  in go 0 t

It is as simple as it can be given the choices I outlined above. The crucial bit is that the presence or absence of the value is the flag that indicates if this node represents a present or absent key in the tree. The signature's retrieval and membership functions are simply wrappers around this:

let find k t = match lookup k t with
| Some x -> x
| None   -> raise Not_found

let mem k t = match lookup k t with
| Some _ -> true
| None   -> false

Inserting a key-value pair in the tree is a bit more involved:

let add k v t =
  let n = String.length k in
  let rec go i = function
  | None when i+1 < n ->
    Some { value = None  ; split = k.[i]; lokid = None; eqkid = go (i+1) None; hikid = None }
  | None              ->
    Some { value = Some v; split = k.[i]; lokid = None; eqkid =          None; hikid = None }
  | Some t            ->
    let cmp = Char.compare k.[i] t.split in
    if cmp < 0 then Some { t with lokid = go  i    t.lokid } else
    if cmp > 0 then Some { t with hikid = go  i    t.hikid } else
    if i+1 < n then Some { t with eqkid = go (i+1) t.eqkid } else
                    Some { t with value = Some v }
  in go 0 t

The case when the end of a branch is reached is inlined, so that a descending branch keyed on the key's suffix is built until reaching its last character, which stores the keyed value. Otherwise, the suffix is compared to the node key, and the appropriate child is built. Note that the recursively built subtree replaces the corresponding child in the node; this is what makes this data structure purely functional and persistent. Removing a mapping is similar, but complicated by the need to normalize the tree so that an empty tree is represented by None:

let remove k t =
  let prune = function
  | { value = None; lokid = None; eqkid = None; hikid = None } -> None
  | t -> Some t
  in
  let n = String.length k in
  let rec go i = function
  | None   -> None
  | Some t ->
    let cmp = Char.compare k.[i] t.split in
    if cmp < 0 then prune { t with lokid = go  i    t.lokid } else
    if cmp > 0 then prune { t with hikid = go  i    t.hikid } else
    if i+1 < n then prune { t with eqkid = go (i+1) t.eqkid } else
                    prune { t with value = None             }
  in go 0 t

In this case, prune works somewhat as a smart constructor for node references.

The big advantage of ternary search trees is that, since they are modeled on three-way quicksort, they provide ordered key storage with relative insensitivity to the order of insertions. The big disadvantage is that they don't have a canonical form for a given key set, that is, their shape depends on the order in which the keys are inserted. This means that they provide a cheap fold but no easy way to compare trees:

let fold f t e =
  let rec go prefix e = function
  | None   -> e
  | Some t ->
    let e   = go prefix e t.lokid in
    let key = prefix ^ String.make 1 t.split in
    let e   = match t.value with None -> e | Some v -> f key v e in
    let e   = go key    e t.eqkid in
              go prefix e t.hikid
  in go "" e t

Note that the key is built character by character but only on the equal kids. Note also that the resulting fold is guaranteed to be ordered on the keys by the sequence in which the recursion is carried out. The rest of the iteration functions in the signature are trivial wrappers:

let iter f t = fold (fun k v () ->        f k v ) t ()
and map  f t = fold (fun k v    -> add k (f v  )) t empty
and mapi f t = fold (fun k v    -> add k (f k v)) t empty

As I indicated above, ternary search trees have no canonical form, so to compare two trees you have to somehow traverse them in synchronization. One easy but expensive way would be to turn both into key-value lists and compare those; this would require O(n min m) time and O(n + m) space. The best way would be to turn the fold into a continuation-passing iterator that lazily streams keys and traverse both in tandem. I leave it as an exercise to the reader:

let compare _ _ _ = failwith "Not comparable"

let equal eq t t' =
  0 == compare (fun v w -> if eq v w then 0 else 1) t t'

2009-06-14

Algorithm-conscious, cache-oblivious

No matter how credentialed a writer is, he is bound to make a mistake in print sooner or later. Raymond Chen's latest entry was for me oddly synchronous and oddly at variance with my own investigations on Bentley and Sedgewick's Ternary Trees. His apology of the hash table is technically sound, but only on the surface: there is not a single measurement to back up his assertions. Here is my data.

I pitted four different data structures intended as finite maps on strings: the standard Hashtbl, my implementation of Cuckoo Hash, the standard height-balanced Map and my own implementation of ternary trees as Trie Maps. I used Mac OS X 10.5's /usr/share/dict/words file against a Spanish lexicon of 90.000 entries. I tested both successful and unsuccessful searches by looking up every entry in the original file first, and every entry in the alternate file second. In both cases the files were read into memory and then processed with the algorithm benchmarks. I used a bit of generic programming to have the same benchmark code operate on the four data structures, and in each case the mapped domain was unit, treating the maps effectively as sets.

Here are the results for reading the big file and testing against the small file:

Test of 234936-word set against 89546 queries (3.91 % success rate). In ops/s.
Data StructureInsertReadLookup
Hashtbl333,993.49798,238.10621,221.64
Cuckoo133,952.08904,978.43952,688.49
Map167,080.10256,780.34226,653.87
Trie Map118,932.34770,795.15991,782.69

As you can see, Hashtbl is competitive inserting data, and Cuckoo Hash is smoking fast looking up data. This is not a very fair comparison, as these are imperative data structures, whereas both the standard Map and the Trie Map are purely functional data structures. Even so, the Trie Map is slightly (3.5%) slower in successful searches and substantially (60%) faster in unsuccessful searches than Hashtbl.

Unfortunately the champion apparent, the Cuckoo Hash, has a rather fragile disposition. The Spanish lexicon thoroughly broke its insertion routine.

Aside: I used to view with suspicion the lack of termination analysis for the insertion routine, and the very scarce code floating around implementing Cuckoo Hashes. After seeing here a practical case of non-termination for insertion with a dataset of less than 4,000 keys I consider this algorithm completely unsuitable for use. I have confidence that the universal hash I used is essentially random, as it passes standard tests for this data set, and that I implemented the published algorithm faithfully.

Even so, the Ternary Trie remains the victor in both successful and unsuccessful searches, being 60% faster than Hashtbls (and almost 4 times as fast as height-balanced trees) in the latter:

Test of 89546-word set against 234936 queries (1.49 % success rate). In ops/s.
Data StructureInsertReadLookup
Hashtbl283,416.41882,628.51808,735.29
Cuckoo
Map191,458.29280,610.94263,735.97
Trie Map108,718.90889,076.241,300,517.50

What is the take-away lesson in this? First, do not let expert advice lull you into complacency. Take the time to experiment and analyze for yourself: it is not only intellectually challenging and rewarding, it might be a store of surprises (and of blogging material) for you.

2009-06-12

Hash 'n Mix

How good is the hash function in your language of choice, particularly for strings? Chances are, not very. I applied the classical statistical tests popularized by Knuth and I was very surprised to see how bad OCaml's hash function on strings is.

I applied the chi-square test on the distribution of hashes over /usr/share/dict/words, for almost 235.000 words. I binned the integers from the most significant bit down, taking successively more and more bits into account, and tested the resulting distribution against a uniform distribution. Here are the results:

χ² test of 234936 hashes with Hashtbl.hash
Bins (ν)Pr[χ² ≤ X²]
25250.00170261.0000000
46491.63326181.0000000
89310.98511941.0000000
1620682.96482451.0000000
3242563.45421731.0000000
6488265.16375521.0000000
12894121.13978271.0000000
256155277.57251341.0000000
512209363.27578571.0000000
1024317909.05410841.0000000
2048389422.34508121.0000000
4096448253.52838221.0000000
8192663614.91660711.0000000
16384887812.23196101.0000000
327681037118.94275891.0000000

The first column indicates how many bins were used, which is the degrees of freedom in the probability distribution (ν). The second column shows the computed statistic. The third column shows the probability with which a χ² deviate with ν degrees of freedom would be at least this value. Highlighted in red are the failed entries according to Knuth's criterion (probability in first or last percentile) and in yellow the suspect entries (probability in first five or last five percentiles).

Pretty dismal, isn't it? The Kolmogorov-Smirnov test is equally appalling: the probability that entries are this higher than a uniform deviate is p+ = 0.0016593, the probability that entries are this lower than a uniform deviate is p- = 1.0000000 both of which are extreme.

Postprocessing the hashes with a simple mixing function helps matter considerably. I use the Murmur hash specialized to a single 32-bit value:

let murmur_hash_32 k h =
  let m = 0x5bd1e995l in
  let k = Int32.mul k m in 
  let k = Int32.logxor k (Int32.shift_right_logical k 24) in 
  let k = Int32.mul k m in 
  let h = Int32.mul h m in 
  let h = Int32.logxor h k in
  let h = Int32.logxor h (Int32.shift_right_logical h 13) in
  let h = Int32.mul h m in
  let h = Int32.logxor h (Int32.shift_right_logical h 15) in
  h

let universal_hash i h =
  let r = murmur_hash_32 (Int32.of_int h) (Int32.of_int (succ i)) in
  Int32.to_int (Int32.logand r 0x3fffffffl)

Since Murmur hash achieves full 32-bit avalanche, I can use it to construct a universal hash function. The results of applying this mixing function to Hashtbl.hash are much more encouraging:

χ² test of 234936 hashes with mixing function
bins (ν)Pr[χ² ≤ X²]
20.03602680.1505399
44.94074980.8238125
811.46436480.8803928
1617.10573090.6874168
3225.67698440.2633473
6455.60717810.2655752
128121.71784660.3843221
256250.50784890.4322981
512480.15554870.1675241
1024934.75669970.0230148
20481983.74910610.1614639
40963951.44563630.0550442
81928032.66115030.1074982
1638416249.48792860.2308936
3276832526.78857220.1741126

Much better, except maybe for the distribution on 1024 bins (the 10 most significant bits). The Kolmogorov-Smirnov test shows that the probability that entries are this higher than a uniform deviate is p+ = 0.7688490, the probability that entries are this lower than a uniform deviate is p- = 0.4623243 which are much more balanced.

Know your hash function. It should be random.

2009-06-05

Euler's Power Riffs

Ed Sandifer is an exegete of Euler's methods. In his latest "How Euler Did It", he shows the method through which he arrived at closed polynomials for the sums of n-th powers of the first x integers:

In general, Euler has us multiply the terms of Σxn by (n + 1)/(n + 2) x, (n + 1)/(n + 1) x, (n + 1)/n x, … (n + 1)/2 x and (n + 1)/1 x, respectively, and add the appropriate linear term and to get the formula for Σxn+1.

where Σxn is Euler's notation for the sum in question. The algorithm is very easily translated to a short OCaml function. First, I'll need a way to generate an integer range:

let range i j =
  let rec go l j =
    if j < i then l else
    go (j :: l) (pred j)
  in go [] j

Simple, eager, tail-call; almost imperative, definitely dirty. With that, Sandifer's rendition of Euler's method is simply:

open Num

let rec eusum n =
  if n = 0 then [Int 0; Int 1] else
  let l = eusum (pred n)
  (* n/1 n/2 n/3 ... n/(n+1) *)
  and k = List.map (fun d -> Int n // Int d) (range 1 (n + 1)) in
  (* the lowest coefficient is that of x^1 and always 0 *)
  let _ :: xs = List.map2 ( */ ) k l in
  (* normalize the sum of coefficients to 1 *)
  let x = List.fold_left ( -/ ) (Int 1) xs in
  Int 0 :: x :: xs

I want to clarify a couple of subtleties in the implementation. First, I use lists of Nums representing dense polynomials, least-order coefficient first. The base case is that for the sum of x ones (with exponent n = 0), namely x itself. As the article points out, the constant term is always zero, because a sum of zero powers is zero.

Then, the general case is built out of Σxn (called l) and the terms with which to multiply it (called k, in reverse order to that reported by Sandifer, which follows Euler in putting the highest power first). Both polynomials are multiplied term by term with map2; however, the degree is not yet raised at this stage. Also, the lowest term is always zero by the previous observation and I disregard it with an irrefutable pattern match (I could've used List.tl, but I prefer not to, in general). A simple fold_left computes the coefficient α for the linear term, and the polynomial is completed with it and the zero linear term, thus raising the degree by one (one term discarded, two added).

That's it, seven lines of code. Regrettably, it is not of much use as it is:

# eusum 4 ;;
- : Num.num list =
[Int 0; Ratio <abstr>; Int 0; Ratio <abstr>; Ratio <abstr>; Ratio <abstr>]

Bummer, abstract data types. It turns out, pretty-printing polynomials is surprisingly tedious to get right:

let pp_print_poly pp l =
  let pow = ref 0
  and any = ref false in
  Format.pp_open_box pp 1;
  List.iter (fun k ->
    if k <>/ Int 0 then begin
      if !any then Format.pp_print_space pp ();
      Format.pp_open_hbox pp ();
      (* Print coefficient *)
      if k </ Int 0 then Format.pp_print_char pp '-' else
      if !any then Format.pp_print_char pp '+';
      if !any then Format.pp_print_space pp ();
      let n = Num.abs_num k in
      if n <>/ Int 1 || !pow = 0 then Format.pp_print_string pp (Num.string_of_num n);
      (* Print formal power *)
      if n <>/ Int 1 && !pow > 0 then Format.pp_print_char pp '*';
      if !pow > 0 then Format.pp_print_char pp 'x';
      if !pow > 1 then Format.fprintf pp "^%d" !pow;
      Format.pp_close_box pp ();
      any := true
    end;
    incr pow
  ) l;
  Format.pp_close_box pp ()

I follow some very general rules for writing pretty printers. First, always accept the pretty printing channel pp as a parameter for maximum generality (you can then use it with #install_printer in the top-level). Second, always box everything. In this case I use a vertical-or-horizontal box around the entire polynomial, with one space for continuation lines, and a horizontal-only box around monomials.

As it is evident here, the usual rules for polynomials make the function logic-heavy: the first term uses the sign of the coefficient, but subsequent monomials are added or subtracted with the operation sign offset with spaces, and the coefficient is taken in absolute value. If the coefficient or the variable are 1 they are not shown, unless it is the constant term. If both the coefficient and the power are shown, they are separated by the multiplication sign.

With this it is easy to recreate the table in page 2 of the paper:

# let () =
    Format.open_box 1;
    List.iter (fun i ->
      pp_print_poly Format.std_formatter (eusum i);
      Format.print_newline ()) (range 0 8);
    Format.close_box ()
  ;;
x
1/2*x + 1/2*x^2
1/6*x + 1/2*x^2 + 1/3*x^3
1/4*x^2 + 1/2*x^3 + 1/4*x^4
-1/30*x + 1/3*x^3 + 1/2*x^4 + 1/5*x^5
-1/12*x^2 + 5/12*x^4 + 1/2*x^5 + 1/6*x^6
1/42*x - 1/6*x^3 + 1/2*x^5 + 1/2*x^6 + 1/7*x^7
1/12*x^2 - 7/24*x^4 + 7/12*x^6 + 1/2*x^7 + 1/8*x^8
-1/30*x + 2/9*x^3 - 7/15*x^5 + 2/3*x^7 + 1/2*x^8 + 1/9*x^9

As noted by Euler, the coefficient of the linear term is a Bernoulli number. There are more efficient ways to compute them, but in a pinch this serves:

let bernoulli n = let _ :: b :: _ = eusum n in b

Immutability Redux

My last post generated a ton of very interesting discussion. I'd like to summarize it here to bring the topic to closure.

In view of the excellent analysis performed by Mauricio Fernández , Ketan's comment:

At this point, you don't have the data to say what you said. You could just as easily say OCaml penalizes mutable data, or, less judgmentally, that it is less well-optimized for that use case.

is entirely justified. His first observation that [I have] more data to support that the performance disparity is due to the idiom rather than the programming language" was as misunderstood by me as it was insightful.

Mauricio did an in-depth analysis of the disparity and reported about it first on reddit, then on his own blog (which I cannot recommend highly enough). His follow-up is a bit more in full than his first analysis, but I'll paraphrase it anyway for completeness's sake. OCaml's garbage collector is of the generational variety. This means that it works under the generational hypothesis: the most recently created objects are also those most likely to become unreachable quickly. In order to exploit this fact, it divides the heap in at least two regions: the young space with newly created objects that mostly point to themselves and to older objects, and the old space that comprise objects that survived a number of collection phases. Any time an old object points to a young object it must be added to the remembered set, a list of roots that allow the collector to quickly decide what's reachable without scanning the entire old space.

Mauricio shows that my "synthetic business entity update benchmark" violates the generation hypothesis: since floats are in general boxed, updating a mutable float field generates a boxed cell with the new value. After enough allocations, the fixed, mutable record is promoted to the old space, from which it points to the young space containing the box. Every mutation then adds to the remembered set via the caml_modify write barrier. His post shows ways to ameliorate this penalty by exploiting the conditions under which OCaml unboxes floats, in particular by using records comprised entirely of floats, mutable or immutable (in this case, just one suffices). He followed up with an in-depth analysis of the write barrier and a simple optimization that gains up to 50% the speed of synthetic benchmarks, and an excellent 32% on this particular benchmark.

It is hoped that the OCaml team continues improving the performance of the garbage collector (indeed, some people are asking very vocally for different kinds of overhaul to it). Meanwhile, instead of recommending workarounds or speed-ups, I'd say "go with the idioms best supported by the language". In this case, indeed, it pays off to use immutable records in OCaml. This shouldn't be extrapolated to a general methodology applicable to other programming languages.

2009-05-26

Is Immutable Data Slow?

Is immutatble data always slower than mutable state? I was challenged in a discussion over at Reddit about this issue, and I decided to perform a little experiment to back up a rough estimate with some numbers. I was surprised and delighted to see that I could make my case very strong indeed by showing that no, immutable data can be twice as fast as mutable data in a modern functional language.

I concocted a trivial synthetic benchmark using simple records intended to emulate typical business entities:

module P0 = struct
  type person = { name: string; age: int; balance: float }
end

module P1 = struct
  type person = { name: string; mutable age: int; mutable balance: float }
end

let test0 =
  let module E = struct
    open P0
    let rec test p iters =
      if iters == 0 then p.balance else
      test { p with balance = p.balance +. 1. } (iters - 1)
  end in E.test { P0.name="John Doe"; P0.age=19; P0.balance=100.}

let test1 =
  let module E = struct
    open P1
    let test p iters =
      for i = 1 to iters do
        p.balance <- p.balance +. 1.
      done
  end in E.test { P1.name="John Doe"; P1.age=19; P1.balance=100.}

The module P0 encapsulates an immutable business record; test0 is the attendant test that uses functional record updating and recursion in a classicaly pure idiom. The module P1, on the other hand, with its corresponding test test1 show the typical imperative use of a mutable record. The test and timing harness for both functions is simple enough:

let test times f iters =
  let best = ref infinity in
  for i = 1 to times do
    let t = Sys.time () in
    let _ = f iters in
    let t = Sys.time () -. t in
    if !best > t then
      best := t
  done;
  float iters /. !best

let () =
  let iters = 1_000_000 in
  Printf.printf "Immutable record: %f updates/s\n"
    (test 5 test0 iters);
  Printf.printf "Mutable record:   %f updates/s\n"
    (test 5 test1 iters)

I tested this code with the OCaml bytecode compiler on a 1.6/2.0 GHz Intel Atom computer with Windows XP, and compiled with both the bytecode and native compilers, without optimization, on a 2.2 GHz Core2 Duo MacBook Pro computer with Mac OS X 10.5.7. The results are as follows:

SystemImmutable (upd/s)Mutable (upd/s)Speedup (Δ%)
1.6 GHz Atom/Win XP/Bytecode3,048,7803,558,71916.73
2.0 GHz Atom/Win XP/Bytecode4,000,0004,566,210 14.16
2.2 GHz Core2 Duo/Mac OS/Bytecode17,750,32420,418,58115.03
2.2 GHz Core2 Duo/Mac OS/Native128,882,58858,400,981-54.69

The speedup afforded by imperative mutation in bytecode is modest, in all cases around 15%. On the other hand, the penalty incurred by mutation under the native compiler is more than double.

Of course, my correspondent was right in that the situation with current object-oriented languages is very different. I translated the benchmark straightforwardly into Java, and the result is what "common sense" would let us expect: Java is heavily oriented to mutation, or rather, it penalizes immutability quite heavily. I tried both client and server VMs, under the Windows environment detailed above:

SystemImmutable (upd/s)Mutable (upd/s)Speedup (×)
1.6 GHz Atom/Win XP/Client22,831,050107,526,8824.71
1.6 GHz Atom/Win XP/Server37,735,849322,580,6458.55

The difference is dramatic: mutation is between 4.7 and 8.5 times faster than functional update in Java, and even twice as fast than native OCaml code on a faster machine. Moral of the story: test your assumptions with experiments. The benchmark code is:

public final class MutTest {
  private static final class Person0 {
    public final String name;
    public final int age;
    public final double balance;
    
    public Person0(String name, int age, double balance) {
      this.name = name;
      this.age = age;
      this.balance = balance;
    }
    
    public Person0 deposit(double amount) {
      return new Person0(this.name, this.age, this.balance + amount);
    }
  }
  
  private static final class Person1 {
    public String name;
    public int age;
    public double balance;
    
    public Person1(String name, int age, double balance) {
      this.name = name;
      this.age = age;
      this.balance = balance;
    }
    
    public void deposit(double amount) {
      this.balance += amount;
    }  
  }
  
  private interface Test {
    double test(int iters);
  }

  private static final Test TEST0 = new Test() {
    public double test(int iters) {
      Person0 person = new Person0("John Doe", 19, 100.0);
      for (int i = 0; i != iters; i++) {
        person = person.deposit(1.0);
      }
      return person.balance;
    }
  };
  
  private static final Test TEST1 = new Test() {
    public double test(int iters) {
      Person1 person = new Person1("John Doe", 19, 100.0);
      for (int i = 0; i != iters; i++) {
        person.deposit(1.0);
      }
      return person.balance;
    }
  };
  
  private static double test(int times, Test test, int iters) {
    long best = Long.MAX_VALUE;
    double balance;
    for (int i = 0; i != times; i++) {
      long now = System.currentTimeMillis();
      balance = test.test(iters);
      now = System.currentTimeMillis() - now;
      if (best > now)
        best = now;
    }
    return (double) iters / ((double) best / 1000.0);
  }

  public static void main(String[] args) {
    final int iters = 10000000;
    System.out.printf("Immutable record: %f updates/s\n", test(5, TEST0, iters));
    System.out.printf("Mutable record:  %f updates/s\n", test(5, TEST1, iters));
  }
}

2009-05-14

A Polymorphic Question

Chris Conway asked a question that I replied, unwittingly, with a reference to a thread he started on the OCaml mailing list exactly two years ago: what is the difference between types 'a . int -> 'a vec -> 'a and int -> 'a vec -> 'a?

Strictly speaking, the latter is, in isolation, ill formed: the type variable 'a is unbounded! The convention is to implicitly universally quantify all unbound type variables in the type. This is the so-called prenex restriction:

The most popular of these is the let-polymorphism of ML […] which is sometimes called prenex polymorphism because it can be viewed as a fragment of System F in which type variables range only over quantifier-free types (monotypes) and in which quantified types (polytypes, or type schemes) are not allowed to appear on the left-hand side of arrows.

Pierce, Benjamin C. Types and Programming Languages, p. 356.

Suppose the latter type were a complete type abbreviation, to avoid dealing with the label:

type 'a t = int -> 'a vec -> 'a 

which as a TaPL-style type (a "tree" type) would be equivalent to a definition

∀X.(T = int→Vec(X)→X)

You cannot directly use such a definition as is; the only thing you can do is to instantiate it with ("apply it" to) some type before expanding the definition, for instance:

   (∀X.(T = int→Vec(X)→X))[X:=U]
→β
   T = int→Vec(U)→U

In other words, there is no way to expand the right-hand side of T while keeping the variable X uninstantiated.

The first type, on the other hand, written as a type abbreviation would be:

type t = 'a . int -> 'a vec -> 'a 

which as a tree type would be equivalent to the definition:

T = ∀X.int→Vec(X)→X

Now with this equation you can immediately replace T by its right-hand side, which would introduce a universal quantification as a subterm of the type being substituted into.

Seen in another way, and to bring it closer to the actual examples being discussed, labels are functions from records to the corresponding type. For instance:

type 'a t = { index : int -> 'a vec -> 'a }

introduces the function index: ∀X.T→int→Vec(X)→X (note the position of the quantification). This is exactly the first definition I wrote, but with definitional equality replaced by the functional arrow. Now a polymorphic record type

type t = { index: 'a . int -> 'a vec -> 'a }

introduces a different function index: T→∀X.(int→Vec(X)→X), which corresponds term by term to the second "tree" definition.

To see it in a whole different way, game semantics help in understanding the differences between linear formulas. In game theoretic terms, the type ∀X.T→int→Vec(X)→X corresponds to the game: "You get to choose and fix an X whatsoever. I give you a function. You give me a T, and I'll give you another function. You give me an int and I'll give you yet another function. You give me a Vec of the Xs you yourself chose, and I'll give you a certain X."

The second type T→∀X.(int→Vec(X)→X) corresponds to a different game: "You give me a T, and I give you back a thing, with the proviso that I reserve the right to choose and fix an X, which if given to you would result in a function." Only then you can give me back an int and so on. This is what makes X opaque to you.

I hope this helps.

2009-05-13

Okasaki's Random Access Lists

To close the chapter on Okasaki's Random Access Lists, I must record that there is an errata on the bottom of page 145 of his Purely Functional Data Structures. The argument to the recursive call to update it must read Zero (update (i div 2, p, ps)) end. It took me a while to figure it out, thinking that I had made a mistake.

For completeness, here's the translation for the chapter's pseudo-SML into functional, typesafe OCaml:

module Vec : sig
  type 'a t
  val nil : 'a t
  val cons : 'a -> 'a t -> 'a t
  val head : 'a t -> 'a
  val tail : 'a t -> 'a t
  val length : 'a t -> int
  val index : int -> 'a t -> 'a
  val update : int -> 'a -> 'a t -> 'a t
end = struct
  type 'a t = Nil | Zero of ('a * 'a) t | One  of 'a * ('a * 'a) t

  let nil = Nil

  type cons = { cons : 'a . 'a -> 'a t -> 'a t }
  let cons =
    let rec cons = { cons = fun x l -> match l with
    | Nil         -> One (x, Nil)
    | Zero    ps  -> One (x, ps)
    | One (y, ps) -> Zero (cons.cons (x, y) ps)
    } in cons.cons

  type uncons = { uncons : 'a . 'a t -> 'a * 'a t }
  let uncons =
    let rec uncons = { uncons = function
    | Nil          -> failwith "uncons"
    | One (x, Nil) -> (x, Nil)
    | One (x, ps ) -> (x, Zero ps)
    | Zero    ps   ->
      let ((x, y), ps) = uncons.uncons ps in (x, One (y, ps))
    } in uncons.uncons

  let head l = fst (uncons l)
  and tail l = snd (uncons l)

  type length = { length : 'a . 'a t -> int }
  let length v =
    let rec length = { length = function
    | Nil         -> 0
    | One (_, ps) -> 1 + length.length (Zero ps)
    | Zero    ps  -> 2 * length.length ps
    } in length.length v

  type index = { index : 'a . int -> 'a t -> 'a }
  let index n =
    let rec index = { index = fun n l -> match l with
    | Nil                    -> failwith "index"
    | One (x, ps) when n = 0 -> x
    | One (x, ps)            -> index.index (n - 1) (Zero ps)
    | Zero    ps             ->
      let (l, r) = index.index (n / 2) ps in
      if n mod 2 = 0 then l else r
    } in index.index n

  type update = { update : 'a . int -> ('a -> 'a) -> 'a t -> 'a t }
  let update n e =
    let rec update = { update = fun n f l -> match l with
    | Nil                    -> failwith "update"
    | One (x, ps) when n = 0 -> One (f x, ps)
    | One (x, ps)            -> cons x (update.update (pred n) f (Zero ps))
    | Zero ps                ->
      let g (x, y) = if n mod 2 = 0 then (f x, y) else (x, f y) in
      Zero (update.update (n / 2) g ps)
    } in update.update n (fun _ -> e)
end

Enjoy!