I've committed a number of errors and inaccuracies in my last post that warrant clarification. The most egregious and urgent is that the definition I gave for `length`

is utterly wrong: the type encodes the length from the inside out, so that its "most significant bit" is the inner constructor. The recursion must take the form of a right fold, not of a left fold as I wrote. The correct function is then:

type length = { length : 'a . 'a vec -> int } let length v = let rec length = { length = function Nil -> 0 | One (_, ps) -> 1 + length.length (Zero ps) | Zero ps -> 2 * length.length ps } in length.length v

Second, the option `-rectypes` is not necessary in OCaml 3.11.0 in order to use non-uniform recursion with this technique. The language permits non-uniform type definitions just fine. But, as Okasaki remarks, "[a]lthough Standard ML allows the definition of non-uniform recursive datatypes, the type system disallows most of the interesting functions on such types." (Okasaki, Chris. Purely Functional Data Structures, p. 143). What `-rectype` does is to disable the so-called occurs check in the typechecking algorithm that forces an error if a type is found to be defined in terms of itself when reconstructing the type of an expression via unification. This is why the types resulting in the compilation of a function using non-uniform recursion have the form

, for a type expression `E`[`α`] as `α``E`.

Third, I spoke of the type of a non-uniform parameter as being existential, which is technically true in this particular case but not the usual characterization of such a type. The way in which the type

is more commonly described is as rank-2:`t`→∀`α`.`E`[`α`]

A type is said to be of rank 2 if no path from its root to a ∀ quantifier passes to the left of 2 or more arrows, when the type is drawn as a tree. For example,

`(∀X.X→X)→Nat`

is of rank 2, as are`Nat→Nat`

and`Nat→(∀X.X→X)→Nat→Nat`

, but`((∀X.X→X)→Nat)→Nat`

is not.(Pierce, Benjamin C. Types and Programming Languages, p. 359)

A type definition `type `

is a function from `α` t = …`α` to the declared type, so it counts as an arrow. Furthermore, there is an encoding of existential types in terms of rank-2 types that for this application suffices. This is the source of my confusion. Abstract signatures implement existential types in OCaml, which is why I said that this technique can be applied by means of a module. Also, OCaml allows rank-2 polymorphism in exactly two places: in record type declarations and in class declarations. So I missed a third way to encode non-uniform recursion in OCaml, namely via objects.

In a comment to the last post, **bluestorm** shows examples of the three encodings, using local module definitions to tidy up the global namespace of types and record labels. I want to compare the efficiency of the three approaches. The following function serves as a simple timing loop:

let test lap f e = let now = Sys.time () in let rec go cnt = let dt = Sys.time () -. now in if dt > lap then dt /. float cnt else (ignore (f e); go (succ cnt)) in go 0

(the toplevel in my machine has an overhead of 4.9µs to execute the empty loop). Note that the result of the function is thrown away; fortunately, OCaml is a strict language. The test would measure the time it takes for different versions of `index`

to find an element. This will require some test data. The simplest way to build a long list is with `range`

:

let range m n = let rec go l i = if i < m then l else go (cons i l) (pred i) in go Nil n

Now everything works as expected:

# length (range 1 500_000) ;;- : int = 500000

The first version of `index`

uses records to encode the rank-2 type:

let index1 n v = let module Index = struct type index = { index : 'a . int -> 'a vec -> 'a } let rec index = { index = fun n l -> match l with Nil -> failwith "index" | One (x, ps) when n = 0 -> x | One (x, ps) -> index.index (n - 1) (Zero ps) | Zero ps -> let (l, r) = index.index (n / 2) ps in if n mod 2 = 0 then l else r } end in Index.index.Index.index n v

(Malkovich Malkovich Malkovich). This is what I wrote in the last post, but now using a local module to encapsulate the type definitions, as per **bluestorm**'s suggestion. The second version uses an object to encode the rank-2 type:

let index2 n v = let index = object (self) method index : 'a . int -> 'a vec -> 'a = fun n l -> match l with Nil -> failwith "index" | One (x, ps) when n = 0 -> x | One (x, ps) -> self#index (n - 1) (Zero ps) | Zero ps -> let (l, r) = self#index (n / 2) ps in if n mod 2 = 0 then l else r end in index#index n v

Objects in OCaml are always self-recursive. The third version uses a recursive module to encode the non-uniform recursion by forcing the type of the second argument to remain generic:

let index3 n v = let module M = struct module rec Index : sig val index : int -> 'a vec -> 'a end = struct let index n l = match l with Nil -> failwith "index" | One (x, ps) when n = 0 -> x | One (x, ps) -> Index.index (n - 1) (Zero ps) | Zero ps -> let (l, r) = Index.index (n / 2) ps in if n mod 2 = 0 then l else r end end in M.Index.index n v

(alas, local modules cannot be recursive). Now, in order to eliminate interference by the garbage collector, I predeclare my test data:

let data = range 1 1_000_000 ;;

I test each function for two seconds:

# test 2. (index1 999_999) data ;;- : float = 1.45106870782848256e-05# test 2. (index2 999_999) data ;;- : float = 1.60240840297084448e-05# test 2. (index3 999_999) data ;;- : float = 1.91896990107750189e-05

The encoding using an object is 10% slower than the one using a record, which is to be expected as the open recursion is performed through a method table. On the other hand, the encoding using a recursive module is almost 25% slower than the one using a record, which is completely unexpected as I would believe the call to be statically resolved. Note also that there is no need to use a local module if the code is in a module bound by an abstract signature. All in all, this shows that these lists are *extremely* fast: It takes less than 15µs to find the last element in the list.

## 5 comments:

Thanks for these posts. Rank-2/existential types in OCaml are sorely under-documented and I've been curious about them for a long time.

Maybe I'm being dense, but could you explain why "type index = { index : 'a . int -> 'a vec -> 'a }" has rank-2 but the plain function type "int -> 'a vec -> 'a" doesn't?

Thanks. That was really interesting posts about these not so much documented nested datatypes.

@Chris: the snippet I quoted from TaPL is not very clear, or is it? Wikipedia doesn't help much either, but it's a starting point for defining terminology.

It is an abuse of language to speak of records as "rank-2 polymorphic". What is polymorphic (with or without rank restriction) are the types of expressions and values. For instance, a function taking or returning an

indexwould have a type where one of its components is quantified. For instance, a functionf : int -> indexwould violate the prenex restriction in that it would have a universally quantified variable in a subterm (the one corresponding toindexif you were to expand the definition), but would be rank-2 polymorphic.Properly speaking, records and objects in OCaml are polymorphic, period, and can appear anywhere in a type. The resulting type doesn't necessarily follow the restrictions for rank-2 polymorphism, so they could be used to encode unbounded rank polymorphism.

As to why polymorphic types can be used to encode existentials, this post is a good overview of the logic behind the equivalence.

Matias: Note that I started that thread on the OCaml list! And here we are, two years later, and I'm still confused! :S

I think I understand Pierce's definition and the examples he gives. What I'm missing is what the tree representation for a record type should be, and how the quantifier inside a record type is different from the (implicit) quantifier on regular polymorphic types. I.e., what is the difference between "'a . int -> 'a vec -> 'a" and just "int -> 'a vec -> 'a" (which would be the type of a naive, uniformly polymorphic implementation of "index")?

@Chris: I can't believe how moronic the comment filter is. I'll post my reply to you as a new entry, as it broke when posted as a comment.

Post a Comment