This blog post and the previous one about functor packs covers two RFCs currently developed by OCamlPro and Jane Street. We previously introduced functor packs, a new feature adding the possiblity to compile packs as functors, allowing the user to implement functors as multiple source files or even parameterized libraries.
In this blog post, we will cover the other aspect of the packs rehabilitation: allowing anyone to implement recursive compilation units using packs (as described formally in the RFC#20). Our previous post introduced briefly how packs were compiled and why we needed some bits of closure conversion to effectively implement big functors. Once again, to implement recursive packs we will need to encode modules through this technique, as such we advise the reader to check at least the introduction and the compilation part of functor packs.
Recursive modules through recursive packs
Recursive modules are a feature long available in the compiler, but restricted to modules, not compilation units. As such, it is impossible to write two files that depend on each other, except by using scripts that tie up these modules into a single compilation file. Due to the internal representation of recursive modules, it would be difficult to implement recursive (and mutually recursive) compilation units. However, we could use packs to implement these.
One common example of recursive modules are trees whose nodes are represented by sets. To implement such a data structure with the standard library we need recursive modules:
Set is a functor that takes as parameter a module describing the values embedded in the set, but in our case the type needs the already applied functor.
module rec T : sig type t = Leaf of int | Node of TSet.t val compare : t -> t -> int end = struct type t = Leaf of int | Node of TSet.t let compare t1 t2 = match t1, t2 with Leaf v1, Leaf v2 -> Int.compare v1 v2 | Node s1, Node s2 -> TSet.compare s1 s2 | Leaf _, Node _ -> -1 | Node _, Leaf _ -> 1 end and TSet : Set.S with type elt = T.t = Set.Make(T)
With recursive pack, we can simply put
TSet into their respective files (
tSet.ml), and tie them into one module (let’s name it
P). Signature of recursive modules cannot be infered, as such we also need to define
tSet.mli. Both must be compiled simultaneously since they refer to each other. The result of the compilation is the following:
ocamlopt -c -for-pack P -recursive t.mli tSet.mli ocamlopt -c -for-pack P -pack-is-recursive P t.ml ocamlopt -c -for-pack P -pack-is-recursive P tSet.ml ocamlopt -o p.cmx -recursive-pack t.cmx tSet.cmx
We have three new compilation options:
-recursiveindicates to the compiler to typecheck all the given
mlis simultaneously, as recursive modules.
-pack-is-recursiveindicates which pack(s) in the hierarchy are meant to be recursive. This is necessary since it determines how the module must be compiled (i.e if we will need to apply closure conversion).
recursive-packgenerates a pack that deals with the initialization of its modules, as for recursive modules.
Recursives modules compilation
One may be wondering why we need packs to compile recursive modules. Let’s take a look at how they are encoded. We will craft a naive example that is simple enough once compiled:
module rec Even : sig val test: int -> bool end = struct let test i = if i-1 <= 0 then false else Odd.test (i-1) end and Odd : sig val test: int -> bool end = struct let test i = if i-1 <= 0 then true else Even.test (i-1) end
It defines two modules
Odd, that both test whether an integer is even or odd, and if that is not the case calls the test function from the other module. Not a really interesting use of recursive modules obviously. The compilation schema for recursive modules is the following:
- First, it allocates empty blocks for each module according to its shape (how many values are bound and what size they need in the block, if the module is a functor and what are its values, etc).
- Then these blocks are filled with the implementation.
In our case, in a pseudo-code that is a bit higher order than Lambda (the first intermediate language of ocaml) it would translate as:
module Even = <allocation of the shape of even.cmx> module Odd = <allocation of the shape of odd.cmx>
Even := <struct let test = .. end> Odd := <struct let test = .. end>
This ensures that every reference to
Odd (and vice-versa) are valid pointers. To respect this schema, we will use packs to tie recursive modules together. Without packs, this means we would generate this code when linking the units into an executable which can be tricky. The pack can simply do it as initialization code.
Compiling modules for recursive pack
If we tried to compile these modules naively, we would end up in the same situation than for the functor pack: the compilation units would refer to identifiers that do not exist at the time they are generated. Moreover, the initialization part needs to know the shape of the compilation unit to be able to allocate precisely the block that will contain the recursive module. In order to implement recursive compilation units into packs, we extends the compilation units in two ways:
- The shape of the unit is computed and stored in the
- As for functor pack, we apply closure conversion on the free variables that are modules from the same pack or from packs above in the hierarchy as long as they are recursive.
As an example, we will reuse our
Odd example above and split it into two units
odd.ml, and compile them into a recursive pack
P. Both have the same shape: a module with a single value.
Even refers to a free variable
Odd, which is in the same recursive pack, and vice-versa. The result of the closure conversion is a function that will take the pointer resulting from the initialization. Since the module is also recursive itself, it takes its own pointer resulting from its initialization. The result will look as something like:
(* even.cmx *) module Even_rec (Even: <even.mli>
)(Odd: <odd.mli> ) = .. (* odd.cmx *) module Odd_rec (Odd: <odd.mli> )(Even: <even.mli>) = .. (* p.cmx *) module Even = <allocation of the shape of even.cmx> module Odd = <allocation of the shape of odd.cmx> Even := Even_rec(Even)(Odd) Odd := Odd_rec(Odd)(Even)
Under the hood, these new features come with some refactoring in the pack implementation which follows work done for RFC on the representation of symbols in the middle-end of the compiler. Packs were not really used anymore and were deprecated by module aliases, this work makes them relevant again. These RFCs improve the OCaml ecosystem in multiple ways:
- Compilation units are now on par with modules, since they can be functors.
- Functor packs allow developers to implement parameterized libraries, without having to rely on scripts to produce multiple libraries linked with different backends (for example, Cohttp can use Lwt or Async as backend, and provides two libraries, one for each of these).
- Recursive packs allow the implementation of recursive modules into separate files.
We hope that such improvements will benefit the users and library developers. Having a way to implement parameterize libraries without having to describe big functors by hand, or use mutually recursive compilation units without using scripts to generate a unique
ml file will certainly introduce new workflows.