OCamlPro has a long history of dedicated efforts to support the development of the OCaml compiler, through sponsorship or direct contributions from Flambda Team. An important one is the Flambda intermediate representation designed for optimizations, and in the future its next iteration Flambda 2. This work is funded by JaneStreet.
Packs in the OCaml ecosystem are kind of an outdated concept (options
-for-pack in the OCaml manual), and their main utility has been overtaken by the introduction of module aliases in OCaml 4.02. What if we tried to redeem them and give them a new youth and utility by adding the possibility to generate functors or recursive packs?
This blog post covers the functor units and functor packs, while the next one will be centered around recursive packs. Both RFCs are currently developed by JaneStreet and OCamlPro. This idea was initially introduced by functor packs (Fabrice Le Fessant) and later generalized by functorized namespaces (Pierrick Couderc et al.).
Packs for the masses
First of all let’s take a look at what packs are, and how they fixed some issues that arose when the ecosystem started to grow and the number of libraries got quite large.
One common problem in any programming language is how names are treated and disambiguated. For example, look at this small piece of code:
let x = "something" let x = "something else"
We declare two variables
x, but actually the first one is shadowed by the second, and is now unavailable for the rest of the program. It is perfectly valid in OCaml. Let’s try to do the same thing with modules:
module M = struct end module M = struct end
The compiler rejects it with the following error:
File "m.ml", line 3, characters 0-21: 3 | module M = struct end ^^^^^^^^^^^^^^^^^^^^^ Error: Multiple definition of the module name M. Names must be unique in a given structure or signature.
This also applies with programs linking two compilation units of the same name. Imagine you are using two libraries (here
lib_b), that both define a module named
ocamlopt -o prog.asm -I lib_a -I lib_b lib_a.cmxa lib_b.cmxa prog.ml File "prog.ml", line 1: Error: The files lib_a/a.cmi and lib_b/b.cmi make inconsistent assumptions over interface Misc
At link time, the compiler will reject your program since you are trying to link two modules with the same name but different implementations. The compiler is unable to differentiate the two compilation units since they define some identical symbols, as such cannot link the program. Enforcing unique module names in the same namespace (i.e. a signature) is consistent with the inability to link two modules of the same name in the same program.
Misc is a common name for a module in any project. How can we avoid that? As a user of the libraries there is nothing you can do, since you cannot rename the modules (you will eventually need to link two files named
misc.cmx). As the developer, you need to ensure that your module names are unique enough to be used along any other libraries. One solution would be to use prefixes for each of your compilation units, for example by naming your files
mylib_misc.ml, with the drawback that you will need to use those long module names inside your library. Another solution is packing your units.
A pack is simply a generated module that appends all your compilation units into one. For example, suppose you have two files
b.ml, you can generate a pack (i.e. a module)
mylib.cmx that is equivalent to:
module A = struct <content of a.ml> end module B = struct <content of b.ml> end
B can retain their original module name, and be accessed from the outside as
Mylib.B. It uses the namespacing induced by the module system. A developer can simply generate a pack for its library, assuming its library name will be unique enough to be linked with other modules without the risk of name clashing. However it has one big downside: suppose you use a library with many modules but only use one. Without packs the compiler will only link the necessary compilation units from this library, but since the pack is one big compilation unit this means your program embeds the complete library.
This problem is fixed using module aliases and the compiler option
-no-alias-deps since OCaml 4.02, and the result for the user is equivalent to a pack, making them more or less deprecated.
Functorizing packs, or how to parameterize a library
Packs being modules representing libraries, a useful feature would be to be able to produce libraries that take modules as parameters, just like functors. Another usage would be to split a huge functor into multiple files. In other words, we want our pack
Mylib to be compiled as:
functor (P : sig .. end) -> struct module A = struct <content of a.ml> end module B = struct <content of b.ml> end end
B would use the parameter
P as a module, and
Mylib instantiated later as
module Mylib = Mylib(Some_module_with_sig_compatible_with_P)
One can notice that our pack is indeed a functor, and not simply a module that binds a functor. To be able to do that, we also extends classical compilation units to be compiled as functors. Such functors are not expressed in the language, we do not provide a syntax for that, they are a matter of options at compile-time. For example:
ocamlopt -c -parameter P m.ml
m.ml as a functor that has a parameter
P whose interface is described in
p.cmi in the compilation path. Similarly, our pack
Mylib can be produced by the following compilation steps:
ocamlopt -c -parameter-of Mylib p.mli ocamlopt -c -for-pack "Mylib(P)" a.ml ocamlopt -c -for-pack "MyLib(P)" b.ml ocamlopt -pack -o mylib.cmx -parameter P a.cmx b.cmx
- The parameter is compiled with the flag
-parameter-of Mylib, as such it won’t be used as the interface of an implementation.
- The two modules packed are compiled with the flag
Expressing the parameter of the pack is mandatory since
Pmust be known as a functor parameter (we will see why in the next section).
- The pack is compiled with
-parameter P, which will indeed produce a functorized compilation unit.
Functors are not limited to a unique parameter, as such they can be compiled with multiple
-parameter options and multiple arguments in
-for-pack. This implementation being on the build system side, it does not need to change the syntax of the language. We expect build tools like dune to provide supports for this feature, making it maybe more easier to use. Moreover, it makes compilation units on par with modules which can have a functor type. One downside however is that we cannot express type equalities between two parameters or with the functor body type as we would do with substitutions in module types.
Functor packs under the hood
In terms of implementation, packs should be seen as a concatenation of the compilation units then a rebinding of each of them in the newly created one. For example, a pack
P of two units
n.cmx is actually compiled as something like:
module P__M = <code of m.cmx> module P__N = <code of n.cmx> module M = P__M module N = P__N
According to this representation, if we tried to naively implement our previous functor pack
Mylib(P) we would end up with a functor looking like this:
module Mylib__A = <code of a.cmx, with references to P> module Mylib__B = <code of b.cmx, with references to P> functor (P : <signature of p.cmi>) -> struct module A = Mylib__A module B = Mylib__B end
Unfortunately, this encoding of functor packs is wrong:
P is free in
b.cmx and its identifier cannot correspond to the one generated for the functor retrospectively. The solution is actually quite simple and relies on a transformation known as closure conversion. In other words we will transform our modules into functors that takes as parameters their free variables, which in our case are the parameters of the functor pack and the dependencies from the same pack. Let’s do it on a concrete functor equivalent to Mylib:
module Mylib' (P : P_SIG) = struct module A = struct .. <references to P> end module B = struct .. <references to P> <references to A> end end
Our goal here is to move
B outside of the functor, as such out of the scope of
P, which is done by transforming those two modules into functors that takes a parameter
P' with the same signature as
module A_funct (P' : P_SIG) = struct .. <references to P as P'> end module B_funct (P' : P_SIG) = struct module A' = A_funct(P') .. <references to P as P'> <references to A as A'> end module Mylib' (P : P_SIG) = struct module A = A_funct(P) module B = B_funct(P) end
While this code compiles it is not semantically equivalent.
A_funct is instantiated twice, its side effects are computed twice: the first time when instantiating
A in the functor, and the second when instantiating
B. The solution is simply to go further with closure conversion and make the result of applying
P an argument of
module A_funct (P' : P_SIG) = struct .. <references to P as P'> end module B_funct (P' : P_SIG)(A': module type of A_funct(P'))= struct .. <references to P as P'> <references to A as A'> end module Mylib' (P : P_SIG) = struct module A = A_funct(P) module B = B_funct(P)(A) end
This represents exactly how our functor pack
Mylib is encoded. Since we need to compile modules in a specific way if they belong to a functor pack, the compiler has to know in the argument
-for-pack that the pack is a functor, and what are its parameters.
Functor packs applied to
What we described is a functional prototype of functor packs, implemented on OCaml 4.10, as described in RFC#11. In practice, we already have one usage that we could benefit of in the future: cross-compilation of native code. At the moment the compiler is configured to target the architecture which it is compiled on. The modules relative to the current architecture are linked symbolically into the backend folder and the backend is compiled as if it only supported one architecture. One downside of this approach is that changes into the interface of the backend that need some modifications in each architecture are not detected at compile time, but only for the current one. You need to reconfigure the OCaml compiler and rebuild it to check if another architecture still compiles. One interesting property is that each architecture backend defines the same set of modules with compatible interfaces. In other words, these modules could simply be parameters of a functor, that is instantiated for a given architecture.
Following this idea, we implemented a prototype of native compiler whose backend is indeed packed into a functor, and instantiated at the initialization of the compiler. With this approach, we can easily switch the targeted architecture, and moreover we can be sure that each architecture is compiled, leveraging the fact that some necessary refactoring is always done when changes happen in the backend interface. Implementing such a functor is mainly a matter of adapting the build system to produce a functor pack, writing few signatures for the functor and its parameters, and instantiating the backend at the right time.
This proof of concept shows how functor packs can ease some complicated build system and allows new workflow.
Making packs useful again
Packs were an old concept mainly outdated by module aliases. They were not practical as they are some sort of monolithic libraries shaped into a unique module containing sub modules. While they perfectly use the module system for its namespacing properties, their usage enforces the compiler to link an entire library even if only one module is actually used. This improvement allows programmers to define big functors, functors that are split among multiple files, resulting in what we can view as a way to implement some form of parameterized libraries.
In the second part, we will cover another aspect of the rehabilitation of packs: using packs to implement mutually recursive compilation units.