Jul
11
2013

As announced some time ago, I am working on a new intermediate language within the OCaml compiler to improve its inlining strategy. After some time of bug squashing, I prepared a testable version of the patchset, available either on Github (branch flambda_experiments), or through OPAM, in the following repository:

opam repo add inlining https://github.com/OCamlPro/opam-compilers-repository.git
opam switch flambda
opam install inlining-benchs

The series of patches is not ready for benchmarking against real applications, as no cross module information is propagated yet (this is more practical for development because it simplifies debugging a lot), but it already works quite well on single-file code. Some very simple benchmark examples are available in the inlining-benchs package.

The series of patches implements a set of 'reasonable' compilation passes, that do not try anything too complicated, but combined, generates quite efficient code.

Current Status

As said in the previous post, I decided to design a new intermediate language to implement better inlining heuristics in the compiler. This intermediate language, called flambda, lies between the lambda code and the Clambda code. It has an explicit representation of closures, making them easier to manipulate, and modules do not appear in it anymore (they have already been compiled to static structures).

I then started to implement new inlining heuristics as functions from the lambda code to the flambda code. The following features are already present:

  • intra function value analysis

  • variable rebinding

  • dead code elimination (which needs purity analysis)

  • known match / if branch elimination

In more detail, the chosen strategy is divided into two passes, which can be described by the following pseudo-code:

1st:
if function is at toplevel
then if applied to at least one constant OR small enough
     then inline
else if applied to at least one constant AND small enough
     then inline

2nd:
if function is small enough
   AND does not contain local function declarations
then inline

The first pass eliminates most functor applications and functions of the kind:

let iter f x =
  let rec aux x = ... f ... in
  aux x

The second pass eliminates the same kind of functions as Ocaml 4.01, but being after the first pass, it can also inline functions revealed by inlining functors.

Benchmarks

I ran a few benchmarks to ensure that there were no obvious miscompilations (and there were, but they are now fixed). On benchmarks that were too carefully written there was not much gain, but I got interesting results on some examples: those illustrate quite well the improvements, and can be seen at $(opam config var lib)/inlining-benchs (binaries at $(opam congfig var bin)/bench-*).

The Knuth-Bendix Benchmark (single-file)

Performance gains against OCaml 4.01 are around 20%. The main difference is that exceptions are compiled to constants, hence not allocated when raised. In that particular example, this halves the allocations.

In general, constant exceptions can be compiled to constants when predefined (Not_found, Failure, ...). They cannot yet when user defined: to improve this a few things need to be changed in translcore.ml to annotate values created by exceptions.

The Noiz Benchmark:

Performance gains are around 30% against OCaml 4.01. This code uses a lot of higher order functions of the kind:

let map_triple f (a,b,c) = (f a, f b, f c)

OCaml 4.01 can inline map_triple itself but then cannot inline the parameter f. Moreover, when writing:

let (x,y,z) = map_triple f (1,2,3)

the tuples are not really used, and after inlining their allocations can be eliminated (thanks to rebinding and dead code elimination)

The Set Example

Performance gains are around 20% compared to OCaml 4.01. This example shows how inlining can help defunctorization: when inlining the Set functor, the provided comparison function can be inlined in Set.add, allowing direct calls everywhere.

Known Bugs

Recursive Values

A problem may arise in a rare case of recursive values where a field access can be considered to be a constant. Something that would look like (if it were allowed):

type 'a v = { v : 'a }

let rec a = { v = b }
and b = (a.v, a.v)

I have a few solutions, but not sure yet which one is best. This probably won't appear in any normal test. This bug manifests through a segmentation fault (cmmgen fails to compile that recursive value reasonably).

Pattern-Matching

The new passes assume that every identifier is declared only once in a given module, but this assumption can be broken on some rare pattern matching cases. I will have to dig through matching.ml to add a substitution in these cases. (the only non hand-built occurence that I found is in ocamlnet)

known Mis-compilations

  • since there is no cross-module information at the moment, calls to functions from other modules are always slow.

  • In some rare cases, there could be functions with more values in their closure, thus resulting in more allocations.

What's next ?

I would now like to add back cross-module information, and after a bit of cleanup the first series of patches should be ready to propose upstream.