OCamlPro Highlights: November 2013

Date: 2013-12-03
Category: OCaml
Tags: ocaml, tooling



New Team Members

We are pleased to welcome three new members in our OCamlPro team since the beginning of November:

  • Benjamin Canou started working at OCamlPro on the Richelieu project, an effort to bring better safety and performance to the Scilab language. He is in charge of a type inference algorithm that will serve both as a developper tool and in coordination with a JIT. He spent his first month understanding the darkest corners of the language, and then writing a versatile AST with a parser to build it. Actually, this is not an easy task, because the language gives different statuses to characters (including spaces) depending on the context, leading to non-trivial lexing. But the real source of problems is the fact that the original lexparser is intermingled with the interpreter inside a big bunch of venerable FORTRAN code. This old fellow makes parsing choices depending on the dynamic typing context, allows its users to catch syntax errors at runtime, among other fun things. The new OCaml lexer and parser is handwritten in around a thousand lines, has performance comparable to a Lex and Yacc generated one, and is resilient to errors so it could be integrated into an IDE to detect errors on the fly without stopping on the first one. Once again, it’s OCaml to the rescue of the weak and elderly!An example of the kind of code that can be written in Scilab:
if return = while then [ 12..
34.. … .. …
56 } ; else ‘”‘”
end

which is parsed into:

— parsed in 0.000189–
(script (if (== !return !while) (matrix (row 123456)) “‘”))
— messages
1.10:1.11: use of deprecated operator ‘=’
— end
  • Gregoire Henry started working at OCamlPro on the Bware project. He is tackling the optimization of memory performance of automatic provers written in OCaml, in collaboration with Cagdas Bozman. One of his first contributions after joining us was to exhume his internship work of 2004, an implementation of Graphics for Mac OS X that we are going to use for our online OCaml IDE!

  • Thomas Blanc started a PhD at OCamlPro after his summer internship with us. He is going to continue his work on whole-program analysis, especially as a way to detect uncaught exceptions. We hope his tool will be a good replacement for the ocamlexn tool written by Francois Pessaux.

Compiler Updates

On the compiler optimization front, Pierre Chambart got direct access to the OCaml SVN, so that he will soon upload his work directly into an SVN branch, easier for reviewing and integration into the official compiler. A current set of optimizations is already scheduled for the new branch, and is now working on inlining recursive functions, such List.map, by inlining the function definition at the call site, when at least one of its arguments is invariant during recursion.

A function that can benefit a lot from that transformation is:

let f l = List.fold_left (+) 0 l
camlTest__f_1013:
.L102:
movq %rax, %rdi
movq $1, %rbx
jmp camlTest__fold_left_1017@PLT

camlTest__fold_left_1017:
.L101:
cmpq $1, %rdi
je .L100
movq 8(%rdi), %rsi
movq (%rdi), %rdi
leaq -1(%rbx, %rdi), %rbx
movq %rsi, %rdi
jmp .L101
.align 4
.L100:
movq %rbx, %rax
ret

Development Tools

Release of OPAM 1.1

After lots of testing and fixing, the official version 1.1.0 of OPAM has finally been released. It features lots of stability improvements, and a reorganized and cleaner repo now hosted at https://opam.ocaml.org. Work goes on on OPAM as we’ll release opam-installer soon, a small script that enables using and testing .install files. This is a step toward a better integration of OPAM with existing build tools, and are experimenting with ways to ease usage for Coq packages, to generate binary packages, and to enhance portability.

Binary Packages for OPAM

We also started to experiment with binary packages. We developed a very small tool, ocp-bin, that monitors the compilation of every OPAM package, takes a snapshot of OPAM files before and after the compilation and installation, and generates a binary archive for the package. The next time the package is re-installed, with the same dependencies, the archive is used instead of compiling the package again.

For a typical package, the standard OPAM file:

build: [
[ “./configure” “–prefix” “%{prefix}%”]
[ “make]
[ make “install”]
]
remove: [
[ make “uninstall” ]
]

has to be modified in:

build: [
[ “ocp-bin” “begin” “%{package}%” “%{version}%” “%{compiler}%” “%{prefix}%”
“-opam” “-depends” “%{depends}%” “-hash” “%{hash}%”
“-nodeps” “ocamlfind.” ]
[ “ocp-bin” “–” “./configure” “–prefix” “%{prefix}%”]
[ “ocp-bin” “–” make]
[ “ocp-bin” “–” make “install”]
[ “ocp-bin” “end” ]
]
remove: [
[ “!” “ocp-bin” “uninstall”
“%{package}%” “%{version}%” “%{compiler}%” “%{prefix}%” ]

Such a transformation would be automated in the future by adding a field ocp-bin: true. Note that, since ocp-bin takes care of the deinstallation of the package, it would ensure a complete and correct deinstallation of all packages.

We also implemented a client-server version of ocp-bin, to be able to share binary packages between users. The current limitation with this approach is that many binary packages are not relocatable: if packages are compiled by Bob to be installed in /home/bob/.opam/4.01.0, the same packages will only be usable on a different computer by a user with the same home path! Although it can still be useful for a user with several computers, we plan to investigate now on how to build relocatable packages for OCaml.

Stable Release of ocp-index

Always looking for a way to provide better tools to the OCaml programmer, we are happy to announce the first stable release of ocp-index, which provides quick access to the installed interfaces and documentation as well as some source-browsing features (lookup ident definition, search for uses of an ident, etc).

Profiling Alt-Ergo with ocp-memprof: The Killer App

One of the most exciting events this month is the use of the ocp-memprof tool to profile an execution of Alt-Ergo on a big formula generated by the Cubicle model checker. The story is the following:

The formula was generated from a transition system modeling the FLASH coherence cache protocol, plus additional information computed by Cubicle during the verification of FLASH’s safety. It contains a sub-formula made of nested conjunctions of 999 elements. Its proof requires reasoning in the combination of the free theory of equality, enumerated data types and quantifiers. Alt-Ergo was able to discharge it in only 10 seconds. However, Alain Mebsout — who is doing his Phd thesis on Cubicle — noticed that Alt-Ergo allocates more than 60 MB during its execution.

In order to localize the source of this abnormal memory consumption, we installed the OCaml Memory Profiler runtime, version 4.00.1+memprof (available in the private OPAM repository of OCamlPro) and compiled Alt-Ergo using -bin-annot option in order to dump .cmt files. We then executed the prover on Alain’s example as shown below, without any instrumentation of Alt-Ergo’s code.

$ OCAMLRUNPARAM=m ./alt-ergo.opt formula.mlw

This execution caused the modified OCaml compiler to dump a snapshot of the typed heap at every major collection of the GC. The names of dumped files are of the form memprof.<PID>.<DUMP-NAME>.<image-number>.dump, where PID is a natural number that identifies the set of dumped files during a particular execution.

Dumped files were then fed to the ocp-memprof tool (available in the TypeRex-Pro toolbox) using the syntax below. The synthesis of this step (.hp file) was then converted to a .ps file thanks to hp2ps command. At the end, we obtained the diagram shown in the figure below.

$ ./ocp-memprof -loc -sizes PID

alt-ergo-memprof-before.png

From the figure above, one can extract the following information:

  • there were 15 major collections of OCaml’s GC during the above execution (the x-axis),

  • Alt-Ergo allocated more than 60 MB during its execution (the y-axis),

  • Some function in file src/preprocess/why_typing.ml is allocating a lot of data of type Parsed.pp_desc at line 868 (the first square of the legend).

The third point corresponds to a piece of code used in a recursive function that performs alpha renaming on parsed formulas to avoid variable captures. This code is the following:

let rec alpha_renaming_b s f =
…

| PPinfix(f1, op, f2) -> (* ‘op’ may be the AND operator *)
let ff1 = alpha_renaming_b s f1 in
let ff2 = alpha_renaming_b s f2 in
PPinfix(ff1, op, ff2) (* line 868 *)

…

Actually, in 99% there are no capture problems and the function just reconstructs a new value PPinfix(ff1, op, ff2) that is structurally equal (=) to its argument f. In case of very big formulas (recall that Alain’s formula contains a nested conjunction of 999 elements), this causes Alt-Ergo to allocate a lot.

Fixing this behavior was straightforward. We only had to check whether recursive calls to alpha renaming function returned modified values using physical equality ==. If not, no renaming is performed and we safely return the formula given in the argument. This way, the function will never allocate for formulas without capture issues. For instance, the piece of code given above is fixed as follows:

let rec alpha_renaming_b s f =
…

| PPinfix(f1, op, f2) ->
let ff1 = alpha_renaming_b s f1 in
let ff2 = alpha_renaming_b s f2 in
if ff1 == f1 && ff2 == f2 then f (* no renaming performed by recursive calls ? *)
else PPinfix(ff1, op, ff2)

…

Once we applied the patch on the hole function alpha_renaming_b, Alt-Ergo only needed 2 seconds and less than 2.2MB memory to prove our formula. Profiling an execution of patched version of the prover with OCaml 4.00.1+memprof and ocp-memprof produced the diagram below. The difference with the first drawing was really impressive.

alt-ergo-memprof-after.png

Other R&D Projects

Scilint, the Scilab Style-Checker

This month our work on Richelieu was mainly focused on improving Scilint. After some discussions with Scilab knowledgeable users, we chose a new set of warnings to implement. Among other things those warnings analyze primitive fonctions and their arguments as well as loop variables. Another important thing was to allow SciNotes, Scilab’s editor, to display our warnings. This has been done by implementing support for Firehose. Finally some minor bugs were also fixed.



About OCamlPro:

OCamlPro is a R&D lab founded in 2011, with the mission to help industrial users benefit from experts with a state-of-the-art knowledge of programming languages theory and practice.

  • We provide audit, support, custom developer tools and training for both the most modern languages, such as Rust, Wasm and OCaml, and for legacy languages, such as COBOL or even home-made domain-specific languages;
  • We design, create and implement software with great added-value for our clients. High complexity is not a problem for our PhD-level experts. For example, we helped the French Income Tax Administration re-adapt and improve their internally kept M language, we designed a DSL to model and express revenue streams in the Cinema Industry, codename Niagara, and we also developed the prototype of the Tezos proof-of-stake blockchain from 2014 to 2018.
  • We have a long history of creating open-source projects, such as the Opam package manager, the LearnOCaml web platform, and contributing to other ones, such as the Flambda optimizing compiler, or the GnuCOBOL compiler.
  • We are also experts of Formal Methods, developing tools such as our SMT Solver Alt-Ergo (check our Alt-Ergo Users' Club) and using them to prove safety or security properties of programs.

Please reach out, we'll be delighted to discuss your challenges: contact@ocamlpro.com or book a quick discussion.