wxOCaml, camlidl and Class Modules

Authors: Çagdas Bozman
Date: 2015-04-13
Category: Tooling



A few months ago, a memory leak in the Scanf.fscanf function of OCaml’s standard library has been reported on the OCaml mailing list. The following “minimal” example reproduces this misbehavior:

for i = 0 to 100_000 do
  let ic = open_in “some_file.txt” in
  Scanf.fscanf ic “%s” (fun _s -> ());
  close_in ic
done;;

read_line ();;

Let us see how to identify the origin of the leak and fix it with our OCaml memory profiler.

Installing the OCaml Memory Profiler

We first install our modified OCaml compiler and the memory profiling tool thanks to the following opam commands:

$ opam remote add memprof http://memprof.typerex.org/opam
$ opam update
$ opam switch 4.01.0+ocp1-20150202
$ opam install ocp-memprof
$ eval opam config env

That’s all ! Installation is done after only five (opam) commands.

Compiling and Executing the Example

The second step consists in compiling the example above and profiling it. This is simply achieved with the commands:

$ ocamlopt scanf_leak.ml -o scanf.x
$ ocp-memprof –exec scanf.x

You may notice that no instrumentation of the source is needed to enable profiling.

Visualizing the Results

In the last command above, scanf.x dumps a lot of information (related to memory occupation) during its execution. Our “OCaml Memory Profiler” then analyzes these dumps, and generates a “human readable” graph that shows the evolution of memory consumption after each OCaml garbage collection. Concretely, this yields the graph below (the interactive graph generated by ocp-memprof is available here). As you can see, memory consumption is growing abnormally and exceed 240Mb ! Note that we stopped the scanf.x after 90 seconds.

Playing With (Some of) ocp-memprof Capabilities

ocp-memprof allows to group and show data contained in the graph w.r.t. several criteria. For instance, data are grouped by “Modules” in the capture below. This allows us to deduce that most allocations are performed in the Scanf and Buffer modules.

In addition to aggregation capabilities, the interactive graph generated by ocp-memprof also allows to “zoom” on particular data. For instance, by looking at Scanf, we obtain the graph below that shows the different functions that are allocating in this module. We remark that the most allocating function is Scanf.Scanning.from_ic. Let us have a look to this function.

From Profiling Graphs to Source Code The code of the function from_ic, that is responsible for most of the allocation in Scanf, is the following:

let memo_from_ic =
let memo = ref [] in
(fun scan_close_ic ic ->
   try 
     List.assq ic !memo 
   with
   | Not_found ->
     let ib = from_ic scan_close_ic (From_channel ic) ic in
     memo := (ic, ib) :: !memo;
     ib)
;;

It looks like that the leak is caused by the memo list that associates a lookahead buffer, resulting from the call to from_ic, with each input channel.

Patching the Code

Benoit Vaugon quickly sent a patch based on weak-pointers that seems to solve the problem. He modified the code as follows:

  • he put the key in a weak set in order to test if it is gone;
  • he created a pair that stores the key and the associated value (PairMemo);
  • he put this pair in a weak set (IcMemo), where it will be reclaimed at the next GC because;
  • he added a finalizer on the pair that adds again the pair in the weak set at each GC
let memo_from_ic =
  let module IcMemo = Weak.Make (
    struct
      type t = Pervasives.in_channel
      let equal ic1 ic2 = ic1 = ic2
      let hash ic = Hashtbl.hash ic
    end) 
  in
  let module PairMemo = Weak.Make (
    struct
      type t = Pervasives.in_channel * in_channel
      let equal (ic1, _) (ic2, _) = ic1 = ic2
      let hash (ic, _) = Hashtbl.hash ic
    end) 
  in
  let ic_memo = IcMemo.create 16 in
  let pair_memo = PairMemo.create 16 in
  let rec finaliser ((ic, _) as pair) =
    if IcMemo.mem ic_memo ic then (
      Gc.finalise finaliser pair;
      PairMemo.add pair_memo pair) in
  (fun scan_close_ic ic ->
     try snd (PairMemo.find pair_memo (ic, stdin)) with
     | Not_found ->
       let ib = from_ic scan_close_ic (From_channel ic) ic in
       let pair = (ic, ib) in
       IcMemo.add ic_memo ic;
       Gc.finalise finaliser pair;
       PairMemo.add pair_memo pair;
       ib)
;;

Checking the Fixed Version

Curious to see the memory behavior after applying this patch ? The graph below shows the memory consumption of the patched version of Scanf module. Again, the interactive version is available here. After each iteration of the for-loop, the memory is released as expected and memory consumption does not exceed 2.1Mb during each for-loop iteration.

Conclusion

This example is online in our gallery of examples if you want to see and explore the graphs (with the leak and without the leak).

Do not hesitate to use ocp-memprof on your applications. Of course, all feedback and suggestions on using ocp-memprof are welcome, just send us an email !

More information:



About OCamlPro:

OCamlPro is a R&D lab founded in 2011, with the mission to help industrial users benefit from experts with a state-of-the-art knowledge of programming languages theory and practice.

  • We provide audit, support, custom developer tools and training for both the most modern languages, such as Rust, Wasm and OCaml, and for legacy languages, such as COBOL or even home-made domain-specific languages;
  • We design, create and implement software with great added-value for our clients. High complexity is not a problem for our PhD-level experts. For example, we helped the French Income Tax Administration re-adapt and improve their internally kept M language, we designed a DSL to model and express revenue streams in the Cinema Industry, codename Niagara, and we also developed the prototype of the Tezos proof-of-stake blockchain from 2014 to 2018.
  • We have a long history of creating open-source projects, such as the Opam package manager, the LearnOCaml web platform, and contributing to other ones, such as the Flambda optimizing compiler, or the GnuCOBOL compiler.
  • We are also experts of Formal Methods, developing tools such as our SMT Solver Alt-Ergo (check our Alt-Ergo Users' Club) and using them to prove safety or security properties of programs.

Please reach out, we'll be delighted to discuss your challenges: contact@ocamlpro.com or book a quick discussion.