Apr
13
2015

A few months ago, a memory leak in the Scanf.fscanf function of OCaml's standard library has been reported on the OCaml mailing list. The following "minimal" example reproduces this misbehavior:

(* in file scanf_leak.ml *)
for i = 0 to 100_000 do
   let ic = open_in "some_file.txt" in
   Scanf.fscanf ic "%s" (fun _s -> ());
   close_in ic
done;;
        
read_line ();;

Let us see how to identify the origin of the leak and fix it with our OCaml memory profiler.

Installing the OCaml Memory Profiler

We first install our modified OCaml compiler and the memory profiling tool thanks to the following opam commands:

#### Add memprof repository ####
$ opam remote add memprof http://memprof.typerex.org/opam
$ opam update
    
#### Install the patched compiler and ocp-memprof ####
$ opam switch 4.01.0+ocp1-20150202
$ opam install ocp-memprof
$ eval `opam config env`

That's all ! Installation is done after only five (opam) commands.

Compiling and Executing the Example

The second step consists in compiling the example above and profiling it. This is simply achieved with the commands:

#### Compile your example ####
$ ocamlopt scanf_leak.ml -o scanf.x
    
#### Execute your program with ocp-memprof ####
$ ocp-memprof --exec scanf.x

You may notice that no instrumentation of the source is needed to enable profiling.

Visualizing the Results

In the last command above, scanf.x dumps a lot of information (related to memory occupation) during its execution. Our "OCaml Memory Profiler" then analyzes these dumps, and generates a "human readable" graph that shows the evolution of memory consumption after each OCaml garbage collection. Concretely, this yields the graph below (the interactive graph generated by ocp-memprof is available here). As you can see, memory consumption is growing abnormally and exceed 240Mb ! Note that we stopped the scanf.x after 90 seconds.

Ocaml-fscanf-function-with-leak

Playing With (Some of) ocp-memprof Capabilities

ocp-memprof allows to group and show data contained in the graph w.r.t. several criteria. For instance, data are grouped by "Modules" in the capture below. This allows us to deduce that most allocations are performed in the Scanf and Buffer modules.

Ocaml-fscanf-function-with-leak

In addition to aggregation capabilities, the interactive graph generated by ocp-memprof also allows to "zoom" on particular data. For instance, by looking at Scanf, we obtain the graph below that shows the different functions that are allocating in this module. We remark that the most allocating function is Scanf.Scanning.from_ic. Let us have a look to this function.

Ocaml-fscanf-function-with-leak

From Profiling Graphs to Source Code

The code of the function from_ic, that is responsible for most of the allocation in Scanf, is the following:

let memo_from_ic =
        let memo = ref [] in
        (fun scan_close_ic ic ->
        try List.assq ic !memo with
        | Not_found ->
                let ib = from_ic scan_close_ic (From_channel ic) ic in
                memo := (ic, ib) :: !memo;
                ib)
;;

It looks like that the leak is caused by the memo list that associates a lookahead buffer, resulting from the call to from_ic, with each input channel.

Patching the Code

Benoit Vaugon quickly sent a patch based on weak-pointers that seems to solve the problem. He modified the code as follows:

  • he put the key in a weak set in order to test if it is gone;

  • he created a pair that stores the key and the associated value (PairMemo);

  • he put this pair in a weak set (IcMemo), where it will be reclaimed at the next GC because;

  • he added a finalizer on the pair that adds again the pair in the weak set at each GC

let memo_from_ic =
        let module IcMemo = Weak.Make (struct
                type t = Pervasives.in_channel
                let equal ic1 ic2 = ic1 = ic2
                let hash ic = Hashtbl.hash ic
        end) in
        let module PairMemo = Weak.Make (struct
                type t = Pervasives.in_channel * in_channel
                let equal (ic1, _) (ic2, _) = ic1 = ic2
                let hash (ic, _) = Hashtbl.hash ic
        end) in
        let ic_memo = IcMemo.create 16 in
        let pair_memo = PairMemo.create 16 in
        let rec finaliser ((ic, _) as pair) =
                if IcMemo.mem ic_memo ic then (
                        Gc.finalise finaliser pair;
                        PairMemo.add pair_memo pair) in
        (fun scan_close_ic ic ->
        try snd (PairMemo.find pair_memo (ic, stdin)) with
        | Not_found ->
                let ib = from_ic scan_close_ic (From_channel ic) ic in
                let pair = (ic, ib) in
                IcMemo.add ic_memo ic;
                Gc.finalise finaliser pair;
                PairMemo.add pair_memo pair;
        ib)
;;

Checking the Fixed Version

Curious to see the memory behavior after applying this patch ? The graph below shows the memory consumption of the patched version of Scanf module. Again, the interactive version is available here. After each iteration of the for-loop, the memory is released as expected and memory consumption does not exceed 2.1Mb during each for-loop iteration.

Ocaml-fscanf-function-without-leak

Conclusion

This example is online in our gallery of examples if you want to see and explore the graphs (with the leak and without the leak).

Do not hesitate to use ocp-memprof on your applications. Of course, all feedback and suggestions on using ocp-memprof are welcome, just send us an email !

More information: