A few months ago, a memory leak in the
Scanf.fscanf function of OCaml’s standard library has been reported on the OCaml mailing list. The following “minimal” example reproduces this misbehavior:
for i = 0 to 100_000 do let ic = open_in "some_file.txt" in Scanf.fscanf ic "%s" (fun _s -&gt; ()); close_in ic done;; read_line ();;
Let us see how to identify the origin of the leak and fix it with our OCaml memory profiler.
Installing the OCaml Memory Profiler
We first install our modified OCaml compiler and the memory profiling tool thanks to the following opam commands:
$ opam remote add memprof http://memprof.typerex.org/opam $ opam update
$ opam switch 4.01.0+ocp1-20150202 $ opam install ocp-memprof $ eval `opam config env`
That’s all ! Installation is done after only five (opam) commands.
Compiling and Executing the Example
The second step consists in compiling the example above and profiling it. This is simply achieved with the commands:
$ ocamlopt scanf_leak.ml -o scanf.x
$ ocp-memprof --exec scanf.x
You may notice that no instrumentation of the source is needed to enable profiling.
Visualizing the Results
In the last command above,
scanf.x dumps a lot of information (related to memory occupation) during its execution. Our “OCaml Memory Profiler” then analyzes these dumps, and generates a “human readable” graph that shows the evolution of memory consumption after each OCaml garbage collection. Concretely, this yields the graph below (the interactive graph generated by
ocp-memprof is available here). As you can see, memory consumption is growing abnormally and exceed 240Mb ! Note that we stopped the
scanf.x after 90 seconds.
Playing With (Some of) ocp-memprof Capabilities
ocp-memprof allows to group and show data contained in the graph w.r.t. several criteria. For instance, data are grouped by “Modules” in the capture below. This allows us to deduce that most allocations are performed in the
In addition to aggregation capabilities, the interactive graph generated by ocp-memprof also allows to “zoom” on particular data. For instance, by looking at
Scanf, we obtain the graph below that shows the different functions that are allocating in this module. We remark that the most allocating function is
Scanf.Scanning.from_ic. Let us have a look to this function.
From Profiling Graphs to Source Code
The code of the function
from_ic, that is responsible for most of the allocation in
Scanf, is the following:
let memo_from_ic = let memo = ref  in (fun scan_close_ic ic -> try List.assq ic !memo with | Not_found -> let ib = from_ic scan_close_ic (From_channel ic) ic in memo := (ic, ib) :: !memo; ib) ;;
It looks like that the leak is caused by the
memo list that associates a lookahead buffer, resulting from the call to
from_ic, with each input channel.
Patching the Code
Benoit Vaugon quickly sent a patch based on weak-pointers that seems to solve the problem. He modified the code as follows:
- he put the key in a weak set in order to test if it is gone;
- he created a pair that stores the key and the associated value (
- he put this pair in a weak set (
IcMemo), where it will be reclaimed at the next GC because;
- he added a finalizer on the pair that adds again the pair in the weak set at each GC
let memo_from_ic = let module IcMemo = Weak.Make (struct type t = Pervasives.in_channel let equal ic1 ic2 = ic1 = ic2 let hash ic = Hashtbl.hash ic end) in let module PairMemo = Weak.Make (struct type t = Pervasives.in_channel * in_channel let equal (ic1, _) (ic2, _) = ic1 = ic2 let hash (ic, _) = Hashtbl.hash ic end) in let ic_memo = IcMemo.create 16 in let pair_memo = PairMemo.create 16 in let rec finaliser ((ic, _) as pair) = if IcMemo.mem ic_memo ic then ( Gc.finalise finaliser pair; PairMemo.add pair_memo pair) in (fun scan_close_ic ic -> try snd (PairMemo.find pair_memo (ic, stdin)) with | Not_found -> let ib = from_ic scan_close_ic (From_channel ic) ic in let pair = (ic, ib) in IcMemo.add ic_memo ic; Gc.finalise finaliser pair; PairMemo.add pair_memo pair; ib) ;;
Checking the Fixed Version
Curious to see the memory behavior after applying this patch ? The graph below shows the memory consumption of the patched version of
Scanf module. Again, the interactive version is available here. After each iteration of the
for-loop, the memory is released as expected and memory consumption does not exceed 2.1Mb during each
Do not hesitate to use
ocp-memprof on your applications. Of course, all feedback and suggestions on using
ocp-memprof are welcome, just send us an email !