Debugging memory leaks in Python

Recently I noticed a Python service on our embedded device was leaking memory. It was a long running service with a number of library dependencies in Python and C and contained thousands of lines of code. I don’t consider myself a Python expert so I wasn’t entirely sure where to start.

After a quick Google search I turned up a number of packages that looked like they might help me but none of them turned out to be quite what was needed. Some were too basic and just gave an overall summary of memory usage, when I really wanted to see counts of which objects were leaking. Guppy hasn’t been updated in some time and seems quite complex. Pympler seemed to work but with an excessive amount of runtime overhead that made it impractical on a resource constrained system. At this point I was close to giving up on the tools and embarking on a time consuming code review to try and track down the problem. Luckily before I disappeared down that rabbit hole I came across the tracemalloc package.

Tracemalloc is a package added in Python 3.4 that allows tracing of allocations from Python. It’s part of the standard library so it should be available anywhere there is a modern Python installation. It has functionality to provide snapshots of object counts on the heap which was just what I needed. There is some runtime overhead which results in spikes in CPU time, presumably when walking the heap, but it seems more efficient than the alternatives.

An example of how you can use the tracemalloc package:

import time
import tracemalloc

snapshot = None

def trace_print():
    global snapshot
    snapshot2 = tracemalloc.take_snapshot()
    snapshot2 = snapshot2.filter_traces((
        tracemalloc.Filter(False, "<frozen importlib._bootstrap>"),
        tracemalloc.Filter(False, "<unknown>"),
        tracemalloc.Filter(False, tracemalloc.__file__)
    ))
    
    if snapshot is not None:
        print("================================== Begin Trace:")
        top_stats = snapshot2.compare_to(snapshot, 'lineno', cumulative=True)
        for stat in top_stats[:10]:
            print(stat)
    snapshot = snapshot2

l = []

def leaky_func(x):
    global l
    l.append(x)

if __name__=='__main__':
    i = 0
    tracemalloc.start()
    while True:
        leaky_func(i)
        i += 1
        time.sleep(1)
        if i % 10 == 0:
            trace_print()

This should print a snapshot every 10 seconds of the state of the heap such as the one below:

leak.py:27: size=576 B (+112 B), count=1 (+0), average=576 B
/usr/lib64/python3.5/posixpath.py:52: size=256 B (+64 B), count=4 (+1), average=64 B
/usr/lib64/python3.5/re.py:246: size=4672 B (+0 B), count=73 (+0), average=64 B
/usr/lib64/python3.5/sre_parse.py:528: size=1792 B (+0 B), count=28 (+0), average=64 B
/usr/lib64/python3.5/sre_compile.py:553: size=1672 B (+0 B), count=4 (+0), average=418 B
/usr/lib64/python3.5/sre_parse.py:72: size=736 B (+0 B), count=4 (+0), average=184 B
/usr/lib64/python3.5/fnmatch.py:70: size=704 B (+0 B), count=4 (+0), average=176 B
/usr/lib64/python3.5/sre_parse.py:524: size=560 B (+0 B), count=1 (+0), average=560 B

The numbers can be interpreted as follows:

size, the total size of allocations at this call site
count, the total number of allocations at this call site
average, the average size of allocations at this call site

The numbers in practices indicate the amount by which the value increased or decreased since the last snapshot. So to look for our memory leak we would expect to see a positive number in the brackets next to size and count. In the example above we can see there is a positive count next to leak.py line 27 which matches up with our leaky function.

Using SystemTap userspace static probes

One of the new features in glibc 2.19 was a set of SystemTap static probes in the malloc subsystem to allow a better view into its inner workings. SystemTap static probe points expand to only a single nop instruction when not enabled and take a fixed number of arguments which are passed to your SystemTap probe as arguments. I wanted to use these probes to analyze the performance of a malloc workload, so I wrote a SystemTap script to log events in the malloc subsystem.

To get this script to work on Fedora 20 I had to install the git version of SystemTap or some of the probes failed to parse
their arguments correctly. The script can be run like this:

# stap malloc.stp -c /usr/bin/ls

It’s also possible to run this script using non-installed version of glibc if you modify the globs in the script to match the path to your libc and run it with the appropriate library path:

# stap malloc.stp -c "env 'LD_LIBRARY_PATH=.../glibc-build:.../glibc-build/nptl' /usr/bin/ls"

The script is very simple and just prints a timestamp, the name of the probe point and the arguments but I hope someone will find it useful.

probe process("/lib*/libc.so.*").mark("memory_heap_new") { printf("%d:memory_heap_new heap %x size %dn", gettimeofday_ms(), $arg1, $arg2) }


probe process("/lib*/libc.so.*").mark("memory_heap_more") {

  printf("%d:memory_heap_more heap %x size %dn",

	 gettimeofday_ms(), $arg1, $arg2)

}
probe process("/lib*/libc.so.*").mark("memory_heap_less") {

  printf("%d:memory_heap_less heap %x size %dn",

	 gettimeofday_ms(), $arg1, $arg2)

}
probe process("/lib*/libc.so.*").mark("memory_heap_free") {

  printf("%d:memory_heap_free heap %x size %dn",

	 gettimeofday_ms(), $arg1, $arg2)

}
probe process("/lib*/libc.so.*").mark("memory_arena_new") {

  printf("%d:memory_arena_new arena %x size %dn",

	 gettimeofday_ms(), $arg1, $arg2)

}
probe process("/lib*/libc.so.*").mark("memory_arena_reuse_free_list") {

  printf("%d:memory_arena_reuse_free_list free_list %xn",

	 gettimeofday_ms(), $arg1)

}
probe process("/lib*/libc.so.*").mark("memory_arena_reuse_wait") {

  printf("%d:memory_arena_reuse_wait mutex %d arena %x avoid_arena %xn",

	 gettimeofday_ms(), $arg1, $arg2, $arg3)

}
probe process("/lib*/libc.so.*").mark("memory_arena_reuse") {

  printf("%d:memory_arena_reuse arena %x avoid_arena %xn",

	 gettimeofday_ms(), $arg1, $arg2)

}
probe process("/lib*/libc.so.*").mark("memory_arena_retry") {

  printf("%d:memory_arena_retry arena %x bytes %dn",

	 gettimeofday_ms(), $arg2, $arg1)

}
probe process("/lib*/libc.so.*").mark("memory_sbrk_more") {

  printf("%d:memory_sbrk_more brk %x change %dn",

	 gettimeofday_ms(), $arg1, $arg2)

}
probe process("/lib*/libc.so.*").mark("memory_sbrk_less") {

  printf("%d:memory_sbrk_less brk %x change %dn",

	 gettimeofday_ms(), $arg1, $arg2)

}
probe process("/lib*/libc.so.*").mark("memory_malloc_retry") {

  printf("%d:memory_malloc_retry bytes %dn",

	 gettimeofday_ms(), $arg1)

}
probe process("/lib*/libc.so.*").mark("memory_mallopt_free_dyn_thresholds") {

  printf("%d:memory_mallopt_free_dyn_thresholds mmap %d trim %dn",

	 gettimeofday_ms(), $arg1, $arg2)

}
probe process("/lib*/libc.so.*").mark("memory_realloc_retry") {

  printf("%d:memory_realloc_retry bytes %d oldmem %xn",

	 gettimeofday_ms(), $arg1, $arg2)

}
probe process("/lib*/libc.so.*").mark("memory_memalign_retry") {

  printf("%d:memory_memalign_retry bytes %d alignment %dn",

	 gettimeofday_ms(), $arg1, $arg2)

}
probe process("/lib*/libc.so.*").mark("memory_calloc_retry") {

  printf("%d:memory_calloc_retry bytes %dn",

	 gettimeofday_ms(), $arg1)

}
probe process("/lib*/libc.so.*").mark("memory_mallopt") {

  printf("%d:memory_mallopt param %d value %dn",

	 gettimeofday_ms(), $arg1, $arg2)

}
probe process("/lib*/libc.so.*").mark("memory_mallopt_mxfast") {

  printf("%d:memory_mallopt_mxfast new %d old %dn",

	 gettimeofday_ms(), $arg1, $arg2)

}
probe process("/lib*/libc.so.*").mark("memory_mallopt_trim_threshold") {

  printf("%d:memory_mallopt_trim_threshold new %d old %d dyn_threshold %dn",

	 gettimeofday_ms(), $arg1, $arg2, $arg3)

}
probe process("/lib*/libc.so.*").mark("memory_mallopt_top_pad") {

  printf("%d:memory_mallopt_top_pad new %d old %d dyn_threshold %dn",

	 gettimeofday_ms(), $arg1, $arg2, $arg3)

}
probe process("/lib*/libc.so.*").mark("memory_mallopt_mmap_threshold") {

  printf("%d:memory_mallopt_mmap_threshold new %d old %d dyn_threshold %dn",

	 gettimeofday_ms(), $arg1, $arg2, $arg3)

}
probe process("/lib*/libc.so.*").mark("memory_mallopt_mmap_max") {

  printf("%d:memory_mallopt_mmap_max new %d old %d dyn_threshold %dn",

	 gettimeofday_ms(), $arg1, $arg2, $arg3)

}
probe process("/lib*/libc.so.*").mark("memory_mallopt_check_action") {

  printf("%d:memory_mallopt_check_action new %d old %dn",

	 gettimeofday_ms(), $arg1, $arg2)

}
probe process("/lib*/libc.so.*").mark("memory_mallopt_perturb") {

  printf("%d:memory_mallopt_perturb new %d old %dn",

	 gettimeofday_ms(), $arg1, $arg2)

}
probe process("/lib*/libc.so.*").mark("memory_mallopt_arena_test") {

  printf("%d:memory_mallopt_arena_test new %d old %dn",

	 gettimeofday_ms(), $arg1, $arg2)

}

probe process("/lib*/libc.so.*").mark("memory_mallopt_arena_max") { printf("%d:memory_mallopt_arena_max new %d old %dn", gettimeofday_ms(), $arg1, $arg2) }

Canon PIXMA MG6350 drivers for Linux

I recently bought a Canon PIXMA MG6350 printer for my home office. Before buying it I found Canon had a set of drivers available for Linux so assumed it was reasonably well supported. However the binary packages available from the Canon support site had out of date dependencies for Fedora 20 so weren’t installable, but there was a source package available so I grabbed that.

On the positive side Canon have provided a mostly GPL CUPS printer driver package for Linux, which is to be commended, but unfortunately it doesn’t build out of the box on modern systems and contains a handful of proprietary binary libraries. I spent a bit of time hacking it to build and fix some compile warnings and pushed the result to github:

https://github.com/willnewton/cnijfilter

The following commands will build an RPM for the MG6350, you need to modify it slightly for other printers in the family:

# git archive --prefix=cnijfilter-source-3.80-2/ -o ~/rpmbuild/SOURCES/cnijfilter-source-3.80-2.tar.gz HEAD # rpmbuild -ba cnijfilter-common.spec --define="MODEL mg6300" --define="MODEL_NUM 408" --with build_common_package

As I mentioned above unfortunately there are some binary libraries in the package which seem to be essential, and the code quality in general seems pretty poor. There are a number of compiler warnings still that show up moderately serious issues with the bits of code in question. There’s a lot of copy and paste reuse and the code is full of fixed size buffers and dangerous assumptions. It lets me print documents from my laptop so I am not entirely unhappy, although it would be nice if Canon would engage with the community on getting these drivers fully open sourced and integrated properly into CUPS.

setcontext and signal handlers

setcontext is a C library function that along with getcontext allows you to perform a non-local jump from one context to another. They are often used when implementing coroutines or custom threading libraries. longjmp and setjmp provide similar functionality but setcontext was an attempt to fix the shortcomings of these functions and standardize behaviour, although in POSIX 2008 the specification of setcontext and related functions were removed due to the difficulty of implementing them in a truly portable manner.

In the beginning there was setjmp and longjmp. setjmp would capture the current register values into a data structure and longjmp would restore those values at a later point in program execution, causing the control flow to jump back to the point where setjmp was called. This works fine unless the place where you are jumping from is a signal handler. In this case the problem you have is that setjmp will not restore the signal mask so the signal you were handling will not be unmasked. To fix this functions that saved and restored the signal mask called sigsetjmp and siglongjmp were introduced.

However, this doesn’t mean it is necessarily safe to jump out of a signal handler even if you are using siglongjmp. The problem you will often hit is that if the signal was delivered at an arbitrary point in program execution there may be locks or other global resource that need to be deallocated. The only way to avoid this is to block the appropriate signal across any part of the code that may not behave well when interrupted in this way. Unfortunately without auditing all third party libraries that probably means you can only enable handling of such a signal across very small regions of your program.

setcontext and getcontext also restore and save the signal mask and setcontext can also be used to exit from a signal handler with the caveats expressed above. However there is another way in which signal handling and setcontext interact. The context structure used by setcontext and getcontext is of type ucontext_t. If you installed your signal handler using sigaction and the sa_handler field of struct sigaction you get a third argument to the signal handler which is of type void * but allows casting to ucontext_t *. So what happens if you pass this structure to setcontext?

Well the answer is, on Linux at least, “probably nothing good”. The original definition of setcontext specified that “program execution continues with the program instruction following the instruction interrupted by the signal”, but more recent versions of the standards removed that requirement so the result is now unspecified. Some glibc ports such as powerpc, mips and tile do support restoring signal handler created contexts in the spirit of the original specification, but the rest, including x86 and ARM do not. As such it is not possible to rely on being able to restore a signal handler created context with setcontext on Linux. It would be interesting to know if any proprietary Unixes support restoring these contexts and if any applications actually use the functionality.

Debugging and profiling libtool binaries

libtool takes care of a lot of the details of building binaries and shared libraries for you, but one of the side effects of this is that until you install your binaries the binaries in your build directory are actually shell script wrappers. Running gdb or profiling tools on the shell script won’t work:

# gdb ./mybinary 
GNU gdb (GDB) Fedora 7.5.1-42.fc18
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
...
"/home/will/mybinary": not in executable format: File format not recognized
(gdb)

However, you can get libtool to help you:

# libtool --mode=execute gdb ./mybinary
GNU gdb (GDB) Fedora 7.5.1-42.fc18
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
...
Reading symbols from /home/will/.libs/lt-mybinary...done.
(gdb)

This trick will also work for other tools like gprof and perf.

ARM C++ constructors are different

Programmers who have, like me, come to ARM from other architectures may find one or two things surprising.

For example, the following code is quite simple C++:

class A {
public:
    A() {}
    ~A() {}
};

A a;

int main(void)
{
    return 0;
}

But if we compile it and examine it with gdb, there’s something a bit unexpected:

$ gcc -g constructor.cpp -o constructor
$ gdb ./constructor 
GNU gdb (GDB) 7.5.91.20130417-cvs-ubuntu
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "arm-linux-gnueabihf".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/linaro/constructor...done.
(gdb) ptype A::A
type = class A {
  public:
    A(void);
    ~A(int);
} *(A * const)
(gdb) ptype A::~A
type = void *(A * const)
(gdb)

The types of the constructor and destructor are not quite what we might expect. Traditionally a C++ constructor or destructor does not return a value, however on ARM things are different – the constructor returns a pointer to class A, and the destructor returns a pointer to void.

Why is this the case? On ARM constructors and destructors are specified differently in order to provide scope for optimizing calls to a chain of constructors or destructors while minimizing the pushing stack frames (tail call optimization). There’s a very helpful document available here called the C++ ABI for the ARM Architecture which details the differences between the ARM ABI and the Generic GNU C++ ABI including this little quirk.