Debugging memory leaks in Python

Recently I noticed a Python service on our embedded device was leaking memory. It was a long running service with a number of library dependencies in Python and C and contained thousands of lines of code. I don’t consider myself a Python expert so I wasn’t entirely sure where to start.

After a quick Google search I turned up a number of packages that looked like they might help me but none of them turned out to be quite what was needed. Some were too basic and just gave an overall summary of memory usage, when I really wanted to see counts of which objects were leaking. Guppy hasn’t been updated in some time and seems quite complex. Pympler seemed to work but with an excessive amount of runtime overhead that made it impractical on a resource constrained system. At this point I was close to giving up on the tools and embarking on a time consuming code review to try and track down the problem. Luckily before I disappeared down that rabbit hole I came across the tracemalloc package.

Tracemalloc is a package added in Python 3.4 that allows tracing of allocations from Python. It’s part of the standard library so it should be available anywhere there is a modern Python installation. It has functionality to provide snapshots of object counts on the heap which was just what I needed. There is some runtime overhead which results in spikes in CPU time, presumably when walking the heap, but it seems more efficient than the alternatives.

An example of how you can use the tracemalloc package:

import time
import tracemalloc

snapshot = None

def trace_print():
    global snapshot
    snapshot2 = tracemalloc.take_snapshot()
    snapshot2 = snapshot2.filter_traces((
        tracemalloc.Filter(False, "<frozen importlib._bootstrap>"),
        tracemalloc.Filter(False, "<unknown>"),
        tracemalloc.Filter(False, tracemalloc.__file__)
    ))
    
    if snapshot is not None:
        print("================================== Begin Trace:")
        top_stats = snapshot2.compare_to(snapshot, 'lineno', cumulative=True)
        for stat in top_stats[:10]:
            print(stat)
    snapshot = snapshot2

l = []

def leaky_func(x):
    global l
    l.append(x)

if __name__=='__main__':
    i = 0
    tracemalloc.start()
    while True:
        leaky_func(i)
        i += 1
        time.sleep(1)
        if i % 10 == 0:
            trace_print()

This should print a snapshot every 10 seconds of the state of the heap such as the one below:

leak.py:27: size=576 B (+112 B), count=1 (+0), average=576 B
/usr/lib64/python3.5/posixpath.py:52: size=256 B (+64 B), count=4 (+1), average=64 B
/usr/lib64/python3.5/re.py:246: size=4672 B (+0 B), count=73 (+0), average=64 B
/usr/lib64/python3.5/sre_parse.py:528: size=1792 B (+0 B), count=28 (+0), average=64 B
/usr/lib64/python3.5/sre_compile.py:553: size=1672 B (+0 B), count=4 (+0), average=418 B
/usr/lib64/python3.5/sre_parse.py:72: size=736 B (+0 B), count=4 (+0), average=184 B
/usr/lib64/python3.5/fnmatch.py:70: size=704 B (+0 B), count=4 (+0), average=176 B
/usr/lib64/python3.5/sre_parse.py:524: size=560 B (+0 B), count=1 (+0), average=560 B

The numbers can be interpreted as follows:

  • size, the total size of allocations at this call site
  • count, the total number of allocations at this call site
  • average, the average size of allocations at this call site

The numbers in practices indicate the amount by which the value increased or decreased since the last snapshot. So to look for our memory leak we would expect to see a positive number in the brackets next to size and count. In the example above we can see there is a positive count next to leak.py line 27 which matches up with our leaky function.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s