calloc versus malloc and memset

The standard C library provides two ways to allocate memory from the heap, calloc and malloc. They differ superficially in their arguments but more fundamentally in the guarantees they provide about the memory they return – calloc promises to fill any memory returned with zeros but malloc does not.

It would appear that the following two code snippets are equivalent:

/* Allocating with calloc. */
void *ptr = calloc(1024, 1024);
/* Allocating with malloc. */
void *ptr = malloc(1024*1024);
memset(ptr, 0, 1024*1024);

Functionally the two snippets of code are the same – they allocate 1MB of memory full of zeros from the heap – but in one subtle way they may not be.

The standard allocator on Linux is ptmalloc, part of glibc, which for large allocations of 128kB or above may use the mmap system call to allocate memory. The mmap system call returns a mapping of zeroed pages, and ptmalloc avoids calling memset for calloc allocations made with mmap to avoid the overhead involved. So what does this mean?

In order to return an allocation of zeroed pages the kernel plays a trick. Instead of allocating however many pages are required and writing zeros into them, it allocates a single page filled with zeroes, and every time it needs a page full of zeros it maps the same one.

zero_page_mapping

Only one page is allocated initially and subsequent pages are allocated as needed when they are written to (copy on write), saving memory and allowing the allocation to be completed quickly. The allocation starts out looking like the picture on the left, and then as pages are written to would eventually end up looking like the picture on the right.

Normally this behaviour is beneficial to the performance and memory usage of your system however it caught me out when running some code to benchmark memory copy routines. Allocating a source buffer for a copy benchmark with calloc is not the same as using initialized memory, the zero page is a single physical page, so a physically tagged cache can quite comfortably contain the whole of your source buffer even if the buffer you allocated was much larger than the cache.

So in special situations like benchmarking it is best to think carefully before using calloc, but in general it can be a useful tool to improve the memory usage and performance of your code.

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s