Why working with Android AOSP can be frustrating

I’ve spent most of the last year working with the Android Open Source Project (AOSP) in production. Previously I had worked with Android to build proof of concept prototypes and been relatively happy but developing something of production quality with the latest AOSP threw up a few issues that made it a less than enjoyable process.

Build System(s)

The Android build system started out based on make and I always thought it was quite a nice piece of work, at least as nice a system as you can build with make. It was quite slow for some operations, particularly for builds where very little or nothing has changed but it worked and was consistent.

More recently Android has added the soong build system, which is a custom build system used only by Android as far as I can tell. soong is intended to be faster than the make build system but does not replace it. In some ways it is the future of building Android but in other ways it is incredibly deficient. For example, if you want to install a single file into the filesystem (e.g. a configuration file) you can do this easily with an Android.mk but not an Android.bp. This makes it unclear which build system you are supposed to use and gives the whole thing a half-finished air. soong has superficial similarities to Bazel, which is another Google-developed build system, but Bazel wasn’t chosen for Android for reasons that aren’t clear.

If you would like to build an Android app then you will most likely be using Gradle. It’s possible to build apps with other tools like Maven but for modern applications Gradle is fairly standard. However AOSP has no support for building with Gradle, so if you have an Android Studio developed app you want to build as part of your firmware image you need to write custom Android.mk or Android.bp files and keep them in sync with the build.gradle files by hand.

Building system apps is painful

System apps are apps that are signed with the system key and have access to APIs and permissions that are not available to normal apps. For some functionality, for example an OTA update tool, running as a system app is essential. Developing system apps using Android Studio is very painful however.

Accessing system APIs from Android Studio requires modifying your Android SDK install by copying a JAR file over the top of it to make the system APIs visible. This is ugly and not an clean repeatable step for a team of developers or a CI system. Even when you have done this Andriod Studio will still give warnings for accessing various system APIs and permissions and there is no way to say that you are developing a system app and disable these warnings.

Once you have built your app and need to test it there is no easy way to do this either. The standard emulator keys are not available so it is not possible to sign your app with a key that will run as a system app in the emulator. It is theoretically possible to build an emulator image with custom system keys and get Android Studio to use this image somehow but all this seems like a lot of work and customization, and again you want to be able to deploy this for your team and CI system not be making custom hacks on your development machine.

There are other annoyances. If you build an app with Android Studio you will be pushed towards using ConstraintLayout for your UI. It’s faster, it’s more flexible, it’s the future. But it’s not part of the base system, so even if you would like to use it in an app in your firmware image you can’t and you only get to figure this out when you come to try and build it.

Grab bag of random software components

Android has always taken software from a variety of sources and brought it together in the manner of a Linux distribution. This model works well but has costs in terms of keeping your components up to date and the AOSP does not always seem to keep their components up to date. When you rely on a provider of software like Google you have an expectation that the software provided will be reasonably up to date, not just for reasons of security and stability but also whether or not you can integrate further third party components that may rely on newer versions of these underlying components.

As an example protobuf, a Google project, is on version 3.6.1 upstream as of this writing but the version in AOSP identifies as 3.0.0–beta–3. Yes, a beta version in production software that millions of people rely on. The AOSP protobuf implementation also ships with the nano runtime which the upstream protobuf project claims is never released and not to be used. And these are both Google projects!

Further examples of this include zxing, the barcode scanner library. Again, this is a project owned by Google but the version shipped with AOSP is well out of date and is poorly integrated. tinyalsa, the project used to provide sound device access in a number of places seems well behind the upstream too and this caused me some trouble due to a bug in tinycap that had long been fixed upstream.

The overall impression is of a poorly maintained set of software components. I would suggest you could pick any random package from the external directory of the AOSP tree and you will find many of them are several releases behind the upstream. libvorbis for example, is on version 1.3.1 but 1.3.6 was released earlier this year with fixes for three different CVEs. Is this a problem? I don’t know, but I can’t easily prove it is not a problem.

Android is not AOSP

Increasingly Google has been nibbling away at the functionality provided by Android and moving it to Google Play Services. This is really old news, but things like location APIs and push messaging are only available to devices that are certified by Google and have Google Play Services installed. Google are working on making it harder for devices to circumvent this arrangement by sideloading Play Services. microG exists for providing an open source replacement for Play Services but it seems to be a long way from production quality and requires some patches to core security protections of the Android framework to enable it to work.

From the other side SoC vendors (such as Qualcomm) ship a large quantity of add-on code and modifications to the core Android frameworks in order to not just support their hardware but add features. For example, Qualcomm have their own camera, gallery and music apps and large parts of their telephony, camera and media stacks are closed source and shipped as add-ons for system integrators. This means that the majority of Android devices in the field are actually running custom implementations of the core apps and frameworks, causing compatibility issues and diluting the benefits of the open source project. The Qualcomm deliverables are not publically available so producing a fully functional open source image is very difficult and keeping it up to date is almost impossible.

Google and Linaro are doing some work to integrate Qualcomm’s huge pile of changes into the mainline of AOSP and Linux, but this only benefits a small number of SoCs and progress so far has been understandably slow.

Conclusion

Working with AOSP to build products is a frustrating process, particularly if you work for a smaller company that has little leverage with Google or Qualcomm. Google have to be commended for the amount of code they have released as part of AOSP and the work they do in maintaining and supporting it, however there is an opportunity for them to do much better.

Avoiding audio glitching with ALSA USB Audio

Recently I was trying to diagnose an issue with recording audio via a USB to I2S bridge. We had a Silicon Labs CP2114 connected to a reasonably powerful quad-core Cortex-A7 processor running Android via USB with I2S audio coming into the bridge from an external TI codec. Everything seemed to be working fine – we could play audio back and we could record it – but after a while I noticed our recorded audio contained very occasional small clicks. At the native 48kHz frequency of the CP2114 this was very rarely noticeable but when down-sampled to lower frequencies in software it became much more noticeable.

Closer analysis of the clicks revealed discontinuities in the audio sample stream leading me to suspect there was some kind of buffer overrun condition occurring – my theory was the application (or audio subsystem) was failing to read the sample data quickly enough and samples were being dropped on the floor. That type of problem should be relatively easy to fix. First, I checked the system was not overloaded. It was quite easy to see with top that it wasn’t and we had four relatively powerful cores mostly idle. This made me suspect that maybe there was a problem with latency rather than throughput, so I tried adding more buffers, to give the software more time to hit its deadlines. However the glitches remained, very puzzling!

At this point I decided that taking random stabs in the dark was not working so I should go and try to gather some data. I added logging at various layers from the kernel, audioflinger and the Android audio HAL but no problems were reported. I found a patch to the Linux USB audio driver that fixed xrun reporting and applied that. Still nothing.

So what could be the problem? I dug a bit further into the USB Audio Class specification and noticed that there were a couple endpoint descriptors that looked relevant. These were bInterval and wMaxPacketSize. The CP2114 has a fixed configuration of 48kHz sample rate, stereo with 16bit samples. This means that the data rate is 192,000 bytes per second. The value of bInterval was set to 1ms and the value of wMaxPacketSize was set to 192, which makes sense – 1000 packets of 192 bytes adds up to 192,000 bytes per second. This got me thinking about the size of the audio buffers we were sending to the kernel.

We had a very simple custom Android audio HAL that set a period size of 1024 bytes, which is a pretty large buffer but does not divide by the 192 byte packet size of the hardware. As an experiment I tried reducing this to 768 bytes (192 * 4) and the audio glitches vanished! This was surprising to me as the ALSA audio device advertised a wide range of period sizes and no problems were being reported by any of the software involved. Luckily the fix was simple in the end but the process of getting there involved more luck than science.

Thoughts on Dell XPS 13 as a developer laptop

Recently I started using a Dell XPS 13 9360. It has an i7 CPU and 256GB SSD so it seemed like it would be a great fit for doing development work and a significant upgrade over my old laptop which was a Lenovo Thinkpad Carbon X1 with an i5.

I chose the XPS 13 based on the pretty much unanimously glowing reviews to be found online:

Based on these reviews it seemed like a no-brainer to pick the XPS 13 over the equivalent Thinkpad Carbon X1 which was several hundred pounds more expensive.

The Dell sales experience is always pretty slick but once I got hold of the laptop and started trying to do real work with it my experience pretty quickly started to sour. Now I don’t want to suggest it’s a bad product, it clearly packs a lot of modern technology into a small package and is in many ways better specced than the Lenovo product I was looking at. The screen in particular is very nice, but beyond that I don’t have too much positive to say about it.

The Keyboard

This is the biggest problem for me. The keyboard is just nowhere near as good as the Lenovo. The key response is soft and the travel is low.

The two images above show the depth of the keys on the two laptops (Dell left, Lenovo right). The images don’t give an indication of the travel on the keys – the Lenovo is much firmer with a longer travel which to my hands is much more comfortable to type on. You can also see the Ctrl and Fn keys are swapped. This is an arbitrary choice but the Ctrl key on the Dell is also considerably smaller than the Lenovo and as an emacs user I really much prefer a larger Ctrl key as I’m hitting it a lot.

On the other side of the keyboard there is another awkward design choice. The PgUp and PgDn keys on the Dell require the Fn key to be pressed rather than standalone keys which makes them, in my opinion, quite useless. There’s even space on the keyboard that could have been used for physical keys like on the Thinkpad so this seems an odd choice.

When I’m working at home I use a Dell wireless keyboard which is something of an improvement over the laptop keyboard but still not in the same class as, for example, the Microsoft keyboards.

USB-C

The only graphical output available is USB-C. I bought a Dell monitor with the laptop but that doesn’t come with a usable cable. Adapter cables are available but they can be expensive so it seems unfortunate that Dell don’t help their customers with this. Technically USB-C is a more flexible connector than, for example, HDMI but this doesn’t seem to be practically useful with the current range of adapters on the market.

If you do presentations then you will also need to make sure you carry the right kind of adapter with you. Most venues will have adapters of HDMI and DisplayPort but I have yet to find anywhere that provides USB-C.

Webcam

The webcam is positioned underneath the screen, just above the Esc key. This is a really awkward place for a camera for video conferencing – it means that you will need to push your screen quite far back otherwise the picture is of your chest and even then the angle is quite odd with your correspondent getting the feeling of looking up your nose.

Performance

Going from a four year old laptop to a new one and switching from an i5 to i7 I was expecting a serious performance boost. However it doesn’t seem like in practical terms this actually happens, probably due to thermal throttling. My dmesg is full of messages like:

[1847368.552137] CPU1: Core temperature above threshold, cpu clock throttled (total events = 97580)
[1847368.552138] CPU3: Core temperature above threshold, cpu clock throttled (total events = 97581)
[1847368.552139] CPU2: Package temperature above threshold, cpu clock throttled (total events = 116408)
[1847368.552140] CPU0: Package temperature above threshold, cpu clock throttled (total events = 116405)
[1847368.552142] CPU3: Package temperature above threshold, cpu clock throttled (total events = 116407)
[1847368.552145] CPU1: Package temperature above threshold, cpu clock throttled (total events = 116405)

The fan is also very prone to coming on and is quite loud, so you have to get used to that noise if you’re going to do any number of builds.

Build Quality

Overall the Dell build quality feels weaker than the Lenovo. It starts with the body which is a mix of metal and plastic where the Carbon X1 feels like more of a single piece of material. The keys on the Dell are soft and slightly loose adding to the plasticky feel. The USB-C seems to be implicated in some singing capacitors – large amounts of output on the screen can be heard as a high-pitched tone.

I also am not much of a fan of the Dell power plugs (on the left, Lenovo right). For a UK plug they are very thin and the triangular shape makes them difficult to hold and apply any force to. It’s a small detail but I would much rather have the functional and solid plug that Lenovo supply.

Conclusion

I’ve raised a number of issues I have with the XPS 13 but it is still a decent laptop that packs a lot of modern tech in at a competitive price. However it still feels to me like a high-end mid-range laptop rather than a genuine contender to compete with the Thinkpad Carbon X1 or the Apple laptops and if you have the budget I would definitely recommend the Thinkpad even if the spec is a little bit lower.

Debugging memory leaks in Python

Recently I noticed a Python service on our embedded device was leaking memory. It was a long running service with a number of library dependencies in Python and C and contained thousands of lines of code. I don’t consider myself a Python expert so I wasn’t entirely sure where to start.

After a quick Google search I turned up a number of packages that looked like they might help me but none of them turned out to be quite what was needed. Some were too basic and just gave an overall summary of memory usage, when I really wanted to see counts of which objects were leaking. Guppy hasn’t been updated in some time and seems quite complex. Pympler seemed to work but with an excessive amount of runtime overhead that made it impractical on a resource constrained system. At this point I was close to giving up on the tools and embarking on a time consuming code review to try and track down the problem. Luckily before I disappeared down that rabbit hole I came across the tracemalloc package.

Tracemalloc is a package added in Python 3.4 that allows tracing of allocations from Python. It’s part of the standard library so it should be available anywhere there is a modern Python installation. It has functionality to provide snapshots of object counts on the heap which was just what I needed. There is some runtime overhead which results in spikes in CPU time, presumably when walking the heap, but it seems more efficient than the alternatives.

An example of how you can use the tracemalloc package:

import time
import tracemalloc

snapshot = None

def trace_print():
    global snapshot
    snapshot2 = tracemalloc.take_snapshot()
    snapshot2 = snapshot2.filter_traces((
        tracemalloc.Filter(False, "<frozen importlib._bootstrap>"),
        tracemalloc.Filter(False, "<unknown>"),
        tracemalloc.Filter(False, tracemalloc.__file__)
    ))
    
    if snapshot is not None:
        print("================================== Begin Trace:")
        top_stats = snapshot2.compare_to(snapshot, 'lineno', cumulative=True)
        for stat in top_stats[:10]:
            print(stat)
    snapshot = snapshot2

l = []

def leaky_func(x):
    global l
    l.append(x)

if __name__=='__main__':
    i = 0
    tracemalloc.start()
    while True:
        leaky_func(i)
        i += 1
        time.sleep(1)
        if i % 10 == 0:
            trace_print()

This should print a snapshot every 10 seconds of the state of the heap such as the one below:

leak.py:27: size=576 B (+112 B), count=1 (+0), average=576 B
/usr/lib64/python3.5/posixpath.py:52: size=256 B (+64 B), count=4 (+1), average=64 B
/usr/lib64/python3.5/re.py:246: size=4672 B (+0 B), count=73 (+0), average=64 B
/usr/lib64/python3.5/sre_parse.py:528: size=1792 B (+0 B), count=28 (+0), average=64 B
/usr/lib64/python3.5/sre_compile.py:553: size=1672 B (+0 B), count=4 (+0), average=418 B
/usr/lib64/python3.5/sre_parse.py:72: size=736 B (+0 B), count=4 (+0), average=184 B
/usr/lib64/python3.5/fnmatch.py:70: size=704 B (+0 B), count=4 (+0), average=176 B
/usr/lib64/python3.5/sre_parse.py:524: size=560 B (+0 B), count=1 (+0), average=560 B

The numbers can be interpreted as follows:

size, the total size of allocations at this call site
count, the total number of allocations at this call site
average, the average size of allocations at this call site

The numbers in practices indicate the amount by which the value increased or decreased since the last snapshot. So to look for our memory leak we would expect to see a positive number in the brackets next to size and count. In the example above we can see there is a positive count next to leak.py line 27 which matches up with our leaky function.

Go Toolchain Primer

A toolchain is a package composed of the compiler and ancillary tools, libraries and runtime for a language which together allow you to build and run code written in that language. The GNU toolchain is the most commonly used toolchain on Linux and allows building programs written C, C++, Fortran and a host of other languages too.

gc

The first Go toolchain to be made available, and the one most people are referring to when they talk about Go, is gc. gc, (which is not to be confused with GC, which usually refers to the garbage collector) is the compiler and toolchain which evolved from the Plan 9 toolchain and includes its own compiler, assembler, linker and tools, as well as the Go runtime and standard library. With Go 1.5 the parts of the toolchain that were written in C have been rewritten in Go so the Plan 9 legacy is gone in terms of code but remains in spirit. The toolchain supports i386, x86_64, arm, arm64 and powerpc64 and the code is BSD licensed.

gccgo

gccgo extends the gcc project to support Go. gcc is widely used compiler that along with GNU binutils for the linker and assembler supports a large number of processor architectures. gccgo currently only supports Go 1.4, but Go 1.5 support is in the works. The advantage of being able to use the gcc compiler infrastructure is that as well as supporting more processor architectures than gc, gccgo can take advantage of the more advanced middle-end and backend optimizations that gcc has developed over the years which could lead to faster generated code. The GNU toolchain and gccgo are GPLv3 licensed which some people may find problematic, but it is what many of the Linux distributions use to support Go on architectures not supported by gc like SPARC, S/390 or MIPS.

llgo

LLVM is a compiler infrastructure project similar to gcc, with a couple of key differences. Firstly it was developed from the ground up in C++ at a much later date than gcc so the code is generally more modern in style and has a clearer structure. Secondly it is BSD licensed, which has attracted a number of large companies such as Apple and Google to get heavily involved in it (in fact Apple employs the project’s founder). llgo is a Go compiler built on top of the LLVM compiler infrastructure. It is BSD licensed and supports nearly as many architectures as gccgo but feels like a less mature project and fewer people seem to be using it, at least publically.

Why so many toolchains?

One thing that may appear as odd is that all three toolchains are predominantly developed by Google engineers. All the toolchains contain some of the same components – the runtime and libraries are largely shared between all the projects, gccgo and llgo share a compiler frontend (language parser) and all the compilers are similarly ahead of time and generate native code. Perhaps Google feels like diversity of implementations is good for the language – I would be inclined to agree – and it looks like Google is spending their efforts relatively sparingly on gccgo and llgo so the development cost of that strategy may not be that high.

I would suggest most people should just stick with gc for their Go development needs, but it will be interesting to see in which directions the various toolchains develop. In later posts I will go into a bit more depth about the relative pros and cons of each.

Channel buffering in Go

Go provides channels as a core concurrency primitive. Channels can be used to pass objects between goroutines safely and can be used to construct quite complex concurrent data structures, however they are still quite a low-level construct and it is not always clear how to use them best.

I was reminded of this when I saw Evan Huus’s excellent talk on Complex Concurrency Patterns in Go at the Golang UK conference. One of the points he made was that on his team infinite buffering was considered a “code smell”. This is something I find interesting, particularly as someone who has written a bit of Erlang code. In the Erlang concurrency model there are no channels but every process (processes in Erlang are lightweight, like goroutines) has an input message queue, which is effectively unbounded in size. In Go channels are first class objects and must provide a finite buffer size on construction, the default value of which is zero.

On the face of it the Go approach is appealing. Nobody wants runaway resource allocation in their program and Go provides a useful way of, for example, throttling excessively fast producers in a producer-consumer system to prevent them filling up memory with unprocessed data. But how large should you make channel buffers?

Unbuffered channels are the default in Go but can have some undesirable side effects. For example a producer and consumer connected by an unbuffered channel can cause reduced parallelism as the producer blocks waiting for the consumer to finish working and the consumer blocks waiting for the producer to produce output, for example in the following code increasing the channel buffer from zero to one will cause a speedup, at least on my machine:

package main

import (
	"fmt"
	"math/rand"
	"time"
)

func producer(ch chan bool) {
	for i := 0; i < 100000; i++ {
		time.Sleep(time.Duration(10 * rand.Float64() * float64(time.Microsecond)))
		ch <- true
	}
	close(ch)
}

func consumer(ch chan bool) {
	for _ = range ch {
		time.Sleep(time.Duration(10 * rand.Float64() * float64(time.Microsecond)))
	}
}

func unbuffered() {
	ch := make(chan bool, 0)
	go producer(ch)
	consumer(ch)
}

func buffered() {
	ch := make(chan bool, 1)
	go producer(ch)
	consumer(ch)
}

func main() {
	startTime := time.Now()
	unbuffered()
	fmt.Printf("unbuffered: %vn", time.Since(startTime))
	startTime = time.Now()
	buffered()
	fmt.Printf("buffered: %vn", time.Since(startTime))
}

Unbuffered channels are also more prone to deadlock. It can be argued whether this is a good thing or bad thing – deadlocks that may not be apparent with buffered channels become visible with unbuffered channels which allows them to be fixed and the behaviour of an unbuffered system is generally simpler and easier to reason about.

So if we decide we need to create a buffered channel, how large should that buffer be? That question turns out to be pretty hard to answer. Channel buffers are allocated with malloc at when the channel is created, so the buffer size is not an upper bound on the size of the buffer but the actual allocated size – the larger the buffer size the more system memory it will consume and the worse cache locality it will have. This means we can’t just use a very large number and treat the channel as if it was infinitely buffered.

In the example below, which is inspired by an open source project, we have a logging API that internally has a channel that is used to pass log messages from the API entry point to the log backend which writes the log messages to a file or network based on the configuration:

type Logger struct {
	logChan chan string
}

func NewLogger() *Logger {
	return &Logger{logChan: make(chan string, 100)}
}

func (l *Logger) WriteString(str string) {
	l.logChan <- str
}

In this case the channel is providing two things. Firstly, it is providing synchronization so multiple goroutines can call into the API safely at the same time. This is a fairly uncontentious use of a channel. Secondly, it is providing a buffer to prevent callers of the API from blocking unduly when many logging calls are made around the same time – it won’t increase throughput but will help if logging is bursty. But is this a good use of a channel? And is 100 the right size to use?

The two questions are somewhat intertwined in my opinion. Channels make good buffers for small buffer sizes. As mentioned above, the storage for a channel is statically allocated so a large buffer can more space efficiently be implemented in other ways. The right size of a buffer depends on many things, in this example the number of goroutines logging, how bursty the logging is, the tolerance for blocking, the size of the machine memory and probably many others. For this reason the API should provide a way of setting the size of the channel buffer so users can adapt it to their needs rather than hardcoding the size.

So a good size for a buffer could be zero, one or some other small number – say five or less – or a variable size that can be set by the API but it is unlikely, in my opinion, to be a large constant value like 100. Infinite buffer sizes, that is a buffer only limited by available memory, are sometimes useful and not something we should dismiss altogether although they are obviously not going to be possible to implement directly with channels as provided by Go. It is possible to create a buffer goroutine with an input channel and an output channel that can implement a memory efficient variable-sized buffer, for example as Evan does here and this is the best choice if you expect your buffering requirements to be large.

Using SystemTap userspace static probes

One of the new features in glibc 2.19 was a set of SystemTap static probes in the malloc subsystem to allow a better view into its inner workings. SystemTap static probe points expand to only a single nop instruction when not enabled and take a fixed number of arguments which are passed to your SystemTap probe as arguments. I wanted to use these probes to analyze the performance of a malloc workload, so I wrote a SystemTap script to log events in the malloc subsystem.

To get this script to work on Fedora 20 I had to install the git version of SystemTap or some of the probes failed to parse
their arguments correctly. The script can be run like this:

# stap malloc.stp -c /usr/bin/ls

It’s also possible to run this script using non-installed version of glibc if you modify the globs in the script to match the path to your libc and run it with the appropriate library path:

# stap malloc.stp -c "env 'LD_LIBRARY_PATH=.../glibc-build:.../glibc-build/nptl' /usr/bin/ls"

The script is very simple and just prints a timestamp, the name of the probe point and the arguments but I hope someone will find it useful.

probe process("/lib*/libc.so.*").mark("memory_heap_new") { printf("%d:memory_heap_new heap %x size %dn", gettimeofday_ms(), $arg1, $arg2) }


probe process("/lib*/libc.so.*").mark("memory_heap_more") {

  printf("%d:memory_heap_more heap %x size %dn",

	 gettimeofday_ms(), $arg1, $arg2)

}
probe process("/lib*/libc.so.*").mark("memory_heap_less") {

  printf("%d:memory_heap_less heap %x size %dn",

	 gettimeofday_ms(), $arg1, $arg2)

}
probe process("/lib*/libc.so.*").mark("memory_heap_free") {

  printf("%d:memory_heap_free heap %x size %dn",

	 gettimeofday_ms(), $arg1, $arg2)

}
probe process("/lib*/libc.so.*").mark("memory_arena_new") {

  printf("%d:memory_arena_new arena %x size %dn",

	 gettimeofday_ms(), $arg1, $arg2)

}
probe process("/lib*/libc.so.*").mark("memory_arena_reuse_free_list") {

  printf("%d:memory_arena_reuse_free_list free_list %xn",

	 gettimeofday_ms(), $arg1)

}
probe process("/lib*/libc.so.*").mark("memory_arena_reuse_wait") {

  printf("%d:memory_arena_reuse_wait mutex %d arena %x avoid_arena %xn",

	 gettimeofday_ms(), $arg1, $arg2, $arg3)

}
probe process("/lib*/libc.so.*").mark("memory_arena_reuse") {

  printf("%d:memory_arena_reuse arena %x avoid_arena %xn",

	 gettimeofday_ms(), $arg1, $arg2)

}
probe process("/lib*/libc.so.*").mark("memory_arena_retry") {

  printf("%d:memory_arena_retry arena %x bytes %dn",

	 gettimeofday_ms(), $arg2, $arg1)

}
probe process("/lib*/libc.so.*").mark("memory_sbrk_more") {

  printf("%d:memory_sbrk_more brk %x change %dn",

	 gettimeofday_ms(), $arg1, $arg2)

}
probe process("/lib*/libc.so.*").mark("memory_sbrk_less") {

  printf("%d:memory_sbrk_less brk %x change %dn",

	 gettimeofday_ms(), $arg1, $arg2)

}
probe process("/lib*/libc.so.*").mark("memory_malloc_retry") {

  printf("%d:memory_malloc_retry bytes %dn",

	 gettimeofday_ms(), $arg1)

}
probe process("/lib*/libc.so.*").mark("memory_mallopt_free_dyn_thresholds") {

  printf("%d:memory_mallopt_free_dyn_thresholds mmap %d trim %dn",

	 gettimeofday_ms(), $arg1, $arg2)

}
probe process("/lib*/libc.so.*").mark("memory_realloc_retry") {

  printf("%d:memory_realloc_retry bytes %d oldmem %xn",

	 gettimeofday_ms(), $arg1, $arg2)

}
probe process("/lib*/libc.so.*").mark("memory_memalign_retry") {

  printf("%d:memory_memalign_retry bytes %d alignment %dn",

	 gettimeofday_ms(), $arg1, $arg2)

}
probe process("/lib*/libc.so.*").mark("memory_calloc_retry") {

  printf("%d:memory_calloc_retry bytes %dn",

	 gettimeofday_ms(), $arg1)

}
probe process("/lib*/libc.so.*").mark("memory_mallopt") {

  printf("%d:memory_mallopt param %d value %dn",

	 gettimeofday_ms(), $arg1, $arg2)

}
probe process("/lib*/libc.so.*").mark("memory_mallopt_mxfast") {

  printf("%d:memory_mallopt_mxfast new %d old %dn",

	 gettimeofday_ms(), $arg1, $arg2)

}
probe process("/lib*/libc.so.*").mark("memory_mallopt_trim_threshold") {

  printf("%d:memory_mallopt_trim_threshold new %d old %d dyn_threshold %dn",

	 gettimeofday_ms(), $arg1, $arg2, $arg3)

}
probe process("/lib*/libc.so.*").mark("memory_mallopt_top_pad") {

  printf("%d:memory_mallopt_top_pad new %d old %d dyn_threshold %dn",

	 gettimeofday_ms(), $arg1, $arg2, $arg3)

}
probe process("/lib*/libc.so.*").mark("memory_mallopt_mmap_threshold") {

  printf("%d:memory_mallopt_mmap_threshold new %d old %d dyn_threshold %dn",

	 gettimeofday_ms(), $arg1, $arg2, $arg3)

}
probe process("/lib*/libc.so.*").mark("memory_mallopt_mmap_max") {

  printf("%d:memory_mallopt_mmap_max new %d old %d dyn_threshold %dn",

	 gettimeofday_ms(), $arg1, $arg2, $arg3)

}
probe process("/lib*/libc.so.*").mark("memory_mallopt_check_action") {

  printf("%d:memory_mallopt_check_action new %d old %dn",

	 gettimeofday_ms(), $arg1, $arg2)

}
probe process("/lib*/libc.so.*").mark("memory_mallopt_perturb") {

  printf("%d:memory_mallopt_perturb new %d old %dn",

	 gettimeofday_ms(), $arg1, $arg2)

}
probe process("/lib*/libc.so.*").mark("memory_mallopt_arena_test") {

  printf("%d:memory_mallopt_arena_test new %d old %dn",

	 gettimeofday_ms(), $arg1, $arg2)

}

probe process("/lib*/libc.so.*").mark("memory_mallopt_arena_max") { printf("%d:memory_mallopt_arena_max new %d old %dn", gettimeofday_ms(), $arg1, $arg2) }

Canon PIXMA MG6350 drivers for Linux

I recently bought a Canon PIXMA MG6350 printer for my home office. Before buying it I found Canon had a set of drivers available for Linux so assumed it was reasonably well supported. However the binary packages available from the Canon support site had out of date dependencies for Fedora 20 so weren’t installable, but there was a source package available so I grabbed that.

On the positive side Canon have provided a mostly GPL CUPS printer driver package for Linux, which is to be commended, but unfortunately it doesn’t build out of the box on modern systems and contains a handful of proprietary binary libraries. I spent a bit of time hacking it to build and fix some compile warnings and pushed the result to github:

https://github.com/willnewton/cnijfilter

The following commands will build an RPM for the MG6350, you need to modify it slightly for other printers in the family:

# git archive --prefix=cnijfilter-source-3.80-2/ -o ~/rpmbuild/SOURCES/cnijfilter-source-3.80-2.tar.gz HEAD # rpmbuild -ba cnijfilter-common.spec --define="MODEL mg6300" --define="MODEL_NUM 408" --with build_common_package

As I mentioned above unfortunately there are some binary libraries in the package which seem to be essential, and the code quality in general seems pretty poor. There are a number of compiler warnings still that show up moderately serious issues with the bits of code in question. There’s a lot of copy and paste reuse and the code is full of fixed size buffers and dangerous assumptions. It lets me print documents from my laptop so I am not entirely unhappy, although it would be nice if Canon would engage with the community on getting these drivers fully open sourced and integrated properly into CUPS.

setcontext and signal handlers

setcontext is a C library function that along with getcontext allows you to perform a non-local jump from one context to another. They are often used when implementing coroutines or custom threading libraries. longjmp and setjmp provide similar functionality but setcontext was an attempt to fix the shortcomings of these functions and standardize behaviour, although in POSIX 2008 the specification of setcontext and related functions were removed due to the difficulty of implementing them in a truly portable manner.

In the beginning there was setjmp and longjmp. setjmp would capture the current register values into a data structure and longjmp would restore those values at a later point in program execution, causing the control flow to jump back to the point where setjmp was called. This works fine unless the place where you are jumping from is a signal handler. In this case the problem you have is that setjmp will not restore the signal mask so the signal you were handling will not be unmasked. To fix this functions that saved and restored the signal mask called sigsetjmp and siglongjmp were introduced.

However, this doesn’t mean it is necessarily safe to jump out of a signal handler even if you are using siglongjmp. The problem you will often hit is that if the signal was delivered at an arbitrary point in program execution there may be locks or other global resource that need to be deallocated. The only way to avoid this is to block the appropriate signal across any part of the code that may not behave well when interrupted in this way. Unfortunately without auditing all third party libraries that probably means you can only enable handling of such a signal across very small regions of your program.

setcontext and getcontext also restore and save the signal mask and setcontext can also be used to exit from a signal handler with the caveats expressed above. However there is another way in which signal handling and setcontext interact. The context structure used by setcontext and getcontext is of type ucontext_t. If you installed your signal handler using sigaction and the sa_handler field of struct sigaction you get a third argument to the signal handler which is of type void * but allows casting to ucontext_t *. So what happens if you pass this structure to setcontext?

Well the answer is, on Linux at least, “probably nothing good”. The original definition of setcontext specified that “program execution continues with the program instruction following the instruction interrupted by the signal”, but more recent versions of the standards removed that requirement so the result is now unspecified. Some glibc ports such as powerpc, mips and tile do support restoring signal handler created contexts in the spirit of the original specification, but the rest, including x86 and ARM do not. As such it is not possible to rely on being able to restore a signal handler created context with setcontext on Linux. It would be interesting to know if any proprietary Unixes support restoring these contexts and if any applications actually use the functionality.

Debugging and profiling libtool binaries

libtool takes care of a lot of the details of building binaries and shared libraries for you, but one of the side effects of this is that until you install your binaries the binaries in your build directory are actually shell script wrappers. Running gdb or profiling tools on the shell script won’t work:

# gdb ./mybinary 
GNU gdb (GDB) Fedora 7.5.1-42.fc18
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
...
"/home/will/mybinary": not in executable format: File format not recognized
(gdb)

However, you can get libtool to help you:

# libtool --mode=execute gdb ./mybinary
GNU gdb (GDB) Fedora 7.5.1-42.fc18
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
...
Reading symbols from /home/will/.libs/lt-mybinary...done.
(gdb)

This trick will also work for other tools like gprof and perf.