Recently I was trying to diagnose an issue with recording audio via a USB to I2S bridge. We had a Silicon Labs CP2114 connected to a reasonably powerful quad-core Cortex-A7 processor running Android via USB with I2S audio coming into the bridge from an external TI codec. Everything seemed to be working fine – we could play audio back and we could record it – but after a while I noticed our recorded audio contained very occasional small clicks. At the native 48kHz frequency of the CP2114 this was very rarely noticeable but when down-sampled to lower frequencies in software it became much more noticeable.
Closer analysis of the clicks revealed discontinuities in the audio sample stream leading me to suspect there was some kind of buffer overrun condition occurring – my theory was the application (or audio subsystem) was failing to read the sample data quickly enough and samples were being dropped on the floor. That type of problem should be relatively easy to fix. First, I checked the system was not overloaded. It was quite easy to see with top that it wasn’t and we had four relatively powerful cores mostly idle. This made me suspect that maybe there was a problem with latency rather than throughput, so I tried adding more buffers, to give the software more time to hit its deadlines. However the glitches remained, very puzzling!
At this point I decided that taking random stabs in the dark was not working so I should go and try to gather some data. I added logging at various layers from the kernel, audioflinger and the Android audio HAL but no problems were reported. I found a patch to the Linux USB audio driver that fixed xrun reporting and applied that. Still nothing.
So what could be the problem? I dug a bit further into the USB Audio Class specification and noticed that there were a couple endpoint descriptors that looked relevant. These were bInterval and wMaxPacketSize. The CP2114 has a fixed configuration of 48kHz sample rate, stereo with 16bit samples. This means that the data rate is 192,000 bytes per second. The value of bInterval was set to 1ms and the value of wMaxPacketSize was set to 192, which makes sense – 1000 packets of 192 bytes adds up to 192,000 bytes per second. This got me thinking about the size of the audio buffers we were sending to the kernel.
We had a very simple custom Android audio HAL that set a period size of 1024 bytes, which is a pretty large buffer but does not divide by the 192 byte packet size of the hardware. As an experiment I tried reducing this to 768 bytes (192 * 4) and the audio glitches vanished! This was surprising to me as the ALSA audio device advertised a wide range of period sizes and no problems were being reported by any of the software involved. Luckily the fix was simple in the end but the process of getting there involved more luck than science.