Sunday, September 02, 2012

Audio over USB: back to the age of V42bis modems?

A long time ago I used to run a BBS.  When I first started, I only had a 2400 bps modem.  This modem was connected to an ISA slot on the motherboard, and you configured the COM port via jumpers.  Naturally, on the software side, the serial port was set to the maximum speed of the modem, i.e. 2400 bps.  Because of how the hardware interface worked, this meant the CPU was going to be interrupted 300 times per second as transmission happened.  Bytes moved one IRQ at a time.

These old machines, basically a 386 running DOS, could deal with this for the most part.  Then came modems that had a baseline 9600 bps transmission rate, plus two major improvements: V42, and V42bis.  The first one introduced packetized transmissions between the modems, with retries and everything, such that line noise and data corruption would (for the most part) become invisible to the modem user.  The second one introduced data compression of the Lempel-Ziv variety, and could achieve a maximum compression ratio of 4.  In other words, whereas modems had a physical data rate of 9600 bps and would at first glance require a CPU interrupt rate of 1200 times per second, this number of interrupts would be insufficient to deal with compressed data because the CPU would not be able to take data off of the modem's hands quickly enough.  Since the compression ratio could reach 4, then you would set the modem's serial port to 38400 bps to avoid defeating V42bis.

Ah, those computers.  There was no way they could handle 4800 serial IRQs per second plus writing whatever to disk, or paying attention to the keyboard, or whatever else the software was doing.  Bytes started getting overwritten at the serial port interface, and data errors started to become visible to applications.  In other words, since the hardware wasn't fast enough, by trying to avoid defeating V42bis with a higher serial port speed, you would defeat V42 as well.

The solution to this problem was to introduce the 16550 UART to replace the 8250 UART.  The 16550 came in with a FIFO buffer capable of holding (OOoooohhh!) 16 bytes.  In other words, the modem could send up to 16 bytes to the FIFO buffer, and no bits would be lost as long as the CPU eventually got to emptying the buffer.  This required some changes in applications to take advantage of the new hardware, and solved dropped byte problems completely.  Now you could set your serial port to 115200 or even 230400 bps if you wanted, and things would continue to work just fine.  Nice, isn't it?

Let's fast forward 20 years.  We have vastly more powerful computers these days, capable of emulating a whole DOS machine in software without breaking a sweat.  And we still seem to have not learned the FIFO lesson.  It's quite embarrassing, frankly.  The main symptom I see is that you cannot use USB for audio.  But why, right?  USB 2.0 can easily move 30 megabytes per second to that external hard drive, what could a couple hundred kilobytes per second do to it?  Well, on OS X in particular, a lot.

When things don't work on OS X, what you hear are electronic glitches, cutoffs, pops and other artifacts that do not belong.  Sometimes audio gets in a state where it doesn't even recover.  So for example, if you are in a Skype conversation, your voice may become garbled until you hang up and call again.  Something similar may happen with USB microphones in e.g. GarageBand, where audio input will sound incorrectly no matter what you do until you unplug the USB device and plug it again.  In every day use, these problems come and go without an apparent pattern.  However, there are hints that what is behind these issues is the old serial FIFO buffer problem from 20 years ago which we still have not learned to fix.  See for example this, this, this, and this.  Google has an exhaustive etcetera available as well.  In those, we see plenty of discussion about buffer sizes and buffer underruns.  We're still discussing how large the FIFO buffer should be.  In other words,

  • We still have not fixed the decades old FIFO problem, and
  • We're just asking the user to fix it instead of providing a proper solution.
The above means that at no point you can be absolutely sure that the magic number you put in some configuration dialog box will be enough to stop the FIFO buffer problem.  This is even with today's super powerful machines.  Moreover, since you could conceivably experience artifacts that are not immediately obvious, and since there is no monitor that can tell you e.g. when and by how much the buffer was insufficient, you're left in the dark in two ways.
  • If you cannot prove there was a buffer issue beyond any reasonable doubt, then you cannot say none occurred, and
  • You are not given the tools to prove or disprove that there were buffer issues.
But it gets better.  Under OS X, it is up to applications to set their buffer size if they want something else.  There is no configuration dialog to change the default buffer size which, according to pages such as this, is tiny.  Sometimes, it's on the order of a few hundred bytes.  Consequently, that means sending hundred of kilobytes per second a few hundred bytes at a time.  Are we really talking about potentially interrupting the CPU several thousand times per second, something even MSDN says is not necessarily a good idea?

So what happens when you use GarageBand?  You get two options: "small buffer", and "large buffer".  But what is small, and what is large?  The answer to that question seems to be "well my dear why would you care, just go buy another app at the app store will you?".  And what if you use Skype?  That I know of, Skype has zero configuration ability to set buffer sizes.  So, in this case, you are completely out of luck.  And it's still your problem because it is you that buys the hardware and uses the software.

And there's more.  In OS X 10.4, the process priority model allowed you to use the nice and renice tools and set the priority of processes to whatever you wanted.  So, if you wanted to get the computer to do some background batch processing, you could set it to the lowest priority and it wouldn't interfere significantly with anything else you were doing.  This ability was removed in OS X 10.5, and it results in that applications cannot run with the equivalent of lowest priority.  They will consume significant time no matter what, and they will continue doing so even if you go to the command line and use the (now basically useless) renice commands.  After you modify the renice priority, the OS X scheduler will seemingly apply a balancing change which you do not have access to, and in the end nothing will happen.  You can see this in action with software such as BOINC.

Why is this important?  Because the audio buffer problem is exacerbated when the CPUs are busy.  So if you keep your whatever cores super busy, like you should, then I'm sorry but you will experience audio problems no matter what you do.  So really the only way to do USB audio is to keep your super fast machine idle(!).  And even that is not a guarantee of anything.  For emphasis, as I write this very paragraph, a short Firefox CPU spike resulted in the USB microphone I was monitoring to start producing static.  The machine has plenty of computing power, and is otherwise not doing anything of consequence.

And why would this be a problem?  Why can't OS X simply raise the priority of the processes that demand enough priority to do USB audio in tiny little packets?  Well, it could, and even then this approach wouldn't necessarily work.  This is because, according to OS X docs (and the technical note linked above) the audio drivers provided by the OS predict when applications should be notified that something needs to be done with audio, and issue a software interrupt when action is needed.  The problem with these predictions is that they are guesses, and as a kernel service developer the problem is that if you are forced to guess then you are also forced to guess wrong.  So what happens when the machine is busy?  Well, basically, too bad.  But can you fix it?  No:
  • You cannot reconfigure the default buffer size used by the OS drivers.
  • Apparently, the OS drivers do not reconfigure themselves when there is a problem, as evidenced by failure modes that require resetting hardware to clean up.
  • Applications are supposed to specify buffer sizes, but most don't because the user is not expected to understand what is going on.
  • And even if applications provide a way to configure things, it is up to users to basically guess numbers large enough such that they more or less guess there are no problems anymore.  However, users cannot prove for a fact that their chosen number is large enough because there are no diagnostic facilities that will help determining the right course of action.
  • If there are systemic problems with the way the machine is operated and applications do not respond to the audio driver's attention request within the predicted (guessed) time, from the point of view of the OS it is always possible to blame applications for the problem.
So that seems to be why you get hundreds of thousands of hits for "usb audio glitches" in Google.  Apparently, that's also why you shouldn't do audio over USB at all.  Can't we please fix this problem already?  Or, at least, provide the following so we can at least do something about it?
  • Provide enough user visible diagnostic information to determine where the problem is.
  • Allow OS drivers to provide default configuration values determined by the user to applications that would otherwise not provide configuration capabilities.  This would be somewhat similar to the situation you have with graphics drivers in which the driver more or less imposes a particular configuration for applications at the user's request.
Would this be that hard?

Well the above sounds authoritative and all that.  Nevertheless, this is merely the best estimate I've been able to figure out so far.  If you know a real solution to this problem, or you know something I missed, please let me know.  TIA!