FOMS/LCA Recap

Finally, here's my linux.conf.au 2007 and FOMS 2007 recap. Maybe a little bit late, but better late then never.

FOMS was a very well organized conference with a packed schedule and a lot of high-profile attendees. To my surprise PulseAudio has been accepted by the attendees without any opposition (at least none was expressed aloud). After a few "discussions" on a few mailing lists (including GNOME MLs) and some personal emails I got, I had thought that more people were in opposition of the idea of having a userspace sound daemon for the desktop. Apparently, I was overly pessimistic. Good news, that!

During the FOMS conference we discussed the problems audio on Linux currently has. One of the major issues still is that we're lacking a cross-platform PCM audio API everyone agrees on. ALSA is Linux-specific and complicated to use. The only real contender is PortAudio. However, PortAudio has its share of problems and hasn't reach wide adoption yet. Right now most larger software projects implement an audio abstraction layer of some kind, and mostly in a very dirty, simplistic and limited fasion. MPlayer does, Xine does it, Flash does it. Everyone does it, and it sucks. (Note: this is only a very short overview why audio on Linux sucks right now. For a longer one, please have a look on the first 15mins of my PulseAudio talk at LCA, linked below.)

Several people were asking why not to make the PulseAudio API the new "standard" PCM API for Linux. Due to several reasons that would be a bad idea. First of all, the PulseAudio API cannot be used on anything else but PulseAudio. While PulseAudio has been ported to Win32, Vista already has a userspace desktop sound server, hence running PulseAudio on top of that doesn't make much sense. Thus the API is not exactly cross-platform. Secondly, I - as the guy who designed it - am not happy with the current PulseAudio API. While it is very powerful it is also very difficult to use and easy to misuse, mostly due to its fully asynchronous nature. In addition it is also not the exactly smallest API around.

So, what could be done about this? We agreed on a - maybe - controversional solution: defining yet another abstracted PCM audio API. Yes, fixing the problem that we have too many conflicting, competing sound systems by defining yet another API sounds like a paradoxon, but I do believe this is the right path to follow. Why? Because none of the currently available solutions is suitable for all application areas we have on Linux. Either the current APIs are not portable, or they are horribly difficult to use properly, or have a strange license, or are too simple in their functionality. MacOSX managed to establish a single audio API (CoreAudio) that makes almost everyone happy on that system - and we should be able to do same for Linux. Secondly, none of the current APIs has been designed with network sound servers in mind. However, proper networking support reflects back into the API, and in a non-trivial way. An API which works fine in networked environment needs to eliminate roundtrips where possible, be open for time interpolation and have a flexible buffering (besides other minor things). Thirdly none of the current APIs offers enough functionality to properly support all the needs of modern desktop sound systems, such as per-stream volumes, stream names and notifications about external state changes.

During FOMS and LCA, Mikko Leppanen (from Nokia), Jean-Marc Valin (from Xiph) and I sat down and designed a draft API for the functionality we would like to see in this API. For the time being we dubbed it libsydney, after the city where we started this project. I plan to make this the only supported audio API for PulseAudio, eventually. Thus, if you will code against PulseAudio you will get cross-platform support for free. In addition, because PulseAudio is now being integrated into the major distributions (at least Ubuntu and Fedora), this library will be made available on most systems through the backdoor.

So, what will this new API offer? Firstly, the buffering model is much more powerful than of any current sound API. The buffering model mostly follows PulseAudio's internal buffering model which (theoretically) can offer zero-latency streaming and has been pioneered by Jim Gettys' AF sound server. It allows you to seek around in the playback buffer very flexibly. This is very useful to allow very fast reaction to the user's playback control commands while still allowing large buffers, which are good to deal with high network lag. In addition it is very handy for the programmer, such as when implementing streaming clients where packets may arrive out-of-order. The API will emulate this buffering model on top of traditional audio devices, and when used on top of PulseAudio it will use its native implementation. The API will also clearly define which sound formats are guaranteed to be available, thus making it a lot easier to code without thinking of different hardware supporting different formats all the time. Of course, the API will be easier to use than PulseAudio's current API. It will be very portable, scaling from FPU-less architectures to pro-audio machines with a massive number of synchronised channels. There are several modes available to deal with XRUNs semi-automatically, one of them guaranteeing that the time axis stays linear and monotonical in all events.

The list of features of this new API is much longer, however, enough of these grand plans! We didn't write any real code for this yet. To make sure that this project is not another one of those which are announced grandiosely without ever producing any code I will stop listing features here now. We will eventually publish a first draft of our C API for public discussion. Stay tuned.

Side-by-side with libsydney I discussed an abstract API for desktop event sounds with Mikko (i.e. those annoying "bing" sounds when you click a button and the like). Dubbed libcanberra (named after the city which one of the developers visited after Sydney), this will hopefully be for the PulseAudio sample cache API what libsydney is for the PulseAudio streaming API: a total replacement.

As a by-product of the libsydney discussion Jean-Marc coded a fast C resampling library supporting both floating point and fixed point and being licensed under BSD. (In contrast to libsamplerate which is GPL and floating-point-only, but which probably has better quality). PulseAudio will make use of this new library, as will libsydney. And I sincerly hope that ALSA, GStreamer and other projects replace their crappy home-grown resamplers with this one!

For PulseAudio I was looking for a CODEC which we could use to encode audio if we have to transfer it over the network. Such a CODEC would need to have low CPU requirements and allow low-latency operation, while providing hifi audio. Compression ratio is not such a high requirement. Unfortunately, as it seems no such CODEC exists, especially not a "Free" one. However, the Xiph people recommended to hack up a special version of FLAC for this task. FLAC is fast, has (obviously) good quality and if hacked up could provide low-latency encoding. However, FLAC doesn't compress that well. Current PulseAudio thin-client installations require 170kB network bandwidth for each client if hifi audio is used. Encoding this in FLAC this could cut this in half. Not perfect, but better than nothing.

So, that was FOMS! FOMS is a definitely highly recommended conference. If you have the chance to attend next year, don't miss it! I've never been to a more productive, packed conference in my life!

At LCA I met fellow Avahi coder Trent Lloyd for the first time. Our talk about Avahi went very well. During my flights to and back from .au I hacked up avahi-ui which I also announced during that talk. Also, in related news, tedp started to work on an implementation of NAT-PMP (aka "reverse firewall piercing"; both client and server) for inclusion in Avahi. This will hopefully make the upcoming Wide-Area DNS support in Avahi much more useful.

linux.conf.au was a very exciting conference. As a speaker you're treated like a rock star, with stuff like the speakers dinner, the speakers adventure (climbing on top of Sydney's AMP tower) and the penguin dinner. Heck, the organizers even picked me up at the airport, something I really didn't expect when I landed in Sydney, which however is quite nice after a 27h flight.

Two talks I particularly enjoyed at LCA:

And just for the sake of completeness, here are the links to my presentations:

The PulseAudio Sound Server (Ogg Theora; Slides)
Using Avahi the "Right Way" (Ogg Theora; Slides)

Ok, that's it for now. Thanks go to Silvia Pfeiffer, the rest of the FOMS team and the Seven Team for organizing these two amazing conferences!

Category: projects