レナート   TBFKAYIBYNYAAYB   ﻟﻴﻨﺎﺭﺕ

Thu, 08 Feb 2007

FOMS/LCA Recap

Finally, here's my linux.conf.au 2007 and FOMS 2007 recap. Maybe a little bit late, but better late then never.

FOMS was a very well organized conference with a packed schedule and a lot of high-profile attendees. To my surprise PulseAudio has been accepted by the attendees without any opposition (at least none was expressed aloud). After a few "discussions" on a few mailing lists (including GNOME MLs) and some personal emails I got, I had thought that more people were in opposition of the idea of having a userspace sound daemon for the desktop. Apparently, I was overly pessimistic. Good news, that!

During the FOMS conference we discussed the problems audio on Linux currently has. One of the major issues still is that we're lacking a cross-platform PCM audio API everyone agrees on. ALSA is Linux-specific and complicated to use. The only real contender is PortAudio. However, PortAudio has its share of problems and hasn't reach wide adoption yet. Right now most larger software projects implement an audio abstraction layer of some kind, and mostly in a very dirty, simplistic and limited fasion. MPlayer does, Xine does it, Flash does it. Everyone does it, and it sucks. (Note: this is only a very short overview why audio on Linux sucks right now. For a longer one, please have a look on the first 15mins of my PulseAudio talk at LCA, linked below.)

Several people were asking why not to make the PulseAudio API the new "standard" PCM API for Linux. Due to several reasons that would be a bad idea. First of all, the PulseAudio API cannot be used on anything else but PulseAudio. While PulseAudio has been ported to Win32, Vista already has a userspace desktop sound server, hence running PulseAudio on top of that doesn't make much sense. Thus the API is not exactly cross-platform. Secondly, I - as the guy who designed it - am not happy with the current PulseAudio API. While it is very powerful it is also very difficult to use and easy to misuse, mostly due to its fully asynchronous nature. In addition it is also not the exactly smallest API around.

So, what could be done about this? We agreed on a - maybe - controversional solution: defining yet another abstracted PCM audio API. Yes, fixing the problem that we have too many conflicting, competing sound systems by defining yet another API sounds like a paradoxon, but I do believe this is the right path to follow. Why? Because none of the currently available solutions is suitable for all application areas we have on Linux. Either the current APIs are not portable, or they are horribly difficult to use properly, or have a strange license, or are too simple in their functionality. MacOSX managed to establish a single audio API (CoreAudio) that makes almost everyone happy on that system - and we should be able to do same for Linux. Secondly, none of the current APIs has been designed with network sound servers in mind. However, proper networking support reflects back into the API, and in a non-trivial way. An API which works fine in networked environment needs to eliminate roundtrips where possible, be open for time interpolation and have a flexible buffering (besides other minor things). Thirdly none of the current APIs offers enough functionality to properly support all the needs of modern desktop sound systems, such as per-stream volumes, stream names and notifications about external state changes.

During FOMS and LCA, Mikko Leppanen (from Nokia), Jean-Marc Valin (from Xiph) and I sat down and designed a draft API for the functionality we would like to see in this API. For the time being we dubbed it libsydney, after the city where we started this project. I plan to make this the only supported audio API for PulseAudio, eventually. Thus, if you will code against PulseAudio you will get cross-platform support for free. In addition, because PulseAudio is now being integrated into the major distributions (at least Ubuntu and Fedora), this library will be made available on most systems through the backdoor.

So, what will this new API offer? Firstly, the buffering model is much more powerful than of any current sound API. The buffering model mostly follows PulseAudio's internal buffering model which (theoretically) can offer zero-latency streaming and has been pioneered by Jim Gettys' AF sound server. It allows you to seek around in the playback buffer very flexibly. This is very useful to allow very fast reaction to the user's playback control commands while still allowing large buffers, which are good to deal with high network lag. In addition it is very handy for the programmer, such as when implementing streaming clients where packets may arrive out-of-order. The API will emulate this buffering model on top of traditional audio devices, and when used on top of PulseAudio it will use its native implementation. The API will also clearly define which sound formats are guaranteed to be available, thus making it a lot easier to code without thinking of different hardware supporting different formats all the time. Of course, the API will be easier to use than PulseAudio's current API. It will be very portable, scaling from FPU-less architectures to pro-audio machines with a massive number of synchronised channels. There are several modes available to deal with XRUNs semi-automatically, one of them guaranteeing that the time axis stays linear and monotonical in all events.

The list of features of this new API is much longer, however, enough of these grand plans! We didn't write any real code for this yet. To make sure that this project is not another one of those which are announced grandiosely without ever producing any code I will stop listing features here now. We will eventually publish a first draft of our C API for public discussion. Stay tuned.

Side-by-side with libsydney I discussed an abstract API for desktop event sounds with Mikko (i.e. those annoying "bing" sounds when you click a button and the like). Dubbed libcanberra (named after the city which one of the developers visited after Sydney), this will hopefully be for the PulseAudio sample cache API what libsydney is for the PulseAudio streaming API: a total replacement.

As a by-product of the libsydney discussion Jean-Marc coded a fast C resampling library supporting both floating point and fixed point and being licensed under BSD. (In contrast to libsamplerate which is GPL and floating-point-only, but which probably has better quality). PulseAudio will make use of this new library, as will libsydney. And I sincerly hope that ALSA, GStreamer and other projects replace their crappy home-grown resamplers with this one!

For PulseAudio I was looking for a CODEC which we could use to encode audio if we have to transfer it over the network. Such a CODEC would need to have low CPU requirements and allow low-latency operation, while providing hifi audio. Compression ratio is not such a high requirement. Unfortunately, as it seems no such CODEC exists, especially not a "Free" one. However, the Xiph people recommended to hack up a special version of FLAC for this task. FLAC is fast, has (obviously) good quality and if hacked up could provide low-latency encoding. However, FLAC doesn't compress that well. Current PulseAudio thin-client installations require 170kB network bandwidth for each client if hifi audio is used. Encoding this in FLAC this could cut this in half. Not perfect, but better than nothing.

So, that was FOMS! FOMS is a definitely highly recommended conference. If you have the chance to attend next year, don't miss it! I've never been to a more productive, packed conference in my life!

At LCA I met fellow Avahi coder Trent Lloyd for the first time. Our talk about Avahi went very well. During my flights to and back from .au I hacked up avahi-ui which I also announced during that talk. Also, in related news, tedp started to work on an implementation of NAT-PMP (aka "reverse firewall piercing"; both client and server) for inclusion in Avahi. This will hopefully make the upcoming Wide-Area DNS support in Avahi much more useful.

linux.conf.au was a very exciting conference. As a speaker you're treated like a rock star, with stuff like the speakers dinner, the speakers adventure (climbing on top of Sydney's AMP tower) and the penguin dinner. Heck, the organizers even picked me up at the airport, something I really didn't expect when I landed in Sydney, which however is quite nice after a 27h flight.

Two talks I particularly enjoyed at LCA:

And just for the sake of completeness, here are the links to my presentations:

Ok, that's it for now. Thanks go to Silvia Pfeiffer, the rest of the FOMS team and the Seven Team for organizing these two amazing conferences!

posted at: 21:51 | path: /projects | permanent link to this entry | 6 comments


Posted by Jakob Petsovits at Sun Feb 18 19:12:37 2007
No offence, but this goal (a soundsystem-abstracting, cross-platform audio API) seems to me like a duplication of what KDE's Phonon started two years ago, and is readily available in SVN now.

Wouldn't it make sense to check it on your requirements, extend/adapt it a bit and make C bindings for it? That would really standardize Linux audio, plus there are already backends for NMM and Xine available, and a GStreamer one is being worked on.

Maybe there's something that I'm missing out on, but this appears to me like another duplication of effort just because another project isn't using the "right" libs or programming language.

Posted by Lennart at Sun Feb 18 19:17:17 2007
Jakob: Phonon is something completely different, it tries to abstract media players which is much more high-level than what we try to do with libsydney.

libsydney tries to provide a powerful abstraction for PCM playback.

Phonon tries to abstract media pipeline frameworks such as GStreamer, Helix and so on.

Comparing these two things is like comparing apples and pears.

Posted by Jakob Petsovits at Mon Feb 19 16:43:57 2007
Ok... I don't quite get the difference, but if you say there is one then I guess you're right, as I don't have much low-level knowledge of how audio libs work.

Sorry for the riot, and good luck with libsydney :)

Posted by Oli at Tue Mar 20 14:37:14 2007
you could also hack Wavpack, which is slightly faster than flac iirc and open source, too.

Posted by Azrael Nightwalker at Sun Feb 24 04:51:01 2008
What about OSS4? It's now licensed under GPL/CDDL/BSD. Is it useless? Can it be used somehow?

Posted by triton at Mon Mar 3 08:51:59 2008
Just leave it to the user (between OGG and FLAC/Wavpack), it seems the best option.

As for the lib, wouldn't it be easier to fix PortAudio? Why reinvent yet another audio lib? NIH syndrome?

Leave a Comment:

Your Name:


Your E-mail (optional):


Comment:


As a protection against comment spam, please type the following number into the field on the right:
Secret Number Image

Please note that this is neither a support forum nor a bug tracker! Support questions or bug reports posted here will be ignored and not responded to!


It should be obvious but in case it isn't: the opinions reflected here are my own. They are not the views of my employer, or Ronald McDonald, or anyone else.

Please note that I take the liberty to delete any comments posted here that I deem inappropriate, off-topic, or insulting. And I excercise this liberty quite agressively. So yes, if you comment here, I might censor you. If you don't want to be censored your are welcome to comment on your own blog instead.


Lennart Poettering <mzoybt (at) 0pointer (dot) net>
Syndicated on Planet GNOME, Planet Fedora, planet.freedesktop.org, Planet Debian Upstream. feed RSS 0.91, RSS 2.0
Archives: 2005, 2006, 2007, 2008, 2009, 2010

Valid XHTML 1.0 Strict!   Valid CSS!