レナート   Wunschkonzert, Ponyhof und Abenteuerspielplatz   ﻟﻴﻨﺎﺭﺕ

Sun, 19 Apr 2009

All About Fragments

In my on-going series Writing Better Audio Applications for Linux, here's another installment: a little explanation how fragments/periods and buffer sizes should be chosen when doing audio playback with traditional audio APIs such as ALSA and OSS. This originates from some emails I exchanged with the Ekiga folks. In the last weeks I kept copying this explanation to various other folks. I guess it would make sense to post this on my blog here too to reach a wider audience. So here it is, mostly unedited:

Yes. You shouldn't misuse the fragments logic of sound devices. It's
like this:

   The latency is defined by the buffer size.
   The wakeup interval is defined by the fragment size.

The buffer fill level will oscillate between 'full buffer' and 'full
buffer minus 1x fragment size minus OS scheduling latency'. Setting
smaller fragment sizes will increase the CPU load and decrease battery
time since you force the CPU to wake up more often. OTOH it increases
drop out safety, since you fill up playback buffer earlier. Choosing
the fragment size is hence something which you should do balancing out
your needs between power consumption and drop-out safety. With modern
processors and a good OS scheduler like the Linux one setting the
fragment size to anything other than half the buffer size does not
make much sense.

Your [Ekiga's ptlib driver that is] ALSA output is configured
to set the the fragment size to the size of your codec audio
frames. And that's a bad idea. Because the codec frame size has not
been chosen based on power consumption or drop-out safety
reasoning. It has been chosen by the codec designers based on
different reasoning, such as latency.

You probably configured your backend this ways because the ALSA
library docs say that it is recommended to write to the sound card in
multiples of the fragment size. However deducing from this that you
hence should configure the fragment size to the codec frame size is
wrong!

The best way to implement playback these days for ALSA is to write as
much as snd_pcm_avail() tells you to each time you wake up due to
POLLOUT on the sound card. If that is not a multiple of your codec
frame size then you need to buffer the the remainder of the decoded
data yourself in system memory.

The ALSA fragment size you should normally set as large as possible
given your latency constraints but that you have at least two
fragments in your buffer size.

I hope this explains a bit how frag_size/buffer_size should be
chosen. If you have questions, just ask.

(Oh, ALSA uses the term 'period' for what I call 'fragment'
above. It's synonymous)

posted at: 01:34 | path: /projects | permanent link to this entry | 7 comments


Posted by Thomas at Sun Apr 19 12:06:27 2009
From a systems point of view, I am not sure I agree. Ekiga is a soft real time application, so the main constraint is that the audio data is (almost) always at the hardware end in time. The objective is to reduce latency, which is caused by buffering on any level.

So this is a problem of managing the buffer level, and I would take a very different approach to it. There is no harm in setting a big buffer size, as long as it is not usually full. The logical approach would be set a high buffer size, and then write the data (and push it to hardware) as soon as it becomes available. There is no need for a trigger from the audio (output) hardware, because ekiga is driven by network data.

Of course there also need to be some logic to control the amount of buffered data, and thus the latency. In a naive implementation, delays or drifts could increase latency over time, and then measures should be available to compensate (such as dropping data).

Posted by alsa newbies at Sun Apr 19 15:04:00 2009
your definition of latency seem different from the alsa developers


Playback latency: buffer size
Recording latency: period size


http://www.alsa-project.org/~iwai/suselabs2003-audio-latency.pdf

Posted by Lennart at Sun Apr 19 17:25:16 2009
alsa newbies: no, it is not different. I refer to the playback case only where the definitions are identical.

But yes, I didn't say that explicitly and especially for Ekiga the record case is relevant too.

Posted by Lennart at Sun Apr 19 17:31:30 2009
Thomas: when you do audio playback things are always driven by the sound card clock. You claim that for the Ekiga case playback would be driven by the network source. Which is not correct. It is still the hardware clock that drives the playback. Since the hardware and the network clock differ resampling has to take place, so that the hw buffer can be filled based on the hw clock. But it always boils to that the hw clock is the one to use for filling up the hw buffer. There is no way around that. As I understand your suggestion you'd put the jitter buffer directly in the hw buffer. Which I don't think makes much sense, since resampling has to happen after the jitter buffer and before the hw buffer.

Posted by Thomas at Sun Apr 19 22:29:10 2009
> But it always boils to that the hw clock is the one to use for filling up the hw buffer.

I can believe that is how it is down, but I wonder about the reasoning behind it.

The question is whether the writing to hardware in Ekiga should be driven by the available data (i.e. the network), or by the demand of the sound card. To me the answer seems obvious: data should be processed and written as soon as possible, and that means network driven. I see no point in buffering the data in Ekiga, when it can be done in the kernel or even in hardware. (In other words: you can avoid the buffer in Ekiga, and avoid some latency.) Ekiga can take care of differences in speed at this point by compressing or stretching the available data if necessary.

The only exception is if the buffer runs dry. Then someone (Ekiga? Ekiga in advance? Or the kernel?) has to "invent" some data - zero in the worst case, or some kind of filtered extrapolation.

I know that this may be quite a different solution, and it will also lead to different audio data if the buffer runs dry, because the next sequence of audio data is not known (technically this results in a phase lag, but phase is usually not audible). The advantage is less latency.

Posted by alsa newbies at Tue Apr 21 04:27:13 2009
" Choosing
the fragment size is hence something which you should do balancing out
your needs between power consumption and drop-out safety."

For desktop pc , power consumption is not really a concern at all,

the requirement of the application is the main factor to use the buffer size.

interactive war game (e.g. Counter Strike ) and voip need the lowest latency ( without underrun ) so that the user can response as quickly as they hear the sound ( e.g. gunshot )


"The ALSA fragment size you should normally set as large as possible
given your latency constraints but that you have at least two
fragments in your buffer size."


But why glitch-free pulseaudio server use one period per buffer for snd-intel8x0 and snd-emu10k1

aplay do not allow 1 period per buffer by returning error "Can't use period equal to buffer size"

The modern sound card such hda , support multiple streaming , the internal mic and external mic can be recording by two different application at the same time

card 1: Intel [HDA Intel], device 0: AD198x Analog [AD198x Analog]
  Subdevices: 3/3
  Subdevice #0: subdevice #0
  Subdevice #1: subdevice #1
  Subdevice #2: subdevice #2


Those application which need real time recording (e.g Ekiga ) can open hw:0,0,2 while the other applications (e.g. pulseaudio server ) use hw:0,0,0 or hw:0,0,1

Posted by alsa newbies at Thu Apr 23 13:33:20 2009
"Because the codec frame size has not
been chosen based on power consumption or drop-out safety
reasoning. It has been chosen by the codec designers based on
different reasoning, such as latency."

Refer to the most common sound card nowadays ,High Definition Audio Specification 4.5.1 Stream Data In Memory

  The buffer must start on a 128-byte
boundary and must contain at least one sample of data. For highest efficiency, the following
guidelines should be met in buffer allocation:
• The buffer should have a length which is a multiple of 128.
• The buffer should contain at least one full packet of information.



"If that is not a multiple of your codec
frame size then you need to buffer the the remainder of the decoded
data yourself in system memory."


Do alsa-pulse plugin require the buffer size to be a multiple of period size ?

22050 seem not a multiple of 5512

aplay -v test.wav
Playing WAVE 'test.wav' : Signed 16 bit Little Endian, Rate 44100 Hz, Stereo
ALSA <-> PulseAudio PCM I/O Plugin
Its setup is:
  stream  : PLAYBACK
  access  : RW_INTERLEAVED
  format  : S16_LE
  subformat  : STD
  channels  : 2
  rate  : 44100
  exact rate  : 44100 (44100/1)
  msbits  : 16
  buffer_size  : 22050
  period_size  : 5512
  period_time  : 125000
  tstamp_mode  : NONE
  period_step  : 1
  avail_min  : 5512
  period_event : 0
  start_threshold  : 22050
  stop_threshold  : 22050
  silence_threshold: 0
  silence_size : 0
  boundary  : 1445068800

Leave a Comment:

Your Name:


Your E-mail (optional):


Comment:


As a protection against comment spam, please type the following number into the field on the right:
Secret Number Image

Please note that this is neither a support forum nor a bug tracker! Support questions or bug reports posted here will be ignored and not responded to!


It should be obvious but in case it isn't: the opinions reflected here are my own. They are not the views of my employer, or Ronald McDonald, or anyone else.

Please note that I take the liberty to delete any comments posted here that I deem inappropriate, off-topic, or insulting. And I excercise this liberty quite agressively. So yes, if you comment here, I might censor you. If you don't want to be censored your are welcome to comment on your own blog instead.


Lennart Poettering <mzoybt (at) 0pointer (dot) net>
Syndicated on Planet GNOME, Planet Fedora, planet.freedesktop.org, Planet Debian Upstream. feed RSS 0.91, RSS 2.0
Archives: 2005, 2006, 2007, 2008, 2009, 2010, 2011

Valid XHTML 1.0 Strict!   Valid CSS!