レナート   TBFKAYIBYNYAAYB   ﻟﻴﻨﺎﺭﺕ

Fri, 13 Nov 2009

On OOM

Building on what Havoc wrote two years ago about the fallacies of OOM safety (Out Of Memory) in user code I'd like to point you to this little mail I just posted to jack-devel which tries to give you the bigger picture. Should be interesting for non-audio folks, too.

Say NO to OOM safety!

posted at: 02:25 | path: /projects | permanent link to this entry | 10 comments


Posted by Perry Lorier at Fri Nov 13 03:03:55 2009
while I completely agree that you shouldn't bother with dealing with OOM -- the kernel will deal to you before you get a chance anyway -- I once saw a very clever system for testing malloc() -- have your own implementation of malloc (in an LD_PRELOAD) that did a fork(), then returned the result of the libc malloc in one branch, and returned NULL in the other.  Thus testing every possible combination of allocating and failing to allocate memory.  Clever I thought.

Posted by DDD at Fri Nov 13 11:49:54 2009
Wow...  It's so obvious but I never thought about it.  I just realised that I've been living in the 70s with only 64k of RAM.

The primary causes of OOM for me are memory leaks or lack of limiting during DOS attacks, which are usually unrecoverable anyway.

Posted by Ed Avis at Fri Nov 13 13:15:02 2009
Perhaps the effort spent on checking malloc() should be redirected to doing something sensible when an OOM condition occurs, by some other mechanism.  From a user's point of view it is very puzzling and annoying to have an application suddenly go pop for no apparent reason.  This does happen from time to time on Linux desktops, though it could be segfaults rather than OOM conditions.

Perhaps another sentinel process needs to monitor each application and at the very least give an error dialogue and some way to restart it?

Posted by John Levon at Fri Nov 13 15:40:56 2009
Whilst I agree with you in general, I should point out your comments are Linux specific. In particular, on Solaris, a NULL return from malloc() is recoverable from, and can indeed imply OOM not address space exhaustion. Indeed, I went to some effort to make vmstat and friends be able to cope with such scenarios: in a OOM situation, this is often one of the tools you really, really, don't want to quit:

http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/cmd/stat/common/acquire.c#418

Posted by Lennart at Fri Nov 13 20:46:08 2009
Perry: nice idea, but that will break as soon as there is some external resource (such as a file) that would then be accessed by both processes at the same time, possibly conflicting. Also, fork()ing does not copy over threads. And for any non-trivial application this seems like a fork bomb anyway.

Ed: bug-buddy is supposed to handle segfaults and show you an informational dialog, and offers you to submit the data generated upstream.

John: I'd bet Solaris is not any different than Linux in this respect. Certainly not in a way that retrying the malloc() would help in any way.

Posted by Ben at Fri Nov 13 21:32:38 2009
It's true that Solaris does not overcommit, and thus malloc will frequently return NULL once memory runs out:
http://developers.sun.com/solaris/articles/subprocess/subprocess.html#overcom

This is very different from standard desktop Linux configurations.  It's also true that once memory runs out, there's usually not much you can do.

On systems where malloc can return NULL, two things are important:

1. Unchecked mallocs can create security bugs if you attempt to dereference NULL+offset.  Thus, you still often need to check that the malloc has returned non-null, even if you just die when it's null.

2. Once malloc starts returning null, something pretty much has to die... but you really want that to be some application, not a system process.  One way to achieve this is to make the system processes resilient to malloc failures, and let the applications die when their mallocs fail.

Posted by Anonymous at Sun Nov 15 23:51:38 2009
I agree entirely that OOM handling does not belong in anything except select daemons and the libraries they use.  However, that doesn't mean people should just expect malloc to succeed; rather, it means programs should have and use an appropriate malloc_or_die function.

Posted by John Levon at Mon Nov 16 19:03:45 2009
Lennart, you are wrong. I wouldn't have written the code if I hadn't tested its usefulness successfully.

Posted by Alexander Larsson at Tue Nov 17 13:20:55 2009
John:
If your code uses that it will be "safe" against malloc failure to some extent. I.E it will block until there is memory availible.

However, its not truly "safe", because it doesn't do anything to make progress. If all apps did that then the entire system would start blocking doing nothing when memory was low. So, its not a general solution.

Posted by Christian Henz at Sat Nov 21 12:55:58 2009
In my experience with OOM situations due to memory leaks, on systems with swap the problem is not that your application "suddenly goes pop", but rather that your system becomes totally unresponsive and you wait through ten minutes of 100% disk io, HOPING that something finally goes pop (ideally the leaky application). So the more important question to me is not how to properly handle OOM (once malloc returns NULL it is already too late), but how to avoid it in the first place.

Leave a Comment:

Your Name:


Your E-mail (optional):


Comment:


As a protection against comment spam, please type the following number into the field on the right:
Secret Number Image

Please note that this is neither a support forum nor a bug tracker! Support questions or bug reports posted here will be ignored and not responded to!


It should be obvious but in case it isn't: the opinions reflected here are my own. They are not the views of my employer, or Ronald McDonald, or anyone else.

Please note that I take the liberty to delete any comments posted here that I deem inappropriate, off-topic, or insulting. And I excercise this liberty quite agressively. So yes, if you comment here, I might censor you. If you don't want to be censored your are welcome to comment on your own blog instead.


Lennart Poettering <mzoybt (at) 0pointer (dot) net>
Syndicated on Planet GNOME, Planet Fedora, planet.freedesktop.org, Planet Debian Upstream. feed RSS 0.91, RSS 2.0
Archives: 2005, 2006, 2007, 2008, 2009, 2010

Valid XHTML 1.0 Strict!   Valid CSS!