レナート   TBFKAYIBYNYAAYB   ﻟﻴﻨﺎﺭﺕ

Wed, 29 Oct 2008

Automatic Backtrace Generation

Ubuntu has Apport. Fedora has nothing. That sucks big time.

Here's the result of a few minutes of hacking up something similar to Apport based on the awesome (and much underused) Frysk debugging tool kit. It doesn't post any backtraces on any Internet servers and has no fancy UI -- but it automatically dumps a stacktrace of every crashing process on the system to syslog and stores all kinds of data in /tmp/core.*/ for later inspection.

#!/bin/bash
set -e
export PATH=/sbin:/bin:/usr/sbin:/usr/bin
DIR="/tmp/core.$1.$2"
umask 077
mkdir "$DIR"
cat > "$DIR/core"
exec &> "$DIR/dump.log"
set +e
echo "$1" > "$DIR/pid"
echo "$2" > "$DIR/timestamp"
echo "$3" > "$DIR/uid"
echo "$4" > "$DIR/gid"
echo "$5" > "$DIR/signal"
echo "$6" > "$DIR/hostname"
set -x 
fauxv "$DIR/core" > "$DIR/auxv"
fexe "$DIR/core" > "$DIR/exe"
fmaps "$DIR/core" > "$DIR/maps"
PKGS=`/usr/bin/fdebuginfo "$DIR/core" | grep "\-\-\-" | cut -d ' ' -f 1 | sort | uniq | grep '^/'| xargs rpm -qf | sort | uniq`
[ "x$PKGS" != x ] && debuginfo-install -y $PKGS
fstack -rich "$DIR/core" > "$DIR/fstack"
set +x
(
	echo "Application `cat "$DIR/exe"` (pid=$1,uid=$3,gid=$4) crashed with signal $5."
	echo "Stack trace follows:"
	cat "$DIR/fstack"
	echo "Auxiliary vector:"
	cat "$DIR/auxv"
	echo "Maps:"
	cat "$DIR/maps"
	echo "For details check $DIR"
) | logger -p local6.info -t "frysk-core-dump-$1"

Copy that into a file $SOMEWHERE/frysk-core-dump. Then do a chmod +x $SOMEWHERE/frysk-core-dump and a chown root:root $SOMEWHERE/frysk-core-dump. Now, tell the kernel that core dumps should be handed to this script:

# echo "|$SOMEWHERE/frysk-core-dump %p %t %u %g %s %h" > /proc/sys/kernel/core_pattern

Finally, increase RLIMIT_CORE to actually enable core dumps. ulimit -c unlimited is a good idea. This will enable them only for your shell and everything it spawns. In /etc/security/limits.conf you can enable them for all users. I haven't found out yet how to enable them globally in Fedora though, i.e. for every single process that is started after boot including system daemons.

You can test this with running sleep 4711 and then dumping core with C-\. The stacktrace should appear right-away in /var/log/messages.

This script will automatically try to install the debugging symbols for the crashing application via yum. In some cases it hence might take a while until the backtrace appears in syslog.

Don't forget to install Frysk before trying this script!

You can't believe how useful this script is. Something crashed and the backtrace is already waiting for you! It's a bugfixer's wet dream.

I am a bit surprised though that noone else came up with this before me. Or maybe I am just too dumb to use Google properly?

posted at: 23:05 | path: /projects | permanent link to this entry | 15 comments


Posted by Michael "Apport" Howell at Thu Oct 30 00:19:06 2008
Perhaps you can use Apport in Fedora? I'm not sure how hard that would be, but of course it's always good to share code ;).

Posted by Lennart at Thu Oct 30 00:23:17 2008
Michael: There was actually some discussion about that:

http://fedoraproject.org/wiki/Features/Apport

I am happy with Apport. But apparently some people in Fedora prefer Socorro. I don't.

Posted by James at Thu Oct 30 04:11:38 2008
Dear Lennart,
Why do you feel it necessary to have a NIH approach to every problem you try to solve?

Just use apport.

Love,
someone-who-is-a-bit-cranky-but-loves-your-work

Posted by Colin Walters at Thu Oct 30 04:27:36 2008
Because the source code for the Apport server isn't (as far as I know) public, and even if it was it'd probably be highly tied in with Launchpad which itself isn't free (yet).

Posted by Lennart at Thu Oct 30 04:49:32 2008
James: I already made clear that I'd be happy with adopting Apport. You are barking up the wrong tree, my friend!

Posted by Martin Pitt at Thu Oct 30 11:33:52 2008
Huh? Apport is fully GPL, as it says in it's license, and it is written with portability in mind, too. See https://wiki.ubuntu.com/Apport for some intro.

Will Woods started to port it to Fedora a while ago (https://code.launchpad.net/~wwoods/apport/fedora) which primarily meant to implement an RPM version of the abstract packaging backend. There is even a Fedora spec about it:
https://fedoraproject.org/wiki/Features/Apport

Also, there is lots of activity in the OpenSUSE branch: https://code.launchpad.net/~apport-opensuse/apport/opensuse

Posted by Martin Pitt at Thu Oct 30 11:54:57 2008
Just some further explanations about this mystical "apport server": There are really two parts:

- A place to store the crash reports, which is just a bug tracker; apport has an abstract class CrashDB for this, with a Launchpad implementation, and the beginnings of a Bugzilla one. So yes, the storage of those bugs/crash reports for Ubuntu actually uses Launchpad, but they are not more or less free than any other bug report.

- A bot which grabs newly reported bugs, and recombines the core dumps and debug symbol packages to symbolic stack traces, checks for duplicates, and updates the bug report. The code is entirely free and contained in the public bzr repo. It has nothing to do with Launchpad, it's just running in the Canonical data center because that has good bandwidth and well administered servers. That part is highly distro specific by nature, of course, since it needs to create chroots, install packages, etc.

Posted by FACORAT Fabrice at Thu Oct 30 18:56:59 2008
Mandriva have also a tool to do automatic crash reports : drakbug. However this is only for Mandriva tools.
http://club.mandriva.com/xwiki/documentation/2008-spring/Mastering-Manual-EN.html/Mastering-Manual.html/drakbug.html#drakbackup-what-other

Posted by Will Woods at Thu Oct 30 19:03:13 2008
One of the reasons we didn't go forward with Apport in Fedora is that it required reimplementing a bunch of Ubuntu-specific code to work properly, and we've been working on doing all of that upstream.

An example: there were some kernel patches to twiddle core_rlimit and pass along data about the crashing process.

Part of my response to that was to talk Neil Horman into fixing up core_pattern so it could pass a proper **argv to the pipe-command, so now you can pass the pid etc. as arguments. (As you use in your example, Lennart.) That feature landed upstream in.. 2.6.24, I think, so it wasn't available until Fedora 9.

Another piece: abstraction for the packaging system stuff to fetch debuginfo, get file lists, etc. Well, again in Fedora 9, we gained PackageKit, which is a nicely cross-distro abstraction for all that stuff.

That brings us to the retracing server and the  backend to send the crash reports. I started writing a bugzilla backend (which has now expanded into python-bugzilla), but quickly realized that authentication (or the lack thereof) makes this a serious security/privacy problem.

So instead we've been working on a server that exports all the debuginfo files on a WebDAV share, so you can just mount the share and retrace data on your own system without needing to install debuginfo packages at all. Furthermore the amount of data transferred should be much smaller - WebDAV supports seek(), after all.

So, yeah, it sucks that we don't have something for automatic backtraces yet. But I'm not convinced that "Just make Apport work!!" is the right way to spend my time.

Posted by Kevin Kofler at Fri Oct 31 03:45:55 2008
What about a simple solution like KCrash/DrKonqi? Of course that solution is KDE-specific, but IMHO the easiest solution to that problem would be to implement something equivalent in upstream GNOME.

How it works: KCrash (which is automatically instantiated by KApplication) hooks the crash vectors (SIGSEGV, SIGILL, SIGABRT etc.). When a crash is intercepted, it launches DrKonqi (using fork/exec directly to be as failsafe as possible), which brings up a crash dialog, attaches GDB to the crashed process (if GDB is not installed, you get a message that you have to install GDB instead of a backtrace) and uses it to create a backtrace. You get the option to copy the backtrace to the clipboard or save it to a file, so you can paste it into a bug report.

That solution needs no server, no core dumps (so the default ulimit setting not to create them need not be touched) and no interface to a distro-specific bug tracker (it just uses the clipboard). It may not be the perfect solution, but it is very easy to set up, so it could be done in upstream GNOME (just like it is in upstream KDE, which also means that Fedora gets it for free for KDE apps, wouldn't it be nice if the same were true for GNOME?).

Posted by Lennart at Fri Oct 31 09:12:28 2008
Kevin: we already have that in GNOME. It's called Bug Buddy.

It is useful only for GNOME applications. I am more ineterested in system level stuff and non-GNOME stuff.

Posted by Anonymous at Sat Nov 1 05:00:56 2008
1) Nice work.

2) Insecure temporary file vulnerability. :)

Posted by Anon at Sat Nov 1 13:09:24 2008
I believe that for kernel oopses Fedora led the way with kerneloops which reports oopses found in logs to kerneloops.org . However it is a shame to see Red Hat's Fedora lagging behind this curve for general programs given that RH9 was the first Linux distro (that I used) to support external debugging symbols (a great day that was too).

Microsoft took things even further with XP where you can sometimes get a message back saying things like "you need this patch with these drivers" after submitting issues...

Posted by Lennart at Sat Nov 1 15:16:21 2008
Anonymous: Rgarding 2). Not that it would matter, but the only problem i see here is a DoS by guessing the right name beforehand. Which doesn't really matter, I'd say. The umask 077 and set -e should make sure that if the script succeeds at all the data stored is trustable.

Posted by grfgguvf@gmail.com at Thu Nov 27 07:41:59 2008
While certainly, the current situation sucks in as much as when something crashes, there are no details left behind, making this script default in Fedora would instead result in loading the system with this script in case of crash. You say the stack trace should appear in syslog right away, in reality when something like apache or firefox crashes, this script can easily run for half an hour.

Frysk tools are unnecessarily slow. The WebDAV debuginfo share does sound like a step in the right direction.

Maybe you and most Red Hat engineers work on quad core systems with 4GB memory and you don't notice it, but most RH software is really sloooooow.

PackageKit -- with the in-out-fading icon during transactions hogs X.
prelink provides no noticeable benefit while hogging the disk every night.
Completely Fair Scheduler caused great degradation in desktop responsiveness -- have improved somewhat since.
PulseAudio -- what's the purpose of it? Before, I could just listen to music, the music player took about 2% CPU and when I ran an upgrade or a compile, I ran it niced. Everything fine. Now; the music player starves CPU because PulseAudio takes it all and has a higher priority... I don't wanna run the player as root. Setting a higher priority by hand every time from sudo is clumsy. I could script that..... but what does PulseAudio do again? I don't think anyone ever really needed mixing, surely you don't want annoying dings and clicks from the Desktop Env. when listening to music... that said, OSS kernelspace mixing (at least in FreeBSD) impairs a minimal few percent overhead at max. It's not because it is kernel space either...
SELinux -- while the inkernel parts are fine, why does installing an upgraded policy take half an hour?
Yum -- why does it even exist? The original argument was that APT is complex and hard to fix/port. Seeing that it took about 4 years to get yum up to an acceptable speed, surely it would have been easier to just go with APT/up2date.
...and so on...

I can understand that these mostly are mooted by heavy corporate server hardware, but what about desktop/laptop users? They made Linux possible, but now I guess they aren't customers so they don't matter!

Leave a Comment:

Your Name:


Your E-mail (optional):


Comment:


As a protection against comment spam, please type the following number into the field on the right:
Secret Number Image

Please note that this is neither a support forum nor a bug tracker! Support questions or bug reports posted here will be ignored and not responded to!


It should be obvious but in case it isn't: the opinions reflected here are my own. They are not the views of my employer, or Ronald McDonald, or anyone else.

Please note that I take the liberty to delete any comments posted here that I deem inappropriate, off-topic, or insulting. And I excercise this liberty quite agressively. So yes, if you comment here, I might censor you. If you don't want to be censored your are welcome to comment on your own blog instead.


Lennart Poettering <mzoybt (at) 0pointer (dot) net>
Syndicated on Planet GNOME, Planet Fedora, planet.freedesktop.org, Planet Debian Upstream. feed RSS 0.91, RSS 2.0
Archives: 2005, 2006, 2007, 2008, 2009, 2010

Valid XHTML 1.0 Strict!   Valid CSS!