レナート   TBFKAYIBYNYAAYB   ﻟﻴﻨﺎﺭﺕ

Tue, 15 Sep 2009

Measuring Lock Contention

When naively profiling multi-threaded applications the time spent waiting for mutexes is not necessarily visible in the generated output. However lock contention can have a big impact on the runtime behaviour of applications. On Linux valgrind's drd can be used to track down mutex contention. Unfortunately running applications under valgrind/drd slows them down massively, often having the effect of itself generating many of the contentions one is trying to track down. Also due to its slowness it is very time consuming work.

To improve the situation if have now written a mutex profiler called mutrace. In contrast to valgrind/drd it does not virtualize the CPU instruction set, making it a lot faster. In fact, the hooks mutrace relies on to profile mutex operations should only minimally influence application runtime. mutrace is not useful for finding synchronizations bugs, it is solely useful for profiling locks.

Now, enough of this introductionary blabla. Let's have a look on the data mutrace can generate for you. As an example we'll look at gedit as a bit of a prototypical Gnome application. Gtk+ and the other Gnome libraries are not really known for their heavy use of multi-threading, and the APIs are generally not thread-safe (for a good reason). However, internally subsytems such as gio do use threading quite extensibly. And as it turns out there are a few hotspots that can be discovered with mutrace:

$ LD_PRELOAD=/home/lennart/projects/mutrace/libmutrace.so gedit
mutrace: 0.1 sucessfully initialized.

gedit is now running and its mutex use is being profiled. For this example I have now opened a file with it, typed a few letters and then quit the program again without saving. As soon as gedit exits mutrace will print the profiling data it gathered to stderr. The full output you can see here. The most interesting part is at the end of the generated output, a breakdown of the most contended mutexes:

mutrace: 10 most contended mutexes:

 Mutex #   Locked  Changed    Cont. tot.Time[ms] avg.Time[ms] max.Time[ms]       Type
      35   368268      407      275      120,822        0,000        0,894     normal
       5   234645      100       21       86,855        0,000        0,494     normal
      26   177324       47        4       98,610        0,001        0,150     normal
      19    55758       53        2       23,931        0,000        0,092     normal
      53      106       73        1        0,769        0,007        0,160     normal
      25    15156       70        1        6,633        0,000        0,019     normal
       4      973       10        1        4,376        0,004        0,174     normal
      75       68       62        0        0,038        0,001        0,004     normal
       9     1663       52        0        1,068        0,001        0,412     normal
       3   136553       41        0       61,408        0,000        0,281     normal
     ...      ...      ...      ...          ...          ...          ...        ...

mutrace: Total runtime 9678,142 ms.

(Sorry, LC_NUMERIC was set to de_DE.UTF-8, so if you can't make sense of all the commas, think s/,/./g!)

For each mutex a line is printed. The 'Locked' column tells how often the mutex was locked during the entire runtime of about 10s. The 'Changed' column tells us how often the owning thread of the mutex changed. The 'Cont.' column tells us how often the lock was already taken when we tried to take it and we had to wait. The fifth column tell us for how long during the entire runtime the lock was locked, the sixth tells us the average lock time, and the seventh column tells us the longest time the lock was held. Finally, the last column tells us what kind of mutex this is (recursive, normal or otherwise).

The most contended lock in the example above is #35. 275 times during the runtime a thread had to wait until another thread released this mutex. All in all more then 120ms of the entire runtime (about 10s) were spent with this lock taken!

In the full output we can now look up which mutex #35 actually is:

Mutex #35 (0x0x7f48c7057d28) first referenced by:
	/home/lennart/projects/mutrace/libmutrace.so(pthread_mutex_lock+0x70) [0x7f48c97dc900]
	/lib64/libglib-2.0.so.0(g_static_rw_lock_writer_lock+0x6a) [0x7f48c674a03a]
	/lib64/libgobject-2.0.so.0(g_type_init_with_debug_flags+0x4b) [0x7f48c6e38ddb]
	/usr/lib64/libgdk-x11-2.0.so.0(gdk_pre_parse_libgtk_only+0x8c) [0x7f48c853171c]
	/usr/lib64/libgtk-x11-2.0.so.0(+0x14b31f) [0x7f48c891831f]
	/lib64/libglib-2.0.so.0(g_option_context_parse+0x90) [0x7f48c67308e0]
	/usr/lib64/libgtk-x11-2.0.so.0(gtk_parse_args+0xa1) [0x7f48c8918021]
	/usr/lib64/libgtk-x11-2.0.so.0(gtk_init_check+0x9) [0x7f48c8918079]
	/usr/lib64/libgtk-x11-2.0.so.0(gtk_init+0x9) [0x7f48c89180a9]
	/usr/bin/gedit(main+0x166) [0x427fc6]
	/lib64/libc.so.6(__libc_start_main+0xfd) [0x7f48c5b42b4d]
	/usr/bin/gedit() [0x4276c9]

As it appears in this Gtk+ program the rwlock type_rw_lock (defined in glib's gobject/gtype.c) is a hotspot. GLib's rwlocks are implemented on top of mutexes, so an obvious attempt in improving this could be to actually make them use the operating system's rwlock primitives.

If a mutex is used often but only ever by the same thread it cannot starve other threads. The 'Changed.' column lists how often a specific mutex changed the owning thread. If the number is high this means the risk of contention is also high. The 'Cont.' column tells you about contention that actually took place.

Due to the way mutrace works we cannot profile mutexes that are used internally in glibc, such as those used for synchronizing stdio and suchlike.

mutrace is implemented entirely in userspace. It uses all kinds of exotic GCC, glibc and kernel features, so you might have a hard time compiling and running it on anything but a very recent Linux distribution. I have tested it on Rawhide but it should work on slightly older distributions, too.

Make sure to build your application with -rdynamic to make the backtraces mutrace generates useful.

As of now, mutrace only profiles mutexes. Adding support for rwlocks should be easy to add though. Patches welcome.

The output mutrace generates can be influenced by various MUTRACE_xxx environment variables. See the sources for more information.

And now, please take mutrace and profile and speed up your application!

You may find the sources in my git repository.

posted at: 00:07 | path: /projects | permanent link to this entry | 29 comments


Posted by Hongli Lai at Tue Sep 15 00:52:57 2009
Very nice tool!

Posted by Christian Hergert at Tue Sep 15 01:08:56 2009
I love it!  I hope to use this to do some profiling in my concurrency toolkit.

Posted by Boudewijn Rempt at Tue Sep 15 08:43:14 2009
Just what I needed! Now I only have to try to make it work -- on 32 bits opensuse 11.1, running

boud@valdiesalie:~/src/mutrace> LD_PRELOAD=/home/boud/src/mutrace/libmutrace.so krita


gives me a segmentation fault. Where can I report a bug for this issue?

Posted by ebl at Tue Sep 15 10:33:30 2009
It gives me also a segmentation fault while testing Bacula programs.

eric@zog4:/tmp/regress$ ldd bin/drivetype
  linux-vdso.so.1 =>  (0x00007f04c9ffd000)
  libbacfind.so.1 => /tmp/regress/bin/libbacfind.so.1 (0x00007f04c9bd6000)
  libbac.so.1 => /tmp/regress/bin/libbac.so.1 (0x00007f04c9980000)
  libwrap.so.0 => /lib/libwrap.so.0 (0x00007f04c9777000)
  libpthread.so.0 => /lib/libpthread.so.0 (0x00007f04c955b000)
  libdl.so.2 => /lib/libdl.so.2 (0x00007f04c9357000)
  libssl.so.0.9.8 => /usr/lib/libssl.so.0.9.8 (0x00007f04c910d000)
  libcrypto.so.0.9.8 => /usr/lib/libcrypto.so.0.9.8 (0x00007f04c8d8d000)
  libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x00007f04c8a82000)
  libm.so.6 => /lib/libm.so.6 (0x00007f04c8801000)
  libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x00007f04c85f3000)
  libc.so.6 => /lib/libc.so.6 (0x00007f04c8291000)
  libnsl.so.1 => /lib/libnsl.so.1 (0x00007f04c8078000)
  /lib64/ld-linux-x86-64.so.2 (0x00007f04c9de2000)
  libz.so.1 => /usr/lib/libz.so.1 (0x00007f04c7e61000)

Posted by Gareth Foster at Tue Sep 15 12:58:21 2009
Can this work on Solaris?

Posted by Lennart at Tue Sep 15 14:53:15 2009
Boudewijn, ebl: interesting. I don't have a bug tracker for this though. However, please send me a backtrace for these segfaults to by email! Thanks.

Gareth: No clue, I am not a Solaris guy. I used a lot of GCC'isms, ELF'isms, Linux'isms, though. Using it on Solaris might need some porting.

Posted by Lennart at Tue Sep 15 15:48:49 2009
Boudewijn, ebl: so this turns out to be a generaly incompat of mutrace and C++ static constructors. I have an idea though how to fix this.

Posted by Anton at Tue Sep 15 21:09:48 2009
For Solaris, just use lockstat.

Posted by Mike at Wed Sep 16 04:42:35 2009
Can this work out for deamons? I mean, can I use this to see the stats on running processes without quitting them?

Posted by Lennart at Wed Sep 16 04:47:16 2009
Mike: No, I don't see how that could work with LD_PRELOAD.

Posted by Mike at Wed Sep 16 07:51:28 2009
if x is an exe, then

mv x x.orig
cat > x
export LD_PRELOAD="..."
x.orig $*
^D

`execute x`

Posted by Martin Schöpf at Wed Sep 16 12:48:35 2009
Your the man!

Very nice tool! ;) ... initially we had the same idea with preloading, but we never finished it to this degree.

Cheers

Posted by Philip Van Hoof at Wed Sep 16 15:07:04 2009
Very useful, I will definitely use this.

A suggestion: make it easy to allow an IDE to use the output. For example if -g was added during compilation print the file.c:linenumber instead of just the functions' names.

Posted by Grammer Nazi at Wed Sep 16 18:02:46 2009
Bananam in manu habeo.

singular accusative, need an "m".

Posted by Lennart at Wed Sep 16 18:08:10 2009
Philip: unfortunately it's not that easy getting the actual debug symbol information, so we don't print this at all. The current backtraces are the unmodified output glibc's backtrace_symbols() generates.

Posted by hexa at Wed Sep 16 20:39:04 2009
Thanks nice tool ! :)

Way better then my ugly profiling patch to glib hehe

Posted by anon at Wed Sep 16 23:12:18 2009
This is good stuff, thanks!

Posted by Lawrence D'Oliveiro at Thu Sep 17 11:08:12 2009
Would it be useful to show, not just the number of times a lock was contended, but also the time spent waiting for a contended lock to be freed? Wouldn't that be a more realistic measure of performance impact than just the number of contentions?

Posted by Lennart at Thu Sep 17 17:36:05 2009
Lawrence: we already show the time the lock was held for. It might indeed make sense to show the time a lock was waited for, too.

Posted by pavlinux at Thu Sep 17 22:41:51 2009
./bootstrap.sh


Remember to add `AC_PROG_LIBTOOL' to `configure.ac'.
You should update your `aclocal.m4' by running aclocal.
....

configure.ac:28: warning: The macro `AC_FOREACH' is obsolete.
configure.ac:28: You should run autoupdate.
../../lib/autoconf/general.m4:196: AC_FOREACH is expanded from...
aclocal.m4:700: _AM_SET_OPTIONS is expanded from...
aclocal.m4:393: AM_INIT_AUTOMAKE is expanded from...
configure.ac:28: the top level
configure.ac:91: warning: The macro `AC_LANG_SAVE' is obsolete.
configure.ac:91: You should run autoupdate.
../../lib/autoconf/lang.m4:167: AC_LANG_SAVE is expanded from...
m4/acx_pthread.m4:79: AX_PTHREAD is expanded from...
configure.ac:91: the top level
configure.ac:91: warning: The macro `AC_LANG_C' is obsolete.
configure.ac:91: You should run autoupdate.
../../lib/autoconf/c.m4:73: AC_LANG_C is expanded from...
configure.ac:91: warning: The macro `AC_TRY_LINK' is obsolete.
configure.ac:91: You should run autoupdate.
../../lib/autoconf/general.m4:2414: AC_TRY_LINK is expanded from...
configure.ac:91: warning: The macro `AC_LANG_RESTORE' is obsolete.
configure.ac:91: You should run autoupdate.
../../lib/autoconf/lang.m4:176: AC_LANG_RESTORE is expanded from...

....
Makefile.am:27: Libtool library used but `LIBTOOL' is undefined
Makefile.am:27:  The usual way to define `LIBTOOL' is to add `AC_PROG_LIBTOOL'
Makefile.am:27:  to `configure.ac' and run `aclocal' and `autoconf' again.
Makefile.am:27:  If `AC_PROG_LIBTOOL' is in `configure.ac', make sure
Makefile.am:27:  its definition is in aclocal's search path.

...

./configure: line 5335: syntax error near unexpected token `2.2'
./configure: line 5335: `LT_PREREQ(2.2)

Posted by Lennart at Thu Sep 17 23:29:10 2009
pavlinux: this is not a bugtracker! that said, you need to install libtool 2.2.

Posted by Brian at Fri Sep 18 18:39:16 2009
This is useful, however the back tracing is a touch weak.  I tried compling with and without debug.  I'm assuming this only reports shared library entry points?

Mutex #42 (0x0x7f06ac000f88) first referenced by:
  /usr/local/lib/libmutrace.so(pthread_mutex_init+0x1a6) [0x7f06beac1f3e]
  ./uNormalSystem [0x4773ed]
  ./uNormalSystem [0x443914]
  ./uNormalSystem [0x406ba8]
  ./uNormalSystem [0x40443f]
  /lib/libc.so.6(__libc_start_main+0xfd) [0x7f06be56ba3d]
  ./uNormalSystem [0x403d99]

Posted by Lennart at Fri Sep 18 19:30:57 2009
Brian: make sure to compile your stuff with -rdynamic. That will make the backtrace more useful. Note however that we rely on glibc's backtrace_symbols() call for backtraces which doesn't read debug info, so line numbers won't be availabe then either.

Posted by Brian at Sun Sep 20 00:35:19 2009
That worked, rdynamic is a link flag, not a build flag.

Unfortunately std::ios_base and the locale BS shows up as #2 in my list of contentions.  I was aware of this being a problem before and made a few changes in critical areas but it's still a problem apparently.

Posted by Xuekun at Thu Oct 22 11:28:42 2009
what is the limitation of mutrace? I mean under what kind of situation, mutrace couldn't get mutex contention data?

I used mutrace on some small applications (like several threads created), it worked fine.

However when I tried to use on large application (like mysql, :-) ), the application just hang. I used gdb to attach to the application, and found it is blocked at __lll_mutex_lock_wait. What's wrong?

#0  0x00002b8e7d973888 in __lll_mutex_lock_wait () from /lib64/libpthread.so.0
#1  0x00002b8e7d96f8a5 in _L_mutex_lock_107 () from /lib64/libpthread.so.0
#2  0x00002b8e7d96f333 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00002b8e7d73c8c3 in pthread_mutex_lock () from /usr/local/lib/libmutrace.so
#4  0x00002aaaaaadb5cb in _Unwind_Find_FDE () from /lib64/libgcc_s.so.1
#5  0x00002aaaaaad85b5 in _Unwind_GetIPInfo () from /lib64/libgcc_s.so.1
#6  0x00002aaaaaad95db in _Unwind_GetIPInfo () from /lib64/libgcc_s.so.1
#7  0x00002aaaaaad980a in _Unwind_Backtrace () from /lib64/libgcc_s.so.1
#8  0x00002b8e7e751bc8 in backtrace () from /lib64/libc.so.6
#9  0x00002b8e7d73be6e in generate_stacktrace () from /usr/local/lib/libmutrace.so
#10 0x00002b8e7d73c15d in mutex_info_add () from /usr/local/lib/libmutrace.so
#11 0x00002b8e7d73c524 in pthread_mutex_init () from /usr/local/lib/libmutrace.so
#12 0x000000000084002d in my_thread_global_init () at my_thr_init.c:142
#13 0x0000000000822932 in my_init () at my_init.c:81
#14 0x00000000005a2ab2 in main (argc=-1429347456, argv=0x0) at mysqld.cc:4267

Sorry, if post at wrong place, since I couldn't find its support forum or bug tracker. :-)

Posted by Lennart at Thu Oct 22 15:15:29 2009
Xuekun: mutrace has no problem with processes of any size. However, it does have problems with C++ apps that throw exceptions. This should be easy to fix though, however my lowe-level C++ skills are not good enough for this.

Posted by Anton Ivanov at Mon Feb 8 14:56:53 2010
It`s will be usefull to support google perftools.

currently it shows error:

ai@aivanov2:~/DevelMail/drweb-maild/Engine$ LD_PRELOAD=/usr/local/lib/libmutrace.so ./drweb-maild
mutrace: Application appears to be compiled without -rdynamic. It might be a
mutrace: good idea to recompile with -rdynamic enabled since this produces more
mutrace: useful stack traces.

mutrace: Detected non-glibc memory allocator. Your program uses some
mutrace: alternative memory allocator (jemalloc?) which is not compatible with
mutrace: mutrace. Please rebuild your program with the standard memory
mutrace: allocator or fix mutrace to handle yours correctly.

Posted by Eero Tamminen at Thu Apr 29 16:34:43 2010
E.g. __sync_fetch_and_add() doesn't seem to be available on ARM.

Maybe mutrace could in future use libatomic-ops:
http://packages.debian.org/squeeze/libatomic-ops-dev
?

Posted by Gunther Piez at Fri Jun 18 10:22:10 2010
Rather useful tool. It did help me track down why my program suffered unusual high system load.

It turns out that a condition variable used in multiple threads was the culprit. I used notify_all() to wake mutliple worker threads at the same time, where a separate condition var on each thread and a notify_one() would have been better.

The high congestion on the condition mutex lead me to the right trace.

BTW, mutrace shouldn't hard exit, if a non glibc allocator is found. I did have one (probably from Qt), and it worked nevertheless. I just changed the "exit(1)" at the error message to a "exit(0)".

Leave a Comment:

Your Name:


Your E-mail (optional):


Comment:


As a protection against comment spam, please type the following number into the field on the right:
Secret Number Image

Please note that this is neither a support forum nor a bug tracker! Support questions or bug reports posted here will be ignored and not responded to!


It should be obvious but in case it isn't: the opinions reflected here are my own. They are not the views of my employer, or Ronald McDonald, or anyone else.

Please note that I take the liberty to delete any comments posted here that I deem inappropriate, off-topic, or insulting. And I excercise this liberty quite agressively. So yes, if you comment here, I might censor you. If you don't want to be censored your are welcome to comment on your own blog instead.


Lennart Poettering <mzoybt (at) 0pointer (dot) net>
Syndicated on Planet GNOME, Planet Fedora, planet.freedesktop.org, Planet Debian Upstream. feed RSS 0.91, RSS 2.0
Archives: 2005, 2006, 2007, 2008, 2009, 2010

Valid XHTML 1.0 Strict!   Valid CSS!