レナート   Wunschkonzert, Ponyhof und Abenteuerspielplatz   ﻟﻴﻨﺎﺭﺕ

Fri, 19 Nov 2010

systemd for Administrators, Part IV

Here's the fourth installment of my ongoing series about systemd for administrators.

Killing Services

Killing a system daemon is easy, right? Or is it?

Sure, as long as your daemon persists only of a single process this might actually be somewhat true. You type killall rsyslogd and the syslog daemon is gone. However it is a bit dirty to do it like that given that this will kill all processes which happen to be called like this, including those an unlucky user might have named that way by accident. A slightly more correct version would be to read the .pid file, i.e. kill `cat /var/run/syslogd.pid`. That already gets us much further, but still, is this really what we want?

More often than not it actually isn't. Consider a service like Apache, or crond, or atd, which as part of their usual operation spawn child processes. Arbitrary, user configurable child processes, such as cron or at jobs, or CGI scripts, even full application servers. If you kill the main apache/crond/atd process this might or might not pull down the child processes too, and it's up to those processes whether they want to stay around or go down as well. Basically that means that terminating Apache might very well cause its CGI scripts to stay around, reassigned to be children of init, and difficult to track down.

systemd to the rescue: With systemctl kill you can easily send a signal to all processes of a service. Example:

# systemctl kill crond.service

This will ensure that SIGTERM is delivered to all processes of the crond service, not just the main process. Of course, you can also send a different signal if you wish. For example, if you are bad-ass you might want to go for SIGKILL right-away:

# systemctl kill -s SIGKILL crond.service

And there you go, the service will be brutally slaughtered in its entirety, regardless how many times it forked, whether it tried to escape supervision by double forking or fork bombing.

Sometimes all you need is to send a specific signal to the main process of a service, maybe because you want to trigger a reload via SIGHUP. Instead of going via the PID file, here's an easier way to do this:

# systemctl kill -s HUP --kill-who=main crond.service

So again, what is so new and fancy about killing services in systemd? Well, for the first time on Linux we can actually properly do that. Previous solutions were always depending on the daemons to actually cooperate to bring down everything they spawned if they themselves terminate. However, usually if you want to use SIGTERM or SIGKILL you are doing that because they actually do not cooperate properly with you.

How does this relate to systemctl stop? kill goes directly and sends a signal to every process in the group, however stop goes through the official configured way to shut down a service, i.e. invokes the stop command configured with ExecStop= in the service file. Usually stop should be sufficient. kill is the tougher version, for cases where you either don't want the official shutdown command of a service to run, or when the service is hosed and hung in other ways.

(It's up to you BTW to specify signal names with or without the SIG prefix on the -s switch. Both works.)

It's a bit surprising that we have come so far on Linux without even being able to properly kill services. systemd for the first time enables you to do this properly.

posted at: 18:17 | path: /projects | permanent link to this entry | 36 comments


Posted by alex at Fri Nov 19 20:07:52 2010
> This will ensure that SIGTERM is delivered to all processes of the crond service, not just the main process

boy, you know that u dont need to restart cron? its rereading config files right away once it was changed. And even if its needed, you need send HUP to a parent is enought. Anyway. cron, bind, apache... anyway.

Lennart, with all the respect, your knowledges about daemons are so screwed up. You should read the book of Evi Nemet before touching this things.

Posted by bochecha at Fri Nov 19 20:32:50 2010
@alex: I lost count of the times I had a stalled cron daemon that kept spawning children that would never complete, bringing the host to its knees.

Stopping such a cron daemon is not enough usually, and when killing it, all children processes remain alive and attach to init, so you have to « kill -9 » them all individually.

I for one welcome « systemctl kill » heartily. :)

Posted by Michael at Fri Nov 19 20:35:50 2010
@alex:
There are different cron implementations. The one on Debian (vixie-cron based) indeed does pick up changes to configuration files automatically.

Fedora uses a different cron implementation from what I could find out which does not automatically reload on configuration changes.

Please also note that Lennart used --kill-who=main for the SIGHUP example. Exactly for the reason you mentioned that only the main process (what you called parent) needs this signal

Posted by Jeffrey W. Baker at Fri Nov 19 22:31:40 2010
Might I suggest abbreviating the syntax of these two things to 'systemctl kill' and 'systemctl killall'?  That will be a bit nicer than --kill-who=whatever.

I agree with @bochecha that the ability to kill all user's children of crond is a miracle.  It is quite difficult to write a proper cron job that will never launch in parallel with itself and most users will screw it up.

Posted by Jakub Narębski at Sat Nov 20 00:04:36 2010
+1 for 'systemctl kill' and 'systemctl killall'.

The --kill-who doesn't make for nice API.

Posted by Lennart at Sat Nov 20 01:31:42 2010
alex: you seem a little bit confused, killing SIGTERM triggers a shutdown of a process, not a restart. Also note that HUP in most daemons actually triggers a reload, not a restart, which is quite a distinction. Finally, different cron implementations work differently. With the advent of inotify more and more daemons now automatically reload their configuration files if they change (although for the cron case you don't even really need that), but that's a more recent development. I won't comment on who of us has the screwed up knowledge here...

Jeffrey: definitely an interesting idea, however I am not 100% convinced we really want this. After all I want to be able to read the command line as if it was a sentence, and "kill foo.service" kinda tells me that this will kill this service, but "killall" would suggest there were more than service by the same name? The killall command we all know and love works like that: it iterates through the process tree and kills everything that matches the name. If we reuse this verb in this context here, then I believe this would be slightly misleading.

Posted by Horst H. von Brand at Sat Nov 20 01:52:50 2010
Why is one SIGxxx and the other plain HUP?

Posted by Lennart at Sat Nov 20 02:05:54 2010
Horst: just to make the point that you may write the full name including the SIGxxx prefix or leave it out. Since I myself never can remember which tools want the full name and which tools take the unprefixed name I just made all systemd tools take both. While I general believe too much redundancy in the configuration languages is a bad idea I thought that in this case it's fine. (Note that I actually wrote pretty much this in the blog story, in the second to last paragraph)

Posted by someone at Sat Nov 20 02:57:47 2010
Do these commands work?

systemctl kill -9 crond.service
systemctl kill -s 9 crond.service
systemctl kill -s 0 crond.service
systemctl kill -SIGKILL crond.service
systemctl kill -KILL crond.service

i.e. it would be great if the syntax was exactly the same as kill (except drop the -l case).

Posted by Lennart at Sat Nov 20 03:29:09 2010
someone: you have to specify the '-s', but yes, otherwise all three possible syntaxes are accepted (with and without the SIG prefix, and numeric)

Posted by David Weinehall at Sat Nov 20 12:02:36 2010
@Lennart: only the "-s <NO SIG>" syntax is POSIX-compliant though; omitting "-s" or including "SIG" is implementation specific and is not guaranteed to be supported.

Posted by Grahame Bowland at Sat Nov 20 12:44:52 2010
A bit of a minor thing, but why do all the systemd commands require you to type 'crond.service' rather than just 'crond'? It's a bit cumbersome and seems unnecessary.

Posted by Lennart at Sat Nov 20 16:47:33 2010
David, uh? systemctl is my brainchaild, it's definitely not POSIX compliant, since it was defined by me, not POSIX.

Grahame: since we maintain not only services, but also sockets, devices, mount points, automount points, timers, inotify triggers, and more. The suffix encodes what kind of object it is you deal with.

Posted by strcmp at Sun Nov 21 13:49:39 2010
systemctl kill ssh.service looks dangerous on remote systems, in this case --kill-who=main should be the default... Because of that i vote for something like kill/killall.

Posted by Jon at Sun Nov 21 14:54:25 2010
foo.service seems backwards, in that case.  It would seem more logical to me (at least) that you start with the widest scope and narrow down, e.g. service.ssh (or socket.ssh or whatever).

See also: heirarchical include mechanisms in 4GLs, e.g. Java; Don't see also: The domain name system.

Posted by Lennart at Sun Nov 21 14:59:50 2010
strcmp: note that user sessions are moved into their own cgroups anyway, and ssh sessions would hence not be killed by killing the sshd daemon itself.

Jon: well, file names tend to have the type at the end, and since our unit names are actually identical to the names of the files their configuration is stored in we chose to do <name>.<suffix> instead of the reversed order.

Posted by Holger at Sun Nov 21 21:40:44 2010
So I guess it would be possible to use it to sent STOP/CONT to a service incl. all its childs, assuming CONT doing the "wakeup" in the revers order of STOP, right?

Posted by Lennart at Sun Nov 21 21:46:28 2010
Holger: yes, you can send STOP/CONT, but the order of the delivery is actually undefined.

Posted by Andreas at Mon Nov 22 00:30:28 2010
Typing "kill --kill-who=XXX" feels a bit redundant. Could it not  be shorter like just" kill --who=XXX"?

Posted by yhdezalvarez at Mon Nov 22 14:08:15 2010
@Andreas

> Could it not  be shorter like just" kill --who=XXX"?

why not "kill --target=XXX"? "who" sounds a little odd.

Posted by Andreas at Mon Nov 22 14:33:36 2010
Yeah, --who= was just the first thing which I thought of.

Posted by Will P at Thu Nov 25 13:34:43 2010
It would be really nice to have a way of shutting down a service by specifying a time-delay after TERM before sending KILL, so you can give the service time to gracefully shut down, but then forcibly kill it if it hasn't shut down on its own.  Having the logic present in the systemctl command would let it wait out the full duration or exit early if the service completed its shutdown before the time expired.

maybe:
systemctl killwait -w 15 nfsd
systemctl stopwait -w 15 nfsd

The killwait would use SIGTERM then SIGKILL... The stopwait  would use the ExecStop method, followed by SIGKILL.

This is a feature that the init script 'killproc' function provides primitive support for.  (It's in /etc/init.d/functions on my fedora14 system).  Having the more exact knowledge from systemd about the actual state of the service processes would make this a much more robust method than what 'killproc' tries to do.

For historical perspective, there are internal process management tools at some companies that provide this same functionality, which use process groups to implement the same kind of "service" management, with this 'killproc' delay behavior between TERM and KILL.

If this feature already exists, then bravo! If not, then what do you think of adding it?

Posted by sysitos at Thu Nov 25 13:40:02 2010
@Lennart, yes you are right, the killall would lead to confusion, but if you kill the whole tree, than the right name would be killtree ;)

So my suggestion:
systemctl kill crond.service -> would only kill the single crond service
systemctl killtree crond.service -> would kill the crond and all of the childs

And so the even the systemctl kill command couldn't lead to confusion with the well know kill  command.

CU sysitos

Posted by sysitos at Thu Nov 25 13:56:27 2010
@me and Lennart, some addition.

You could than even add the
systemctl killall service1.service -> would kill all instances of service1
systemctl killalltree service1.service -> would kill all instancen and childs of service1

But a question, what happens when a service was started multiple times and now is running multiple times? Which service is than killed by the first kill? The first one, the last one?

Thanks.
CU sysitos

Posted by Berniyh at Fri Nov 26 01:02:09 2010
I would propose these two commands:
systemctl killmain foo.service # Kill the main service (see --kill-who=main)
systemctl killcgroup foo.service # Kill the service and all of its childs. could be killcg, too.

If I understand the above text correct, what is actually killed in a "complete" kill is everything in the cgroup of the daemon. So this would actually make the command more intuitive.

Posted by Ralph Corderoy at Sun Dec 5 18:15:58 2010
--kill-who?  I think that should be --kill-whom!  Would --victim be more fun and avoid the confusion for those that do or don't speak English natively?  :-)

Posted by Bjoern Michaelsen at Mon Jan 3 00:43:58 2011
I have not looked at all at systemd yet, so I wonder if it is possible to just use it to query for PIDs?

systemctl kill -s HUP --kill-who=main crond.service

looks very un-unixy to me as it needlessly mixes multiple tasks. How about something like:

systemctl getpid --main crond.service | xargs kill -S HUP

"one tool, one job" and all that jazz ...

(I just found the "systemctl status" in the first installment of the series, but its output does not seem to be easily parse- or pipeable. It would be a shame having to revert to mayor shell or perl voodoo to do some basic scenario not covered by systemctls "convenience functions".)

Posted by ck at Mon Jan 3 08:28:45 2011
Um, nobody uses pkill?

$ pkill rsyslogd
$ pkill -u bob rsyslogd
$ pkill -1 rsyslogd
...

What does "systemd" know what "pkill" doesn't know?

Posted by Lennart at Mon Jan 3 21:02:26 2011
ck: you didn't even make it to the second paragraph of "Killing Services", have you? killall and pkill do the same thing, and that paragraph tells you why it is ugly. And it doesn't cover the CGI usecase anyway...

Posted by Lennart at Mon Jan 3 21:07:12 2011
Bjoern: there is "systemctl show" which you can use to query particular properties of a service, which is easily parsable and pipable.

Posted by ck at Tue Jan 4 00:23:00 2011
Lennart: I /did/ make it to the 2nd part, thank you very much. It went on to explain how process spawn child process and that "killall" cannot cope with that. I'm not using "killall" any more (hint: try "killall" on a Solaris box :)). But I fail to see what "systemctl" knows that "pkill" or "killall" do not see. If process xy or one of its child processes are not "registered" to systemd, "systemctl" won't see them either? How is "systemctl" superior to pkill? I don't see it. I guess what I fail to understand from your article is how the magic is done.
Thanks.

Posted by Lennart at Tue Jan 4 00:36:44 2011
ck: systemd creates a kernel cgroup for each service. processes do not have to register with systemd, they will be members of the cgroup and their children automatically too, regardless if the fork, rename themselves or try anything else to escape supervision.

Posted by Bjoern Michaelsen at Tue Jan 4 02:40:01 2011
Lennart: Thanks for the reply, sounds great!

systemctl show -p MainPID crond.service | xargs kill -S HUP

(just guessing by some man page found somewhere on the interwebz)

Posted by ck at Tue Jan 4 12:14:24 2011
cgroups, OK. This might make sense after all. Thanks for the response, Lennart.

Posted by pal at Sun Jan 9 04:11:59 2011
Bjoern Michaelsen:
your pipe example has race condition, that's why it's so unixy

Posted by spuk at Tue May 10 08:56:46 2011
I can't say I like systemd so far, but it looks like a decent new process manager..

re 'cron.service' X 'crond', systemctl could infer what you're referring to, when the name given is univocally mappable to an existing unit name... usually 'cron' would referr only to 'cron.service', not anything else.

Also.. I assume the standard GNU utils should be patched without much trouble to act on cgroups (i.e. systemd managed things)? Like making kill and killall work on cgroups the same as systemctl kill etc., no? Would DBUS be required for that?

Leave a Comment:

Your Name:


Your E-mail (optional):


Comment:


As a protection against comment spam, please type the following number into the field on the right:
Secret Number Image

Please note that this is neither a support forum nor a bug tracker! Support questions or bug reports posted here will be ignored and not responded to!


It should be obvious but in case it isn't: the opinions reflected here are my own. They are not the views of my employer, or Ronald McDonald, or anyone else.

Please note that I take the liberty to delete any comments posted here that I deem inappropriate, off-topic, or insulting. And I excercise this liberty quite agressively. So yes, if you comment here, I might censor you. If you don't want to be censored your are welcome to comment on your own blog instead.


Lennart Poettering <mzoybt (at) 0pointer (dot) net>
Syndicated on Planet GNOME, Planet Fedora, planet.freedesktop.org, Planet Debian Upstream. feed RSS 0.91, RSS 2.0
Archives: 2005, 2006, 2007, 2008, 2009, 2010, 2011

Valid XHTML 1.0 Strict!   Valid CSS!