レナート   Wunschkonzert, Ponyhof und Abenteuerspielplatz   ﻟﻴﻨﺎﺭﺕ

Thu, 03 May 2012

Boot & Base OS Miniconf at Linux Plumbers Conference 2012, San Diego

Linux Plumbers Conference Logo

We are working on putting together a miniconf on the topic of Boot & Base OS for the Linux Plumbers Conference 2012 in San Diego (Aug 29-31). And we need your submission!

Are you working on some exciting project related to Boot and Base OS and would like to present your work? Then please submit something following these guidelines, but please CC Kay Sievers and Lennart Poettering.

I hope that at this point the Linux Plumbers Conference needs little introduction, so I will spare any further prose on how great and useful and the best conference ever it is for everybody who works on the plumbing layer of Linux. However, there's one conference that will be co-located with LPC that is still little known, because it happens for the first time: The C Conference, organized by Brandon Philips and friends. It covers all things C, and they are still looking for more topics, in a reverse CFP. Please consider submitting a proposal and registering to the conference!

C
Conference Logo

posted at: 20:42 | path: /projects | permanent link to this entry | 0 comments


Tue, 01 May 2012

The Most Awesome, Least-Advertised Fedora 17 Feature

There's one feature In the upcoming Fedora 17 release that is immensly useful but very little known, since its feature page 'ckremoval' does not explicitly refer to it in its name: true automatic multi-seat support for Linux.

A multi-seat computer is a system that offers not only one local seat for a user, but multiple, at the same time. A seat refers to a combination of a screen, a set of input devices (such as mice and keyboards), and maybe an audio card or webcam, as individual local workplace for a user. A multi-seat computer can drive an entire class room of seats with only a fraction of the cost in hardware, energy, administration and space: you only have one PC, which usually has way enough CPU power to drive 10 or more workplaces. (In fact, even a Netbook has fast enough to drive a couple of seats!) Automatic multi-seat refers to an entirely automatically managed seat setup: whenever a new seat is plugged in a new login screen immediately appears -- without any manual configuration --, and when the seat is unplugged all user sessions on it are removed without delay.

In Fedora 17 we added this functionality to the low-level user and device tracking of systemd, replacing the previous ConsoleKit logic that lacked support for automatic multi-seat. With all the ground work done in systemd, udev and the other components of our plumbing layer the last remaining bits were surprisingly easy to add.

Currently, the automatic multi-seat logic works best with the USB multi-seat hardware from Plugable you can buy cheaply on Amazon (US). These devices require exactly zero configuration with the new scheme implemented in Fedora 17: just plug them in at any time, login screens pop up on them, and you have your additional seats. Alternatively you can also assemble your seat manually with a few easy loginctl attach commands, from any kind of hardware you might have lying around. To get a full seat you need multiple graphics cards, keyboards and mice: one set for each seat. (Later on we'll probably have a graphical setup utility for additional seats, but that's not a pressing issue we believe, as the plug-n-play multi-seat support with the Plugable devices is so awesomely nice.)

Plugable provided us for free with hardware for testing multi-seat. They are also involved with the upstream development of the USB DisplayLink driver for Linux. Due to their positive involvement with Linux we can only recommend to buy their hardware. They are good guys, and support Free Software the way all hardware vendors should! (And besides that, their hardware is also nicely put together. For example, in contrast to most similar vendors they actually assign proper vendor/product IDs to their USB hardware so that we can easily recognize their hardware when plugged in to set up automatic seats.)

Currently, all this magic is only implemented in the GNOME stack with the biggest component getting updated being the GNOME Display Manager. On the Plugable USB hardware you get a full GNOME Shell session with all the usual graphical gimmicks, the same way as on any other hardware. (Yes, GNOME 3 works perfectly fine on simpler graphics cards such as these USB devices!) If you are hacking on a different desktop environment, or on a different display manager, please have a look at the multi-seat documentation we put together, and particularly at our short piece about writing display managers which are multi-seat capable.

If you work on a major desktop environment or display manager and would like to implement multi-seat support for it, but lack the aforementioned Plugable hardware, we might be able to provide you with the hardware for free. Please contact us directly, and we might be able to send you a device. Note that we don't have unlimited devices available, hence we'll probably not be able to pass hardware to everybody who asks, and we will pass the hardware preferably to people who work on well-known software or otherwise have contributed good code to the community already. Anyway, if in doubt, ping us, and explain to us why you should get the hardware, and we'll consider you! (Oh, and this not only applies to display managers, if you hack on some other software where multi-seat awareness would be truly useful, then don't hesitate and ping us!)

Phoronix has this story about this new multi-seat support which is quite interesting and full of pictures. Please have a look.

Plugable started a Pledge drive to lower the price of the Plugable USB multi-seat terminals further. It's full of pictures (and a video showing all this in action!), and uses the code we now make available in Fedora 17 as base. Please consider pledging a few bucks.

Recently David Zeuthen added multi-seat support to udisks as well. With this in place, a user logged in on a specific seat can only see the USB storage plugged into his individual seat, but does not see any USB storage plugged into any other local seat. With this in place we closed the last missing bit of multi-seat support in our desktop stack.

With this code in Fedora 17 we cover the big use cases of multi-seat already: internet cafes, class rooms and similar installations can provide PC workplaces cheaply and easily without any manual configuration. Later on we want to build on this and make this useful for different uses too: for example, the ability to get a login screen as easily as plugging in a USB connector makes this not useful only for saving money in setups for many people, but also in embedded environments (consider monitoring/debugging screens made available via this hotplug logic) or servers (get trivially quick local access to your otherwise head-less server). To be truly useful in these areas we need one more thing though: the ability to run a simply getty (i.e. text login) on the seat, without necessarily involving a graphical UI.

The well-known X successor Wayland already comes out of the box with multi-seat support based on this logic.

Oh, and BTW, as Ubuntu appears to be "focussing" on "clarity" in the "cloud" now ;-), and chose Upstart instead of systemd, this feature won't be available in Ubuntu any time soon. That's (one detail of) the price Ubuntu has to pay for choosing to maintain it's own (largely legacy, such as ConsoleKit) plumbing stack.

Multi-seat has a long history on Unix. Since the earliest days Unix systems could be accessed by multiple local terminals at the same time. Since then local terminal support (and hence multi-seat) gradually moved out of view in computing. The fewest machines these days have more than one seat, the concept of terminals survived almost exclusively in the context of PTYs (i.e. fully virtualized API objects, disconnected from any real hardware seat) and VCs (i.e. a single virtualized local seat), but almost not in any other way (well, server setups still use serial terminals for emergency remote access, but they almost never have more than one serial terminal). All what we do in systemd is based on the ideas originally brought forward in Unix; with systemd we now try to bring back a number of the good ideas of Unix that since the old times were lost on the roadside. For example, in true Unix style we already started to expose the concept of a service in the file system (in /sys/fs/cgroup/systemd/system/), something where on Linux the (often misunderstood) "everything is a file" mantra previously fell short. With automatic multi-seat support we bring back support for terminals, but updated with all the features of today's desktops: plug and play, zero configuration, full graphics, and not limited to input devices and screens, but extending to all kinds of devices, such as audio, webcams or USB memory sticks.

Anyway, this is all for now; I'd like to thank everybody who was involved with making multi-seat work so nicely and natively on the Linux platform. You know who you are! Thanks a ton!

posted at: 23:07 | path: /projects | permanent link to this entry | 38 comments


Sat, 21 Apr 2012

systemd Status Update

It has been way too long since my last status update on systemd. Here's another short, incomprehensive status update on what we worked on for systemd since then.

We have been working hard to turn systemd into the most viable set of components to build operating systems, appliances and devices from, and make it the best choice for servers, for desktops and for embedded environments alike. I think we have a really convincing set of features now, but we are actively working on making it even better.

Here's a list of some more and some less interesting features, in no particular order:

  1. We added an automatic pager to systemctl (and related tools), similar to how git has it.
  2. systemctl learnt a new switch --failed, to show only failed services.
  3. You may now start services immediately, overrding all dependency logic by passing --ignore-dependencies to systemctl. This is mostly a debugging tool and nothing people should use in real life.
  4. Sending SIGKILL as final part of the implicit shutdown logic of services is now optional and may be configured with the SendSIGKILL= option individually for each service.
  5. We split off the Vala/Gtk tools into its own project systemd-ui.
  6. systemd-tmpfiles learnt file globbing and creating FIFO special files as well as character and block device nodes, and symlinks. It also is capable of relabelling certain directories at boot now (in the SELinux sense).
  7. Immediately before shuttding dow we will now invoke all binaries found in /lib/systemd/system-shutdown/, which is useful for debugging late shutdown.
  8. You may now globally control where STDOUT/STDERR of services goes (unless individual service configuration overrides it).
  9. There's a new ConditionVirtualization= option, that makes systemd skip a specific service if a certain virtualization technology is found or not found. Similar, we now have a new option to detect whether a certain security technology (such as SELinux) is available, called ConditionSecurity=. There's also ConditionCapability= to check whether a certain process capability is in the capability bounding set of the system. There's also a new ConditionFileIsExecutable=, ConditionPathIsMountPoint=, ConditionPathIsReadWrite=, ConditionPathIsSymbolicLink=.
  10. The file system condition directives now support globbing.
  11. Service conditions may now be "triggering" and "mandatory", meaning that they can be a necessary requirement to hold for a service to start, or simply one trigger among many.
  12. At boot time we now print warnings if: /usr is on a split-off partition but not already mounted by an initrd; if /etc/mtab is not a symlink to /proc/mounts; CONFIG_CGROUPS is not enabled in the kernel. We'll also expose this as tainted flag on the bus.
  13. You may now boot the same OS image on a bare metal machine and in Linux namespace containers and will get a clean boot in both cases. This is more complicated than it sounds since device management with udev or write access to /sys, /proc/sys or things like /dev/kmsg is not available in a container. This makes systemd a first-class choice for managing thin container setups. This is all tested with systemd's own systemd-nspawn tool but should work fine in LXC setups, too. Basically this means that you do not have to adjust your OS manually to make it work in a container environment, but will just work out of the box. It also makes it easier to convert real systems into containers.
  14. We now automatically spawn gettys on HVC ttys when booting in VMs.
  15. We introduced /etc/machine-id as a generalization of D-Bus machine ID logic. See this blog story for more information. On stateless/read-only systems the machine ID is initialized randomly at boot. In virtualized environments it may be passed in from the machine manager (with qemu's -uuid switch, or via the container interface).
  16. All of the systemd-specific /etc/fstab mount options are now in the x-systemd-xyz format.
  17. To make it easy to find non-converted services we will now implicitly prefix all LSB and SysV init script descriptions with the strings "LSB:" resp. "SYSV:".
  18. We introduced /run and made it a hard dependency of systemd. This directory is now widely accepted and implemented on all relevant Linux distributions.
  19. systemctl can now execute all its operations remotely too (-H switch).
  20. We now ship systemd-nspawn, a really powerful tool that can be used to start containers for debugging, building and testing, much like chroot(1). It is useful to just get a shell inside a build tree, but is good enough to boot up a full system in it, too.
  21. If we query the user for a hard disk password at boot he may hit TAB to hide the asterisks we normally show for each key that is entered, for extra paranoia.
  22. We don't enable udev-settle.service anymore, which is only required for certain legacy software that still hasn't been updated to follow devices coming and going cleanly.
  23. We now include a tool that can plot boot speed graphs, similar to bootchartd, called systemd-analyze.
  24. At boot, we now initialize the kernel's binfmt_misc logic with the data from /etc/binfmt.d.
  25. systemctl now recognizes if it is run in a chroot() environment and will work accordingly (i.e. apply changes to the tree it is run in, instead of talking to the actual PID 1 for this). It also has a new --root= switch to work on an OS tree from outside of it.
  26. There's a new unit dependency type OnFailureIsolate= that allows entering a different target whenever a certain unit fails. For example, this is interesting to enter emergency mode if file system checks of crucial file systems failed.
  27. Socket units may now listen on Netlink sockets, special files from /proc and POSIX message queues, too.
  28. There's a new IgnoreOnIsolate= flag which may be used to ensure certain units are left untouched by isolation requests. There's a new IgnoreOnSnapshot= flag which may be used to exclude certain units from snapshot units when they are created.
  29. There's now small mechanism services for changing the local hostname and other host meta data, changing the system locale and console settings and the system clock.
  30. We now limit the capability bounding set for a number of our internal services by default.
  31. Plymouth may now be disabled globally with plymouth.enable=0 on the kernel command line.
  32. We now disallocate VTs when a getty finished running (and optionally other tools run on VTs). This adds extra security since it clears up the scrollback buffer so that subsequent users cannot get access to a user's session output.
  33. In socket units there are now options to control the IP_TRANSPARENT, SO_BROADCAST, SO_PASSCRED, SO_PASSSEC socket options.
  34. The receive and send buffers of socket units may now be set larger than the default system settings if needed by using SO_{RCV,SND}BUFFORCE.
  35. We now set the hardware timezone as one of the first things in PID 1, in order to avoid time jumps during normal userspace operation, and to guarantee sensible times on all generated logs. We also no longer save the system clock to the RTC on shutdown, assuming that this is done by the clock control tool when the user modifies the time, or automatically by the kernel if NTP is enabled.
  36. The SELinux directory got moved from /selinux to /sys/fs/selinux.
  37. We added a small service systemd-logind that keeps tracks of logged in users and their sessions. It creates control groups for them, implements the XDG_RUNTIME_DIR specification for them, maintains seats and device node ACLs and implements shutdown/idle inhibiting for clients. It auto-spawns gettys on all local VTs when the user switches to them (instead of starting six of them unconditionally), thus reducing the resource foot print by default. It has a D-Bus interface as well as a simple synchronous library interface. This mechanism obsoletes ConsoleKit which is now deprecated and should no longer be used.
  38. There's now full, automatic multi-seat support, and this is enabled in GNOME 3.4. Just by pluging in new seat hardware you get a new login screen on your seat's screen.
  39. There is now an option ControlGroupModify= to allow services to change the properties of their control groups dynamically, and one to make control groups persistent in the tree (ControlGroupPersistent=) so that they can be created and maintained by external tools.
  40. We now jump back into the initrd in shutdown, so that it can detach the root file system and the storage devices backing it. This allows (for the first time!) to reliably undo complex storage setups on shutdown and leave them in a clean state.
  41. systemctl now supports presets, a way for distributions and administrators to define their own policies on whether services should be enabled or disabled by default on package installation.
  42. systemctl now has high-level verbs for masking/unmasking units. There's also a new command (systemctl list-unit-files) for determining the list of all installed unit file files and whether they are enabled or not.
  43. We now apply sysctl variables to each new network device, as it appears. This makes /etc/sysctl.d compatible with hot-plug network devices.
  44. There's limited profiling for SELinux start-up perfomance built into PID 1.
  45. There's a new switch PrivateNetwork= to turn of any network access for a specific service.
  46. Service units may now include configuration for control group parameters. A few (such as MemoryLimit=) are exposed with high-level options, and all others are available via the generic ControlGroupAttribute= setting.
  47. There's now the option to mount certain cgroup controllers jointly at boot. We do this now for cpu and cpuacct by default.
  48. We added the journal and turned it on by default.
  49. All service output is now written to the Journal by default, regardless whether it is sent via syslog or simply written to stdout/stderr. Both message streams end up in the same location and are interleaved the way they should. All log messages even from the kernel and from early boot end up in the journal. Now, no service output gets unnoticed and is saved and indexed at the same location.
  50. systemctl status will now show the last 10 log lines for each service, directly from the journal.
  51. We now show the progress of fsck at boot on the console, again. We also show the much loved colorful [ OK ] status messages at boot again, as known from most SysV implementations.
  52. We merged udev into systemd.
  53. We implemented and documented interfaces to container managers and initrds for passing execution data to systemd. We also implemented and documented an interface for storage daemons that are required to back the root file system.
  54. There are two new options in service files to propagate reload requests between several units.
  55. systemd-cgls won't show kernel threads by default anymore, or show empty control groups.
  56. We added a new tool systemd-cgtop that shows resource usage of whole services in a top(1) like fasion.
  57. systemd may now supervise services in watchdog style. If enabled for a service the daemon daemon has to ping PID 1 in regular intervals or is otherwise considered failed (which might then result in restarting it, or even rebooting the machine, as configured). Also, PID 1 is capable of pinging a hardware watchdog. Putting this together, the hardware watchdogs PID 1 and PID 1 then watchdogs specific services. This is highly useful for high-availability servers as well as embedded machines. Since watchdog hardware is noawadays built into all modern chipsets (including desktop chipsets), this should hopefully help to make this a more widely used functionality.
  58. We added support for a new kernel command line option systemd.setenv= to set an environment variable system-wide.
  59. By default services which are started by systemd will have SIGPIPE set to ignored. The Unix SIGPIPE logic is used to reliably implement shell pipelines and when left enabled in services is usually just a source of bugs and problems.
  60. You may now configure the rate limiting that is applied to restarts of specific services. Previously the rate limiting parameters were hard-coded (similar to SysV).
  61. There's now support for loading the IMA integrity policy into the kernel early in PID 1, similar to how we already did it with the SELinux policy.
  62. There's now an official API to schedule and query scheduled shutdowns.
  63. We changed the license from GPL2+ to LGPL2.1+.
  64. We made systemd-detect-virt an official tool in the tool set. Since we already had code to detect certain VM and container environments we now added an official tool for administrators to make use of in shell scripts and suchlike.
  65. We documented numerous interfaces systemd introduced.

Much of the stuff above is already available in Fedora 15 and 16, or will be made available in the upcoming Fedora 17.

And that's it for now. There's a lot of other stuff in the git commits, but most of it is smaller and I will it thus spare you.

I'd like to thank everybody who contributed to systemd over the past years.

Thanks for your interest!

posted at: 00:17 | path: /projects | permanent link to this entry | 25 comments


Tue, 10 Apr 2012

Control Groups vs. Control Groups

TL;DR: systemd does not require the performance-sensitive bits of Linux control groups enabled in the kernel. However, it does require some non-performance-sensitive bits of the control group logic.

In some areas of the community there's still some confusion about Linux control groups and their performance impact, and what precisely it is that systemd requires of them. In the hope to clear this up a bit, I'd like to point out a few things:

Control Groups are two things: (A) a way to hierarchally group and label processes, and (B) a way to then apply resource limits to these groups. systemd only requires the former (A), and not the latter (B). That means you can compile your kernel without any control group resource controllers (B) and systemd will work perfectly on it. However, if you in addition disable the grouping feature entirely (A) then systemd will loudly complain at boot and proceed only reluctantly with a big warning and in a limited functionality mode.

At compile time, the grouping/labelling feature in the kernel is enabled by CONFIG_CGROUPS=y, the individual controllers by CONFIG_CGROUP_FREEZER=y, CONFIG_CGROUP_DEVICE=y, CONFIG_CGROUP_CPUACCT=y, CONFIG_CGROUP_MEM_RES_CTLR=y, CONFIG_CGROUP_MEM_RES_CTLR_SWAP=y, CONFIG_CGROUP_MEM_RES_CTLR_KMEM=y, CONFIG_CGROUP_PERF=y, CONFIG_CGROUP_SCHED=y, CONFIG_BLK_CGROUP=y, CONFIG_NET_CLS_CGROUP=y, CONFIG_NETPRIO_CGROUP=y. And since (as mentioned) we only need the former (A), not the latter (B) you may disable all of the latter options while enabling CONFIG_CGROUPS=y, if you want to run systemd on your system.

What about the performance impact of these options? Well, every bit of code comes at some price, so none of these options come entirely for free. However, the grouping feature (A) alters the general logic very little, it just sticks hierarchial labels on processes, and its impact is minimal since that is usually not in any hot path of the OS. This is different for the various controllers (B) which have a much bigger impact since they influence the resource management of the OS and are full of hot paths. This means that the kernel feature that systemd mandatorily requires (A) has a minimal effect on system performance, but the actually performance-sensitive features of control groups (B) are entirely optional.

On boot, systemd will mount all controller hierarchies it finds enabled in the kernel to individual directories below /sys/fs/cgroup/. This is the official place where kernel controllers are mounted to these days. The /sys/fs/cgroup/ mount point in the kernel was created precisely for this purpose. Since the control group controllers are a shared facility that might be used by a number of different subsystems a few projects have agreed on a set of rules in order to avoid that the various bits of code step on each other's toes when using these directories.

systemd will also maintain its own, private, controller-less, named control group hierarchy which is mounted to /sys/fs/cgroup/systemd/. This hierarchy is private property of systemd, and other software should not try to interfere with it. This hierarchy is how systemd makes use of the naming and grouping feature of control groups (A) without actually requiring any kernel controller enabled for that.

Now, you might notice that by default systemd does create per-service cgroups in the "cpu" controller if it finds it enabled in the kernel. This is entirely optional, however. We chose to make use of it by default to even out CPU usage between system services. Example: On a traditional web server machine Apache might end up having 100 CGI worker processes around, while MySQL only has 5 processes running. Without the use of the "cpu" controller this means that Apache all together ends up having 20x more CPU available than MySQL since the kernel tries to provide every process with the same amount of CPU time. On the other hand, if we add these two services to the "cpu" controller in individual groups by default, Apache and MySQL get the same amount of CPU, which we think is a good default.

Note that if the CPU controller is not enabled in the kernel systemd will not attempt to make use of the "cpu" hierarchy as described above. Also, even if it is enabled in the kernel it is trivial to tell systemd not to make use of it: Simply edit /etc/systemd/system.conf and set DefaultControllers= to the empty string.

Let's discuss a few frequently heard complaints regarding systemd's use of control groups:

  • systemd mounts all controllers to /sys/fs/cgroup/ even though my software requires it at /dev/cgroup/ (or some other place)! The standardization of /sys/fs/cgroup/ as mount point of the hierarchies is a relatively recent change in the kernel. Some software has not been updated yet for it. If you cannot change the software in question you are welcome to unmount the hierarchies from /sys/fs/cgroup/ and mount them wherever you need them instead. However, make sure to leave /sys/fs/cgroup/systemd/ untouched.
  • systemd makes use of the "cpu" hierarchy, but it should leave its dirty fingers from it! As mentioned above, just set the DefaultControllers= option of systemd to the empty string.
  • I need my two controllers "foo" and "bar" mounted into one hierarchy, but systemd mounts them in two! Use the JoinControllers= setting in /etc/systemd/system.conf to mount several controllers into a single hierarchy.
  • Control groups are evil and they make everything slower! Well, please read the text above and understand the difference between "control-groups-as-in-naming-and-grouping" (A) and "cgroups-as-in-controllers" (B). Then, please turn off all controllers in you kernel build (B) but leave CONFIG_CGROUPS=y (A) enabled.
  • I have heard some kernel developers really hate control groups and think systemd is evil because it requires them! Well, there are a couple of things behind the dislike of control groups by some folks. Primarily, this is probably caused because the hackers in question do not distuingish the naming-and-grouping bits of the control group logic (A) and the controllers that are based on it (B). Mainly, their beef is with the latter (which systemd does not require, which is the key point I am trying to make in the text above), but there are other issues as well: for example, the code of the grouping logic is not the most beautiful bit of code ever written by man (which is thankfully likely to get better now, since the control groups subsystem now has an active maintainer again). And then for some developers it is important that they can compare the runtime behaviour of many historic kernel versions in order to find bugs (git bisect). Since systemd requires kernels with basic control group support enabled, and this is a relatively recent feature addition to the kernel, this makes it difficult for them to use a newer distribution with all these old kernels that predate cgroups. Anyway, the summary is probably that what matters to developers is different from what matters to users and administrators.

I hope this explanation was useful for a reader or two! Thank you for your time!

posted at: 19:09 | path: /projects | permanent link to this entry | 2 comments


GUADEC 2012 CFP Ending Soon!

In case you haven't submitted your talk proposal for GUADEC 2012 in A Coruña, Spain yet, hurry: the deadline is on April 14th, i.e. this saturday! Read der Call for Participation! Submit a proposal!

posted at: 17:40 | path: /projects | permanent link to this entry | 0 comments


Wed, 28 Mar 2012

/tmp or not /tmp?

A number of Linux distributions have recently switched (or started switching) to /tmp on tmpfs by default (ArchLinux, Debian among others). Other distributions have plans/are discussing doing the same (Ubuntu, OpenSUSE). Since we believe this is a good idea and it's good to keep the delta between the distributions minimal we are proposing the same for Fedora 18, too. On Solaris a similar change has already been implemented in 1994 (and other Unixes have made a similar change long ago, too). Yet, not all of our software is written in a way that it works nicely together with /tmp on tmpfs.

Another Fedora feature (for Fedora 17) changed the semantics of /tmp for many system services to make them more secure, by isolating the /tmp namespaces of the various services. Handling of temporary files in /tmp has been security sensitive since it has been introduced since it traditionally has been a world writable, shared namespace and unless all user code safely uses randomized file names it is vulnerable to DoS attacks and worse.

In this blog story I'd like to shed some light on proper usage of /tmp and what your Linux application should use for what purpose. We'll not discuss why /tmp on tmpfs is a good idea, for that refer to the Fedora feature page. Here we'll just discuss what /tmp should be used for and for what it shouldn't be, as well as what should be used instead. All that in order to make sure your application remains compatible with these new features introduced to many newer Linux distributions.

/tmp is (as the name suggests) an area where temporary files applications require during operation may be placed. Of course, temporary files differ very much in their properties:

  • They can be large, or very small
  • They might be used for sharing between users, or be private to users
  • They might need to be persistent across boots, or very volatile
  • They might need to be machine-local or shared on the network

Traditionally, /tmp has not only been the place where actual temporary files are stored, but some software used to place (and often still continues to place) communication primitives such as sockets, FIFOs, shared memory there as well. Notably X11, but many others too. Usage of world-writable shared namespaces for communication purposes has always been problematic, since to establish communication you need stable names, but stable names open the doors for DoS attacks. This can be corrected partially, by establishing protected per-app directories for certain services during early boot (like we do for X11), but this only fixes the problem partially, since this only works correctly if every package installation is followed by a reboot.

Besides /tmp there are various other places where temporary files (or other files that traditionally have been stored in /tmp) can be stored. Here's a quick overview of the candidates:

  • /tmp, POSIX suggests this is flushed as boot, FHS says that files do not need to be persistent between two runs of the application. Old files are often cleaned up automatically after a time ("aging"). Usually it is recommended to use $TMPDIR if it is set before falling back to /tmp directly. As mentioned, this is a tmpfs on many Linuxes/Unixes (and most likely will be for most soon), and hence should be used only for small files. It's generally a shared namespace, hence the only APIs for using it should be mkstemp(), mkdtemp() (and friends) to be entirely safe.[1] Recently, improvements have been made to turn this shared namespace into a private namespace (see above), but that doesn't relieve developers from writing secure code that is also safe if /tmp is a shared namespace. Because /tmp is no longer necessarily a shared namespace it is generally unsuitable as a location for communication primitives. It is machine-private and local. It's usually fully featured (locking, ...). This directory is world writable and thus available for both privileged and unprivileged code.
  • /var/tmp, according to FHS "more persistent" than /tmp, and is less often cleaned up (it's persistent across reboots, for example). It's not on a tmpfs, but on a real disk, and hence can be used to store much larger files. The same namespace problems apply as with /tmp, hence also exclusively use mkstemp()/mkdtemp() for this directory. It is also automatically cleaned up by time. It is machine-private. It's not necessarily fully featured (no locking, ...). This directory is world writable and thus available for both privileged and unprivileged code. We suggest to also check $TMPDIR before falling back to /var/tmp. That way if $TMPDIR is set this overrides usage of both /tmp and /var/tmp.
  • /run (traditionally /var/run) where privileged daemons can store runtime data, such as communication primitives. This is where your daemon should place its sockets. It's guaranteed to be a shared namespace, but is only writable by privileged code and hence very safe to use. This file system is guaranteed to be a tmpfs and is hence automatically flushed at boots. No automatic clean-up is done beyond that. It is machine-private and local. It is fully-featured, and provides all functionality the local OS can provide (locking, sockets, ...).
  • $XDG_RUNTIME_DIR where unprivileged user software can store runtime data, such as communication primitives. This is similar to /run but for user applications. It's a user private namespace, and hence very safe to use. It's cleaned up automatically at logout and also is cleaned up by time via "aging". It is machine-private and fully featured. In GLib applications use g_get_user_runtime_dir() to query the path of this directory.
  • $XDG_CACHE_HOME where unprivileged user software can store non-essential data. It's a private namespace of the user. It might be shared between machines. It is not automatically cleaned up, and not fully featured (no locking, and so on, due to NFS). In GLib applications use g_get_user_cache_dir() to query this directory.
  • $XDG_DOWNLOAD_DIR where unprivileged user software can store downloads and downloads in progress. It should only be used for downloads, and is a private namespace fo the user, but might be shared between machines. It is not automatically cleaned up and not fully featured. In GLib applications use g_get_user_special_dir() to query the path of this directory.

Now that we have introduced the contestants, here's a rough guide how we suggest you (a Linux application developer) pick the right directory to use:

  1. You need a place to put your socket (or other communication primitive) and your code runs privileged: use a subdirectory beneath /run. (Or beneath /var/run for extra compatibility.)
  2. You need a place to put your socket (or other communication primitive) and your code runs unprivileged: use a subdirectory beneath $XDG_RUNTIME_DIR.
  3. You need a place to put your larger downloads and downloads in progress and run unprivileged: use $XDG_DOWNLOAD_DIR.
  4. You need a place to put cache files which should be persistent and run unprivileged: use $XDG_CACHE_HOME.
  5. Nothing of the above applies and you need to place a small file that needs no persistency: use $TMPDIR with a fallback on /tmp. And use mkstemp(), and mkdtemp() and nothing homegrown.
  6. Otherwise use $TMPDIR with a fallback on /var/tmp. Also use mkstemp()/mkdtemp().

Note that these rules above are only suggested by us. These rules take into account everything we know about this topic and avoid problems with current and future distributions, as far as we can see them. Please consider updating your projects to follow these rules, and keep them in mind if you write new code.

One thing we'd like to stress is that /tmp and /var/tmp more often than not are actually not the right choice for your usecase. There are valid uses of these directories, but quite often another directory might actually be the better place. So, be careful, consider the other options, but if you do go for /tmp or /var/tmp then at least make sure to use mkstemp()/mkdtemp().

Thank you for your interest!

Oh, and if you now complain that we don't understand Unix, and that we are morons and worse, then please read this again, and you might notice that this is just a best practice guide, not a specification we have written. Nothing that introduces anything new, just something that explains how things are.

If you want to complain about the tmp-on-tmpfs or ServicesPrivateTmp feature, then this is not the right place either, because this blog post is not really about that. Please direct this to fedora-devel instead. Thank you very much.

Footnotes

[1] Well, or to turn this around: unless you have a PhD in advanced Unixology and are not using mkstemp()/mkdtemp() but use /tmp nonetheless it's very likely you are writing vulnerable code.

posted at: 14:04 | path: /projects | permanent link to this entry | 11 comments


Mon, 13 Feb 2012

/etc/os-release

One of the new configuration files systemd introduced is /etc/os-release. It replaces the multitude of per-distribution release files[1] with a single one. Yesterday we decided to drop support for systems lacking /etc/os-release in systemd since recently the majority of the big distributions adopted /etc/os-release and many small ones did, too[2]. It's our hope that by dropping support for non-compliant distributions we gently put some pressure on the remaining hold-outs to adopt this scheme as well.

I'd like to take the opportunity to explain a bit what the new file offers, why application developers should care, and why the distributions should adopt it. Of course, this file is pretty much a triviality in many ways, but I guess it's still one that deserves explanation.

So, you ask why this all?

  • It relieves application developers who just want to know the distribution they are running on to check for a multitude of individual release files.
  • It provides both a "pretty" name (i.e. one to show to the user), and machine parsable version/OS identifiers (i.e. for use in build systems).
  • It is extensible, can easily learn new fields if needed. For example, since we want to print a welcome message in the color of your distribution at boot we make it possible to configure the ANSI color for that in the file.

FAQs

There's already the lsb_release tool for this, why don't you just use that? Well, it's a very strange interface: a shell script you have to invoke (and hence spawn asynchronously from your C code), and it's not written to be extensible. It's an optional package in many distributions, and nothing we'd be happy to invoke as part of early boot in order to show a welcome message. (In times with sub-second userspace boot times we really don't want to invoke a huge shell script for a triviality like showing the welcome message). The lsb_release tool to us appears to be an attempt of abstracting distribution checks, where standardization of distribution checks is needed. It's simply a badly designed interface. In our opinion, it has its use as an interface to determine the LSB version itself, but not for checking the distribution or version.

Why haven't you adopted one of the generic release files, such as Fedora's /etc/system-release? Well, they are much nicer than lsb_release, so much is true. However, they are not extensible and are not really parsable, if the distribution needs to be identified programmatically or a specific version needs to be verified.

Why didn't you call this file /etc/bikeshed instead? The name /etc/os-release sucks! In a way, I think you kind of answered your own question there already.

Does this mean my distribution can now drop our equivalent of /etc/fedora-release? Unlikely, too much code exists that still checks for the individual release files, and you probably shouldn't break that. This new file makes things easy for applications, not for distributions: applications can now rely on a single file only, and use it in a nice way. Distributions will have to continue to ship the old files unless they are willing to break compatibility here.

This is so useless! My application needs to be compatible with distros from 1998, so how could I ever make use of the new file? I will have to continue using the old ones! True, if you need compatibility with really old distributions you do. But for new code this might not be an issue, and in general new APIs are new APIs. So if you decide to depend on it, you add a dependency on it. However, even if you need to stay compatible it might make sense to check /etc/os-release first and just fall back to the old files if it doesn't exist. The least it does for you is that you don't need 25+ open() attempts on modern distributions, but just one.

You evil people are forcing my beloved distro $XYZ to adopt your awful systemd schemes. I hate you! You hate too much, my friend. Also, I am pretty sure it's not difficult to see the benefit of this new file independently of systemd, and it's truly useful on systems without systemd, too.

I hate what you people do, can I just ignore this? Well, you really need to work on your constant feelings of hate, my friend. But, to a certain degree yes, you can ignore this for a while longer. But already, there are a number of applications making use of this file. You lose compatibility with those. Also, you are kinda working towards the further balkanization of the Linux landscape, but maybe that's your intention?

You guys add a new file because you think there are already too many? You guys are so confused! None of the existing files is generic and extensible enough to do what we want it to do. Hence we had to introduce a new one. We acknowledge the irony, however.

The file is extensible? Awesome! I want a new field XYZ= in it! Sure, it's extensible, and we are happy if distributions extend it. Please prefix your keys with your distribution's name however. Or even better: talk to us and we might be able update the documentation and make your field standard, if you convince us that it makes sense.

Anyway, to summarize all this: if you work on an application that needs to identify the OS it is being built on or is being run on, please consider making use of this new file, we created it for you. If you work on a distribution, and your distribution doesn't support this file yet, please consider adopting this file, too.

If you are working on a small/embedded distribution, or a legacy-free distribution we encourage you to adopt only this file and not establish any other per-distro release file.

Read the documentation for /etc/os-release.

Footnotes

[1] Yes, multitude, there's at least: /etc/redhat-release, /etc/SuSE-release, /etc/debian_version, /etc/arch-release, /etc/gentoo-release, /etc/slackware-version, /etc/frugalware-release, /etc/altlinux-release, /etc/mandriva-release, /etc/meego-release, /etc/angstrom-version, /etc/mageia-release. And some distributions even have multiple, for example Fedora has already four different files.

[2] To our knowledge at least OpenSUSE, Fedora, ArchLinux, Angstrom, Frugalware have adopted this. (This list is not comprehensive, there are probably more.)

posted at: 19:46 | path: /projects | permanent link to this entry | 24 comments


Thu, 26 Jan 2012

The Case for the /usr Merge

One of the features of Fedora 17 is the /usr merge, put forward by Harald Hoyer and Kay Sievers[1]. In the time since this feature has been proposed repetitive discussions took place all over the various Free Software communities, and usually the same questions were asked: what the reasons behind this feature were, and whether it makes sense to adopt the same scheme for distribution XYZ, too.

Especially in the Non-Fedora world it appears to be socially unacceptable to actually have a look at the Fedora feature page (where many of the questions are already brought up and answered) which is very unfortunate. To improve the situation I spent some time today to summarize the reasons for the /usr merge independently. I'd hence like to direct you to this new page I put up which tries to summarize the reasons for this, with an emphasis on the compatibility point of view:

The Case for the /usr Merge

Note that even though this page is in the systemd wiki, what it covers is mostly orthogonal to systemd. systemd supports both systems with a merged /usr and with a split /usr, and the /usr merge should be interesting for non-systemd distributions as well.

Primarily I put this together to have a nice place to point all those folks who continue to write me annoyed emails, even though I am actually not even working on all of this...

Enjoy the read!

Footnotes:

[1] And not actually by me, I am just a supportive spectator and am not doing any work on it. Unfortunately some tech press folks created the false impression I was behind this. But credit where credit is due, this is all Harald's and Kay's work.

posted at: 22:29 | path: /projects | permanent link to this entry | 16 comments


Fri, 20 Jan 2012

Plumbers Wishlist, The Third Edition, a.k.a. "The Thank You Edition"

Last October we published a wishlist for plumbing related features we'd like to see added to the Linux kernel. Three months later it's time to publish a short update, and explain what has been implemented in the kernel, what people have started working on, and what's still missing.

The full, updated list is available on Google Docs.

In general, I must say that the list turned out to be a great success. It shows how awesome the Open Source community is: Just ask nicely and there's a good chance they'll fulfill your wishes! Thank you very much, Linux community!

We'd like to thank everybody who worked on any of the features on that list: Lucas De Marchi, Andi Kleen, Dan Ballard, Li Zefan, Kirill A. Shutemov, Davidlohr Bueso, Cong Wang, Lennart Poettering, Kay Sievers.

Of the items on the list 5 have been fully implemented and are already part of a released kernel, or already merged for inclusion for the next kernels being released.

For 4 further items patches have been posted, and I am hoping they'll get merged eventually. Davidlohr, Wang, Zefan, Kirill, it would be great if you'd continue working on your patches, as we think they are following the right approach[1] even if there was some opposition to them on LKML. So, please keep pushing to solve the outstanding issues and thanks for your work so far!

Footnotes

[1] Yes, I still believe that tmpfs quota should be implemented via resource limits, as everything else wouldn't work, as we don't want to implement complex and fragile userspace infrastructure to racily upload complex quota data for all current and future UIDs ever used on the system into each tmpfs mount point at mount time.

posted at: 21:26 | path: /projects | permanent link to this entry | 5 comments


systemd for Administrators, Part XII

Here's the twelfth installment of my ongoing series on systemd for Administrators:

Securing Your Services

One of the core features of Unix systems is the idea of privilege separation between the different components of the OS. Many system services run under their own user IDs thus limiting what they can do, and hence the impact they may have on the OS in case they get exploited.

This kind of privilege separation only provides very basic protection however, since in general system services run this way can still do at least as much as a normal local users, though not as much as root. For security purposes it is however very interesting to limit even further what services can do, and shut them off a couple of things that normal users are allowed to do.

A great way to limit the impact of services is by employing MAC technologies such as SELinux. If you are interested to secure down your server, running SELinux is a very good idea. systemd enables developers and administrators to apply additional restrictions to local services independently of a MAC. Thus, regardless whether you are able to make use of SELinux you may still enforce certain security limits on your services.

In this iteration of the series we want to focus on a couple of these security features of systemd and how to make use of them in your services. These features take advantage of a couple of Linux-specific technologies that have been available in the kernel for a long time, but never have been exposed in a widely usable fashion. These systemd features have been designed to be as easy to use as possible, in order to make them attractive to administrators and upstream developers:

  • Isolating services from the network
  • Service-private /tmp
  • Making directories appear read-only or inaccessible to services
  • Taking away capabilities from services
  • Disallowing forking, limiting file creation for services
  • Controlling device node access of services

All options described here are documented in systemd's man pages, notably systemd.exec(5). Please consult these man pages for further details.

All these options are available on all systemd systems, regardless if SELinux or any other MAC is enabled, or not.

All these options are relatively cheap, so if in doubt use them. Even if you might think that your service doesn't write to /tmp and hence enabling PrivateTmp=yes (as described below) might not be necessary, due to today's complex software it's still beneficial to enable this feature, simply because libraries you link to (and plug-ins to those libraries) which you do not control might need temporary files after all. Example: you never know what kind of NSS module your local installation has enabled, and what that NSS module does with /tmp.

These options are hopefully interesting both for administrators to secure their local systems, and for upstream developers to ship their services secure by default. We strongly encourage upstream developers to consider using these options by default in their upstream service units. They are very easy to make use of and have major benefits for security.

Isolating Services from the Network

A very simple but powerful configuration option you may use in systemd service definitions is PrivateNetwork=:

...
[Service]
ExecStart=...
PrivateNetwork=yes
...

With this simple switch a service and all the processes it consists of are entirely disconnected from any kind of networking. Network interfaces became unavailable to the processes, the only one they'll see is the loopback device "lo", but it is isolated from the real host loopback. This is a very powerful protection from network attacks.

Caveat: Some services require the network to be operational. Of course, nobody would consider using PrivateNetwork=yes on a network-facing service such as Apache. However even for non-network-facing services network support might be necessary and not always obvious. Example: if the local system is configured for an LDAP-based user database doing glibc name lookups with calls such as getpwnam() might end up resulting in network access. That said, even in those cases it is more often than not OK to use PrivateNetwork=yes since user IDs of system service users are required to be resolvable even without any network around. That means as long as the only user IDs your service needs to resolve are below the magic 1000 boundary using PrivateNetwork=yes should be OK.

Internally, this feature makes use of network namespaces of the kernel. If enabled a new network namespace is opened and only the loopback device configured in it.

Service-Private /tmp

Another very simple but powerful configuration switch is PrivateTmp=:

...
[Service]
ExecStart=...
PrivateTmp=yes
...

If enabled this option will ensure that the /tmp directory the service will see is private and isolated from the host system's /tmp. /tmp traditionally has been a shared space for all local services and users. Over the years it has been a major source of security problems for a multitude of services. Symlink attacks and DoS vulnerabilities due to guessable /tmp temporary files are common. By isolating the service's /tmp from the rest of the host, such vulnerabilities become moot.

For Fedora 17 a feature has been accepted in order to enable this option across a large number of services.

Caveat: Some services actually misuse /tmp as a location for IPC sockets and other communication primitives, even though this is almost always a vulnerability (simply because if you use it for communication you need guessable names, and guessable names make your code vulnerable to DoS and symlink attacks) and /run is the much safer replacement for this, simply because it is not a location writable to unprivileged processes. For example, X11 places it's communication sockets below /tmp (which is actually secure -- though still not ideal -- in this exception since it does so in a safe subdirectory which is created at early boot.) Services which need to communicate via such communication primitives in /tmp are no candidates for PrivateTmp=. Thankfully these days only very few services misusing /tmp like this remain.

Internally, this feature makes use of file system namespaces of the kernel. If enabled a new file system namespace is opened inheritng most of the host hierarchy with the exception of /tmp.

Making Directories Appear Read-Only or Inaccessible to Services

With the ReadOnlyDirectories= and InaccessibleDirectories= options it is possible to make the specified directories inaccessible for writing resp. both reading and writing to the service:

...
[Service]
ExecStart=...
InaccessibleDirectories=/home
ReadOnlyDirectories=/var
...

With these two configuration lines the whole tree below /home becomes inaccessible to the service (i.e. the directory will appear empty and with 000 access mode), and the tree below /var becomes read-only.

Caveat: Note that ReadOnlyDirectories= currently is not recursively applied to submounts of the specified directories (i.e. mounts below /var in the example above stay writable). This is likely to get fixed soon.

Internally, this is also implemented based on file system namspaces.

Taking Away Capabilities From Services

Another very powerful security option in systemd is CapabilityBoundingSet= which allows to limit in a relatively fine grained fashion which kernel capabilities a service started retains:

...
[Service]
ExecStart=...
CapabilityBoundingSet=CAP_CHOWN CAP_KILL
...

In the example above only the CAP_CHOWN and CAP_KILL capabilities are retained by the service, and the service and any processes it might create have no chance to ever acquire any other capabilities again, not even via setuid binaries. The list of currently defined capabilities is available in capabilities(7). Unfortunately some of the defined capabilities are overly generic (such as CAP_SYS_ADMIN), however they are still a very useful tool, in particular for services that otherwise run with full root privileges.

To identify precisely which capabilities are necessary for a service to run cleanly is not always easy and requires a bit of testing. To simplify this process a bit, it is possible to blacklist certain capabilities that are definitely not needed instead of whitelisting all that might be needed. Example: the CAP_SYS_PTRACE is a particularly powerful and security relevant capability needed for the implementation of debuggers, since it allows introspecting and manipulating any local process on the system. A service like Apache obviously has no business in being a debugger for other processes, hence it is safe to remove the capability from it:

...
[Service]
ExecStart=...
CapabilityBoundingSet=~CAP_SYS_PTRACE
...

The ~ character the value assignment here is prefixed with inverts the meaning of the option: instead of listing all capabalities the service will retain you may list the ones it will not retain.

Caveat: Some services might react confused if certain capabilities are made unavailable to them. Thus when determining the right set of capabilities to keep around you need to do this carefully, and it might be a good idea to talk to the upstream maintainers since they should know best which operations a service might need to run successfully.

Caveat 2: Capabilities are not a magic wand. You probably want to combine them and use them in conjunction with other security options in order to make them truly useful.

To easily check which processes on your system retain which capabilities use the pscap tool from the libcap-ng-utils package.

Making use of systemd's CapabilityBoundingSet= option is often a simple, discoverable and cheap replacement for patching all system daemons individually to control the capability bounding set on their own.

Disallowing Forking, Limiting File Creation for Services

Resource Limits may be used to apply certain security limits on services being run. Primarily, resource limits are useful for resource control (as the name suggests...) not so much access control. However, two of them can be useful to disable certain OS features: RLIMIT_NPROC and RLIMIT_FSIZE may be used to disable forking and disable writing of any files with a size > 0:

...
[Service]
ExecStart=...
LimitNPROC=1
LimitFSIZE=0
...

Note that this will work only if the service in question drops privileges and runs under a (non-root) user ID of its own or drops the CAP_SYS_RESOURCE capability, for example via CapabilityBoundingSet= as discussed above. Without that a process could simply increase the resource limit again thus voiding any effect.

Caveat: LimitFSIZE= is pretty brutal. If the service attempts to write a file with a size > 0, it will immeidately be killed with the SIGXFSZ which unless caught terminates the process. Also, creating files with size 0 is still allowed, even if this option is used.

For more information on these and other resource limits, see setrlimit(2).

Controlling Device Node Access of Services

Devices nodes are an important interface to the kernel and its drivers. Since drivers tend to get much less testing and security checking than the core kernel they often are a major entry point for security hacks. systemd allows you to control access to devices individually for each service:

...
[Service]
ExecStart=...
DeviceAllow=/dev/null rw
...

This will limit access to /dev/null and only this device node, disallowing access to any other device nodes.

The feature is implemented on top of the devices cgroup controller.

Other Options

Besides the easy to use options above there are a number of other security relevant options available. However they usually require a bit of preparation in the service itself and hence are probably primarily useful for upstream developers. These options are RootDirectory= (to set up chroot() environments for a service) as well as User= and Group= to drop privileges to the specified user and group. These options are particularly useful to greatly simplify writing daemons, where all the complexities of securely dropping privileges can be left to systemd, and kept out of the daemons themselves.

If you are wondering why these options are not enabled by default: some of them simply break seamntics of traditional Unix, and to maintain compatibility we cannot enable them by default. e.g. since traditional Unix enforced that /tmp was a shared namespace, and processes could use it for IPC we cannot just go and turn that off globally, just because /tmp's role in IPC is now replaced by /run.

And that's it for now. If you are working on unit files for upstream or in your distribution, please consider using one or more of the options listed above. If you service is secure by default by taking advantage of these options this will help not only your users but also make the Internet a safer place.

posted at: 02:26 | path: /projects | permanent link to this entry | 8 comments


Mon, 16 Jan 2012

PulseAudio vs. AudioFlinger

Arun put an awesome article up, detailing how PulseAudio compares to Android's AudioFlinger in terms of power consumption and suchlike. Suffice to say, PulseAudio rocks, but go and read the whole thing, it's worth it.

Apparently, AudioFlinger is a great choice if you want to shorten your battery life.

posted at: 16:31 | path: /projects | permanent link to this entry | 2 comments


Fri, 18 Nov 2011

Introducing the Journal

In the past weeks we have been working on a major new addition to systemd that will hopefully positively change the Linux ecosystem in a number of ways. But see for yourself, check out the full explanation on what we have implemented on the design document we put up on Google Docs.

posted at: 16:28 | path: /projects | permanent link to this entry | 25 comments


Mon, 07 Nov 2011

Kernel Hackers Panel

At LinuxCon Europe/ELCE I had the chance to moderate the kernel hackers panel with Linus Torvalds, Alan Cox, Paul McKenney and Thomas Gleixner on stage. I like to believe it went quite well, but check it out for yourself, as a video recording is now available online:

For me personally I think the most notable topic covered was Control Groups, and the clarification that they are something that is needed even though their implementation right now is in many ways less than perfect. But in the end there is no reasonable way around it, and much like SMP, technology that complicates things substantially but is ultimately unavoidable.

Other videos from ELCE are online now, too.

posted at: 16:53 | path: /projects | permanent link to this entry | 3 comments


Tue, 01 Nov 2011

libabc

At the Kernel Summit in Prague last week Kay Sievers and I lead a session on developing shared userspace libraries, for kernel hackers. More and more userspace interfaces of the kernel (for example many which deal with storage, audio, resource management, security, file systems or a number of other subsystems) nowadays rely on a dedicated userspace component. As people who work primarily in the plumbing layer of the Linux OS we noticed over and over again that these libraries written by people who usually are at home on the kernel side of things make the same mistakes repeatedly, thus making life for the users of the libraries unnecessarily difficult. In our session we tried to point out a number of these things, and in particular places where the usual kernel hacking style translates badly into userspace shared library hacking. Our hope is that maybe a few kernel developers have a look at our list of recommendations and consider the points we are raising.

To make things easy we have put together an example skeleton library we dubbed libabc, whose README file includes all our points in terse form. It's available on kernel.org:

The git repository and the README.

This list of recommendations draws inspiration from David Zeuthen's and Ulrich Drepper's well known papers on the topic of writing shared libraries. In the README linked above we try to distill this wealth of information into a terse list of recommendations, with a couple of additions and with a strict focus on a kernel hacker background.

Please have a look, and even if you are not a kernel hacker there might be something useful to know in it, especially if you work on the lower layers of our stack.

If you have any questions or additions, just ping us, or comment below!

posted at: 01:46 | path: /projects | permanent link to this entry | 45 comments


Sun, 23 Oct 2011

Prague

If you make it to Prague the coming week for the LinuxCon/ELCE/GStreamer/Kernel Summit/... superconference, make sure not to miss:

All of that at the Clarion Hotel. See you in Prague!

posted at: 01:31 | path: /projects | permanent link to this entry | 0 comments


Thu, 20 Oct 2011

Plumbers Wishlist, The Second Edition

Two weeks ago we published a Plumber's Wishlist for Linux. So far, this has already created lively discussions in the community (as reported on LWN among others), and patches for a few of the items listed have already been posted (thanks a lot to those who worked on this, your contributions are much appreciated!).

We have now prepared a second version of the wish list. It includes a number of additions (tmpfs quota! hostname change notifications! and more!) and updates to the previous items, including links to patches, and references to other interesting material.

We hope to update this wishlist from time, so stay tuned!

And now, go and read the new wishlist!

posted at: 20:41 | path: /projects | permanent link to this entry | 3 comments


Mon, 17 Oct 2011

Google doesn't like my name

Nice one, Google suspended my Google+ account because I created it under, well, my name, which is "Lennart Poettering", and Google+ thinks that wasn't my name, even though it says so in my passport, and almost every document I own and I was never aware I had any other name. This is ricidulous. Google, give me my name back! This is a really uncool move.

posted at: 18:50 | path: /projects | permanent link to this entry | 18 comments


Your Questions for the Kernel Developer Panel at LinuxCon in Prague

I am currently collecting questions for the kernel developer panel at LinuxCon in Prague. If there's something you'd like the panelists to respond to, please post it on the thread, and I'll see what I can do. Thank you!

posted at: 15:38 | path: /projects | permanent link to this entry | comments

Fri, 14 Oct 2011

A Big Loss

Google announced today that they'll be shutting down Google Code Search in January. I am quite sure that this would be a massive loss for the Free Software community. The ability to learn from other people's code is a key idea of Free Software. There's simply no better way to do that than with a source code search engine. The day Google Code Search will be shut down will be a sad day for the Free Software community.

Of course, there are a couple of alternatives around, but they all have one thing in common: they, uh, don't even remotely compare to the completeness, performance and simplicity of the Google Code Search interface, and have serious usability issues. (For example: koders.com is really really slow, and splits up identifiers you search for at underscores, which kinda makes it useless for looking for almost any kind of code.)

I think it must be of genuine interest to the Free Software community to have a capable replacement for Google Code Search, for the day it is turned off. In fact, it probably should be something the various foundations which promote Free Software should be looking into, like the FSF or the Linux Foundation. There are very few better ways to get Free Software into the heads and minds of engineers than by examples -- examples consisting of real life code they can find with a source code search engine. I believe a source code search engine is probably among the best vehicles to promote Free Software towards engineers. In particular if it itself was Free Software (in contrast to Google Code Search).

Ideally, all software available on web sites like SourceForge, Freshmeat, or github should be indexed. But there's also a chance for distributions here: indexing the sources of all packages a distribution like Debian or Fedora include would be a great tool for developers. In fact, a distribution offering this functionality might benefit from such functionality, as it attracts developer interest in the distribution.

It's sad that Google Code Search will be gone soon. But maybe there's something positive in the bad news here, and a chance to create something better, more comprehensive, that is free, and promotes our ideals better than Google ever could. Maybe there's a chance here for the Open Source foundations, for the distributions and for the communities to create a better replacement!

posted at: 23:05 | path: /projects | permanent link to this entry | 16 comments


Fri, 07 Oct 2011

A Plumber's Wish List for Linux

Here's a mail we just sent to LKML, for your consideration. Enjoy:

Subject: A Plumber’s Wish List for Linux

We’d like to share our current wish list of plumbing layer features we
are hoping to see implemented in the near future in the Linux kernel and
associated tools. Some items we can implement on our own, others are not
our area of expertise, and we will need help getting them implemented.

Acknowledging that this wish list of ours only gets longer and not
shorter, even though we have implemented a number of other features on
our own in the previous years, we are posting this list here, in the
hope to find some help.

If you happen to be interested in working on something from this list or
able to help out, we’d be delighted. Please ping us in case you need
clarifications or more information on specific items.

Thanks,
Kay, Lennart, Harald, in the name of all the other plumbers


An here’s the wish list, in no particular order:

* (ioctl based?) interface to query and modify the label of a mounted
FAT volume:
A FAT labels is implemented as a hidden directory entry in the file
system which need to be renamed when changing the file system label,
this is impossible to do from userspace without unmounting. Hence we’d
like to see a kernel interface that is available on the mounted file
system mount point itself. Of course, bonus points if this new interface
can be implemented for other file systems as well, and also covers fs
UUIDs in addition to labels.

* CPU modaliases in /sys/devices/system/cpu/cpuX/modalias:
useful to allow module auto-loading of e.g. cpufreq drivers and KVM
modules. Andy Kleen has a patch to create the alias file itself. CPU
‘struct sysdev’ needs to be converted to ‘struct device’ and a ‘struct
bus_type cpu’ needs to be introduced to allow proper CPU coldplug event
replay at bootup. This is one of the last remaining places where
automatic hardware-triggered module auto-loading is not available. And
we’d like to see that fix to make numerous ugly userspace work-arounds
to achieve the same go away.

* expose CAP_LAST_CAP somehow in the running kernel at runtime:
Userspace needs to know the highest valid capability of the running
kernel, which right now cannot reliably be retrieved from header files
only. The fact that this value cannot be detected properly right now
creates various problems for libraries compiled on newer header files
which are run on older kernels. They assume capabilities are available
which actually aren’t. Specifically, libcap-ng claims that all running
processes retain the higher capabilities in this case due to the
“inverted” semantics of CapBnd in /proc/$PID/status.

* export ‘struct device_type fb/fbcon’ of ‘struct class graphics’
Userspace wants to easily distinguish ‘fb’ and ‘fbcon’ from each other
without the need to match on the device name.

* allow changing argv[] of a process without mucking with environ[]:
Something like setproctitle() or a prctl() would be ideal. Of course it
is questionable if services like sendmail make use of this, but otoh for
services which fork but do not immediately exec() another binary being
able to rename this child processes in ps is of importance.

* module-init-tools: provide a proper libmodprobe.so from
module-init-tools:
Early boot tools, installers, driver install disks want to access
information about available modules to optimize bootup handling.

* fork throttling mechanism as basic cgroup functionality that is
available in all hierarchies independent of the controllers used:
This is important to implement race-free killing of all members of a
cgroup, so that cgroup member processes cannot fork faster then a cgroup
supervisor process could kill them. This needs to be recursive, so that
not only a cgroup but all its subgroups are covered as well.

* proper cgroup-is-empty notification interface:
The current call_usermodehelper() interface is an unefficient and an
ugly hack. Tools would prefer anything more lightweight like a netlink,
poll() or fanotify interface.

* allow user xattrs to be set on files in the cgroupfs (and maybe
procfs?)

* simple, reliable and future-proof way to detect whether a specific pid
is running in a CLONE_NEWPID container, i.e. not in the root PID
namespace. Currently, there are available a few ugly hacks to detect
this (for example a process wanting to know whether it is running in a
PID namespace could just look for a PID 2 being around and named
kthreadd which is a kernel thread only visible in the root namespace),
however all these solutions encode information and expectations that
better shouldn’t be encoded in a namespace test like this. This
functionality is needed in particular since the removal of the the ns
cgroup controller which provided the namespace membership information to
user code.

* allow making use of the “cpu” cgroup controller by default without
breaking RT. Right now creating a cgroup in the “cpu” hierarchy that
shall be able to take advantage of RT is impossible for the generic case
since it needs an RT budget configured which is from a limited resource
pool. What we want is the ability to create cgroups in “cpu” whose
processes get an non-RT weight applied, but for RT take advantage of the
parent’s RT budget. We want the separation of RT and non-RT budget
assignment in the “cpu” hierarchy, because right now, you lose RT
functionality in it unless you assign an RT budget. This issue severely
limits the usefulness of “cpu” hierarchy on general purpose systems
right now.

* Add a timerslack cgroup controller, to allow increasing the timer
slack of user session cgroups when the machine is idle.

* An auxiliary meta data message for AF_UNIX called SCM_CGROUPS (or
something like that), i.e. a way to attach sender cgroup membership to
messages sent via AF_UNIX. This is useful in case services such as
syslog shall be shared among various containers (or service cgroups),
and the syslog implementation needs to be able to distinguish the
sending cgroup in order to separate the logs on disk. Of course stm
SCM_CREDENTIALS can be used to look up the PID of the sender followed by
a check in /proc/$PID/cgroup, but that is necessarily racy, and actually
a very real race in real life.

* SCM_COMM, with a similar use case as SCM_CGROUPS. This auxiliary
control message should carry the process name as available
in /proc/$PID/comm.

posted at: 01:22 | path: /projects | permanent link to this entry | 11 comments


Thu, 06 Oct 2011

What You Need to Know When Becoming a Free Software Hacker

Earlier today I gave a presentation at the Technical University Berlin about things you need to know, things you should expect and things you shouldn't expect when your are aspiring to become a successful Free Software Hacker.

I have put my slides up on Google Docs in case you are interested, either because you are the target audience (i.e. a university student) or because you need inspiration for a similar talk about the same topic.

The first two slides are in German language, so skip over them. The interesting bits are all in English. I hope it's quite comprehensive (though of course terse). Enjoy:

In case your feed reader/planet messes this up, here's the non-embedded version.

Oh, and thanks to everybody who reviewed and suggested additions to the the slides on +.

posted at: 22:05 | path: /projects | permanent link to this entry | 5 comments


Tue, 27 Sep 2011

PulseAudio 1.0

PulseAudio 1.0 is out now. It's awesome. Get it while it is hot!

I'd like to thank Colin Guthrie and Arun Raghavan (and all the others involved) for getting this release out of the door!

posted at: 16:07 | path: /projects | permanent link to this entry | comments

Mon, 26 Sep 2011

systemd for Administrators, Part XI

Here's the eleventh installment of my ongoing series on systemd for Administrators:

Converting inetd Services

In a previous episode of this series I covered how to convert a SysV init script to a systemd unit file. In this story I hope to explain how to convert inetd services into systemd units.

Let's start with a bit of background. inetd has a long tradition as one of the classic Unix services. As a superserver it listens on an Internet socket on behalf of another service and then activate that service on an incoming connection, thus implementing an on-demand socket activation system. This allowed Unix machines with limited resources to provide a large variety of services, without the need to run processes and invest resources for all of them all of the time. Over the years a number of independent implementations of inetd have been shipped on Linux distributions. The most prominent being the ones based on BSD inetd and xinetd. While inetd used to be installed on most distributions by default, it nowadays is used only for very few selected services and the common services are all run unconditionally at boot, primarily for (perceived) performance reasons.

One of the core feature of systemd (and Apple's launchd for the matter) is socket activation, a scheme pioneered by inetd, however back then with a different focus. Systemd-style socket activation focusses on local sockets (AF_UNIX), not so much Internet sockets (AF_INET), even though both are supported. And more importantly even, socket activation in systemd is not primarily about the on-demand aspect that was key in inetd, but more on increasing parallelization (socket activation allows starting clients and servers of the socket at the same time), simplicity (since the need to configure explicit dependencies between services is removed) and robustness (since services can be restarted or may crash without loss of connectivity of the socket). However, systemd can also activate services on-demand when connections are incoming, if configured that way.

Socket activation of any kind requires support in the services themselves. systemd provides a very simple interface that services may implement to provide socket activation, built around sd_listen_fds(). As such it is already a very minimal, simple scheme. However, the traditional inetd interface is even simpler. It allows passing only a single socket to the activated service: the socket fd is simply duplicated to STDIN and STDOUT of the process spawned, and that's already it. In order to provide compatibility systemd optionally offers the same interface to processes, thus taking advantage of the many services that already support inetd-style socket activation, but not yet systemd's native activation.

Before we continue with a concrete example, let's have a look at three different schemes to make use of socket activation:

  1. Socket activation for parallelization, simplicity, robustness: sockets are bound during early boot and a singleton service instance to serve all client requests is immediately started at boot. This is useful for all services that are very likely used frequently and continously, and hence starting them early and in parallel with the rest of the system is advisable. Examples: D-Bus, Syslog.
  2. On-demand socket activation for singleton services: sockets are bound during early boot and a singleton service instance is executed on incoming traffic. This is useful for services that are seldom used, where it is advisable to save the resources and time at boot and delay activation until they are actually needed. Example: CUPS.
  3. On-demand socket activation for per-connection service instances: sockets are bound during early boot and for each incoming connection a new service instance is instantiated and the connection socket (and not the listening one) is passed to it. This is useful for services that are seldom used, and where performance is not critical, i.e. where the cost of spawning a new service process for each incoming connection is limited. Example: SSH.

The three schemes provide different performance characteristics. After the service finishes starting up the performance provided by the first two schemes is identical to a stand-alone service (i.e. one that is started without a super-server, without socket activation), since the listening socket is passed to the actual service, and code paths from then on are identical to those of a stand-alone service and all connections are processes exactly the same way as they are in a stand-alone service. On the other hand, performance of the third scheme is usually not as good: since for each connection a new service needs to be started the resource cost is much higher. However, it also has a number of advantages: for example client connections are better isolated and it is easier to develop services activated this way.

For systemd primarily the first scheme is in focus, however the other two schemes are supported as well. (In fact, the blog story I covered the necessary code changes for systemd-style socket activation in was about a service of the second type, i.e. CUPS). inetd primarily focusses on the third scheme, however the second scheme is supported too. (The first one isn't. Presumably due the focus on the third scheme inetd got its -- a bit unfair -- reputation for being "slow".)

So much about the background, let's cut to the beef now and show an inetd service can be integrated into systemd's socket activation. We'll focus on SSH, a very common service that is widely installed and used but on the vast majority of machines probably not started more often than 1/h in average (and usually even much less). SSH has supported inetd-style activation since a long time, following the third scheme mentioned above. Since it is started only every now and then and only with a limited number of connections at the same time it is a very good candidate for this scheme as the extra resource cost is negligble: if made socket-activatable SSH is basically free as long as nobody uses it. And as soon as somebody logs in via SSH it will be started and the moment he or she disconnects all its resources are freed again. Let's find out how to make SSH socket-activatable in systemd taking advantage of the provided inetd compatibility!

Here's the configuration line used to hook up SSH with classic inetd:

ssh stream tcp nowait root /usr/sbin/sshd sshd -i

And the same as xinetd configuration fragment:

service ssh {
        socket_type = stream
        protocol = tcp
        wait = no
        user = root
        server = /usr/sbin/sshd
        server_args = -i
}

Most of this should be fairly easy to understand, as these two fragments express very much the same information. The non-obvious parts: the port number (22) is not configured in inetd configuration, but indirectly via the service database in /etc/services: the service name is used as lookup key in that database and translated to a port number. This indirection via /etc/services has been part of Unix tradition though has been getting more and more out of fashion, and the newer xinetd hence optionally allows configuration with explicit port numbers. The most interesting setting here is the not very intuitively named nowait (resp. wait=no) option. It configures whether a service is of the second (wait) resp. third (nowait) scheme mentioned above. Finally the -i switch is used to enabled inetd mode in SSH.

The systemd translation of these configuration fragments are the following two units. First: sshd.socket is a unit encapsulating information about a socket to listen on:

[Unit]
Description=SSH Socket for Per-Connection Servers

[Socket]
ListenStream=22
Accept=yes

[Install]
WantedBy=sockets.target

Most of this should be self-explanatory. A few notes: Accept=yes corresponds to nowait. It's hopefully better named, referring to the fact that for nowait the superserver calls accept() on the listening socket, where for wait this is the job of the executed service process. WantedBy=sockets.target is used to ensure that when enabled this unit is activated at boot at the right time.

And here's the matching service file sshd@.service:

[Unit]
Description=SSH Per-Connection Server

[Service]
ExecStart=-/usr/sbin/sshd -i
StandardInput=socket

This too should be mostly self-explanatory. Interesting is StandardInput=socket, the option that enables inetd compatibility for this service. StandardInput= may be used to configure what STDIN of the service should be connected for this service (see the man page for details). By setting it to socket we make sure to pass the connection socket here, as expected in the simple inetd interface. Note that we do not need to explicitly configure StandardOutput= here, since by default the setting from StandardInput= is inherited if nothing else is configured. Important is the "-" in front of the binary name. This ensures that the exit status of the per-connection sshd process is forgotten by systemd. Normally, systemd will store the exit status of a all service instances that die abnormally. SSH will sometimes die abnormally with an exit code of 1 or similar, and we want to make sure that this doesn't cause systemd to keep around information for numerous previous connections that died this way (until this information is forgotten with systemctl reset-failed).

sshd@.service is an instantiated service, as described in the preceeding installment of this series. For each incoming connection systemd will instantiate a new instance of sshd@.service, with the instance identifier named after the connection credentials.

You may wonder why in systemd configuration of an inetd service requires two unit files instead of one. The reason for this is that to simplify things we want to make sure that the relation between live units and unit files is obvious, while at the same time we can order the socket unit and the service units independently in the dependency graph and control the units as independently as possible. (Think: this allows you to shutdown the socket independently from the instances, and each instance individually.)

Now, let's see how this works in real life. If we drop these files into /etc/systemd/system we are ready to enable the socket and start it:

# systemctl enable sshd.socket
ln -s '/etc/systemd/system/sshd.socket' '/etc/systemd/system/sockets.target.wants/sshd.socket'
# systemctl start sshd.socket
# systemctl status sshd.socket
sshd.socket - SSH Socket for Per-Connection Servers
	  Loaded: loaded (/etc/systemd/system/sshd.socket; enabled)
	  Active: active (listening) since Mon, 26 Sep 2011 20:24:31 +0200; 14s ago
	Accepted: 0; Connected: 0
	  CGroup: name=systemd:/system/sshd.socket

This shows that the socket is listening, and so far no connections have been made (Accepted: will show you how many connections have been made in total since the socket was started, Connected: how many connections are currently active.)

Now, let's connect to this from two different hosts, and see which services are now active:

$ systemctl --full | grep ssh
sshd@172.31.0.52:22-172.31.0.4:47779.service  loaded active running       SSH Per-Connection Server
sshd@172.31.0.52:22-172.31.0.54:52985.service loaded active running       SSH Per-Connection Server
sshd.socket                                   loaded active listening     SSH Socket for Per-Connection Servers

As expected, there are now two service instances running, for the two connections, and they are named after the source and destination address of the TCP connection as well as the port numbers. (For AF_UNIX sockets the instance identifier will carry the PID and UID of the connecting client.) This allows us to invidiually introspect or kill specific sshd instances, in case you want to terminate the session of a specific client:

# systemctl kill sshd@172.31.0.52:22-172.31.0.4:47779.service

And that's probably already most of what you need to know for hooking up inetd services with systemd and how to use them afterwards.

In the case of SSH it is probably a good suggestion for most distributions in order to save resources to default to this kind of inetd-style socket activation, but provide a stand-alone unit file to sshd as well which can be enabled optionally. I'll soon file a wishlist bug about this against our SSH package in Fedora.

A few final notes on how xinetd and systemd compare feature-wise, and whether xinetd is fully obsoleted by systemd. The short answer here is that systemd does not provide the full xinetd feature set and that is does not fully obsolete xinetd. The longer answer is a bit more complex: if you look at the multitude of options xinetd provides you'll notice that systemd does not compare. For example, systemd does not come with built-in echo, time, daytime or discard servers, and never will include those. TCPMUX is not supported, and neither are RPC services. However, you will also find that most of these are either irrelevant on today's Internet or became other way out-of-fashion. The vast majority of inetd services do not directly take advantage of these additional features. In fact, none of the xinetd services shipped on Fedora make use of these options. That said, there are a couple of useful features that systemd does not support, for example IP ACL management. However, most administrators will probably agree that firewalls are the better solution for these kinds of problems and on top of that, systemd supports ACL management via tcpwrap for those who indulge in retro technologies like this. On the other hand systemd also provides numerous features xinetd does not provide, starting with the individual control of instances shown above, or the more expressive configurability of the execution context for the instances. I believe that what systemd provides is quite comprehensive, comes with little legacy cruft but should provide you with everything you need. And if there's something systemd does not cover, xinetd will always be there to fill the void as you can easily run it in conjunction with systemd. For the majority of uses systemd should cover what is necessary, and allows you cut down on the required components to build your system from. In a way, systemd brings back the functionality of classic Unix inetd and turns it again into a center piece of a Linux system.

And that's all for now. Thanks for reading this long piece. And now, get going and convert your services over! Even better, do this work in the individual packages upstream or in your distribution!

posted at: 20:46 | path: /projects | permanent link to this entry | 23 comments


systemd for Administrators, Part X

Here's the tenth installment of my ongoing series on systemd for Administrators:

Instantiated Services

Most services on Linux/Unix are singleton services: there's usually only one instance of Syslog, Postfix, or Apache running on a specific system at the same time. On the other hand some select services may run in multiple instances on the same host. For example, an Internet service like the Dovecot IMAP service could run in multiple instances on different IP ports or different local IP addresses. A more common example that exists on all installations is getty, the mini service that runs once for each TTY and presents a login prompt on it. On most systems this service is instantiated once for each of the first six virtual consoles tty1 to tty6. On some servers depending on administrator configuration or boot-time parameters an additional getty is instantiated for a serial or virtualizer console. Another common instantiated service in the systemd world is fsck, the file system checker that is instantiated once for each block device that needs to be checked. Finally, in systemd socket activated per-connection services (think classic inetd!) are also implemented via instantiated services: a new instance is created for each incoming connection. In this installment I hope to explain a bit how systemd implements instantiated services and how to take advantage of them as an administrator.

If you followed the previous episodes of this series you are probably aware that services in systemd are named according to the pattern foobar.service, where foobar is an identification string for the service, and .service simply a fixed suffix that is identical for all service units. The definition files for these services are searched for in /etc/systemd/system and /lib/systemd/system (and possibly other directories) under this name. For instantiated services this pattern is extended a bit: the service name becomes foobar@quux.service where foobar is the common service identifier, and quux the instance identifier. Example: serial-getty@ttyS2.service is the serial getty service instantiated for ttyS2.

Service instances can be created dynamically as needed. Without further configuration you may easily start a new getty on a serial port simply by invoking a systemctl start command for the new instance:

# systemctl start serial-getty@ttyUSB0.service

If a command like the above is run systemd will first look for a unit configuration file by the exact name you requested. If this service file is not found (and usually it isn't if you use instantiated services like this) then the instance id is removed from the name and a unit configuration file by the resulting template name searched. In other words, in the above example, if the precise serial-getty@ttyUSB0.service unit file cannot be found, serial-getty@.service is loaded instead. This unit template file will hence be common for all instances of this service. For the serial getty we ship a template unit file in systemd (/lib/systemd/system/serial-getty@.service) that looks something like this:

[Unit]
Description=Serial Getty on %I
BindTo=dev-%i.device
After=dev-%i.device systemd-user-sessions.service

[Service]
ExecStart=-/sbin/agetty -s %I 115200,38400,9600
Restart=always
RestartSec=0

(Note that the unit template file we actually ship along with systemd for the serial gettys is a bit longer. If you are interested, have a look at the actual file which includes additional directives for compatibility with SysV, to clear the screen and remove previous users from the TTY device. To keep things simple I have shortened the unit file to the relevant lines here.)

This file looks mostly like any other unit file, with one distinction: the specifiers %I and %i are used at multiple locations. At unit load time %I and %i are replaced by systemd with the instance identifier of the service. In our example above, if a service is instantiated as serial-getty@ttyUSB0.service the specifiers %I and %i will be replaced by ttyUSB0. If you introspect the instanciated unit with systemctl status serial-getty@ttyUSB0.service you will see these replacements having taken place:

$ systemctl status serial-getty@ttyUSB0.service
serial-getty@ttyUSB0.service - Getty on ttyUSB0
	  Loaded: loaded (/lib/systemd/system/serial-getty@.service; static)
	  Active: active (running) since Mon, 26 Sep 2011 04:20:44 +0200; 2s ago
	Main PID: 5443 (agetty)
	  CGroup: name=systemd:/system/getty@.service/ttyUSB0
		  └ 5443 /sbin/agetty -s ttyUSB0 115200,38400,9600

And that is already the core idea of instantiated services in systemd. As you can see systemd provides a very simple templating system, which can be used to dynamically instantiate services as needed. To make effective use of this, a few more notes:

You may instantiate these services on-the-fly in .wants/ symbolic links in the file system. For example, to make sure the serial getty on ttyUSB0 is started automatically at every boot, create a symlink like this:

# ln -s /lib/systemd/system/serial-getty@.service /etc/systemd/system/getty.target.wants/serial-getty@ttyUSB0.service

systemd will instantiate the symlinked unit file with the instance name specified in the symlink name.

You cannot instantiate a unit template without specifying an instance identifier. In other words systemctl start serial-getty@.service will necessarily fail since the instance name was left unspecified.

Sometimes it is useful to opt-out of the generic template for one specific instance. For these cases make use of the fact that systemd always searches first for the full instance file name before falling back to the template file name: make sure to place a unit file under the fully instantiated name in /etc/systemd/system and it will override the generic templated version for this specific instance.

The unit file shown above uses %i at some places and %I at others. You may wonder what the difference between these specifiers are. %i is replaced by the exact characters of the instance identifier. For %I on the other hand the instance identifier is first passed through a simple unescaping algorithm. In the case of a simple instance identifier like ttyUSB0 there is no effective difference. However, if the device name includes one or more slashes ("/") this cannot be part of a unit name (or Unix file name). Before such a device name can be used as instance identifier it needs to be escaped so that "/" becomes "-" and most other special characters (including "-") are replaced by "\xAB" where AB is the ASCII code of the character in hexadecimal notation[1]. Example: to refer to a USB serial port by its bus path we want to use a port name like serial/by-path/pci-0000:00:1d.0-usb-0:1.4:1.1-port0. The escaped version of this name is serial-by\x2dpath-pci\x2d0000:00:1d.0\x2dusb\x2d0:1.4:1.1\x2dport0. %I will then refer to former, %i to the latter. Effectively this means %i is useful wherever it is necessary to refer to other units, for example to express additional dependencies. On the other hand %I is useful for usage in command lines, or inclusion in pretty description strings. Let's check how this looks with the above unit file:

# systemctl start 'serial-getty@serial-by\x2dpath-pci\x2d0000:00:1d.0\x2dusb\x2d0:1.4:1.1\x2dport0.service'
# systemctl status 'serial-getty@serial-by\x2dpath-pci\x2d0000:00:1d.0\x2dusb\x2d0:1.4:1.1\x2dport0.service'
serial-getty@serial-by\x2dpath-pci\x2d0000:00:1d.0\x2dusb\x2d0:1.4:1.1\x2dport0.service - Serial Getty on serial/by-path/pci-0000:00:1d.0-usb-0:1.4:1.1-port0
	  Loaded: loaded (/lib/systemd/system/serial-getty@.service; static)
	  Active: active (running) since Mon, 26 Sep 2011 05:08:52 +0200; 1s ago
	Main PID: 5788 (agetty)
	  CGroup: name=systemd:/system/serial-getty@.service/serial-by\x2dpath-pci\x2d0000:00:1d.0\x2dusb\x2d0:1.4:1.1\x2dport0
		  └ 5788 /sbin/agetty -s serial/by-path/pci-0000:00:1d.0-usb-0:1.4:1.1-port0 115200 38400 9600

As we can see the while the instance identifier is the escaped string the command line and the description string actually use the unescaped version, as expected.

(Side note: there are more specifiers available than just %i and %I, and many of them are actually available in all unit files, not just templates for service instances. For more details see the man page which includes a full list and terse explanations.)

And at this point this shall be all for now. Stay tuned for a follow-up article on how instantiated services are used for inetd-style socket activation.

Footnotes

[1] Yupp, this escaping algorithm doesn't really result in particularly pretty escaped strings, but then again, most escaping algorithms don't help readability. The algorithm we used here is inspired by what udev does in a similar case, with one change. In the end, we had to pick something. If you'll plan to comment on the escaping algorithm please also mention where you live so that I can come around and paint your bike shed yellow with blue stripes. Thanks!

posted at: 05:11 | path: /projects | permanent link to this entry | 11 comments


Sat, 17 Sep 2011

Boot/Init LPC MC Summary at LWN

Make sure to read the summary of the Boot & Init Microconf at the Linux Plumbers Conference 2011 In Santa Rosa, CA. It was a fantastic conference (at the social event we took busses from the appetizers to the mains...), and this summary should give you quite a good idea what we discussed there. Highly recommended read.

posted at: 17:56 | path: /projects | permanent link to this entry | 0 comments


Thu, 01 Sep 2011

systemd US Tour Dates

Kay Sievers, Harald Hoyer and I will tour the US in the next weeks. If you have any questions on systemd, udev or dracut (or any of the related technologies), then please do get in touch with us on the following occasions:

Linux Plumbers Conference, Santa Rosa, CA, Sep 7-9th
Google, Googleplex, Mountain View, CA, Sep 12th
Red Hat, Westford, MA, Sep 13-14th

As usual LPC is going to rock, so make sure to be there!

posted at: 15:37 | path: /projects | permanent link to this entry | 3 comments


Tue, 30 Aug 2011

How to Write syslog Daemons Which Cooperate Nicely With systemd

I just finished putting together a text on the systemd wiki explaining what to do to write a syslog service that is nicely integrated with systemd, and does all the right things. It's supposed to be a checklist for all syslog hackers:

Read it now.

rsyslog already implements everything on this list afaics, and that's pretty cool. If other implementations want to catch up, please consider following these recommendations, too.

I put this together since I have changed systemd 35 to set StandardOutput=syslog as default, so that all stdout/stderr of all services automatically ends up in syslog. And since that change requires some (minimal) changes to all syslog implementations I decided to document this all properly (if you are curious: they need to set StandardOutput=null to opt out of this default in order to avoid logging loops).

Anyway, please have a peek and comment if you spot a mistake or something I forgot. Or if you have questions, just ask.

posted at: 23:14 | path: /projects | permanent link to this entry | 8 comments


Fri, 19 Aug 2011

How to Behave Nicely in the cgroup Trees

The Linux cgroup hierarchies of the various kernel controllers are a shared resource. Recently many components of Linux userspace started making use of these hierarchies. In order to avoid that the various programs step on each others toes while manipulating this shared resource we have put together a list of recommendations. Programs following these guidelines should work together nicely without interfering with other users of the hierarchies.

These guidelines are available in the systemd wiki. I'd be very interested in feedback, and would like to ask you to ping me in case we forgot something or left something too vague.

And please, if you are writing software that interfaces with the cgroup tree consider following these recommendations. Thank you.

posted at: 16:25 | path: /projects | permanent link to this entry | 2 comments


Tue, 02 Aug 2011

The Desktop Summit Wiki Is Full Of Interesting Stuff

Just wanted to draw your attention to the Desktop Summit Wiki. If you are attending the Desktop Summit in Berlin you might find some interesting information in the Wiki.

Go to the main page of the Wiki here. You are welcome to edit and add additional information to the Wiki. To edit the Wiki authenticate with the same credentials you used to sign up for the conference at the Desktop Summit web site.

See you on friday!

posted at: 22:56 | path: /projects | permanent link to this entry | 1 comments


Mon, 01 Aug 2011

Desktop Summit Announcements, Part II

Read the first part of the announcements.

And now there are more exciting announcements:

Only 5 days are now left to beginning of the conference. The first event will already take place on Friday August 5th, at c-base at U/S Jannowitzbrücke, starting at 4pm. The conference programme itself will begin on Saturday August 6th, 10am (though do come earlier, for registration, if you didn't register at the c-base event already!). Note that the primary entrance to the Desktop Summit is in the north-eastern corner of the main building of Humboldt University. That's on Dorotheenstr./Hegelplatz, and not on Unter den Linden.

See you on Friday at c-base!

posted at: 18:08 | path: /projects | permanent link to this entry | 3 comments


Fri, 22 Jul 2011

Desktop Summit Announcements

In case you missed them, there have been a couple of exciting announcements around the Desktop Summit in Berlin, Germany.

And there will be more exciting announcements coming!

See you in 14 days! Oh, and if you still haven't registered, do so now. It's free, and if you don't register you might not get on the WLAN at the conference right-away.

posted at: 21:15 | path: /projects | permanent link to this entry | 0 comments


Mon, 18 Jul 2011

systemd for Administrators, Part IX

Here's the ninth installment of my ongoing series on systemd for Administrators:

On /etc/sysconfig and /etc/default

So, here's a bit of an opinion piece on the /etc/sysconfig/ and /etc/default directories that exist on the various distributions in one form or another, and why I believe their use should be faded out. Like everything I say on this blog what follows is just my personal opinion, and not the gospel and has nothing to do with the position of the Fedora project or my employer. The topic of /etc/sysconfig has been coming up in discussions over and over again. I hope with this blog story I can explain a bit what we as systemd upstream think about these files.

A few lines about the historical context: I wasn't around when /etc/sysconfig was introduced -- suffice to say it has been around on Red Hat and SUSE distributions since a long long time. Eventually /etc/default was introduced on Debian with very similar semantics. Many other distributions know a directory with similar semantics too, most of them call it either one or the other way. In fact, even other Unix-OSes sported a directory like this. (Such as SCO. If you are interested in the details, I am sure a Unix greybeard of your trust can fill in what I am leaving vague here.) So, even though a directory like this has been known widely on Linuxes and Unixes, it never has been standardized, neither in POSIX nor in LSB/FHS. These directories very much are something where distributions distuingish themselves from each other.

The semantics of /etc/default and /etc/sysconfig are very losely defined only. What almost all files stored in these directories have in common though is that they are sourcable shell scripts which primarily consist of environment variable assignments. Most of the files in these directories are sourced by the SysV init scripts of the same name. The Debian Policy Manual (9.3.2) and the Fedora Packaging Guidelines suggest this use of the directories, however both distributions also have files in them that do not follow this scheme, i.e. that do not have a matching SysV init script -- or not even are shell scripts at all.

Why have these files been introduced? On SysV systems services are started via init scripts in /etc/rc.d/init.d (or a similar directory). /etc/ is (these days) considered the place where system configuration is stored. Originally these init scripts were subject to customization by the administrator. But as they grew and become complex most distributions no longer considered them true configuration files, but more just a special kind of programs. To make customization easy and guarantee a safe upgrade path the customizable bits hence have been moved to separate configuration files, which the init scripts then source.

Let's have a quick look what kind of configuration you can do with these files. Here's a short incomprehensive list of various things that can be configured via environment settings in these source files I found browsing through the directories on a Fedora and a Debian machine:

  • Additional command line parameters for the daemon binaries
  • Locale settings for a daemon
  • Shutdown time-out for a daemon
  • Shutdown mode for a daemon
  • System configuration like system locale, time zone information, console keyboard
  • Redundant system configuration, like whether the RTC is in local timezone
  • Firewall configuration data, not in shell format (!)
  • CPU affinity for a daemon
  • Settings unrelated to boot, for example including information how to install a new kernel package, how to configure nspluginwrap or whether to do library prelinking
  • Whether a specific service should be started or not
  • Networking configuration
  • Which kernel modules to statically load
  • Whether to halt or power-off on shutdown
  • Access modes for device nodes (!)
  • A description string for the SysV service (!)
  • The user/group ID, umask to run specific daemons as
  • Resource limits to set for a specific daemon
  • OOM adjustment to set for a specific daemon

Now, let's go where the beef is: what's wrong with /etc/sysconfig (resp. /etc/default)? Why might it make sense to fade out use of these files in a systemd world?

  • For the majority of these files the reason for having them simply does not exist anymore: systemd unit files are not programs like SysV init scripts were. Unit files are simple, declarative descriptions, that usually do not consist of more than 6 lines or so. They can easily be generated, parsed without a Bourne interpreter and understood by the reader. Also, they are very easy to modify: just copy them from /lib/systemd/system to /etc/systemd/system and edit them there, where they will not be modified by the package manager. The need to separate code and configuration that was the original reason to introduce these files does not exist anymore, as systemd unit files do not include code. These files hence now are a solution looking for a problem that no longer exists.
  • They are inherently distribution-specific. With systemd we hope to encourage standardization between distributions. Part of this is that we want that unit files are supplied with upstream, and not just added by the packager -- how it has usually been done in the SysV world. Since the location of the directory and the available variables in the files is very different on each distribution, supporting /etc/sysconfig files in upstream unit files is not feasible. Configuration stored in these files works against de-balkanization of the Linux platform.
  • Many settings are fully redundant in a systemd world. For example, various services support configuration of the process credentials like the user/group ID, resource limits, CPU affinity or the OOM adjustment settings. However, these settings are supported only by some SysV init scripts, and often have different names if supported in multiple of them. OTOH in systemd, all these settings are available equally and uniformly for all services, with the same configuration option in unit files.
  • Unit files know a large number of easy-to-use process context settings, that are more comprehensive than what most /etc/sysconfig files offer.
  • A number of these settings are entirely questionnabe. For example, the aforementiond configuration option for the user/group ID a service runs as is primarily something the distributor has to take care of. There is little to win for administrators to change these settings, and only the distributor has the broad overview to make sure that UID/GID and name collisions do not happen.
  • The file format is not ideal. Since the files are usually sourced as shell scripts, parse errors are very hard to decypher and are not logged along the other configuration problems of the services. Generally, unknown variable assignments simply have no effect but this is not warned about. This makes these files harder to debug than necessary.
  • Configuration files sources from shell scripts are subject to the execution parameters of the interpreter, and it has many: settings like IFS or LANG tend to modify drastically how shell scripts are parsed and understood. This makes them fragile.
  • Interpretation of these files is slow, since it requires spawning of a shell, which adds at least one process for each service to be spawned at boot.
  • Often, files in /etc/sysconfig are used to "fake" configuration files for daemons which do not support configuration files natively. This is done by glueing together command line arguments from these variable assignments that are then passed to the daemon. In general proper, native configuration files in these daemons are the much prettier solution however. Command line options like "-k", "-a" or "-f" are not self-explanatory and have a very cryptic syntax. Moreover the same switches in many daemons have (due to the limited vocabulary) often very much contradicting effects. (On one daemon -f might cause the daemon to daemonize, while on another one this option turns exactly this behaviour off.) Command lines generally cannot include sensible comments which most configuration files however can.
  • A number of configuration settings in /etc/sysconfig are entirely redundant: for example, on many distributions it can be controlled via /etc/sysconfig files whether the RTC is in UTC or local time. Such an option already exists however in the 3rd line of the /etc/adjtime (which is known on all distributions). Adding a second, redundant, distribution-specific option overriding this is hence needless and complicates things for no benefit.
  • Many of the configuration settings in /etc/sysconfig allow disabling services. By this they basically become a second level of enabling/disabling over what the init system already offers: when a service is enabled with systemctl enable or chkconfig on these settings override this, and turn the daemon of even though the init system was configured to start it. This of course is very confusing to the user/administrator, and brings virtually no benefit.
  • For options like the configuration of static kernel modules to load: there are nowadays usually much better ways to load kernel modules at boot. For example, most modules may now be autoloaded by udev when the right hardware is found. This goes very far, and even includes ACPI and other high-level technologies. One of the very few exceptions where we currently do not do kernel module autoloading is CPU feature and model based autoloading which however will be supported soon too. And even if your specific module cannot be auto-loaded there's usually a better way to statically load it, for example by sticking it in /etc/load-modules.d so that the administrator can check a standardized place for all statically loaded modules.
  • Last but not least, /etc already is intended to be the place for system configuration ("Host-specific system configuration" according to FHS). A subdirectory beneath it called sysconfig to place system configuration in is hence entirely redundant, already on the language level.

What to use instead? Here are a few recommendations of what to do with these files in the long run in a systemd world:

  • Just drop them without replacement. If they are fully redundant (like the local/UTC RTC setting) this is should be a relatively easy way out (well, ignoring the need for compatibility). If systemd natively supports an equivalent option in the unit files there is no need to duplicate these settings in sysconfig files. For a list of execution options you may set for a service check out the respective man pages: systemd.exec(5) and systemd.service(5). If your setting simply adds another layer where a service can be disabled, remove it to keep things simple. There's no need to have multiple ways to disable a service.
  • Find a better place for them. For configuration of the system locale or system timezone we hope to gently push distributions into the right direction, for more details see previous episode of this series.
  • Turn these settings into native settings of the daemon. If necessary add support for reading native configuration files to the daemon. Thankfully, most of the stuff we run on Linux is Free Software, so this can relatively easily be done.

Of course, there's one very good reason for supporting these files for a bit longer: compatibility for upgrades. But that's is really the only one I could come up with. It's reason enough to keep compatibility for a while, but I think it is a good idea to phase out usage of these files at least in new packages.

If compatibility is important, then systemd will still allow you to read these configuration files even if you otherwise use native systemd unit files. If your sysconfig file only knows simple options EnvironmentFile=-/etc/sysconfig/foobar (See systemd.exec(5) for more information about this option.) may be used to import the settings into the environment and use them to put together command lines. If you need a programming language to make sense of these settings, then use a programming language like shell. For example, place an short shell script in /usr/lib/<your package>/ which reads these files for compatibility, and then exec's the actual daemon binary. Then spawn this script instead of the actual daemon binary with ExecStart= in the unit file.

And this is all for now. Thank you very much for your interest.

posted at: 00:34 | path: /projects | permanent link to this entry | 17 comments


Wed, 13 Jul 2011

Wake up!

If you plan to attend Desktop Summit in Berlin this year, then please REGISTER NOW!

If you do not register, then this means you will have to wait in the signup queue at the conference for substantially longer and might miss a talk or two. You will not get onto the conference WLAN right from the beginning of the conference (access is authenticated and personalized, only people who sign up will get access credentials). Your personal badge will not be ready right-away. If not enough people register we will also have to cut down on the available catering and the parties. We rely on the registration numbers to plan, and if you come but don't sign up before you make it very hard for us to plan. Registration is free, so what are you waiting for?

I am pretty sure you want to avoid all of this right? For your own benefit and for the benefit of everybody else attending the conference, go and register for the conference right-away!

Also, we are still looking for more volunteers for session chairs and runners at the conference. This is your chance to introduce your favourite Open Source hacker on stage! Please consider volunteering and read the Call for Volunteers. Add yourself to the list on the wiki page, today. If you sign up you'll earn yourself the gratitude of the GNOME and KDE communities, and you'll receive the exclusive team T-shirts!

Thank you!

posted at: 00:49 | path: /projects | permanent link to this entry | 3 comments


Tue, 05 Jul 2011

Yet another interview

Here's yes another interview with yours truly. It's on LinuxFR, so I hope you understand some fr_FR.

posted at: 15:28 | path: /projects | permanent link to this entry | 6 comments


systemd for Developers II

It has been way to long since I posted the first episode of my systemd for Developers series. Here's finally the second part. Make sure you read the first episode of the series before you start with this part since I'll assume the reader grokked the wonders of socket activation.

Socket Activation, Part II

This time we'll focus on adding socket activation support to real-life software, more specifically the CUPS printing server. Most current Linux desktops run CUPS by default these days, since printing is so basic that it's a must have, and must just work when the user needs it. However, most desktop CUPS installations probably don't actually see more than a handful of print jobs each month. Even if you are a busy office worker you'll unlikely to generate more than a couple of print jobs an hour on your PC. Also, printing is not time critical. Whether a job takes 50ms or 100ms until it reaches the printer matters little. As long as it is less than a few seconds the user probably won't notice. Finally, printers are usually peripheral hardware: they aren't built into your laptop, and you don't always carry them around plugged in. That all together makes CUPS a perfect candidate for lazy activation: instead of starting it unconditionally at boot we just start it on-demand, when it is needed. That way we can save resources, at boot and at runtime. However, this kind of activation needs to take place transparently, so that the user doesn't notice that the print server was not actually running yet when he tried to access it. To achieve that we need to make sure that the print server is started as soon at least one of three conditions hold:

  1. A local client tries to talk to the print server, for example because a GNOME application opened the printing dialog which tries to enumerate available printers.
  2. A printer is being plugged in locally, and it should be configured and enabled and then optionally the user be informed about it.
  3. At boot, when there's still an unprinted print job lurking in the queue.

Of course, the desktop is not the only place where CUPS is used. CUPS can be run in small and big print servers, too. In that case the amount of print jobs is substantially higher, and CUPS should be started right away at boot. That means that (optionally) we still want to start CUPS unconditionally at boot and not delay its execution until when it is needed.

Putting this all together we need four kind of activation to make CUPS work well in all situations at minimal resource usage: socket based activation (to support condition 1 above), hardware based activation (to support condition 2), path based activation (for condition 3) and finally boot-time activation (for the optional server usecase). Let's focus on these kinds of activation in more detail, and in particular on socket-based activation.

Socket Activation in Detail

To implement socket-based activation in CUPS we need to make sure that when sockets are passed from systemd these are used to listen on instead of binding them directly in the CUPS source code. Fortunately this is relatively easy to do in the CUPS sources, since it already supports launchd-style socket activation, as it is used on MacOS X (note that CUPS is nowadays an Apple project). That means the code already has all the necessary hooks to add systemd-style socket activation with minimal work.

To begin with our patching session we check out the CUPS sources. Unfortunately CUPS is still stuck in unhappy Subversion country and not using git yet. In order to simplify our patching work our first step is to use git-svn to check it out locally in a way we can access it with the usual git tools:

git svn clone http://svn.easysw.com/public/cups/trunk/ cups

This will take a while. After the command finished we use the wonderful git grep to look for all occurences of the word "launchd", since that's probably where we need to add the systemd support too. This reveals scheduler/main.c as main source file which implements launchd interaction.

Browsing through this file we notice that two functions are primarily responsible for interfacing with launchd, the appropriately named launchd_checkin() and launchd_checkout() functions. The former acquires the sockets from launchd when the daemon starts up, the latter terminates communication with launchd and is called when the daemon shuts down. systemd's socket activation interfaces are much simpler than those of launchd. Due to that we only need an equivalent of the launchd_checkin() call, and do not need a checkout function. Our own function systemd_checkin() can be implemented very similar to launchd_checkin(): we look at the sockets we got passed and try to map them to the ones configured in the CUPS configuration. If we got more sockets passed than configured in CUPS we implicitly add configuration for them. If the CUPS configuration includes definitions for more listening sockets those will be bound natively in CUPS. That way we'll very robustly listen on all ports that are listed in either systemd or CUPS configuration.

Our function systemd_checkin() uses sd_listen_fds() from sd-daemon.c to acquire the file descriptors. Then, we use sd_is_socket() to map the sockets to the right listening configuration of CUPS, in a loop for each passed socket. The loop corresponds very closely to the loop from launchd_checkin() however is a lot simpler. Our patch so far looks like this.

Before we can test our patch, we add sd-daemon.c and sd-daemon.h as drop-in files to the package, so that sd_listen_fds() and sd_is_socket() are available for use. After a few minimal changes to the Makefile we are almost ready to test our socket activated version of CUPS. The last missing step is creating two unit files for CUPS, one for the socket (cups.socket), the other for the service (cups.service). To make things simple we just drop them in /etc/systemd/system and make sure systemd knows about them, with systemctl daemon-reload.

Now we are ready to test our little patch: we start the socket with systemctl start cups.socket. This will bind the socket, but won't start CUPS yet. Next, we simply invoke lpq to test whether CUPS is transparently started, and yupp, this is exactly what happens. We'll get the normal output from lpq as if we had started CUPS at boot already, and if we then check with systemctl status cups.service we see that CUPS was automatically spawned by our invocation of lpq. Our test succeeded, socket activation worked!

Hardware Activation in Detail

The next trigger is hardware activation: we want to make sure that CUPS is automatically started as soon as a local printer is found, regardless whether that happens as hotplug during runtime or as coldplug during boot. Hardware activation in systemd is done via udev rules. Any udev device that is tagged with the systemd tag can pull in units as needed via the SYSTEMD_WANTS= environment variable. In the case of CUPS we don't even have to add our own udev rule to the mix, we can simply hook into what systemd already does out-of-the-box with rules shipped upstream. More specifically, it ships a udev rules file with the following lines:

SUBSYSTEM=="printer", TAG+="systemd", ENV{SYSTEMD_WANTS}="printer.target"
SUBSYSTEM=="usb", KERNEL=="lp*", TAG+="systemd", ENV{SYSTEMD_WANTS}="printer.target"
SUBSYSTEM=="usb", ENV{DEVTYPE}=="usb_device", ENV{ID_USB_INTERFACES}=="*:0701??:*", TAG+="systemd", ENV{SYSTEMD_WANTS}="printer.target"

This pulls in the target unit printer.target as soon as at least one printer is plugged in (supporting all kinds of printer ports). All we now have to do is make sure that our CUPS service is pulled in by printer.target and we are done. By placing WantedBy=printer.target line in the [Install] section of the service file, a Wants dependency is created from printer.target to cups.service as soon as the latter is enabled with systemctl enable. The indirection via printer.target provides us with a simple way to use systemctl enable and systemctl disable to manage hardware activation of a service.

Path-based Activation in Detail

To ensure that CUPS is also started when there is a print job still queued in the printing spool, we write a simple cups.path that activates CUPS as soon as we find a file in /var/spool/cups.

Boot-based Activation in Detail

Well, starting services on boot is obviously the most boring and well-known way to spawn a service. This entire excercise was about making this unnecessary, but we still need to support it for explicit print server machines. Since those are probably the exception and not the common case we do not enable this kind of activation by default, but leave it to the administrator to add it in when he deems it necessary, with a simple command (ln -s /lib/systemd/system/cups.service /etc/systemd/system/multi-user.target.wants/ to be precise).

So, now we have covered all four kinds of activation. To finalize our patch we have a closer look at the [Install] section of cups.service, i.e. the part of the unit file that controls how systemctl enable cups.service and systemctl disable cups.service will hook the service into/unhook the service from the system. Since we don't want to start cups at boot we do not place WantedBy=multi-user.target in it like we would do for those services. Instead we just place an Also= line that makes sure that cups.path and cups.socket are automatically also enabled if the user asks to enable cups.service (they are enabled according to the [Install] sections in those unit files).

As last step we then integrate our work into the build system. In contrast to SysV init scripts systemd unit files are supposed to be distribution independent. Hence it is a good idea to include them in the upstream tarball. Unfortunately CUPS doesn't use Automake, but Autoconf with a set of handwritten Makefiles. This requires a bit more work to get our additions integrated, but is not too difficult either. And this is how our final patch looks like, after we commited our work and ran git format-patch -1 on it to generate a pretty git patch.

The next step of course is to get this patch integrated into the upstream and Fedora packages (or whatever other distribution floats your boat). To make this easy I have prepared a patch for Tim that makes the necessary packaging changes for Fedora 16, and includes the patch intended for upstream linked above. Of course, ideally the patch is merged upstream, however in the meantime we can already include it in the Fedora packages.

Note that CUPS was particularly easy to patch since it already supported launchd-style activation, patching a service that doesn't support that yet is only marginally more difficult. (Oh, and we have no plans to offer the complex launchd API as compatibility kludge on Linux. It simply doesn't translate very well, so don't even ask... ;-))

And that finishes our little blog story. I hope this quick walkthrough how to add socket activation (and the other forms of activation) to a package were interesting to you, and will help you doing the same for your own packages. If you have questions, our IRC channel #systemd on freenode and our mailing list are available, and we are always happy to help!

posted at: 00:46 | path: /projects | permanent link to this entry | 18 comments


Sun, 03 Jul 2011

Another interview

Here's another interview with yours truly. It's on IBM developerWorks Brasil, so I hope you understand some pt_BR.

posted at: 16:33 | path: /projects | permanent link to this entry | 0 comments


Wed, 29 Jun 2011

Reminder!

GNOMErs, the Desktop Summit in Berlin, Germany is approaching quickly!

Submit your BoF for the Desktop Summit BoF days NOW! Deadline is July 3rd, this sunday!

Sign up as a volunteer for the Desktop Summit NOW! Deadline is July 18th!

posted at: 23:20 | path: /projects | permanent link to this entry | 0 comments


Thu, 23 Jun 2011

Call For Volunteers

The Desktop Summit 2011 in Berlin, Germany Needs Your Help!

Are you attending the Desktop Summit? Are you interested in helping the GNOME and KDE communities organize this year's Summit? Do you want to work with other Free Software enthusiasts to make the Desktop Summit rock? Would you like to own one of the exclusive Desktop Summit Team T-Shirts?

If so, please sign up as a volunteer for the Desktop Summit!

Read the full Call For Volunteers!

Read Patricia's original announcement.

Or go directly and sign up as a volunteer.

posted at: 17:32 | path: /projects | permanent link to this entry | 0 comments


Mon, 20 Jun 2011

Desktop Summit Workshops and BoFs Call for Participation

The Desktop Summit schedule for the talks and presentations has been published a couple of weeks ago. Now it is time to open the 2nd Call for Participation, this time for Workshops and BoFs.

If you'd like to run a workshop, BoF, hack session or training/teaching session, then please submit it here. If you do it will appear in the printed schedule and get a prominent time slot assigned. BoFs, workshops, hack sessions and training/teaching sessions can also be added after the deadline of July 3rd, and even be registered ad-hoc at the conference, but if you register your slot in advance we can make sure people will find it in the printed schedule, will know about it, can plan to attend it and we can do everything to make sure a lot of people show up.

Note that BoF/workshop proposals are unrestricted, i.e. there is no program committee that will accept or reject submissions: we have a lot of room and we'll accept liberally what is submitted.

For GNOMErs: this part of the conference is supposed to be much like the Boston GNOME summit, but with a printed schedule. So please be welcome to submit your sessions like you'd want to have them take place at the GNOME summit as well.

Also see Jonnor's original announcement.

So, hurry, file your session request right-away and before July 3rd!

posted at: 09:59 | path: /projects | permanent link to this entry | 0 comments