<?xml version="1.0" encoding="utf-8"?>
<!-- name="generator" content="pyblosxom/1.4.3 01/10/2008" -->
<!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN" "http://my.netscape.com/publish/formats/rss-0.91.dtd">

<rss version="0.91">
<channel>
<title>Wunschkonzert, Ponyhof und Abenteuerspielplatz   </title>
<link>http://0pointer.de/blog</link>
<description>Lennart's Blog</description>
<language>en</language>
<item>
  <title>It&apos;s Time Again!</title>
  <link>http://0pointer.de/blog/projects/berlin-open-source-meetup-4.html</link>
  <description><![CDATA[

<p>My fellow Berliners! There's another <a
href="https://plus.google.com/events/cnikpv83amqf0mr8cf0ag7f2qus">Berlin
Open Source Meetup</a> scheduled for this Sunday! You are invited!</p>

<p>See you on Sunday!</p>

]]></description>
</item>

<item>
  <title>What Are We Breaking Now?</title>
  <link>http://0pointer.de/blog/projects/brno.html</link>
  <description><![CDATA[

<p>End of February <a href="http://www.devconf.cz/">devconf.cz</a>
took place in Brno, Czech Republic. At the conference Kay Sievers,
Harald Hoyer and I did two presentations about our work on <a
href="http://www.freedesktop.org/wiki/Software/systemd">systemd</a>
and about the systemd Journal. These talks were taped and the
recordings are now available online.</p>

<p>First, here's our talk about <a
href="https://www.youtube.com/watch?v=_rrpjYD373A"><i>What Are We
Breaking Now?</i></a>, in which we try to give an overview on what we
are working on currently in the systemd context, and what we expect to
do in the next few months. We cover <a href="http://www.freedesktop.org/wiki/Software/systemd/PredictableNetworkInterfaceNames">Predictable Network Interface
Names</a>, the <a
href="http://www.freedesktop.org/wiki/Specifications/BootLoaderSpec">Boot
Loader Spec</a>, kdbus, the Apps framework, and more.</p>

<object width="420" height="315"><param name="movie"
value="http://www.youtube.com/v/_rrpjYD373A?hl=en_US&amp;version=3"></param><param
name="allowFullScreen" value="true"></param><param
name="allowscriptaccess" value="always"></param><embed
src="http://www.youtube.com/v/_rrpjYD373A?hl=en_US&amp;version=3"
type="application/x-shockwave-flash" width="420" height="315"
allowscriptaccess="always" allowfullscreen="true"></embed></object>

<p>And then, I did my second talk about <a
href="https://www.youtube.com/watch?v=i4CACB7paLc"><i>The systemd
Journal</i></a>, with a focus on how to practically make use of
<tt>journalctl</tt>, as a day-to-day tool for administrators (these practical
bits start around 28:40). The commands demoed here are all explained in an <a
href="http://0pointer.de/blog/projects/journalctl.html">earlier blog story of
mine</a>.</p>

<object width="420" height="315"><param name="movie"
value="http://www.youtube.com/v/i4CACB7paLc?hl=en_US&amp;version=3"></param><param
name="allowFullScreen" value="true"></param><param
name="allowscriptaccess" value="always"></param><embed
src="http://www.youtube.com/v/i4CACB7paLc?hl=en_US&amp;version=3"
type="application/x-shockwave-flash" width="420" height="315"
allowscriptaccess="always" allowfullscreen="true"></embed></object>

<p>Unfortunately, the audience questions are sometimes hard or
impossible to understand from the videos, and sometimes the text on
the slides is hard to read, but I still believe that the two talks are
quite interesting.</p>

]]></description>
</item>

<item>
  <title>systemd Hackfest!</title>
  <link>http://0pointer.de/blog/projects/hackfest.html</link>
  <description><![CDATA[

<p>Hey, you, systemd hacker, Fedora hacker! Listen up! This Thu/Fri is the <a
href="https://plus.google.com/u/0/events/cnklef88b85tb6tgf6ue3hn32lg">systemd
Hackfest</a> in Brno/Czech Rep, right before <a
href="http://www.devconf.cz/">devconf.cz</a>!  On thursday we'll talk about
(and hack on) all things systemd. And the hackfest friday is going to be a <a
href="https://fedoraproject.org/wiki/FAD_systemd_2013">Fedora Activity Day</a>,
so we'll have a focus on systemd integration into Fedora.</p>

<p>You are invited!</p>

<p>See you in Brno!</p>

]]></description>
</item>

<item>
  <title>The Biggest Myths</title>
  <link>http://0pointer.de/blog/projects/the-biggest-myths.html</link>
  <description><![CDATA[

<p>Since we first proposed <a
href="http://www.freedesktop.org/wiki/Software/systemd">systemd</a>
for inclusion in the distributions it has been frequently discussed in
many forums, mailing lists and conferences. In these discussions one
can often hear certain myths about systemd, that are repeated over and
over again, but certainly don't gain any truth by constant
repetition. Let's take the time to debunk a few of them:</p>

<ol>

<li><p><b>Myth: systemd is monolithic.</b></p>

<p>If you build systemd with all configuration options enabled you
will build 69 individual binaries. These binaries all serve different
tasks, and are neatly separated for a number of reasons. For example,
we designed systemd with security in mind, hence most daemons run at
minimal privileges (using kernel capabilities, for example) and are
responsible for very specific tasks only, to minimize their security
surface and impact. Also, systemd parallelizes the boot more than any
prior solution. This parallization happens by running more processes
in parallel. Thus it is essential that systemd is nicely split up into
many binaries and thus processes. In fact, many of these
binaries<sup>[1]</sup> are separated out so nicely, that they are very
useful outside of systemd, too.</p>

<p>A package involving 69 individual binaries can hardly be called
<i>monolithic</i>. What is different from prior solutions however,
is that we ship more components in a single tarball, and maintain them
upstream in a single repository with a unified release cycle.</p></li>

<li><p><b>Myth: systemd is about speed.</b></p>

<p>Yes, systemd is fast (<a
href="https://plus.google.com/108087225644395745666/posts/LyPQgKdntgA">A
pretty complete userspace boot-up in ~900ms, anyone?</a>), but that's
primarily just a side-effect of doing things <i>right</i>. In fact, we
never really sat down and optimized the last tiny bit of performance
out of systemd. Instead, we actually frequently knowingly picked the
slightly slower code paths in order to keep the code more
readable. This doesn't mean being fast was irrelevant for us, but
reducing systemd to its speed is certainly quite a misconception,
since that is certainly not anywhere near the top of our list of
goals.</p></li>

<li><p><b>Myth: systemd's fast boot-up is irrelevant for
servers.</b></p>

<p>That is just completely not true. Many administrators actually are
keen on reduced downtimes during maintenance windows. In High
Availability setups it's kinda nice if the failed machine comes back
up really fast. In cloud setups with a large number of VMs or
containers the price of slow boots multiplies with the number of
instances. Spending minutes of CPU and IO on really slow boots of
hundreds of VMs or containers reduces your system's density
drastically, heck, it even costs you more energy. Slow boots can be
quite financially expensive. Then, fast booting of containers allows
you to implement a logic such as <a
href="http://0pointer.de/blog/projects/socket-activated-containers.html">socket
activated containers</a>, allowing you to drastically increase the
density of your cloud system.</p>

<p>Of course, in many server setups boot-up is indeed irrelevant, but
systemd is supposed to cover the whole range. And yes, I am aware
that often it is the server firmware that costs the most time at
boot-up, and the OS anyways fast compared to that, but well, systemd
is still supposed to cover the whole range (see above...), and no,
not all servers have such bad firmware, and certainly not VMs and
containers, which are servers of a kind, too.<sup>[2]</sup></p></li>

<li><p><b>Myth: systemd is incompatible with shell scripts.</b></p>

<p>This is entirely bogus. <i>We</i> just don't use them for the boot
process, because we believe they aren't the best tool for that
specific purpose, but that doesn't mean systemd was incompatible with
them. You can easily run shell scripts as systemd services, heck, you
can run scripts written in <i>any</i> language as systemd services,
systemd doesn't care the slightest bit what's inside your
executable. Moreover, we heavily use shell scripts for our own
purposes, for installing, building, testing systemd. And you can stick
your scripts in the early boot process, use them for normal services,
you can run them at latest shutdown, there are practically no
limits.</p></li>

<li><p><b>Myth: systemd is difficult.</b></p>

<p>This also is entire non-sense. A systemd platform is actually much
simpler than traditional Linuxes because it unifies
system objects and their dependencies as systemd units. The
configuration file language is very simple, and redundant
configuration files we got rid of. We provide uniform tools for much
of the configuration of the system. The system is much less
conglomerate than traditional Linuxes are. We also have pretty
comprehensive documentation (<a
href="http://www.freedesktop.org/wiki/Software/systemd">all linked
from the homepage</a>) about pretty much every detail of systemd, and
this not only covers admin/user-facing interfaces, but also developer
APIs.</p>

<p>systemd certainly comes with a learning curve. Everything
does. However, we like to believe that it is actually simpler to
understand systemd than a Shell-based boot for most people. Surprised
we say that? Well, as it turns out, Shell is not a pretty language to
learn, it's syntax is arcane and complex. systemd unit files are
substantially easier to understand, they do not expose a programming
language, but are simple and declarative by nature. That all said, if
you are experienced in shell, then yes, adopting systemd will take a
bit of learning.</p>

<p>To make learning easy we tried hard to provide the maximum
compatibility to previous solutions. But not only that, on many
distributions you'll find that some of the traditional tools will now
even tell you -- while executing what you are asking for -- how you
could do it with the newer tools instead, in a possibly nicer way.</p>

<p>Anyway, the take-away is probably that systemd is probably as
simple as such a system can be, and that we try hard to make it easy
to learn. But yes, if you know sysvinit then adopting systemd will
require a bit learning, but quite frankly if you mastered sysvinit,
then systemd should be easy for you.</p></li>

<li><p><b>Myth: systemd is not modular.</b></p>

<p>Not true at all. At compile time you have a number of
<tt>configure</tt> switches to select what you want to build, and what
not. And <a
href="http://freedesktop.org/wiki/Software/systemd/MinimalBuilds">we
document</a> how you can select in even more detail what you need,
going beyond our configure switches.</p>

<p>This modularity is not totally unlike the one of the Linux kernel,
where you can select many features individually at compile time. If the
kernel is modular enough for you then systemd should be pretty close,
too.</p></li>

<li><p><b>Myth: systemd is only for desktops.</b></p>

<p>That is certainly not true. With systemd we try to cover pretty
much the same range as Linux itself does. While we care for desktop
uses, we also care pretty much the same way for server uses, and
embedded uses as well. You can bet that Red Hat wouldn't make it a
core piece of RHEL7 if it wasn't the best option for managing services
on servers.</p>

<p>People from numerous companies work on systemd. Car manufactureres
build it into cars, Red Hat uses it for a server operating system, and
GNOME uses many of its interfaces for improving the desktop. You find
it in toys, in space telescopes, and in wind turbines.</p>

<p>Most features I most recently worked on are probably relevant
primarily on servers, such as <a
href="http://0pointer.de/blog/projects/socket-activated-containers.html">container
support</a>, <a
href="http://0pointer.de/blog/projects/resources.html">resource
management</a> or the <a
href="http://0pointer.de/blog/projects/security.html">security
features</a>. We cover desktop systems pretty well already, and there
are number of companies doing systemd development for embedded, some
even offer consulting services in it.</p></li>

<li><p><b>Myth: systemd was created as result of the NIH syndrome.</b></p>

<p>This is not true. Before we began working on systemd we were
pushing for Canonical's Upstart to be widely adopted (and Fedora/RHEL
used it too for a while). However, we eventually came to the
conclusion that its design was inherently flawed at its core (at least
in our eyes: most fundamentally, it leaves dependency management to
the admin/developer, instead of solving this hard problem in code),
and if something's wrong in the core you better replace it, rather
than fix it. This was hardly the only reason though, other things that
came into play, such as the licensing/contribution agreement mess
around it. NIH wasn't one of the reasons, though...<sup>[3]</sup></p></li>

<li><p><b>Myth: systemd is a freedesktop.org project.</b></p>

<p>Well, systemd is certainly hosted at fdo, but freedesktop.org is
little else but a repository for code and documentation. Pretty much
any coder can request a repository there and dump his stuff there (as
long as it's somewhat relevant for the infrastructure of free
systems). There's no cabal involved, no "standardization" scheme, no
project vetting, nothing. It's just a nice, free, reliable place to
have your repository. In that regard it's a bit like SourceForge,
github, kernel.org, just not commercial and without over-the-top
requirements, and hence a good place to keep our stuff.</p>

<p>So yes, we host our stuff at fdo, but the implied assumption of
this myth in that there was a group of people who meet and then agree
on how the future free systems look like, is entirely bogus.</p></li>

<li><p><b>Myth: systemd is not UNIX.</b></p>

<p>There's certainly some truth in that. systemd's sources do not
contain a single line of code originating from original UNIX. However,
we derive inspiration from UNIX, and thus there's a ton of UNIX in
systemd. For example, the UNIX idea of "everything is a file" finds
reflection in that in systemd all services are exposed at runtime in a
kernel file system, the <tt>cgroupfs</tt>. Then, one of the original
features of UNIX was multi-seat support, based on built-in terminal
support. Text terminals are hardly the state of the art how you
interface with your computer these days however. With systemd we
brought native <a
href="http://0pointer.de/blog/projects/multi-seat.html">multi-seat</a>
support back, but this time with full support for today's hardware,
covering graphics, mice, audio, webcams and more, and all that fully
automatic, hotplug-capable and without configuration. In fact the
design of systemd as a suite of integrated tools that each have their
individual purposes but when used together are more than just the sum
of the parts, that's pretty much at the core of UNIX philosophy. Then,
the way our project is handled (i.e. maintaining much of the core OS
in a single git repository) is much closer to the BSD model (which is
a true UNIX, unlike Linux) of doing things (where most of the core OS
is kept in a single CVS/SVN repository) than things on Linux ever
were.</p>

<p>Ultimately, UNIX is something different for everybody. For us
systemd maintainers it is something we derive inspiration from. For
others it is a religion, and much like the other world religions there
are different readings and understandings of it. Some define UNIX
based on specific pieces of code heritage, others see it just as a set
of ideas, others as a set of commands or APIs, and even others as a
definition of behaviours. Of course, it is impossible to ever make all
these people happy.</p>

<p>Ultimately the question whether something is UNIX or not matters
very little. Being technically excellent is hardly exclusive to
UNIX. For us, UNIX is a major influence (heck, the biggest one), but
we also have other influences. Hence in some areas systemd will be
very UNIXy, and in others a little bit less.</p></li>

<li><p><b>Myth: systemd is complex.</b></p>

<p>There's certainly some truth in that. Modern computers are complex
beasts, and the OS running on it will hence have to be complex
too. However, systemd is certainly not more complex than prior
implementations of the same components. Much rather, it's simpler, and
has less redundancy (see above). Moreover, building a simple OS based
on systemd will involve much fewer packages than a traditional Linux
did. Fewer packages makes it easier to build your system, gets rid of
interdependencies and of much of the different behaviour of every
component involved.</p></li>

<li><p><b>Myth: systemd is bloated.</b></p>

<p>Well, <i>bloated</i> certainly has many different definitions. But in
most definitions systemd is probably the opposite of bloat. Since
systemd components share a common code base, they tend to share much
more code for common code paths. Here's an example: in a traditional
Linux setup, sysvinit, start-stop-daemon, inetd, cron, dbus, all
implemented a scheme to execute processes with various configuration
options in a certain, hopefully clean environment. On systemd the code
paths for all of this, for the configuration parsing, as well as the
actual execution is shared. This means less code, less place for
mistakes, less memory and cache pressure, and is thus a very good
thing. And as a side-effect you actually get a ton more functionality
for it...</p>

<p>As mentioned above, systemd is also pretty modular. You can choose
at build time which components you need, and which you don't
need. People can hence specifically choose the level of "bloat" they
want.</p>

<p>When you build systemd, it only requires three dependencies: glibc,
libcap and dbus. That's it. It can make use of more dependencies, but
these are entirely optional.</p>

<p>So, yeah, whichever way you look at it, it's really not
<i>bloated</i>.</p></li>

<li><p><b>Myth: systemd being Linux-only is not nice to the BSDs.</b></p>

<p>Completely wrong. The BSD folks are pretty much uninterested in
systemd. If systemd was portable, this would change nothing, they
still wouldn't adopt it. And the same is true for the other Unixes in
the world. Solaris has SMF, BSD has their own "rc" system, and they
always maintained it separately from Linux. The init system is very
close to the core of the entire OS. And these other operating systems
hence define themselves among other things by their core
userspace. The assumption that they'd adopt our core userspace if we
just made it portable, is completely without any foundation.</p></li>

<li><p><b>Myth: systemd being Linux-only makes it impossible for Debian to adopt it as default.</b></p>

<p>Debian supports non-Linux kernels in their distribution. systemd
won't run on those. Is that a problem though, and should that hinder
them to adopt system as default? Not really. The folks who ported
Debian to these other kernels were willing to invest time in a massive
porting effort, they set up test and build systems, and patched and
built numerous packages for their goal. The maintainance of both a
systemd unit file and a classic init script for the packaged services
is a negligable amount of work compared to that, especially since
those scripts more often than not exist already.</p></li>

<li><p><b>Myth: systemd could be ported to other kernels if its maintainers just wanted to.</b></p>

<p>That is simply not true. Porting systemd to other kernel is not
feasible. We just use too many Linux-specific interfaces. For a few
one might find replacements on other kernels, some features one might
want to turn off, but for most this is nor really possible. Here's a
small, very incomprehensive list: <tt>cgroups, fanotify, umount2(),
/proc/self/mountinfo </tt>(including notification)<tt>, /dev/swaps </tt>(same)<tt>,
udev, netlink, </tt>the structure of<tt> /sys, /proc/$PID/comm,
/proc/$PID/cmdline, /proc/$PID/loginuid, /proc/$PID/stat,
/proc/$PID/session, /proc/$PID/exe, /proc/$PID/fd, tmpfs, devtmpfs,
</tt>capabilities, namespaces of all kinds, various<tt> prctl()s, </tt>numerous<tt>
ioctls, </tt>the<tt> mount() </tt>system call and its semantics<tt>, selinux, audit,
inotify, statfs, O_DIRECTORY, O_NOATIME, /proc/$PID/root, waitid(),
SCM_CREDENTIALS, SCM_RIGHTS, mkostemp(), /dev/input, ...</tt></p>

<p>And no, if you look at this list and pick out the few where you can
think of obvious counterparts on other kernels, then think again, and
look at the others you didn't pick, and the complexity of replacing
them.</p></li>

<li><p><b>Myth: systemd is not portable for no reason.</b></p>

<p>Non-sense! We use the Linux-specific functionality because we need
it to implement what we want. Linux has so many features that
UNIX/POSIX didn't have, and we want to empower the user with
them. These features are incredibly useful, but only if they are
actually exposed in a friendly way to the user, and that's what we do
with systemd.</p></li>

<li><p><b>Myth: systemd uses binary configuration files.</b></p>

<p>No idea who came up with this crazy myth, but it's absolutely not
true. systemd is configured pretty much exclusively via simple text
files. A few settings you can also alter with the kernel command line
and via environment variables. There's nothing binary in its
configuration (not even XML). Just plain, simple, easy-to-read text
files.</p></li>

<li><p><b>Myth: systemd is a feature creep.</b></p>

<p>Well, systemd certainly covers more ground that it used to. It's
not just an init system anymore, but the basic userspace building
block to build an OS from, but we carefully make sure to keep most of
the features optional. You can turn a lot off at compile time, and
even more at runtime. Thus you can choose freely how much feature
creeping you want.</p></li>

<li><p><b>Myth: systemd forces you to do something.</b></p>

<p>systemd is not the mafia. It's Free Software, you can do with it
whatever you want, and that includes not using it. That's pretty much
the opposite of "forcing".</p></li>

<li><p><b>Myth: systemd makes it impossible to run syslog.</b></p>

<p>Not true, we carefully made sure when <a
href="http://0pointer.de/blog/projects/the-journal.html">we introduced
the journal</a> that all data is also passed on to any syslog daemon
running. In fact, if something changed, then only that syslog gets
more complete data now than it got before, since we now cover early
boot stuff as well as STDOUT/STDERR of any system service.</p></li>

<li><p><b>Myth: systemd is incompatible.</b></p>

<p>We try very hard to provide the best possible compatibility with
sysvinit. In fact, the vast majority of init scripts should work just
fine on systemd, unmodified. However, there actually are indeed a few
incompatibilities, but we try to <a
href="http://www.freedesktop.org/wiki/Software/systemd/Incompatibilities">document
these</a> and explain what to do about them. Ultimately every system
that is not actually sysvinit itself will have a certain amount of
incompatibilities with it since it will not share the exect same code
paths.</p>

<p>It is our goal to ensure that differences between the various
distributions are kept at a minimum. That means unit files usually
work just fine on a different distribution than you wrote it on, which
is a big improvement over classic init scripts which are very hard to
write in a way that they run on multiple Linux distributions, due to
numerous incompatibilities between them.</p></li>

<li><p><b>Myth: systemd is not scriptable, because of its D-Bus use.</b></p>

<p>Not true. Pretty much every single D-Bus interface systemd provides
is also available in a command line tool, for example in <a
href="http://www.freedesktop.org/software/systemd/man/systemctl.html"><tt>systemctl</tt></a>,
<a
href="http://www.freedesktop.org/software/systemd/man/loginctl.html"><tt>loginctl</tt></a>,
<a
href="http://www.freedesktop.org/software/systemd/man/timedatectl.html"><tt>timedatectl</tt></a>,
<a
href="http://www.freedesktop.org/software/systemd/man/hostnamectl.html"><tt>hostnamectl</tt></a>,
<a
href="http://www.freedesktop.org/software/systemd/man/localectl.html"><tt>localectl</tt></a>
and suchlike. You can easily call these tools from shell scripts, they
open up pretty much the entire API from the command line with
easy-to-use commands.</p>

<p>That said, D-Bus actually has bindings for almost any scripting
language this world knows. Even from the shell you can invoke
arbitrary D-Bus methods with <a
href="http://dbus.freedesktop.org/doc/dbus-send.1.html">dbus-send</a>
or <a
href="http://developer.gnome.org/gio/unstable/gdbus.html">gdbus</a>. If
anything, this improves scriptability due to the good support of D-Bus
in the various scripting languages.</p></li>

<li><p><b>Myth: systemd requires you to use some arcane configuration
tools instead of allowing you to edit your configuration files
directly.</b></p>

<p>Not true at all. We offer some configuration tools, and using them
gets you a bit of additional functionality (for example, command line
completion for all settings!), but there's no need at all to use
them. You can always edit the files in question directly if you wish,
and that's fully supported. Of course sometimes you need to explicitly
reload configuration of some daemon after editing the configuration,
but that's pretty much true for most UNIX services.</p></li>

<li><p><b>Myth: systemd is unstable and buggy.</b></p>

<p>Certainly not according to our data. We have been monitoring the
Fedora bug tracker (and some others) closely for a long long time. The
number of bugs is very low for such a central component of the OS,
especially if you discount the numerous RFE bugs we track for the
project. We are pretty good in keeping systemd out of the list of
blocker bugs of the distribution. We have a relatively fast
development cycle with mostly incremental changes to keep quality and
stability high.</p></li>

<li><p><b>Myth: systemd is not debuggable.</b></p>

<p>False. Some people try to imply that the shell was a good
debugger. Well, it isn't really. In systemd we provide you with actual
debugging features instead. For example: interactive debugging,
verbose tracing, the ability to mask any component during boot, and
more. Also, we provide <a
href="http://freedesktop.org/wiki/Software/systemd/Debugging">documentation
for it</a>.</p>

<p>It's certainly well debuggable, we needed that for our own
development work, after all. But we'll grant you one thing: it uses
different debugging tools, we believe more appropriate ones for the
purpose, though.</p></li>

<li><p><b>Myth: systemd makes changes for the changes' sake.</b></p>

<p>Very much untrue. We pretty much exclusively have technical
reasons for the changes we make, and we explain them in the various
pieces of documentation, wiki pages, blog articles, mailing list
announcements. We try hard to avoid making incompatible changes, and
if we do we try to document the why and how in detail. And if you
wonder about something, just ask us!</p></li>

<li><p><b>Myth: systemd is a Red-Hat-only project, is private property
of some smart-ass developers, who use it to push their views to the
world.</b></p>

<p>Not true. Currently, there are 16 hackers with commit powers to the
systemd git tree. Of these 16 only six are employed by Red Hat. The 10
others are folks from ArchLinux, from Debian, from Intel, even from
Canonical, Mandriva, Pantheon and a number of community folks with
full commit rights. And they frequently commit big stuff, major
changes. Then, there are 374 individuals with patches in our tree, and
they too came from a number of different companies and backgrounds,
and many of those have way more than one patch in the tree. The
discussions about where we want to take systemd are done in the open,
on our IRC channel (<tt>#systemd</tt> on freenode, you are always
weclome), on our <a
href="http://lists.freedesktop.org/mailman/listinfo/systemd-devel">mailing
list</a>, and on public hackfests (<a
href="https://plus.google.com/events/cnklef88b85tb6tgf6ue3hn32lg">such
as our next one in Brno</a>, you are invited). We regularly attend
various conferences, to collect feedback, to explain what we are doing
and why, like few others do. We <a
href="http://0pointer.de/blog">maintain blogs</a>, engage in social
networks (<a
href="https://plus.google.com/104232583922197692623/posts">we actually
have some pretty interesting content on Google+</a>, and our <a
href="https://plus.google.com/communities/114587707547576757881">Google+
Community is pretty alive, too</a>.), and try really hard to explain
the why and the how how we do things, and to listen to feedback and
figure out where the current issues are (for example, from that
feedback we compiled this lists of often heard myths about
systemd...).</p>

<p>What most systemd contributors probably share is a rough idea how a
good OS should look like, and the desire to make it happen. However,
by the very nature of the project being Open Source, and rooted in the
community systemd is just what people want it to be, and if it's not
what they want then they can drive the direction with patches and
code, and if that's not feasible, then there are numerous other
options to use, too, systemd is never exclusive.</p>

<p>One goal of systemd is to unify the dispersed Linux landscape a
bit. We try to get rid of many of the more pointless differences of
the various distributions in various areas of the core OS. As part of
that we sometimes adopt schemes that were previously used by only one
of the distributions and push it to a level where it's the default of
systemd, trying to gently push everybody towards the same set of basic
configuration. This is never exclusive though, distributions can
continue to deviate from that if they wish, however, if they end-up
using the well-supported default their work becomes much easier and
they might gain a feature or two. Now, as it turns out, more
frequently than not we actually adopted schemes that where Debianisms,
rather than Fedoraisms/Redhatisms as best supported scheme by
systemd. For example, systems running systemd now generally store
their hostname in <tt>/etc/hostname</tt>, something that used to be
specific to Debian and now is used across distributions.</p>

<p>One thing we'll grant you though, we sometimes can be
smart-asses. We try to be prepared whenever we open our mouth, in
order to be able to back-up with facts what we claim. That might make
us appear as smart-asses.</p>

<p>But in general, yes, some of the more influental contributors of
systemd work for Red Hat, but they are in the minority, and systemd is
a healthy, open community with different interests, different
backgrounds, just unified by a few rough ideas where the trip should
go, a community where code and its design counts, and certainly not
company affiliation.</p></li>

<li><p><b>Myth: systemd doesn't support <tt>/usr</tt> split from the root directory.</b></p>

<p>Non-sense. Since its beginnings systemd supports the
<tt>--with-rootprefix=</tt> option to its <tt>configure</tt> script
which allows you to tell systemd to neatly split up the stuff needed
for early boot and the stuff needed for later on. All this logic is
fully present and we keep it up-to-date right there in systemd's build
system.</p>

<p>Of course, we still don't think that <a
href="http://freedesktop.org/wiki/Software/systemd/separate-usr-is-broken">actually
booting with <tt>/usr</tt> unavailable is a good idea</a>, but we
support this just fine in our build system. This won't fix the
inherent problems of the scheme that you'll encounter all across the
board, but you can't blame that on systemd, because in systemd we
support this just fine.</p></li>

<li><p><b>Myth: systemd doesn't allow your to replace its components.</b></p>

<p>Not true, you can turn off and replace pretty much any part of
systemd, with very few exceptions. And those exceptions (such as
journald) generally allow you to run an alternative side by side to
it, while cooperating nicely with it.</p></li>

<li><p><b>Myth: systemd's use of D-Bus instead of sockets makes it intransparent.</b></p>

<p>This claim is already contradictory in itself: D-Bus uses sockets
as transport, too. Hence whenever D-Bus is used to send something
around, a socket is used for that too. D-Bus is mostly a standardized
serialization of messages to send over these sockets. If anything this
makes it more transparent, since this serialization is well
documented, understood and there are numerous tracing tools and
language bindings for it. This is very much unlike the usual
homegrown protocols the various classic UNIX daemons use to
communicate locally.</p></li>

</ol>

<p>Hmm, did I write I just wanted to debunk a "few" myths? Maybe these
were more than just a few... Anyway, I hope I managed to clear up a
couple of misconceptions. Thanks for your time.</p>

<p><small><b>Footnotes</b></small></p>

<p><small>[1] For example, <a
href="http://www.freedesktop.org/software/systemd/man/systemd-detect-virt.html"><tt>systemd-detect-virt</tt></a>,
<a
href="http://www.freedesktop.org/software/systemd/man/systemd-tmpfiles.html"><tt>systemd-tmpfiles</tt></a>,
<a href="http://www.freedesktop.org/software/systemd/man/systemd-udevd.service.html"><tt>systemd-udevd</tt></a> are.</small></p>

<p><small>[2] Also, we are trying to do our little part on maybe
making this better. By exposing boot-time performance of the firmware
more prominently in systemd's boot output we hope to shame the
firmware writers to clean up their stuff.</small></p>

<p><small>[3] And anyways, guess which project includes a library "lib<i>nih</i>" -- Upstart or systemd?<sup>[4]</sup></small></p>

<p><small>[4] Hint: it's not systemd!</small></p>

]]></description>
</item>

<item>
  <title>systemd for Administrators, Part XX</title>
  <link>http://0pointer.de/blog/projects/socket-activated-containers.html</link>
  <description><![CDATA[

<p> <a
href="http://0pointer.de/blog/projects/detect-virt.html">This is</a> <a
href="http://0pointer.de/blog/projects/resources.html">no</a> <a
href="http://0pointer.de/blog/projects/journalctl.html">time</a> <a
href="http://0pointer.de/blog/projects/serial-console.html">for</a> <a
href="http://0pointer.de/blog/projects/watchdog.html">procrastination,</a>
<a
href="http://0pointer.de/blog/projects/self-documented-boot.html">here</a>
<a
href="http://0pointer.de/blog/projects/systemctl-journal.html">is</a>
<a href="http://0pointer.de/blog/projects/security.html">already</a> <a
href="http://0pointer.de/blog/projects/inetd.html">the</a> <a
href="http://0pointer.de/blog/projects/instances.html">twentieth</a>
<a
href="http://0pointer.de/blog/projects/on-etc-sysinit.html">installment</a>
<a
href="http://0pointer.de/blog/projects/the-new-configuration-files.html">of</a>

<a href="http://0pointer.de/blog/projects/blame-game.html">my</a> <a
href="http://0pointer.de/blog/projects/changing-roots">ongoing</a> <a
href="http://0pointer.de/blog/projects/three-levels-of-off.html">series</a>
<a
href="http://0pointer.de/blog/projects/systemd-for-admins-4.html">on</a>
<a
href="http://0pointer.de/blog/projects/systemd-for-admins-3.html">systemd</a>
<a
href="http://0pointer.de/blog/projects/systemd-for-admins-2.html">for</a>
<a
href="http://0pointer.de/blog/projects/systemd-for-admins-1.html">Administrators</a>:</p>

<h4>Socket Activated Internet Services and OS Containers</h4>

<p><a
href="http://0pointer.de/blog/projects/socket-activation.html">Socket</a>
<a
href="http://0pointer.de/blog/projects/socket-activation2.html">Activation</a>
is an important feature of <a
href="http://www.freedesktop.org/wiki/Software/systemd/">systemd</a>. When
we <a href="http://0pointer.de/blog/projects/systemd.html">first
announced</a> systemd we already tried to make the point how great
socket activation is for increasing parallelization and robustness of
socket services, but also for simplifying the dependency logic of the
boot. In this episode I'd like to explain why socket activation is an
important tool for drastically improving how many services and even
containers you can run on a single system with the same resource
usage. Or in other words, how you can drive up the density of customer
sites on a system while spending less on new hardware.</p>

<h5>Socket Activated Internet Services</h5>

<p>First, let's take a step back. What was <i>socket activation</i> again? --
Basically, socket activation simply means that systemd sets up
listening sockets (IP or otherwise) on behalf of your services
(without these running yet), and then starts (<i>activates</i>) the
services as soon as the first connection comes in. Depending on the
technology the services might idle for a while after having processed
the connection and possible follow-up connections before they exit on
their own, so that systemd will again listen on the sockets and
activate the services again the next time they are connected to. For
the client it is not visible whether the service it is interested in
is currently running or not. The service's IP socket stays continously
connectable, no connection attempt ever fails, and all connects will
be processed promptly.</p>

<p>A setup like this lowers resource usage: as services are only
running when needed they only consume resources when required. Many
internet sites and services can benefit from that. For example, web
site hosters will have noticed that of the multitude of web sites that
are on the Internet only a tiny fraction gets a continous stream of
requests: the huge majority of web sites still needs to be available
all the time but gets requests only very unfrequently. With a scheme
like socket activation you take benefit of this. By hosting many of
these sites on a single system like this and only activating their
services as necessary allows a large degree of over-commit: you can
run more sites on your system than the available resources actually
allow. Of course, one shouldn't over-commit too much to avoid
contention during peak times.</p>

<p>Socket activation like this is easy to use in systemd. Many modern
Internet daemons already support socket activation out of the box (and
for those which don't yet it's <a
href="http://0pointer.de/blog/projects/socket-activation.html">not</a>
<a
href="http://0pointer.de/blog/projects/socket-activation2.html">hard</a>
to add). Together with systemd's <a
href="http://0pointer.de/blog/projects/instances.html">instantiated
units support</a> it is easy to write a pair of service and socket
templates that then may be instantiated multiple times, once for each
site. Then, (optionally) make use of some of the <a
href="http://0pointer.de/blog/projects/security.html">security
features</a> of systemd to nicely isolate the customer's site's
services from each other (think: each customer's service should only
see the home directory of the customer, everybody else's directories
should be invisible), and there you go: you now have a highly scalable
and reliable server system, that serves a maximum of securely
sandboxed services at a minimum of resources, and all nicely done with
built-in technology of your OS.</p>

<p>This kind of setup is already in production use in a number of
companies. For example, the great folks at <a
href="https://www.getpantheon.com/">Pantheon</a> are running their
scalable instant Drupal system on a setup that is similar to this. (In
fact, Pantheon's David Strauss pioneered this scheme. David, you
rock!)</p>

<h5>Socket Activated OS Containers</h5>

<p>All of the above can already be done with older versions of
systemd. If you use a distribution that is based on systemd, you can
right-away set up a system like the one explained above. But let's
take this one step further. With systemd 197 (to be included in Fedora
19), we added support for socket activating not only individual
services, but <i>entire</i> OS containers. And I really have to say it
at this point: this is stuff I am really excited
about. ;-)</p>

<p>Basically, with socket activated OS containers, the host's systemd
instance will listen on a number of ports on behalf of a container,
for example one for SSH, one for web and one for the database, and as
soon as the first connection comes in, it will spawn the container
this is intended for, and pass to it all three sockets. Inside of the
container, another systemd is running and will accept the sockets and
then distribute them further, to the services running inside the
container using normal socket activation. The SSH, web and database
services will only see the inside of the container, even though they
have been activated by sockets that were originally created on the
host! Again, to the clients this all is not visible. That an entire OS
container is spawned, triggered by simple network connection is entirely
transparent to the client side.<sup>[1]</sup></p>

<p>The OS containers may contain (as the name suggests) a full
operating system, that might even be a different distribution than is
running on the host. For example, you could run your host on Fedora,
but run a number of Debian containers inside of it. The OS containers
will have their own systemd init system, their own SSH instances,
their own process tree, and so on, but will share a number of other
facilities (such as memory management) with the host.</p>

<p>For now, only systemd's own trivial container manager, <a
href="http://0pointer.de/blog/projects/changing-roots">systemd-nspawn</a>
has been updated to support this kind of socket activation. We hope
that <a href="http://libvirt.org/drvlxc.html">libvirt-lxc</a> will
soon gain similar functionality. At this point, let's see in more
detail how such a setup is configured in systemd using nspawn:</p>

<p>First, please use a tool such as <tt>debootstrap</tt> or yum's
<tt>--installroot</tt> to set up a container OS
tree<sup>[2]</sup>. The details of that are a bit out-of-focus
for this story, there's plenty of documentation around how to do
this. Of course, make sure you have systemd v197 installed inside
the container. For accessing the container from the command line,
consider using <a
href="http://0pointer.de/blog/projects/changing-roots">systemd-nspawn</a>
itself. After you configured everything properly, try to boot it up
from the command line with systemd-nspawn's <tt>-b</tt> switch.</p>

<p>Assuming you now have a working container that boots up fine, let's
write a service file for it, to turn the container into a systemd
service on the host you can start and stop. Let's create
<tt>/etc/systemd/system/mycontainer.service</tt> on the host:</p>

<pre>
[Unit]
Description=My little container

[Service]
ExecStart=/usr/bin/systemd-nspawn -jbD /srv/mycontainer 3
KillMode=process
</pre>

<p>This service can already be started and stopped via <tt>systemctl
start</tt> and <tt>systemctl stop</tt>. However, there's no nice way
to actually get a shell prompt inside the container. So let's add SSH
to it, and even more: let's configure SSH so that a connection to the
container's SSH port will socket-activate the entire container. First,
let's begin with telling the host that it shall now listen on the SSH
port of the container. Let's create
<tt>/etc/systemd/system/mycontainer.socket</tt> on the host:</p>

<pre>
[Unit]
Description=The SSH socket of my little container

[Socket]
ListenStream=23
</pre>

<p>If we start this unit with <tt>systemctl start</tt> on the host
then it will listen on port 23, and as soon as a connection comes in
it will activate our container service we defined above. We pick port
23 here, instead of the usual 22, as our host's SSH is already
listening on that. nspawn virtualizes the process list and the file
system tree, but does not actually virtualize the network stack, hence
we just pick different ports for the host and the various containers
here.</p>

<p>Of course, the system inside the container doesn't yet know what to
do with the socket it gets passed due to socket activation. If you'd
now try to connect to the port, the container would start-up but the
incoming connection would be immediately closed since the container
can't handle it yet. Let's fix that!</p>

<p>All that's necessary for that is teach SSH inside the container
socket activation. For that let's simply write a pair of socket and
service units for SSH. Let's create
<tt>/etc/systemd/system/sshd.socket</tt> in the container:</p>

<pre>[Unit]
Description=SSH Socket for Per-Connection Servers

[Socket]
ListenStream=23
Accept=yes</pre>

<p>Then, let's add the matching SSH service file
<tt>/etc/systemd/system/sshd@.service</tt> in the container:</p>

<pre>[Unit]
Description=SSH Per-Connection Server for %I

[Service]
ExecStart=-/usr/sbin/sshd -i
StandardInput=socket</pre>

<p>Then, make sure to hook <tt>sshd.socket</tt> into the
<tt>sockets.target</tt> so that unit is started automatically when the
container boots up:</p>

<pre>ln -s /etc/systemd/system/sshd.socket /etc/systemd/system/sockets.target.wants/</pre>

<p>And that's it. If we now activate <tt>mycontainer.socket</tt> on
the host, the host's systemd will bind the socket and we can connect
to it. If we do this, the host's systemd will activate the container,
and pass the socket in to it. The container's systemd will then take
the socket, match it up with <tt>sshd.socket</tt> inside the
container. As there's still our incoming connection queued on it, it
will then immediately trigger an instance of <tt>sshd@.service</tt>,
and we'll have our login.</p>

<p>And that's already everything there is to it. You can easily add
additional sockets to listen on to
<tt>mycontainer.socket</tt>. Everything listed therein will be passed
to the container on activation, and will be matched up as good as
possible with all socket units configured inside the
container. Sockets that cannot be matched up will be closed, and
sockets that aren't passed in but are configured for listening will be
bound be the container's systemd instance.</p>

<p>So, let's take a step back again. What did we gain through all of
this? Well, basically, we can now offer a number of full OS containers
on a single host, and the containers can offer their services without
running continously. The density of OS containers on the host can
hence be increased drastically.</p>

<p>Of course, this only works for kernel-based virtualization, not for
hardware virtualization. i.e. something like this can only be
implemented on systems such as libvirt-lxc or nspawn, but not in
qemu/kvm.</p>

<p>If you have a number of containers set up like this, here's one
cool thing the journal allows you to do. If you pass <tt>-m</tt> to
<tt>journalctl</tt> on the host, it will automatically discover the
journals of all local containers and interleave them on
display. Nifty, eh?</p>

<p>With systemd 197 you have everything to set up your own socket
activated OS containers on-board. However, there are a couple of
improvements we're likely to add soon: for example, right now even if
all services inside the container exit on idle, the container still
will stay around, and we really should make it exit on idle too, if
all its services exited and no logins are around. As it turns out we
already have much of the infrastructure for this around: we can reuse
the auto-suspend functionality we added for laptops: detecting when a
laptop is idle and suspending it then is a very similar problem to
detecting when a container is idle and shutting it down then.</p>

<p>Anyway, this blog story is already way too long. I hope I haven't
lost you half-way already with all this talk of virtualization,
sockets, services, different OSes and stuff. I hope this blog story is
a good starting point for setting up powerful highly scalable server
systems. If you want to know more, consult the documentation and drop
by our IRC channel. Thank you!</p>

<p><small><b>Footnotes</b></small></p>

<p><small>[1] And BTW, <a
href="https://plus.google.com/115547683951727699051/posts/cVrLAJ8HYaP">this
is another reason</a> why fast boot times the way systemd offers them
are actually a really good thing on servers, too.</small></p>

<p><small>[2] To make it easy: you need a command line such as <tt>yum
--releasever=19 --nogpg --installroot=/srv/mycontainer/ --disablerepo='*'
--enablerepo=fedora install systemd passwd yum fedora-release vim-minimal </tt>
to install Fedora, and <tt>debootstrap --arch=amd64 unstable
/srv/mycontainer/</tt> to install Debian. Also see the bottom of <a
href="http://www.freedesktop.org/software/systemd/man/systemd-nspawn.html">systemd-nspawn(1)</a>.
Also note that auditing is currently broken for containers, and if enabled in
the kernel will cause all kinds of errors in the container. Use
<tt>audit=0</tt> on the host's kernel command line to turn it off.</small></p>

]]></description>
</item>

<item>
  <title>systemd for Administrators, Part XIX</title>
  <link>http://0pointer.de/blog/projects/detect-virt.html</link>
  <description><![CDATA[

<p> <a
href="http://0pointer.de/blog/projects/resources.html">Happy</a> <a
href="http://0pointer.de/blog/projects/journalctl.html">new</a> <a
href="http://0pointer.de/blog/projects/serial-console.html">year</a>
<a href="http://0pointer.de/blog/projects/watchdog.html">2013!</a> <a
href="http://0pointer.de/blog/projects/self-documented-boot.html">Here</a>
<a
href="http://0pointer.de/blog/projects/systemctl-journal.html">is</a>
<a href="http://0pointer.de/blog/projects/security.html">now</a> <a
href="http://0pointer.de/blog/projects/inetd.html">the</a> <a
href="http://0pointer.de/blog/projects/instances.html">nineteenth</a>
<a
href="http://0pointer.de/blog/projects/on-etc-sysinit.html">installment</a>
<a
href="http://0pointer.de/blog/projects/the-new-configuration-files.html">of</a>

<a href="http://0pointer.de/blog/projects/blame-game.html">my</a> <a
href="http://0pointer.de/blog/projects/changing-roots">ongoing</a> <a
href="http://0pointer.de/blog/projects/three-levels-of-off.html">series</a>
<a
href="http://0pointer.de/blog/projects/systemd-for-admins-4.html">on</a>
<a
href="http://0pointer.de/blog/projects/systemd-for-admins-3.html">systemd</a>
<a
href="http://0pointer.de/blog/projects/systemd-for-admins-2.html">for</a>
<a
href="http://0pointer.de/blog/projects/systemd-for-admins-1.html">Administrators</a>:</p>

<h4>Detecting Virtualization</h4>

<p>When we started working on <a
href="http://www.freedesktop.org/wiki/Software/systemd/">systemd</a>
we had a closer look on what the various existing init scripts used on
Linux where actually doing. Among other things we noticed that a
number of them where checking explicitly whether they were running in
a virtualized environment (i.e. in a kvm, VMWare, LXC guest or
suchlike) or not. Some init scripts disabled themselves in such
cases<sup>[1]</sup>, others enabled themselves only in such
cases<sup>[2]</sup>. Frequently, it would probably have been a better
idea to check for other conditions rather than explicitly checking for
virtualization, but after looking at this from all sides we came to
the conclusion that in many cases explicitly conditionalizing services
based on detected virtualization is a valid thing to do. As a result
we added a new configuration option to systemd that can be used to
conditionalize services this way: <a
href="http://www.freedesktop.org/software/systemd/man/systemd.unit.html"><tt>ConditionVirtualization</tt></a>;
we also added a small tool that can be used in shell scripts to detect
virtualization: <a
href="http://www.freedesktop.org/software/systemd/man/systemd-detect-virt.html"><tt>systemd-detect-virt(1)</tt></a>;
and finally, we added a minimal bus interface to query this from other
applications.</p>

<p>Detecting whether your code is run inside a virtualized environment
<a
href="http://cgit.freedesktop.org/systemd/systemd/tree/src/shared/virt.c#n30">is
actually not that hard</a>. Depending on what precisely you want to
detect it's little more than running the CPUID instruction and maybe
checking a few files in <tt>/sys</tt> and <tt>/proc</tt>. The
complexity is mostly about knowing the strings to look for, and
keeping this list up-to-date. Currently, the the virtualization
detection code in systemd can detect the following virtualization
systems:</p>

<ul><li><p>Hardware virtualization (i.e. VMs):</p>
<ul><li>qemu</li>
<li>kvm</li>
<li>vmware</li>
<li>microsoft</li>
<li>oracle</li>
<li>xen</li>
<li>bochs</li>
</ul></li>
<li><p>Same-kernel virtualization (i.e. containers):</p>
<ul><li>chroot</li>
<li>openvz</li>
<li>lxc</li>
<li>lxc-libvirt</li>
<li><a href="http://0pointer.de/blog/projects/changing-roots">systemd-nspawn</a></li>
</ul></li></ul>

<p>Let's have a look how one may make use if this functionality.</p>

<h5>Conditionalizing Units</h5>

<p>Adding <a
href="http://www.freedesktop.org/software/systemd/man/systemd.unit.html"><tt>ConditionVirtualization</tt></a>
to the <tt>[Unit]</tt> section of a unit file is enough to
conditionalize it depending on which virtualization is used or whether
one is used at all. Here's an example:</p>

<pre>[Unit]
Name=My Foobar Service (runs only only on guests)
ConditionVirtualization=yes

[Service]
ExecStart=/usr/bin/foobard</pre>

<p>Instead of specifiying "<tt>yes</tt>" or "<tt>no</tt>" it is possible
to specify the ID of a specific virtualization solution (Example:
"<tt>kvm</tt>", "<tt>vmware</tt>", ...), or either
"<tt>container</tt>" or "<tt>vm</tt>" to check whether the kernel is
virtualized or the hardware. Also, checks can be prefixed with an exclamation mark ("!") to invert a check. For further details see the <a
href="http://www.freedesktop.org/software/systemd/man/systemd.unit.html">manual page</a>.</p>

<h5>In Shell Scripts</h5>

<p>In shell scripts it is easy to check for virtualized systems with
the <a
href="http://www.freedesktop.org/software/systemd/man/systemd-detect-virt.html"><tt>systemd-detect-virt(1)</tt></a>
tool. Here's an example:</p>

<pre>
if systemd-detect-virt -q ; then
        echo "Virtualization is used:" `systemd-detect-virt`
else
        echo "No virtualization is used."
fi</pre>

<p>If this tool is run it will return with an exit code of zero
(success) if a virtualization solution has been found, non-zero
otherwise. It will also print a short identifier of the used
virtualization solution, which can be suppressed with
<tt>-q</tt>. Also, with the <tt>-c</tt> and <tt>-v</tt> parameters it is
possible to detect only kernel or only hardware virtualization
environments. For further details see the <a
href="http://www.freedesktop.org/software/systemd/man/systemd-detect-virt.html">manual
page</a>.</p>

<h5>In Programs</h5>

<p>Whether virtualization is available is also exported on the system bus:</p>

<pre>$ gdbus call --system --dest org.freedesktop.systemd1 --object-path /org/freedesktop/systemd1 --method org.freedesktop.DBus.Properties.Get org.freedesktop.systemd1.Manager Virtualization
(&lt;'systemd-nspawn'&gt;,)</pre>

<p>This property contains the empty string if no virtualization is
detected. Note that some container environments cannot be detected
directly from unprivileged code. That's why we expose this property on
the bus rather than providing a library -- the bus implicitly solves
the privilege problem quite nicely.</p>

<p>Note that all of this will only ever detect and return information
about the "inner-most" virtualization solution. If you stack
virtualization ("We must go deeper!") then these interfaces will
expose the one the code is most directly interfacing
with. Specifically that means that if a container solution is used
inside of a VM, then only the container is generally detected and
returned.</p>

<p><small><b>Footonotes</b></small></p>

<p><small>[1] For example: running certain device management service in a
container environment that has no access to any physical hardware makes little sense.</small></p>

<p><small>[2] For example: some VM solutions work best if certain
vendor-specific userspace components are running that connect the
guest with the host in some way.</small></p>

]]></description>
</item>

<item>
  <title>Third Berlin Open Source Meetup</title>
  <link>http://0pointer.de/blog/projects/berlin-open-source-meetup-3.html</link>
  <description><![CDATA[

<p>The Third <a href="https://plus.google.com/u/0/events/c3f3a8go99cn72n8rsosbj7djks">Berlin Open Source Meetup</a> is going to take place on Sunday, January 20th. You are invited!</p>

<p>It's a public event, so everybody is welcome, and please feel free to invite others!</p>

]]></description>
</item>

<item>
  <title>foss.in Needs Your Funding!</title>
  <link>http://0pointer.de/blog/projects/fossin2012-2.html</link>
  <description><![CDATA[

<p>One of the most exciting conferences in the Free Software world, <a
href="http://foss.in/">foss.in</a> in Bangalore, India has <a
href="http://atulchitnis.net/2012/sponsoring-foss-in/">trouble finding enough
sponsoring</a> for this year's edition. <a
href="http://foss.in/2012/take-one-speakers-at-foss-in2012">Many speakers from
all around the Free Software world</a> (including yours truly) have signed up
to present at the event, and the conference would appreciate any corporate
funding they can get!</p>

<p><a href="http://atulchitnis.net/2012/sponsoring-foss-in/">Please check if
your company can help</a> and <a href="http://foss.in/sponsors">contact the
organizers</a> for details!</p>

<p>See you in Bangalore!</p>

<p><a href="http://foss.in"><img src="http://foss.in/wp-content/uploads/2008/11/speaking_250px.jpg" alt="FOSS.IN" width="250" height="250" border="0" /></a></p>

]]></description>
</item>

<item>
  <title>systemd for Developers III</title>
  <link>http://0pointer.de/blog/projects/journal-submit.html</link>
  <description><![CDATA[

<p>Here's the third episode of <a href="http://0pointer.de/blog/projects/socket-activation.html">of my</a>
<a href="http://0pointer.de/blog/projects/socket-activation2.html"><i>systemd for Developers</i></a> series.</p>

<h4>Logging to the Journal</h4>

<p>In a <a
href="http://0pointer.de/blog/projects/journalctl.html">recent blog
story</a> intended for administrators I shed some light on how to use
the <a
href="http://www.freedesktop.org/software/systemd/man/journalctl.html">journalctl(1)</a>
tool to browse and search the systemd journal. In this blog story for developers
I want to explain a little how to get log data into the <a href="http://www.freedesktop.org/wiki/Software/systemd">systemd</a>
Journal in the first place.</p>

<p>The good thing is that getting log data into the Journal is not
particularly hard, since there's a good chance the Journal already
collects it anyway and writes it to disk. The journal collects:</p>

<ol>
<li>All data logged via libc <tt>syslog()</tt></li>
<li>The data from the kernel logged with <tt>printk()</tt></li>
<li>Everything written to STDOUT/STDERR of any system service</li>
</ol>

<p>This covers pretty much all of the traditional log output of a
Linux system, including messages from the kernel initialization phase,
the initial RAM disk, the early boot logic, and the main system
runtime.</p>

<h4>syslog()</h4>

<p>Let's have a quick look how <tt>syslog()</tt> is used again. Let's
write a journal message using this call:</p>

<pre>#include &lt;syslog.h&gt;

int main(int argc, char *argv[]) {
        syslog(LOG_NOTICE, "Hello World");
        return 0;
}</pre>

<p>This is C code, of course. Many higher level languages provide APIs
that allow writing local syslog messages. Regardless which language
you choose, all data written like this ends up in the Journal.</p>

<p>Let's have a look how this looks after it has been written into the
journal (this is the <a
href="http://www.freedesktop.org/wiki/Software/systemd/json">JSON
output</a> <tt>journalctl -o json-pretty</tt> generates):</p>

<pre>{
        "_BOOT_ID" : "5335e9cf5d954633bb99aefc0ec38c25",
        "_TRANSPORT" : "syslog",
        "PRIORITY" : "5",
        "_UID" : "500",
        "_GID" : "500",
        "_AUDIT_SESSION" : "2",
        "_AUDIT_LOGINUID" : "500",
        "_SYSTEMD_CGROUP" : "/user/lennart/2",
        "_SYSTEMD_SESSION" : "2",
        "_SELINUX_CONTEXT" : "unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023",
        "_MACHINE_ID" : "a91663387a90b89f185d4e860000001a",
        "_HOSTNAME" : "epsilon",
        "_COMM" : "test-journal-su",
        "_CMDLINE" : "./test-journal-submit",
        "SYSLOG_FACILITY" : "1",
        "_EXE" : "/home/lennart/projects/systemd/test-journal-submit",
        "_PID" : "3068",
        "SYSLOG_IDENTIFIER" : "test-journal-submit",
        "MESSAGE" : "Hello World!",
        "_SOURCE_REALTIME_TIMESTAMP" : "1351126905014938"
}</pre>

<p>This nicely shows how the Journal implicitly augmented our little
log message with various meta data fields which describe in more
detail the context our message was generated from. For an explanation
of the various fields, please refer to <a
href="http://www.freedesktop.org/software/systemd/man/systemd.journal-fields.html">systemd.journal-fields(7)</a></p>

<h4>printf()</h4>

<p>If you are writing code that is run as a systemd service, generating journal
messages is even easier:</p>

<pre>#include &lt;stdio.h&gt;

int main(int argc, char *argv[]) {
        printf("Hello World\n");
        return 0;
}</pre>

<p>Yupp, that's easy, indeed.</p>

<p>The printed string in this example is logged at a default log
priority of LOG_INFO<sup>[1]</sup>. Sometimes it is useful to change
the log priority for such a printed string. When systemd parses
STDOUT/STDERR of a service it will look for priority values enclosed
in &lt; &gt; at the beginning of each line<sup>[2]</sup>, following the scheme
used by the kernel's <tt>printk()</tt> which in turn took
inspiration from the BSD syslog network serialization of messages. We
can make use of this systemd feature like this:</p>

<pre>#include &lt;stdio.h&gt;

#define PREFIX_NOTICE "&lt;5&gt;"

int main(int argc, char *argv[]) {
        printf(PREFIX_NOTICE "Hello World\n");
        return 0;
}</pre>

<p>Nice! Logging with nothing but <tt>printf()</tt> but we still get
log priorities!</p>

<p>This scheme works with any programming language, including, of course, shell:</p>

<pre>#!/bin/bash

echo "&lt;5&gt;Hellow world"</pre>

<h4>Native Messages</h4>

<p>Now, what I explained above is not particularly exciting: the
take-away is pretty much only that things end up in the journal if
they are output using the traditional message printing APIs. Yaaawn!</p>

<p>Let's make this more interesting, let's look at what the Journal
provides as native APIs for logging, and let's see what its benefits
are. Let's translate our little example into the 1:1 counterpart
using the Journal's logging API <a
href="http://0pointer.de/public/systemd-man/sd_journal_print.html"><tt>sd_journal_print(3)</tt></a>:</p>

<pre>#include &lt;systemd/sd-journal.h&gt;

int main(int argc, char *argv[]) {
        sd_journal_print(LOG_NOTICE, "Hello World");
        return 0;
}</pre>

<p>This doesn't look much more interesting than the two examples
above, right? After compiling this with <tt>`pkg-config --cflags
--libs libsystemd-journal`</tt> appended to the compiler parameters,
let's have a closer look at the JSON representation of the journal
entry this generates:</p>

<pre> {
        "_BOOT_ID" : "5335e9cf5d954633bb99aefc0ec38c25",
        "PRIORITY" : "5",
        "_UID" : "500",
        "_GID" : "500",
        "_AUDIT_SESSION" : "2",
        "_AUDIT_LOGINUID" : "500",
        "_SYSTEMD_CGROUP" : "/user/lennart/2",
        "_SYSTEMD_SESSION" : "2",
        "_SELINUX_CONTEXT" : "unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023",
        "_MACHINE_ID" : "a91663387a90b89f185d4e860000001a",
        "_HOSTNAME" : "epsilon",
<b>        "CODE_FUNC" : "main",</b>
        "_TRANSPORT" : "journal",
        "_COMM" : "test-journal-su",
        "_CMDLINE" : "./test-journal-submit",
<b>        "CODE_FILE" : "src/journal/test-journal-submit.c",</b>
        "_EXE" : "/home/lennart/projects/systemd/test-journal-submit",
        "MESSAGE" : "Hello World",
<b>        "CODE_LINE" : "4",</b>
        "_PID" : "3516",
        "_SOURCE_REALTIME_TIMESTAMP" : "1351128226954170"
}</pre>

<p>This looks pretty much the same, right? Almost! I highlighted three new
lines compared to the earlier output. Yes, you guessed it, by using
<tt>sd_journal_print()</tt> meta information about the generating
source code location is implicitly appended to each
message<sup>[3]</sup>, which is helpful for a developer to identify
the source of a problem.</p>

<p>The primary reason for using the Journal's native logging APIs is a
not just the source code location however: it is to allow
passing additional structured log messages from the program into the
journal. This additional log data may the be used to search the
journal for, is available for consumption for other programs, and
might help the administrator to track down issues beyond what is
expressed in the human readable message text. Here's and example how
to do that with <tt>sd_journal_send()</tt>:</p>

<pre>#include &lt;systemd/sd-journal.h&gt;
#include &lt;unistd.h&gt;
#include &lt;stdlib.h&gt;

int main(int argc, char *argv[]) {
        sd_journal_send("MESSAGE=Hello World!",
                        "MESSAGE_ID=52fb62f99e2c49d89cfbf9d6de5e3555",
                        "PRIORITY=5",
                        "HOME=%s", getenv("HOME"),
                        "TERM=%s", getenv("TERM"),
                        "PAGE_SIZE=%li", sysconf(_SC_PAGESIZE),
                        "N_CPUS=%li", sysconf(_SC_NPROCESSORS_ONLN),
                        NULL);
        return 0;
}</pre>

<p>This will write a log message to the journal much like the earlier
examples. However, this times a few additional, structured fields are
attached:</p>

<pre>{
        "__CURSOR" : "s=ac9e9c423355411d87bf0ba1a9b424e8;i=5930;b=5335e9cf5d954633bb99aefc0ec38c25;m=16544f875b;t=4ccd863cdc4f0;x=896defe53cc1a96a",
        "__REALTIME_TIMESTAMP" : "1351129666274544",
        "__MONOTONIC_TIMESTAMP" : "95903778651",
        "_BOOT_ID" : "5335e9cf5d954633bb99aefc0ec38c25",
        "PRIORITY" : "5",
        "_UID" : "500",
        "_GID" : "500",
        "_AUDIT_SESSION" : "2",
        "_AUDIT_LOGINUID" : "500",
        "_SYSTEMD_CGROUP" : "/user/lennart/2",
        "_SYSTEMD_SESSION" : "2",
        "_SELINUX_CONTEXT" : "unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023",
        "_MACHINE_ID" : "a91663387a90b89f185d4e860000001a",
        "_HOSTNAME" : "epsilon",
        "CODE_FUNC" : "main",
        "_TRANSPORT" : "journal",
        "_COMM" : "test-journal-su",
        "_CMDLINE" : "./test-journal-submit",
        "CODE_FILE" : "src/journal/test-journal-submit.c",
        "_EXE" : "/home/lennart/projects/systemd/test-journal-submit",
        "MESSAGE" : "Hello World!",
        "_PID" : "4049",
        "CODE_LINE" : "6",
<b>        "MESSAGE_ID" : "52fb62f99e2c49d89cfbf9d6de5e3555",</b>
<b>        "HOME" : "/home/lennart",</b>
<b>        "TERM" : "xterm-256color",</b>
<b>        "PAGE_SIZE" : "4096",</b>
<b>        "N_CPUS" : "4",</b>
        "_SOURCE_REALTIME_TIMESTAMP" : "1351129666241467"
}</pre>

<p>Awesome! Our simple example worked! The five meta data fields we
attached to our message appeared in the journal. We used <a
href="http://0pointer.de/public/systemd-man/sd_journal_print.html"><tt>sd_journal_send()</tt></a>
for this which works much like <tt>sd_journal_print()</tt> but takes a
NULL terminated list of format strings each followed by its
arguments. The format strings must include the field name and a '='
before the values.</p>

<p>Our little structured message included seven fields. The first three we passed are well-known fields:</p>

<ol>
<li><tt>MESSAGE=</tt> is the actual human readable message part of the structured message.</li>
<li><tt>PRIORITY=</tt> is the numeric message priority value as known from BSD syslog formatted as an integer string.</li>
<li><tt>MESSAGE_ID=</tt> is a 128bit ID that identifies our specific
message call, formatted as hexadecimal string. We randomly generated
this string with <tt>journalctl --new-id128</tt>. This can be used by
applications to track down all occasions of this specific
message. The 128bit can be a UUID, but this is not a requirement or enforced.</li></ol>

<p>Applications may relatively freely define additional fields as they
see fit (we defined four pretty arbitrary ones in our example). A
complete list of the currently well-known fields is available in <a
href="http://0pointer.de/public/systemd-man/systemd.journal-fields.html">systemd.journal-fields(7)</a>.</p>

<p>Let's see how the message ID helps us finding this message and all
its occasions in the journal:</p>

<pre>
$ journalctl MESSAGE_ID=52fb62f99e2c49d89cfbf9d6de5e3555
-- Logs begin at Thu, 2012-10-18 04:07:03 CEST, end at Thu, 2012-10-25 04:48:21 CEST. --
Oct 25 03:47:46 epsilon test-journal-se[4049]: Hello World!
Oct 25 04:40:36 epsilon test-journal-se[4480]: Hello World!
</pre>

<p>Seems I already invoked this example tool twice!</p>

<p>Many messages systemd itself generates <a
href="http://cgit.freedesktop.org/systemd/systemd/plain/src/systemd/sd-messages.h">have
message IDs</a>. This is useful for example, to find all occasions
where a program dumped core (<tt>journalctl
MESSAGE_ID=fc2e22bc6ee647b6b90729ab34a250b1</tt>), or when a user
logged in (<tt>journalctl
MESSAGE_ID=8d45620c1a4348dbb17410da57c60c66</tt>). If your application
generates a message that might be interesting to recognize in the
journal stream later on, we recommend attaching such a message ID to
it. You can easily allocate a new one for your message with <tt>journalctl
--new-id128</tt>.</p>

<p>This example shows how we can use the Journal's native APIs to
generate structured, recognizable messages. You can do much more than
this with the C API. For example, you may store binary data in journal
fields as well, which is useful to attach coredumps or hard disk SMART
states to events where this applies. In order to make this blog story
not longer than it already is we'll not go into detail about how to do
this, an I ask you to check out <a
href="http://0pointer.de/public/systemd-man/sd_journal_print.html"><tt>sd_journal_send(3)</tt></a>
for further information on this.</p>

<h4>Python</h4>

<p>The examples above focus on C. Structured logging to the Journal is
also available from other languages. Along with systemd itself we ship
bindings for Python. Here's an example how to use this:</p>

<pre>from systemd import journal
journal.send('Hello world')
journal.send('Hello, again, world', FIELD2='Greetings!', FIELD3='Guten tag')</pre>

<p>Other binding exist for <a
href="http://fourkitchens.com/blog/2012/09/25/nodejs-extension-systemd">Node.js</a>,
<a href="https://github.com/systemd/php-systemd">PHP</a>, <a
href="https://github.com/philips/luvit-systemd-journal">Lua</a>.</p>

<h4>Portability</h4>

<p>Generating structured data is a very useful feature for services to
make their logs more accessible both for administrators and other
programs. In addition to the <i>implicit</i> structure the Journal
adds to all logged messages it is highly beneficial if the various
components of our stack also provide <i>explicit</i> structure
in their messages, coming from within the processes themselves.</p>

<p>Porting an existing program to the Journal's logging APIs comes
with one pitfall though: the Journal is Linux-only. If non-Linux
portability matters for your project it's a good idea to provide an
alternative log output, and make it selectable at compile-time.</p>

<p>Regardless which way to log you choose, in all cases we'll forward
the message to a classic syslog daemon running side-by-side with the
Journal, if there is one. However, much of the structured meta data of
the message is not forwarded since the classic syslog protocol simply
has no generally accepted way to encode this and we shouldn't attempt
to serialize meta data into classic syslog messages which might turn
<tt>/var/log/messages</tt> into an unreadable dump of machine
data. Anyway, to summarize this: regardless if you log with
<tt>syslog()</tt>, <tt>printf()</tt>, <tt>sd_journal_print()</tt> or
<tt>sd_journal_send()</tt>, the message will be stored and indexed by
the journal and it will also be forwarded to classic syslog.</p>

<p>And that's it for today. In a follow-up episode we'll focus on
retrieving messages from the Journal using the C API, possibly
filtering for a specific subset of messages. Later on, I hope to give
a real-life example how to port an existing service to the Journal's
logging APIs. Stay tuned!</p>

<p><small><b>Footnotes</b></small></p>

<p><small>[1] This can be changed with the <tt>SyslogLevel=</tt> service
setting. See <a
href="http://0pointer.de/public/systemd-man/systemd.exec.html">systemd.exec(5)</a>
for details.</small></p>

<p><small>[2] Interpretation of the &lt; &gt; prefixes of logged lines
may be disabled with the <tt>SyslogLevelPrefix=</tt> service setting. See <a
href="http://0pointer.de/public/systemd-man/systemd.exec.html">systemd.exec(5)</a>
for details.</small></p>

<p><small>[3] Appending the code location to the log messages can be
turned off at compile time by defining
-DSD_JOURNAL_SUPPRESS_LOCATION.</small></p>

]]></description>
</item>

<item>
  <title>systemd for Administrators, Part XVIII</title>
  <link>http://0pointer.de/blog/projects/resources.html</link>
  <description><![CDATA[

<p><a href="http://0pointer.de/blog/projects/journalctl.html">Hot
on</a> <a
href="http://0pointer.de/blog/projects/serial-console.html">the
heels</a> <a href="http://0pointer.de/blog/projects/watchdog.html">of
the </a> <a
href="http://0pointer.de/blog/projects/self-documented-boot.html">previous
story</a>, <a
href="http://0pointer.de/blog/projects/systemctl-journal.html">here's</a>
<a href="http://0pointer.de/blog/projects/security.html">now</a> <a
href="http://0pointer.de/blog/projects/inetd.html">the</a> <a
href="http://0pointer.de/blog/projects/instances.html">eighteenth</a>
<a
href="http://0pointer.de/blog/projects/on-etc-sysinit.html">installment</a>
<a
href="http://0pointer.de/blog/projects/the-new-configuration-files.html">of</a>

<a href="http://0pointer.de/blog/projects/blame-game.html">my</a> <a
href="http://0pointer.de/blog/projects/changing-roots">ongoing</a> <a
href="http://0pointer.de/blog/projects/three-levels-of-off.html">series</a>
<a
href="http://0pointer.de/blog/projects/systemd-for-admins-4.html">on</a>
<a
href="http://0pointer.de/blog/projects/systemd-for-admins-3.html">systemd</a>
<a
href="http://0pointer.de/blog/projects/systemd-for-admins-2.html">for</a>
<a
href="http://0pointer.de/blog/projects/systemd-for-admins-1.html">Administrators</a>:</p>

<h4>Managing Resources</h4>

<p>An important facet of modern computing is resource management: if
you run more than one program on a single machine you want to assign
the available resources to them enforcing particular policies. This is
particularly crucial on smaller, embedded or mobile systems where the
scarce resources are the main constraint, but equally for large
installations such as cloud setups, where resources are plenty, but
the number of programs/services/containers on a single node is
drastically higher.</p>

<p>Traditionally, on Linux only one policy was really available: all
processes got about the same CPU time, or IO bandwith, modulated a bit
via the process <i>nice</i> value. This approach is very simple and
covered the various uses for Linux quite well for a long
time. However, it has drawbacks: not all all processes deserve to be
even, and services involving lots of processes (think: Apache with a
lot of CGI workers) this way would get more resources than services
whith very few (think: syslog).</p>

<p>When thinking about service management for systemd, we quickly
realized that resource management must be core functionality of it. In
a modern world -- regardless if server or embedded -- controlling CPU,
Memory, and IO resources of the various services cannot be an
afterthought, but must be built-in as first-class service settings. And
it must be per-service and not per-process as the traditional nice
values or <a href="http://linux.die.net/man/2/setrlimit">POSIX
Resource Limits</a> were.</p>

<p>In this story I want to shed some light on what you can do to
enforce resource policies on systemd services. Resource Management in
one way or another has been available in systemd for a while already,
so it's really time we introduce this to the broader audience.</p>

<p><a
href="http://0pointer.de/blog/projects/cgroups-vs-cgroups.html">In an
earlier blog post</a> I highlighted the difference between Linux
Control Croups (cgroups) as a labelled, hierarchal grouping mechanism,
and Linux cgroups as a resource controlling subsystem. While systemd
requires the former, the latter is optional. And this optional latter
part is now what we can make use of to manage per-service
resources. (At this points, it's probably a good idea to read up on <a
href="https://en.wikipedia.org/wiki/Cgroups">cgroups</a> before
reading on, to get at least a basic idea what they are and what they
accomplish. Even thought the explanations below will be pretty
high-level, it all makes a lot more sense if you grok the background a
bit.)</p>

<p>The main Linux cgroup controllers for resource management are <a
href="http://www.kernel.org/doc/Documentation/scheduler/sched-design-CFS.txt">cpu</a>,
<a
href="http://www.kernel.org/doc/Documentation/cgroups/memory.txt">memory</a>
and <a
href="http://www.kernel.org/doc/Documentation/cgroups/blkio-controller.txt">blkio</a>. To
make use of these, they need to be enabled in the kernel, which many
distributions (including Fedora) do. systemd exposes a couple of high-level service
settings to make use of these controllers without requiring too much
knowledge of the gory kernel details. </p>

<h4>Managing CPU</h4>

<p>As a nice default, if the <tt>cpu</tt> controller is enabled in the
kernel, systemd will create a cgroup for each service when starting
it. Without any further configuration this already has one nice
effect: on a systemd system every system service will get an even
amount of CPU, regardless how many processes it consists off. Or in
other words: on your web server MySQL will get the roughly same amount
of CPU as Apache, even if the latter consists a 1000 CGI script
processes, but the former only of a few worker tasks. (This behavior can
be turned off, see <a
href="http://0pointer.de/public/systemd-man/systemd.conf.html">DefaultControllers=</a>
in <tt>/etc/systemd/system.conf</tt>.)</p>

<p>On top of this default, it is possible to explicitly configure the
CPU shares a service gets with the <a
href="http://0pointer.de/public/systemd-man/systemd.exec.html">CPUShares=</a>
setting. The default value is 1024, if you increase this number you'll
assign more CPU to a service than an unaltered one at 1024, if you decrease it, less.</p>

<p>Let's see in more detail, how we can make use of this. Let's say we
want to assign Apache 1500 CPU shares instead of the default of
1024. For that, let's create a new administrator service file for
Apache in <tt>/etc/systemd/system/httpd.service</tt>, overriding the
vendor supplied one in <tt>/usr/lib/systemd/system/httpd.service</tt>,
but let's change the <tt>CPUShares=</tt> parameter:</p>

<pre>.include /usr/lib/systemd/system/httpd.service

[Service]
CPUShares=1500</pre>

<p>The first line will pull in the vendor service file. Now, lets's
reload systemd's configuration and restart Apache so that the new
service file is taken into account:</p>

<pre>systemctl daemon-reload
systemctl restart httpd.service</pre>

<p>And yeah, that's already it, you are done!</p>

<p>(Note that setting <tt>CPUShares=</tt> in a unit file will cause the
specific service to get its own cgroup in the <tt>cpu</tt> hierarchy,
even if <tt>cpu</tt> is not included in
<tt>DefaultControllers=</tt>.)</p>

<h4>Analyzing Resource usage</h4>

<p>Of course, changing resource assignments without actually
understanding the resource usage of the services in questions is like
blind flying. To help you understand the resource usage of all
services, we created the tool <a
href="http://www.freedesktop.org/software/systemd/man/systemd-cgtop.html">systemd-cgtop</a>,
that will enumerate all cgroups of the system, determine their
resource usage (CPU, Memory, and IO) and present them in a <a
href="http://linux.die.net/man/1/top">top</a>-like fashion. Building
on the fact that systemd services are managed in cgroups this tool
hence can present to you for services what top shows you for
processes.</p>

<p>Unfortunately, by default <tt>cgtop</tt> will only be able to chart
CPU usage per-service for you, IO and Memory are only tracked as total
for the entire machine. The reason for this is simply that by default
there are no per-service cgroups in the <tt>blkio</tt> and
<tt>memory</tt> controller hierarchies but that's what we need to
determine the resource usage. The best way to get this data for all
services is to simply add the <tt>memory</tt> and <tt>blkio</tt>
controllers to the aforementioned <tt>DefaultControllers=</tt> setting
in <tt>system.conf</tt>.</p>

<h4>Managing Memory</h4>

<p>To enforce limits on memory systemd provides the
<tt>MemoryLimit=</tt>, and <tt>MemorySoftLimit=</tt> settings for
services, summing up the memory of all its processes. These settings
take memory sizes in bytes that are the total memory limit for the
service. This setting understands the usual K, M, G, T suffixes for
Kilobyte, Megabyte, Gigabyte, Terabyte (to the base of 1024).</p>

<pre>.include /usr/lib/systemd/system/httpd.service

[Service]
MemoryLimit=1G</pre>

<p>(Analogue to <tt>CPUShares=</tt> above setting this option will cause
the service to get its own cgroup in the <tt>memory</tt> cgroup
hierarchy.)</p>

<h4>Managing Block IO</h4>

<p>To control block IO multiple settings are available. First of all
<tt>BlockIOWeight=</tt> may be used which assigns an IO <i>weight</i>
to a specific service. In behaviour the <i>weight</i> concept is not
unlike the <i>shares</i> concept of CPU resource control (see
above). However, the default weight is 1000, and the valid range is
from 10 to 1000:</p>

<pre>.include /usr/lib/systemd/system/httpd.service

[Service]
BlockIOWeight=500</pre>

<p>Optionally, per-device weights can be specified:</p>

<pre>.include /usr/lib/systemd/system/httpd.service

[Service]
BlockIOWeight=/dev/disk/by-id/ata-SAMSUNG_MMCRE28G8MXP-0VBL1_DC06K01009SE009B5252 750</pre>

<p>Instead of specifiying an actual device node you also specify any
path in the file system:</p>

<pre>.include /usr/lib/systemd/system/httpd.service

[Service]
BlockIOWeight=/home/lennart 750</pre>

<p>If the specified path does not refer to a device node systemd will
determine the block device <tt>/home/lennart</tt> is on, and assign
the bandwith weight to it.</p>

<p>You can even add per-device and normal lines at the same time,
which will set the per-device weight for the device, and the other
value as default for everything else.</p>

<p>Alternatively one may control explicit bandwith limits with the
<tt>BlockIOReadBandwidth=</tt> and <tt>BlockIOWriteBandwidth=</tt>
settings. These settings take a pair of device node and bandwith rate
(in bytes per second) or of a file path and bandwith rate:</p>

<pre>.include /usr/lib/systemd/system/httpd.service

[Service]
BlockIOReadBandwith=/var/log 5M</pre>

<p>This sets the maximum read bandwith on the block device backing
<tt>/var/log</tt> to 5Mb/s.</p>

<p>(Analogue to <tt>CPUShares=</tt> and <tt>MemoryLimit=</tt> using
any of these three settings will result in the service getting its own
cgroup in the <tt>blkio</tt> hierarchy.)</p>

<h4>Managing Other Resource Parameters</h4>

<p>The options described above cover only a small subset of the
available controls the various Linux control group controllers
expose. We picked these and added high-level options for them since we
assumed that these are the most relevant for most folks, and that they
really needed a nice interface that can handle units properly and
resolve block device names.</p>

<p>In many cases the options explained above might not be sufficient
for your usecase, but a low-level kernel cgroup setting might help. It
is easy to make use of these options from systemd unit files, without
having them covered with a high-level setting. For example, sometimes
it might be useful to set the <i>swappiness</i> of a service. The
kernel makes this controllable via the <tt>memory.swappiness</tt>
cgroup attribute, but systemd does not expose it as a high-level
option. Here's how you use it nonetheless, using the low-level
<tt>ControlGroupAttribute=</tt> setting:</p>

<pre>.include /usr/lib/systemd/system/httpd.service

[Service]
ControlGroupAttribute=memory.swappiness 70</pre>

<p>(Analogue to the other cases this too causes the service to be
added to the memory hierarchy.)</p>

<p>Later on we might add more high-level controls for the
various cgroup attributes. In fact, please ping us if you frequently
use one and believe it deserves more focus. We'll consider adding a
high-level option for it then. (Even better: send us a patch!)</p>

<p><i>Disclaimer:</i> note that making use of the various resource
controllers does have a runtime impact on the system. Enforcing
resource limits comes at a price. If you do use them, certain
operations do get slower. Especially the <tt>memory</tt> controller
has (used to have?) a bad reputation to come at a performance
cost.</p>

<p>For more details on all of this, please have a look at the
documenation of the <a
href="http://0pointer.de/public/systemd-man/systemd.exec.html">mentioned
unit settings</a>, and of the <a
href="http://www.kernel.org/doc/Documentation/scheduler/sched-design-CFS.txt">cpu</a>,
<a
href="http://www.kernel.org/doc/Documentation/cgroups/memory.txt">memory</a>
and <a
href="http://www.kernel.org/doc/Documentation/cgroups/blkio-controller.txt">blkio</a>
controllers.</p>

<p>And that's it for now. Of course, this blog story only focussed on
the per-<i>service</i> resource settings. On top this, you can also
set the more traditional, well-known per-<i>process</i> resource
settings, which will then be inherited by the various subprocesses,
but always only be enforced per-process. More specifically that's
<tt>IOSchedulingClass=</tt>, <tt>IOSchedulingPriority=</tt>,
<tt>CPUSchedulingPolicy=</tt>, <tt>CPUSchedulingPriority=</tt>,
<tt>CPUAffinity=</tt>, <tt>LimitCPU=</tt> and related. These do not
make use of cgroup controllers and have a much lower performance
cost. We might cover those in a later article in more detail.</p>

]]></description>
</item>

<item>
  <title>systemd for Administrators, Part XVII</title>
  <link>http://0pointer.de/blog/projects/journalctl.html</link>
  <description><![CDATA[

<p><a href="http://0pointer.de/blog/projects/serial-console.html">It's</a>
<a href="http://0pointer.de/blog/projects/watchdog.html">that</a>
<a
href="http://0pointer.de/blog/projects/self-documented-boot.html">time again</a>,
<a
href="http://0pointer.de/blog/projects/systemctl-journal.html">here's</a>
<a href="http://0pointer.de/blog/projects/security.html">now</a> <a
href="http://0pointer.de/blog/projects/inetd.html">the</a> <a
href="http://0pointer.de/blog/projects/instances.html">seventeenth</a>
<a
href="http://0pointer.de/blog/projects/on-etc-sysinit.html">installment</a>
<a
href="http://0pointer.de/blog/projects/the-new-configuration-files.html">of</a>

<a href="http://0pointer.de/blog/projects/blame-game.html">my</a> <a
href="http://0pointer.de/blog/projects/changing-roots">ongoing</a> <a
href="http://0pointer.de/blog/projects/three-levels-of-off.html">series</a>
<a
href="http://0pointer.de/blog/projects/systemd-for-admins-4.html">on</a>
<a
href="http://0pointer.de/blog/projects/systemd-for-admins-3.html">systemd</a>
<a
href="http://0pointer.de/blog/projects/systemd-for-admins-2.html">for</a>
<a
href="http://0pointer.de/blog/projects/systemd-for-admins-1.html">Administrators</a>:</p>

<h4>Using the Journal</h4>

<p><a href="http://0pointer.de/blog/projects/systemctl-journal.html">A
while back I already</a> posted a blog story introducing some
functionality of the journal, and how it is exposed in
<tt>systemctl</tt>. In this episode I want to explain a few more uses
of the journal, and how you can make it work for you.</p>

<p>If you are wondering what the journal is, here's an explanation in
a few words to get you up to speed: the journal is a component of <a
href="http://www.freedesktop.org/wiki/Software/systemd">systemd</a>,
that captures Syslog messages, Kernel log messages, initial RAM disk
and early boot messages as well as messages written to STDOUT/STDERR
of all services, indexes them and makes this available to the user. It
can be used in parallel, or in place of a traditional syslog daemon,
such as rsyslog or syslog-ng. For more information, see <a
href="http://0pointer.de/blog/projects/the-journal.html">the initial
announcement</a>.</p>

<p>The journal has been part of Fedora since F17. With Fedora 18 it
now has grown into a reliable, powerful tool to handle your logs. Note
however, that on F17 and F18 the journal is configured by default to
store logs only in a small ring-buffer in <tt>/run/log/journal</tt>,
i.e. not persistent. This of course limits its usefulness quite
drastically but is sufficient to show a bit of recent log history in
<tt>systemctl status</tt>. For Fedora 19, we plan to change this, and
enable persistent logging by default. Then, journal files will be
stored in <tt>/var/log/journal</tt> and can grow much larger, thus
making the journal a lot more useful.</p>

<h4>Enabling Persistency</h4>

<p>In the meantime, on F17 or F18, you can enable journald's persistent storage manually:</p>

<pre># mkdir -p /var/log/journal</pre>

<p>After that, it's a good idea to reboot, to get some useful
structured data into your journal to play with. Oh, and since you have
the journal now, you don't need syslog anymore (unless having
<tt>/var/log/messages</tt> as text file is a necessity for you.), so
you can choose to deinstall rsyslog:</p>

<pre># yum remove rsyslog</pre>

<h4>Basics</h4>

<p>Now we are ready to go. The following text shows a lot of features
of systemd 195 as it will be included in Fedora 18<sup>[1]</sup>, so
if your F17 can't do the tricks you see, please wait for F18. First,
let's start with some basics. To access the logs of the journal use
the <a
href="http://www.freedesktop.org/software/systemd/man/journalctl.html">journalctl(1)</a>
tool. To have a first look at the logs, just type in:</p>

<pre># journalctl</pre>

<p>If you run this as root you will see all logs generated on the
system, from system components the same way as for logged in
users. The output you will get looks like a pixel-perfect copy of the
traditional <tt>/var/log/messages</tt> format, but actually has a
couple of improvements over it:</p>

<ul>
<li>Lines of error priority (and higher) will be highlighted red.</li>
<li>Lines of notice/warning priority will be highlighted bold.</li>
<li>The timestamps are converted into your local time-zone.</li>
<li>The output is auto-paged with your pager of choice (defaults to <tt>less</tt>).</li>
<li>This will show <i>all</i> available data, including rotated logs.</li>
<li>Between the output of each boot we'll add a line clarifying that a new boot begins now.</li>
</ul>

<p>Note that in this blog story I will not actually show you any of
the output this generates, I cut that out for brevity -- and to give
you a reason to try it out yourself with a current image for F18's
development version with systemd 195. But I do hope you get the idea
anyway.</p>

<h4>Access Control</h4>

<p>Browsing logs this way is already pretty nice. But requiring to be
root sucks of course, even administrators tend to do most of their
work as unprivileged users these days. By default, Journal users can
only watch their own logs, unless they are root or in the <tt>adm</tt>
group. To make watching system logs more fun, let's add ourselves to
<tt>adm</tt>:</p>

<pre># usermod -a -G adm lennart</pre>

<p>After logging out and back in as <tt>lennart</tt> I know have access
to the full journal of the system and all users:</p>

<pre>$ journalctl</pre>

<h4>Live View</h4>

<p>If invoked without parameters journalctl will show me the current
log database. Sometimes one needs to watch logs as they grow, where
one previously used <tt>tail -f /var/log/messages</tt>:</p>

<pre>$ journalctl -f</pre>

<p>Yes, this does exactly what you expect it to do: it will show you
the last ten logs lines and then wait for changes and show them as
they take place.</p>

<h4>Basic Filtering</h4>

<p>When invoking <tt>journalctl</tt> without parameters you'll see the
whole set of logs, beginning with the oldest message stored. That of
course, can be a lot of data. Much more useful is just viewing the
logs of the current boot:</p>

<pre>$ journalctl -b</pre>

<p>This will show you only the logs of the current boot, with all the
aforementioned gimmicks mentioned. But sometimes even this is way too
much data to process. So what about just listing all the real issues
to care about: all messages of priority levels ERROR and worse, from
the current boot:</p>

<pre>$ journalctl -b -p err</pre>

<p>If you reboot only seldom the <tt>-b</tt> makes little sense,
filtering based on time is much more useful:</p>

<pre>$ journalctl --since=yesterday</pre>

<p>And there you go, all log messages from the day before at 00:00 in
the morning until right now. Awesome! Of course, we can combine this with
<tt>-p err</tt> or a similar match. But humm, we are looking for
something that happened on the 15th of October, or was it the 16th?</p>

<pre>$ journalctl --since=2012-10-15 --until="2011-10-16 23:59:59"</pre>

<p>Yupp, there we go, we found what we were looking for. But humm, I
noticed that some CGI script in Apache was acting up earlier today,
let's see what Apache logged at that time:</p>

<pre>$ journalctl -u httpd --since=00:00 --until=9:30</pre>

<p>Oh, yeah, there we found it. But hey, wasn't there an issue with
that disk <tt>/dev/sdc</tt>? Let's figure out what was going on there:</p>

<pre>$ journalctl /dev/sdc</pre>

<p>OMG, a disk error!<sup>[2]</sup> Hmm, let's quickly replace the
disk before we lose data. Done! Next! -- Hmm, didn't I see that the vpnc binary made a booboo? Let's
check for that:</p>

<pre>$ journalctl /usr/sbin/vpnc</pre>

<p>Hmm, I don't get this, this seems to be some weird interaction with
<tt>dhclient</tt>, let's see both outputs, interleaved:</p>

<pre>$ journalctl /usr/sbin/vpnc /usr/sbin/dhclient</pre>

<p>That did it! Found it!</p>

<h4>Advanced Filtering</h4>

<p>Whew! That was awesome already, but let's turn this up a
notch. Internally systemd stores each log entry with a set of
<i>implicit</i> meta data. This meta data looks a lot like an
environment block, but actually is a bit more powerful: values can
take binary, large values (though this is the exception, and usually
they just contain UTF-8), and fields can have multiple values assigned
(an exception too, usually they only have one value). This implicit
meta data is collected for each and every log message, without user
intervention. The data will be there, and wait to be used by
you. Let's see how this looks:</p>

<pre>$ journalctl -o verbose -n
[...]
Tue, 2012-10-23 23:51:38 CEST [s=ac9e9c423355411d87bf0ba1a9b424e8;i=4301;b=5335e9cf5d954633bb99aefc0ec38c25;m=882ee28d2;t=4ccc0f98326e6;x=f21e8b1b0994d7ee]
        PRIORITY=6
        SYSLOG_FACILITY=3
        _MACHINE_ID=a91663387a90b89f185d4e860000001a
        _HOSTNAME=epsilon
        _TRANSPORT=syslog
        SYSLOG_IDENTIFIER=avahi-daemon
        _COMM=avahi-daemon
        _EXE=/usr/sbin/avahi-daemon
        _SYSTEMD_CGROUP=/system/avahi-daemon.service
        _SYSTEMD_UNIT=avahi-daemon.service
        _SELINUX_CONTEXT=system_u:system_r:avahi_t:s0
        _UID=70
        _GID=70
        _CMDLINE=avahi-daemon: registering [epsilon.local]
        MESSAGE=Joining mDNS multicast group on interface wlan0.IPv4 with address 172.31.0.53.
        _BOOT_ID=5335e9cf5d954633bb99aefc0ec38c25
        _PID=27937
        SYSLOG_PID=27937
        _SOURCE_REALTIME_TIMESTAMP=1351029098747042
</pre>

<p>(I cut out a lot of noise here, I don't want to make this story
overly long. <tt>-n</tt> without parameter shows you the last 10 log
entries, but I cut out all but the last.)</p>

<p>With the <tt>-o verbose</tt> switch we enabled verbose
output. Instead of showing a pixel-perfect copy of classic
<tt>/var/log/messages</tt> that only includes a minimimal subset of
what is available we now see all the gory details the journal has
about each entry. But it's highly interesting: there is user credential
information, SELinux bits, machine information and more. For a full
list of common, well-known fields, see <a
href="http://www.freedesktop.org/software/systemd/man/systemd.journal-fields.html">the
man page</a>.</p>

<p>Now, as it turns out the journal database is indexed by <i>all</i>
of these fields, out-of-the-box! Let's try this out:</p>

<pre>$ journalctl _UID=70</pre>

<p>And there you go, this will show all log messages logged from Linux
user ID 70. As it turns out one can easily combine these matches:</p>

<pre>$ journalctl _UID=70 _UID=71</pre>

<p>Specifying two matches for the same field will result in a logical
OR combination of the matches. All entries matching either will be
shown, i.e. all messages from either UID 70 or 71.</p>

<pre>$ journalctl _HOSTNAME=epsilon _COMM=avahi-daemon</pre>

<p>You guessed it, if you specify two matches for different field
names, they will be combined with a logical AND. All entries matching
both will be shown now, meaning that all messages from processes named
<tt>avahi-daemon</tt> <i>and</i> host <tt>epsilon</tt>.</p>

<p>But of course, that's
not fancy enough for us. We are computer nerds after all, we live off
logical expressions. We must go deeper!</p>

<pre>$ journalctl _HOSTNAME=theta _UID=70 + _HOSTNAME=epsilon _COMM=avahi-daemon</pre>

<p>The + is an explicit OR you can use in addition to the implied OR when
you match the same field twice. The line above hence means: show me
everything from host <tt>theta</tt> with UID 70, or of host
<tt>epsilon</tt> with a process name of <tt>avahi-daemon</tt>.</p>

<h4>And now, it becomes magic!</h4>

<p>That was already pretty cool, right? Righ! But heck, who can
remember all those values a field can take in the journal, I mean,
seriously, who has thaaaat kind of photographic memory? Well, the
journal has:</p>

<pre>$ journalctl -F _SYSTEMD_UNIT</pre>

<p>This will show us all values the field _SYSTEMD_UNIT takes in the
database, or in other words: the names of all systemd services which
ever logged into the journal. This makes it super-easy to build nice
matches. But wait, turns out this all is actually hooked up with shell
completion on bash! This gets even more awesome: as you type your
match expression you will get a list of well-known field names, and of
the values they can take! Let's figure out how to filter for SELinux
labels again. We remember the field name was something with SELINUX in
it, let's try that:</p>

<pre>$ journalctl _SE<b>&lt;TAB&gt;</b></pre>

<p>And yupp, it's immediately completed:</p>

<pre>$ journalctl _SELINUX_CONTEXT=</pre>

<p>Cool, but what's the label again we wanted to match for?</p>

<pre>$ journalctl _SELINUX_CONTEXT=<b>&lt;TAB&gt;&lt;TAB&gt;</b>
kernel                                                       system_u:system_r:local_login_t:s0-s0:c0.c1023               system_u:system_r:udev_t:s0-s0:c0.c1023
system_u:system_r:accountsd_t:s0                             system_u:system_r:lvm_t:s0                                   system_u:system_r:virtd_t:s0-s0:c0.c1023
system_u:system_r:avahi_t:s0                                 system_u:system_r:modemmanager_t:s0-s0:c0.c1023              system_u:system_r:vpnc_t:s0
system_u:system_r:bluetooth_t:s0                             system_u:system_r:NetworkManager_t:s0                        system_u:system_r:xdm_t:s0-s0:c0.c1023
system_u:system_r:chkpwd_t:s0-s0:c0.c1023                    system_u:system_r:policykit_t:s0                             unconfined_u:system_r:rpm_t:s0-s0:c0.c1023
system_u:system_r:chronyd_t:s0                               system_u:system_r:rtkit_daemon_t:s0                          unconfined_u:system_r:unconfined_t:s0-s0:c0.c1023
system_u:system_r:crond_t:s0-s0:c0.c1023                     system_u:system_r:syslogd_t:s0                               unconfined_u:system_r:useradd_t:s0-s0:c0.c1023
system_u:system_r:devicekit_disk_t:s0                        system_u:system_r:system_cronjob_t:s0-s0:c0.c1023            unconfined_u:unconfined_r:unconfined_dbusd_t:s0-s0:c0.c1023
system_u:system_r:dhcpc_t:s0                                 system_u:system_r:system_dbusd_t:s0-s0:c0.c1023              unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
system_u:system_r:dnsmasq_t:s0-s0:c0.c1023                   system_u:system_r:systemd_logind_t:s0
system_u:system_r:init_t:s0                                  system_u:system_r:systemd_tmpfiles_t:s0</pre>

<p>Ah! Right! We wanted to see everything logged under PolicyKit's security label:</p>

<pre>$ journalctl _SELINUX_CONTEXT=system_u:system_r:policykit_t:s0</pre>

<p>Wow! That was easy! I didn't know anything related to SELinux could
be thaaat easy! ;-) Of course this kind of completion works with any
field, not just SELinux labels.</p>

<p>So much for now. There's a lot more cool stuff in <a
href="http://www.freedesktop.org/software/systemd/man/journalctl.html">journalctl(1)</a>
than this. For example, it generates JSON output for you! You can match
against kernel fields! You can get simple
<tt>/var/log/messages</tt>-like output but with <i>relative</i> timestamps!
And so much more!</p>

<p>Anyway, in the next weeks I hope to post more stories about all the
cool things the journal can do for you. This is just the beginning,
stay tuned.</p>

<p><small>Footnotes</small></p>

<p><small>[1] systemd 195 is currently still in <a
href="https://admin.fedoraproject.org/updates/FEDORA-2012-16709/systemd-195-1.fc18">Bodhi</a>
but hopefully will get into F18 proper soon, and definitely before the
release of Fedora 18.</small></p>

<p><small>[2] OK, I cheated here, indexing by block device is not in
the kernel yet, but on its way due to <a
href="http://www.spinics.net/lists/linux-scsi/msg62499.html">Hannes'
fantastic work</a>, and I hope it will make appearence in
F18.</small></p>

]]></description>
</item>

<item>
  <title>systemd for Administrators, Part XVI</title>
  <link>http://0pointer.de/blog/projects/serial-console.html</link>
  <description><![CDATA[

<p><a href="http://0pointer.de/blog/projects/watchdog.html">And,</a>
<a
href="http://0pointer.de/blog/projects/self-documented-boot.html">yes,</a>
<a
href="http://0pointer.de/blog/projects/systemctl-journal.html">here's</a>
<a href="http://0pointer.de/blog/projects/security.html">now</a> <a
href="http://0pointer.de/blog/projects/inetd.html">the</a> <a
href="http://0pointer.de/blog/projects/instances.html">sixteenth</a>
<a
href="http://0pointer.de/blog/projects/on-etc-sysinit.html">installment</a>
<a
href="http://0pointer.de/blog/projects/the-new-configuration-files.html">of</a>

<a href="http://0pointer.de/blog/projects/blame-game.html">my</a> <a
href="http://0pointer.de/blog/projects/changing-roots">ongoing</a> <a
href="http://0pointer.de/blog/projects/three-levels-of-off.html">series</a>
<a
href="http://0pointer.de/blog/projects/systemd-for-admins-4.html">on</a>
<a
href="http://0pointer.de/blog/projects/systemd-for-admins-3.html">systemd</a>
<a
href="http://0pointer.de/blog/projects/systemd-for-admins-2.html">for</a>
<a
href="http://0pointer.de/blog/projects/systemd-for-admins-1.html">Administrators</a>:</p>

<h4>Gettys on Serial Consoles (and Elsewhere)</h4>

<p><i>TL;DR: To make use of a serial console, just use
<tt>console=ttyS0</tt> on the kernel command line, and systemd will
automatically start a getty on it for you.</i></p>

<p>While physical <a
href="https://en.wikipedia.org/wiki/RS-232">RS232</a> serial ports
have become exotic in today's PCs they play an important role in
modern servers and embedded hardware. They provide a relatively robust
and minimalistic way to access the console of your device, that works
even when the network is hosed, or the primary UI is unresponsive. VMs
frequently emulate a serial port as well.</p>

<p>Of course, Linux has always had good support for serial consoles,
but with <a
href="http://www.freedesktop.org/wiki/Software/systemd">systemd</a> we
tried to make serial console support even simpler to use. In the
following text I'll try to give an overview how serial console <a
href="https://en.wikipedia.org/wiki/Getty_%28Unix%29">gettys</a> on
systemd work, and how TTYs of any kind are handled.</p>

<p>Let's start with the key take-away: in most cases, to get a login
prompt on your serial prompt you don't need to do anything. systemd
checks the kernel configuration for the selected kernel console and
will simply spawn a serial getty on it. That way it is entirely
sufficient to configure your kernel console properly (for example, by
adding <tt>console=ttyS0</tt> to the kernel command line) and that's
it. But let's have a look at the details:</p>

<p>In systemd, two template units are responsible for bringing up a
login prompt on text consoles:</p>

<ol>

<li><tt>getty@.service</tt> is responsible for <a
href="https://en.wikipedia.org/wiki/Virtual_console">virtual
terminal</a> (VT) login prompts, i.e. those on your VGA screen as
exposed in <tt>/dev/tty1</tt> and similar devices.</li>

<li><tt>serial-getty@.service</tt> is responsible for all other
terminals, including serial ports such as <tt>/dev/ttyS0</tt>. It
differs in a couple of ways from <tt>getty@.service</tt>: among other
things the <tt>$TERM</tt> environment variable is set to
<tt>vt102</tt> (hopefully a good default for most serial terminals)
rather than <tt>linux</tt> (which is the right choice for VTs only),
and a special logic that clears the VT scrollback buffer (and only
work on VTs) is skipped.</li>

</ol>

<h5>Virtual Terminals</h5>

<p>Let's have a closer look how <tt>getty@.service</tt> is started,
i.e. how login prompts on the virtual terminal (i.e. non-serial TTYs)
work. Traditionally, the init system on Linux machines was configured
to spawn a fixed number login prompts at boot. In most cases six
instances of the getty program were spawned, on the first six VTs,
<tt>tty1</tt> to <tt>tty6</tt>.</p>

<p>In a systemd world we made this more dynamic: in order to make
things more efficient login prompts are now started on demand only. As
you switch to the VTs the getty service is instantiated to
<tt>getty@tty2.service</tt>, <tt>getty@tty5.service</tt> and so
on. Since we don't have to unconditionally start the getty processes
anymore this allows us to save a bit of resources, and makes start-up
a bit faster. This behaviour is mostly transparent to the user: if the
user activates a VT the getty is started right-away, so that the user
will hardly notice that it wasn't running all the time. If he then
logs in and types <tt>ps</tt> he'll notice however that getty
instances are only running for the VTs he so far switched to.</p>

<p>By default this automatic spawning is done for the VTs up to VT6
only (in order to be close to the traditional default configuration of
Linux systems)<sup>[1]</sup>.  Note that the auto-spawning of gettys
is only attempted if no other subsystem took possession of the VTs
yet. More specifically, if a user makes frequent use of <a
href="https://en.wikipedia.org/wiki/Fast_user_switching">fast user
switching</a> via GNOME he'll get his X sessions on the first six VTs,
too, since the lowest available VT is allocated for each session.</p>

<p>Two VTs are handled specially by the auto-spawning logic: firstly
<tt>tty1</tt> gets special treatment: if we boot into graphical mode
the display manager takes possession of this VT. If we boot into
multi-user (text) mode a getty is started on it -- unconditionally,
without any on-demand logic<sup>[2]</sup>.</p>

<p>Secondly, <tt>tty6</tt> is
especially reserved for auto-spawned gettys and unavailable to other
subsystems such as X<sup>[3]</sup>. This is done in order to ensure
that there's always a way to get a text login, even if due to
fast user switching X took possession of more than 5 VTs.</p>

<h5>Serial Terminals</h5>

<p>Handling of login prompts on serial terminals (and all other kind
of non-VT terminals) is different from that of VTs. By default systemd
will instantiate one <tt>serial-getty@.service</tt> on the main
kernel<sup>[4]</sup> console, if it is not a virtual terminal. The
kernel console is where the kernel outputs its own log messages and is
usually configured on the kernel command line in the boot loader via
an argument such as <tt>console=ttyS0</tt><sup>[5]</sup>. This logic ensures that
when the user asks the kernel to redirect its output onto a certain
serial terminal, he will automatically also get a login prompt on it
as the boot completes<sup>[6]</sup>. systemd will also spawn a login
prompt on the first special VM console (that's <tt>/dev/hvc0</tt>,
<tt>/dev/xvc0</tt>, <tt>/dev/hvsi0</tt>), if the system is run in a VM
that provides these devices. This logic is implemented in a <a
href="http://www.freedesktop.org/wiki/Software/systemd/Generators">generator</a>
called <a
href="http://www.freedesktop.org/software/systemd/man/systemd-getty-generator.html">systemd-getty-generator</a>
that is run early at boot and pulls in the necessary services
depending on the execution environment.</p>

<p>In many cases, this automatic logic should already suffice to get
you a login prompt when you need one, without any specific
configuration of systemd. However, sometimes there's the need to
manually configure a serial getty, for example, if more than one
serial login prompt is needed or the kernel console should be
redirected to a different terminal than the login prompt. To
facilitate this it is sufficient to instantiate
<tt>serial-getty@.service</tt> once for each serial port you want it
to run on<sup>[7]</sup>:</p>

<pre># systemctl enable serial-getty@ttyS2.service
# systemctl start serial-getty@ttyS2.service</pre>

<p>And that's it. This will make sure you get the login prompt on the
chosen port on all subsequent boots, and starts it right-away
too.</p>

<p>Sometimes, there's the need to configure the login prompt in even
more detail. For example, if the default baud rate configured by the
kernel is not correct or other <tt>agetty</tt> parameters need to
be changed. In such a case simply copy the default unit template to
<tt>/etc/systemd/system</tt> and edit it there:</p>

<pre># cp /usr/lib/systemd/system/serial-getty@.service /etc/systemd/system/serial-getty@ttyS2.service
# vi /etc/systemd/system/serial-getty@ttyS2.service
 .... now make your changes to the agetty command line ...
# ln -s /etc/systemd/system/serial-getty@ttyS2.service /etc/systemd/system/getty.target.wants/
# systemctl daemon-reload
# systemctl start serial-getty@ttyS2.service</pre>

<p>This creates a unit file that is specific to serial port
<tt>ttyS2</tt>, so that you can make specific changes to this port and
this port only.</p>

<p>And this is pretty much all there's to say about serial ports, VTs
and login prompts on them. I hope this was interesting, and please
come back soon for the next installment of this series!</p>

<p><small><b>Footnotes</b></small></p>

<p><small>[1] You can easily modify this by changing
<tt>NAutoVTs=</tt> in <a
href="http://www.freedesktop.org/software/systemd/man/logind.conf.html">logind.conf</a>.</small></p>

<p><small>[2] Note that whether the getty on VT1 is started on-demand
or not hardly makes a difference, since VT1 is the default active VT
anyway, so the demand is there anyway at boot.</small></p>

<p><small>[3] You can easily change this special reserved VT by
modifying <tt>ReserveVT=</tt> in <a
href="http://www.freedesktop.org/software/systemd/man/logind.conf.html">logind.conf</a>.</small></p>

<p><small>[4] If multiple kernel consoles are used simultaneously, the
<i>main</i> console is the one listed <i>first</i> in
<tt>/sys/class/tty/console/active</tt>, which is the <i>last</i> one
listed on the kernel command line.</small></p>

<p><small>[5] See <a
href="https://www.kernel.org/doc/Documentation/kernel-parameters.txt">kernel-parameters.txt</a>
for more information on this kernel command line
option.</small></p>

<p><small>[6] Note that <tt>agetty -s</tt> is used here so that the
baud rate configured at the kernel command line is not altered and
continued to be used by the login prompt.</small></p>

<p><small>[7] Note that this <tt>systemctl enable</tt> syntax only
works with systemd 188 and newer (i.e. F18). On older versions use
<tt>ln -s /usr/lib/systemd/system/serial-getty@.service
/etc/systemd/system/getty.target.wants/serial-getty@ttyS2.service ; systemctl
daemon-reload</tt> instead.</small></p>

]]></description>
</item>

<item>
  <title>Berlin Open Source Meetup</title>
  <link>http://0pointer.de/blog/projects/berlin-open-source-meetup.html</link>
  <description><![CDATA[

<p><a href="http://blixtra.org/blog/2012/08/06/berlin-open-source-meetup/"><img src="http://blixtra.org/blog/wp-content/uploads/2012/08/Prater.jpg" width="500" height="375" alt="Prater"/></a></p>

<p>Chris K&uuml;hl and I are organizing a <a
href="http://blixtra.org/blog/2012/08/06/berlin-open-source-meetup/">Berlin
Open Source Meetup</a> on Aug 19th at the Prater Biergarten in Prenzlauer Berg.
If you live in Berlin (or are passing by) and are involved in or interested in
Open Source then you are invited!</p>

<p><a href="https://plus.google.com/u/0/events/c9ffkptmk6kbjkgn7nb7bh5i1ek/107949128852701224835">There's also a Google+ event for the meetup.</a></p>

<p>It's a public event, so everybody is welcome, and please feel free to invite others!</p>

<p>See you at the Prater!</p>

]]></description>
</item>

<item>
  <title>Upcoming Hackfests/Sprints</title>
  <link>http://0pointer.de/blog/hackfests.html</link>
  <description><![CDATA[

<p>The <a href="http://www.linuxplumbersconf.org/2012/">Linux Plumbers
Conference 2012</a> will take place August 29th to 31st in San Diego,
California. We, the <a
href="http://www.freedesktop.org/wiki/Software/systemd">systemd</a>
developers, would like to invite you to two hackfests/sprints that will happen
around LPC:</p>

<h4>San Diego: libvirt/LXC/systemd/SELinux Integration Hackfest</h4>

<p>On <b>28th of August</b> we'll have a hackfest on the topic of closer
integration of libvirt, LXC, systemd and SELinux, colocated with LPC in
San Diego, California. We'll have a number of key people from these projects
participating, including Dan Walsh, Eric Paris, Daniel P. Berrange, Kay
Sievers and myself.</p>

<p>Topics we'll cover: making Fedora/Linux boot entirely cleanly in
normal containers, teaching systemd's control tools minimal
container-awareness (such as being able to list all services of all
containers in one go, in addition to those running on the host
system), unified journal logging across multiple containers, the <a
href="http://www.freedesktop.org/wiki/Software/systemd/ContainerInterface">systemd
container interface</a>, auditing and containers, running multiple
instances from the same <tt>/usr</tt> tree, and a lot more...</p>

<p><b>Who should attend?</b> Everybody hacking on the mentioned
projects who wants to help integrating them with the
goal of turning them into a secure, reliable, powerful container
solution for Linux.</p>

<p><b>Who should not attend?</b> If you don't hack on any of these
projects, or if you are not interested in closer integration of at
least two of these projects.</p>

<p><b>How to register?</b> Just show up. You get extra points however
for letting us know in advance (just send us an email). Attendance is
free.</p>

<p>&#10149; See also: <a href="https://plus.google.com/u/0/events/cvs9oi2q802vh57o1vr9le7tsjc/115547683951727699051">Google+ Event</a></p>

<h4>San Francisco: systemd Journal Sprint</h4>

<p>On <b>September 3-7</b> we'll have a sprint on the topic of the systemd
Journal. It's going to take place at the <a
href="https://www.getpantheon.com/">Pantheon</a> headquarters in San
Francisco, California. Among others, Kay Sievers, David Strauss and I will participate.</p>

<p><b>Who should attend?</b> Everybody who wants to help improving the
systemd Journal, regardless if in its core itself, in client software
for it, hooking up other projects or writing library bindings for
it. Also, if you are using or planning to use the journal for a
project, we'd be very interested in high-bandwith face-to-face
feedback regarding what you are missing, what you don't like so much, and what
you find awesome in the Journal.</p>

<p><b>How to register?</b> Please sign up at <a
href="http://systemd.eventbrite.com/">EventBrite</a>. Attendance is
free. For more information see the <a
href="http://lists.freedesktop.org/archives/systemd-devel/2012-July/005803.html">invitation
mail</a>.</p>

<p>&#10149; See also: <a href="https://plus.google.com/u/0/events/cee28a21tk5lfv0u224kj6pa930/115547683951727699051">Google+ Event</a></p>

<p><i>See you in California!</i></p>

]]></description>
</item>

<item>
  <title>foss.in 2012 CFP Ends in a Few Hours</title>
  <link>http://0pointer.de/blog/projects/fossin2012.html</link>
  <description><![CDATA[

<p><a href="http://foss.in/">foss.in 2012 in Bangalore</a> takes place again after a
hiatus of some years. It has always been a fantastic conference, and a great opportunity to
visit Bangalore and India. I just submitted my talk proposals, so, hurry up, and <a href="http://foss.in/participate/call-for-participation">submit yours</a>!</p>

]]></description>
</item>

<item>
  <title>systemd for Administrators, Part XV</title>
  <link>http://0pointer.de/blog/projects/watchdog.html</link>
  <description><![CDATA[

<p><a
href="http://0pointer.de/blog/projects/self-documented-boot.html">Quickly
following the previous iteration</a>, <a
href="http://0pointer.de/blog/projects/systemctl-journal.html">here's</a>
<a href="http://0pointer.de/blog/projects/security.html">now</a> <a
href="http://0pointer.de/blog/projects/inetd.html">the</a> <a
href="http://0pointer.de/blog/projects/instances.html">fifteenth</a>
<a
href="http://0pointer.de/blog/projects/on-etc-sysinit.html">installment</a>
<a
href="http://0pointer.de/blog/projects/the-new-configuration-files.html">of</a>

<a href="http://0pointer.de/blog/projects/blame-game.html">my</a> <a
href="http://0pointer.de/blog/projects/changing-roots">ongoing</a> <a
href="http://0pointer.de/blog/projects/three-levels-of-off.html">series</a>
<a
href="http://0pointer.de/blog/projects/systemd-for-admins-4.html">on</a>
<a
href="http://0pointer.de/blog/projects/systemd-for-admins-3.html">systemd</a>
<a
href="http://0pointer.de/blog/projects/systemd-for-admins-2.html">for</a>
<a
href="http://0pointer.de/blog/projects/systemd-for-admins-1.html">Administrators</a>:</p>

<h4>Watchdogs</h4>

<p>There are three big target audiences we try to cover with <a
href="http://www.freedesktop.org/wiki/Software/systemd/">systemd</a>:
the embedded/mobile folks, the desktop people and the server
folks. While the systems used by embedded/mobile tend to be
underpowered and have few resources are available, desktops tend to be
much more powerful machines -- but still much less resourceful than
servers. Nonetheless there are surprisingly many features that matter
to both extremes of this axis (embedded and servers), but not the
center (desktops). On of them is support for <a
href="https://en.wikipedia.org/wiki/Watchdog_timer">watchdogs</a> in
hardware and software.</p>

<p>Embedded devices frequently rely on watchdog hardware that resets
it automatically if software stops responding (more specifically,
stops signalling the hardware in fixed intervals that it is still
alive). This is required to increase reliability and make sure that
regardless what happens the best is attempted to get the system
working again. Functionality like this makes little sense on the
desktop<sup>[1]</sup>. However, on
high-availability servers watchdogs are frequently used, again.</p>

<p>Starting with version 183 systemd provides full support for
hardware watchdogs (as exposed in <tt>/dev/watchdog</tt> to
userspace), as well as supervisor (software) watchdog support for
invidual system services. The basic idea is the following: if enabled,
systemd will regularly ping the watchdog hardware. If systemd or the
kernel hang this ping will not happen anymore and the hardware will
automatically reset the system. This way systemd and the kernel are
protected from boundless hangs -- by the hardware. To make the chain
complete, systemd then exposes a software watchdog interface for
individual services so that they can also be restarted (or some other
action taken) if they begin to hang. This software watchdog logic can
be configured individually for each service in the ping frequency and
the action to take. Putting both parts together (i.e. hardware
watchdogs supervising systemd and the kernel, as well as systemd
supervising all other services) we have a reliable way to watchdog
every single component of the system.</p>

<p>To make use of the hardware watchdog it is sufficient to set the
<tt>RuntimeWatchdogSec=</tt> option in
<tt>/etc/systemd/system.conf</tt>. It defaults to 0 (i.e. no hardware
watchdog use). Set it to a value like 20s and the watchdog is
enabled. After 20s of no keep-alive pings the hardware will reset
itself. Note that systemd will send a ping to the hardware at half the
specified interval, i.e. every 10s. And that's already all there is to
it. By enabling this single, simple option you have turned on
supervision by the hardware of systemd and the kernel beneath
it.<sup>[2]</sup></p>

<p>Note that the hardware watchdog device (<tt>/dev/watchdog</tt>) is
single-user only. That means that you can either enable this
functionality in systemd, or use a separate external watchdog daemon,
such as the aptly named <a
href="http://linux.die.net/man/8/watchdog">watchdog</a>.</p>

<p><tt>ShutdownWatchdogSec=</tt> is another option that can be
configured in <tt>/etc/systemd/system.conf</tt>. It controls the
watchdog interval to use during reboots. It defaults to 10min, and
adds extra reliability to the system reboot logic: if a clean reboot
is not possible and shutdown hangs, we rely on the watchdog hardware
to reset the system abruptly, as extra safety net.</p>

<p>So much about the hardware watchdog logic. These two options are
really everything that is necessary to make use of the hardware
watchdogs. Now, let's have a look how to add watchdog logic to
individual services.</p>

<p>First of all, to make software watchdog-supervisable it needs to be
patched to send out "I am alive" signals in regular intervals in its
event loop. Patching this is relatively easy. First, a daemon needs to
read the <tt>WATCHDOG_USEC=</tt> environment variable. If it is set,
it will contain the watchdog interval in usec formatted as ASCII text
string, as it is configured for the service. The daemon should then
issue <tt><a
href="http://www.freedesktop.org/software/systemd/man/sd_notify.html">sd_notify</a>("WATCHDOG=1")</tt>
calls every half of that interval. A daemon patched this way should
transparently support watchdog functionality by checking whether the
environment variable is set and honouring the value it is set to.</p>

<p>To enable the software watchdog logic for a service (which has been
patched to support the logic pointed out above) it is sufficient to
set the <tt>WatchdogSec=</tt> to the desired failure latency. See <a
href="http://www.freedesktop.org/software/systemd/man/systemd.service.html">systemd.service(5)</a>
for details on this setting. This causes <tt>WATCHDOG_USEC=</tt> to be
set for the service's processes and will cause the service to enter a
failure state as soon as no keep-alive ping is received within the
configured interval.</p>

<p>If a service enters a failure state as soon as the watchdog logic
detects a hang, then this is hardly sufficient to build a reliable
system. The next step is to configure whether the service shall be
restarted and how often, and what to do if it then still fails. To
enable automatic service restarts on failure set
<tt>Restart=on-failure</tt> for the service. To configure how many
times a service shall be attempted to be restarted use the combination
of <tt>StartLimitBurst=</tt> and <tt>StartLimitInterval=</tt> which
allow you to configure how often a service may restart within a time
interval. If that limit is reached, a special action can be
taken. This action is configured with <tt>StartLimitAction=</tt>. The
default is a <tt>none</tt>, i.e. that no further action is taken and
the service simply remains in the failure state without any further
attempted restarts. The other three possible values are
<tt>reboot</tt>, <tt>reboot-force</tt> and
<tt>reboot-immediate</tt>. <tt>reboot</tt> attempts a clean reboot,
going through the usual, clean shutdown logic. <tt>reboot-force</tt>
is more abrupt: it will not actually try to cleanly shutdown any
services, but immediately kills all remaining services and unmounts
all file systems and then forcibly reboots (this way all file systems
will be clean but reboot will still be very fast). Finally,
<tt>reboot-immediate</tt> does not attempt to kill any process or
unmount any file systems. Instead it just hard reboots the machine
without delay. <tt>reboot-immediate</tt> hence comes closest to a
reboot triggered by a hardware watchdog. All these settings are
documented in <a
href="http://www.freedesktop.org/software/systemd/man/systemd.service.html">systemd.service(5)</a>.</p>

<p>Putting this all together we now have pretty flexible options to
watchdog-supervise a specific service and configure automatic restarts
of the service if it hangs, plus take ultimate action if that doesn't
help.</p>

<p>Here's an example unit file:</p>

<pre>[Unit]
Description=My Little Daemon
Documentation=man:mylittled(8)

[Service]
ExecStart=/usr/bin/mylittled
WatchdogSec=30s
Restart=on-failure
StartLimitInterval=5min
StartLimitBurst=4
StartLimitAction=reboot-force
</pre>

<p>This service will automatically be restarted if it hasn't pinged
the system manager for longer than 30s or if it fails otherwise. If it
is restarted this way more often than 4 times in 5min action is taken
and the system quickly rebooted, with all file systems being clean
when it comes up again.</p>

<p>And that's already all I wanted to tell you about! With hardware
watchdog support right in PID 1, as well as supervisor watchdog
support for individual services we should provide everything you need
for most watchdog usecases. Regardless if you are building an embedded
or mobile applience, or if your are working with high-availability
servers, please give this a try!</p>

<p>(Oh, and if you wonder why in heaven PID 1 needs to deal with
<tt>/dev/watchdog</tt>, and why this shouldn't be kept in a separate
daemon, then please read this again and try to understand that this is
all about the supervisor chain we are building here, where the hardware watchdog
supervises systemd, and systemd supervises the individual
services. Also, we believe that a service not responding should be
treated in a similar way as any other service error. Finally, pinging
<tt>/dev/watchdog</tt> is one of the most trivial operations in the OS
(basically little more than a ioctl() call), to the support for this
is not more than a handful lines of code. Maintaining this externally
with complex IPC between PID 1 (and the daemons) and this watchdog
daemon would be drastically more complex, error-prone and resource
intensive.)</p>

<p>Note that the built-in hardware watchdog support of systemd does
not conflict with other watchdog software by default. systemd does not
make use of <tt>/dev/watchdog</tt> by default, and you are welcome to
use external watchdog daemons in conjunction with systemd, if this
better suits your needs.</p>

<p>And one last thing: if you wonder whether your hardware has a
watchdog, then the answer is: almost definitely yes -- if it is anything more
recent than a few years. If you want to verify this, try the <a
href="http://karelzak.blogspot.de/2012/05/eject1-sulogin1-wdctl1.html">wdctl</a>
tool from recent util-linux, which shows you everything you need to
know about your watchdog hardware.</p>

<p>I'd like to thank the great folks from <a
href="http://www.pengutronix.de/">Pengutronix</a> for contributing
most of the watchdog logic. Thank you!</p>

<p><small><b>Footnotes</b></small></p>

<p><small>[1] Though actually most desktops tend to include watchdog
hardware these days too, as this is cheap to build and available in
most modern PC chipsets.</small></p>

<p><small>[2] So, here's a free tip for you if you hack on the core
OS: don't enable this feature while you hack. Otherwise your system
might suddenly reboot if you are in the middle of tracing through PID
1 with gdb and cause it to be stopped for a moment, so that no
hardware ping can be done...</small></p>

]]></description>
</item>

<item>
  <title>systemd for Administrators, Part XIV</title>
  <link>http://0pointer.de/blog/projects/self-documented-boot.html</link>
  <description><![CDATA[

<p><a
href="http://0pointer.de/blog/projects/systemctl-journal.html">And</a>
<a href="http://0pointer.de/blog/projects/security.html">here's</a> <a
href="http://0pointer.de/blog/projects/inetd.html">the</a> <a
href="http://0pointer.de/blog/projects/instances.html">fourteenth</a>
<a
href="http://0pointer.de/blog/projects/on-etc-sysinit.html">installment</a>
<a
href="http://0pointer.de/blog/projects/the-new-configuration-files.html">of</a>

<a href="http://0pointer.de/blog/projects/blame-game.html">my</a> <a
href="http://0pointer.de/blog/projects/changing-roots">ongoing</a> <a
href="http://0pointer.de/blog/projects/three-levels-of-off.html">series</a>
<a
href="http://0pointer.de/blog/projects/systemd-for-admins-4.html">on</a>
<a
href="http://0pointer.de/blog/projects/systemd-for-admins-3.html">systemd</a>
<a
href="http://0pointer.de/blog/projects/systemd-for-admins-2.html">for</a>
<a
href="http://0pointer.de/blog/projects/systemd-for-admins-1.html">Administrators</a>:</p>

<h4>The Self-Explanatory Boot</h4>

<p>One complaint we often hear about <a
href="http://www.freedesktop.org/wiki/Software/systemd">systemd</a> is
that its boot process was hard to understand, even
incomprehensible. In general I can only disagree with this sentiment, I
even believe in quite the opposite: in comparison to what we had
before -- where to even remotely understand what was going on you had
to have a decent comprehension of the programming language that is
Bourne Shell<sup>[1]</sup> -- understanding systemd's boot process is
substantially easier. However, like in many complaints there is some
truth in this frequently heard discomfort: for a seasoned Unix
administrator there indeed is a bit of learning to do when the switch
to <a
href="http://www.freedesktop.org/wiki/Software/systemd">systemd</a> is
made. And as systemd developers it is our duty to make the learning
curve shallow, introduce as few surprises as we can, and provide
good documentation where that is not possible.</p>

<p>systemd always had huge body of documentation <a
href="http://www.freedesktop.org/software/systemd/man/">as manual
pages</a> (nearly 100 individual pages now!), in the <a
href="http://www.freedesktop.org/wiki/Software/systemd">Wiki</a> and
the various blog stories I posted. However, any amount of
documentation alone is not enough to make software easily
understood. In fact, thick manuals sometimes appear intimidating and
make the reader wonder where to start reading, if all he was
interested in was this one simple concept of the whole system.</p>

<p>Acknowledging all this we have now added a new, neat, little
feature to systemd: the self-explanatory boot process. What do we mean
by that? Simply that each and every single component of our boot comes
with documentation and that this documentation is closely linked to
its component, so that it is easy to find.</p>

<p>More specifically, all units in systemd (which are what
encapsulate the components of the boot) now include references to
their documentation, the documentation of their configuration files
and further applicable manuals. A user who is trying to understand the
purpose of a unit, how it fits into the boot process and how to
configure it can now easily look up this documentation with the
well-known <tt>systemctl status</tt> command. Here's an example how
this looks for <tt>systemd-logind.service</tt>:</p>

<pre>
$ systemctl status systemd-logind.service
systemd-logind.service - Login Service
	  Loaded: loaded (/usr/lib/systemd/system/systemd-logind.service; static)
	  Active: active (running) since Mon, 25 Jun 2012 22:39:24 +0200; 1 day and 18h ago
	    Docs: <a href="http://www.freedesktop.org/software/systemd/man/systemd-logind.service.html">man:systemd-logind.service(7)</a>
	          <a href="http://www.freedesktop.org/software/systemd/man/logind.conf.html">man:logind.conf(5)</a>
	          <a href="http://www.freedesktop.org/wiki/Software/systemd/multiseat">http://www.freedesktop.org/wiki/Software/systemd/multiseat</a>
	Main PID: 562 (systemd-logind)
	  CGroup: name=systemd:/system/systemd-logind.service
		  └ 562 /usr/lib/systemd/systemd-logind

Jun 25 22:39:24 epsilon systemd-logind[562]: Watching system buttons on /dev/input/event2 (Power Button)
Jun 25 22:39:24 epsilon systemd-logind[562]: Watching system buttons on /dev/input/event6 (Video Bus)
Jun 25 22:39:24 epsilon systemd-logind[562]: Watching system buttons on /dev/input/event0 (Lid Switch)
Jun 25 22:39:24 epsilon systemd-logind[562]: Watching system buttons on /dev/input/event1 (Sleep Button)
Jun 25 22:39:24 epsilon systemd-logind[562]: Watching system buttons on /dev/input/event7 (ThinkPad Extra Buttons)
Jun 25 22:39:25 epsilon systemd-logind[562]: New session 1 of user gdm.
Jun 25 22:39:25 epsilon systemd-logind[562]: Linked /tmp/.X11-unix/X0 to /run/user/42/X11-display.
Jun 25 22:39:32 epsilon systemd-logind[562]: New session 2 of user lennart.
Jun 25 22:39:32 epsilon systemd-logind[562]: Linked /tmp/.X11-unix/X0 to /run/user/500/X11-display.
Jun 25 22:39:54 epsilon systemd-logind[562]: Removed session 1.
</pre>

<p>On the first look this output changed very little. If you look
closer however you will find that it now includes one new field:
<tt>Docs</tt> lists references to the documentation of this
service. In this case there are two man page URIs and one web URL
specified. The man pages describe the purpose and configuration of
this service, the web URL includes an introduction to the basic
concepts of this service.</p>

<p>If the user uses a recent graphical terminal implementation it is
sufficient to click on the URIs shown to get the respective
documentation<sup>[2]</sup>. With other words: it never has been that
easy to figure out what a specific component of our boot is about:
just use <tt>systemctl status</tt> to get more information about it
and click on the links shown to find the documentation.</p>

<p>The past days I have written man pages and added these references
for every single unit we ship with systemd. This means, with
<tt>systemctl status</tt> you now have a very easy way to find out
more about every single service of the core OS.</p>

<p>If you are not using a graphical terminal (where you can just click
on URIs), a man page URI in the middle of the output of <tt>systemctl
status</tt> is not the most useful thing to have. To make reading the
referenced man pages easier we have also added a new command:</p>

<pre>systemctl help systemd-logind.service</pre>

<p>Which will open the listed man pages right-away, without the need
to click anything or copy/paste an URI.</p>

<p>The URIs are in the formats documented by the <a
href="https://www.kernel.org/doc/man-pages/online/pages/man7/url.7.html">uri(7)</a>
man page. Units may reference http and https URLs, as well as man and
info pages.</p>

<p>Of course all this doesn't make everything self-explanatory, simply
because the user still has to find out about <tt>systemctl status</tt>
(and even <tt>systemctl</tt> in the first place so that he even knows
what units there are); however with this basic knowledge further
help on specific units is in very easy reach.</p>

<p>We hope that this kind of interlinking of runtime behaviour and the
matching documentation is a big step forward to make our boot easier
to understand.</p>

<p>This functionality is partially already available in Fedora 17, and
will show up in complete form in Fedora 18.</p>

<p>That all said, credit where credit is due: this kind of references
to documentation within the service descriptions is not new, Solaris'
SMF had similar functionality for quite some time. However, we believe
this new systemd feature is certainly a novelty on Linux, and with
systemd we now offer you the best documented and best self-explaining
init system.</p>

<p>Of course, if you are writing unit files for your own packages,
please consider also including references to the documentation of your
services and its configuration. This is really easy to do, just list
the URIs in the new <tt>Documentation=</tt> field in the
<tt>[Unit]</tt> section of your unit files. For details see <a
href="http://www.freedesktop.org/software/systemd/man/systemd.unit.html">systemd.unit(5)</a>. The
more comprehensively we include links to documentation in our OS
services the easier the work of administrators becomes. (To make sure
Fedora makes comprehensive use of this functionality <a
href="https://fedorahosted.org/fpc/ticket/192">I filed a bug on
FPC</a>).</p>

<p>Oh, and BTW: if you are looking for a rough overview of systemd's
boot process <a
href="http://www.freedesktop.org/software/systemd/man/bootup.html">here's
another new man page we recently added</a>, which includes a pretty
ASCII flow chart of the boot process and the units involved.</p>

<p><small><b>Footnotes</b></small></p>

<p><small>[1] Which TBH is a pretty crufty, strange one on top.</small></p>

<p><small>[2] Well, <a
href="https://bugzilla.gnome.org/show_bug.cgi?id=676452">a terminal
where this bug is fixed</a> (used together with <a
href="https://bugzilla.gnome.org/show_bug.cgi?id=676482">a help
browser where this one is fixed</a>).</small></p>

]]></description>
</item>

<item>
  <title>Presentation in Warsaw</title>
  <link>http://0pointer.de/blog/projects/warsaw.html</link>
  <description><![CDATA[

<p>I recently had the chance to speak about <a
href="http://www.freedesktop.org/wiki/Software/systemd">systemd</a>
and other projects, as well as the politics behind them at a <a
href="http://osec.pl/barcamp/lennart">Bar Camp in Warsaw</a>,
organized by the fine people of <a
href="http://osec.pl/">OSEC</a>. The presentation has been recorded,
and has now been posted online. It's a very long recording (1:43h),
but it's quite interesting (as I'd like to believe) and contains a bit
of background where we are coming from and where are going to. Anyway,
please have a look. Enjoy!</p>

<iframe width="560" height="315" src="http://www.youtube.com/embed/9UnEV9SPuw8" frameborder="0" allowfullscreen="1"></iframe>

<p>I'd like to thank the organizers for this great event and for
publishing the recording online.</p>

]]></description>
</item>

<item>
  <title>systemd for Administrators, Part XIII</title>
  <link>http://0pointer.de/blog/projects/systemctl-journal.html</link>
  <description><![CDATA[

<p><a href="http://0pointer.de/blog/projects/security.html">Here's</a>
<a href="http://0pointer.de/blog/projects/inetd.html">the</a> <a
href="http://0pointer.de/blog/projects/instances.html">thirteenth</a> <a
href="http://0pointer.de/blog/projects/on-etc-sysinit.html">installment</a>
<a
href="http://0pointer.de/blog/projects/the-new-configuration-files.html">of</a>

<a href="http://0pointer.de/blog/projects/blame-game.html">my</a> <a
href="http://0pointer.de/blog/projects/changing-roots">ongoing</a> <a
href="http://0pointer.de/blog/projects/three-levels-of-off.html">series</a>
<a
href="http://0pointer.de/blog/projects/systemd-for-admins-4.html">on</a>
<a
href="http://0pointer.de/blog/projects/systemd-for-admins-3.html">systemd</a>
<a
href="http://0pointer.de/blog/projects/systemd-for-admins-2.html">for</a>
<a
href="http://0pointer.de/blog/projects/systemd-for-admins-1.html">Administrators</a>:</p>

<h4>Log and Service Status</h4>

<p>This one is a short episode. One of the most commonly used commands
on a <a
href="http://www.freedesktop.org/wiki/Software/systemd">systemd</a>
system is <tt>systemctl status</tt> which may be used to determine the
status of a service (or other unit). It always has been a valuable
tool to figure out the processes, runtime information and other meta
data of a daemon running on the system.</p>

<p>With Fedora 17 we introduced <a
href="http://0pointer.de/blog/projects/the-journal.html">the
journal</a>, our new logging scheme that provides structured, indexed
and reliable logging on systemd systems, while providing a certain
degree of compatibility with classic syslog implementations. The
original reason we started to work on the journal was one specific
feature idea, that to the outsider might appear simple but without the
journal is difficult and inefficient to implement: along with the
output of <tt>systemctl status</tt> we wanted to show the last 10 log
messages of the daemon. Log data is some of the most essential bits of
information we have on the status of a service. Hence it it is an
obvious choice to show next to the general status of the
service.</p>

<p>And now to make it short: at the same time as we integrated the
journal into <tt>systemd</tt> and Fedora we also hooked up
<tt>systemctl</tt> with it. Here's an example output:</p>

<pre>$ systemctl status avahi-daemon.service
avahi-daemon.service - Avahi mDNS/DNS-SD Stack
	  Loaded: loaded (/usr/lib/systemd/system/avahi-daemon.service; enabled)
	  Active: active (running) since Fri, 18 May 2012 12:27:37 +0200; 14s ago
	Main PID: 8216 (avahi-daemon)
	  Status: "avahi-daemon 0.6.30 starting up."
	  CGroup: name=systemd:/system/avahi-daemon.service
		  ├ 8216 avahi-daemon: running [omega.local]
		  └ 8217 avahi-daemon: chroot helper

May 18 12:27:37 omega avahi-daemon[8216]: Joining mDNS multicast group on interface eth1.IPv4 with address 172.31.0.52.
May 18 12:27:37 omega avahi-daemon[8216]: New relevant interface eth1.IPv4 for mDNS.
May 18 12:27:37 omega avahi-daemon[8216]: Network interface enumeration completed.
May 18 12:27:37 omega avahi-daemon[8216]: Registering new address record for 192.168.122.1 on virbr0.IPv4.
May 18 12:27:37 omega avahi-daemon[8216]: Registering new address record for fd00::e269:95ff:fe87:e282 on eth1.*.
May 18 12:27:37 omega avahi-daemon[8216]: Registering new address record for 172.31.0.52 on eth1.IPv4.
May 18 12:27:37 omega avahi-daemon[8216]: Registering HINFO record with values 'X86_64'/'LINUX'.
May 18 12:27:38 omega avahi-daemon[8216]: Server startup complete. Host name is omega.local. Local service cookie is 3555095952.
May 18 12:27:38 omega avahi-daemon[8216]: Service "omega" (/services/ssh.service) successfully established.
May 18 12:27:38 omega avahi-daemon[8216]: Service "omega" (/services/sftp-ssh.service) successfully established.</pre>

<p>This, of course, shows the status of everybody's favourite
mDNS/DNS-SD daemon with a list of its processes, along with -- as
promised -- the 10 most recent log lines. Mission accomplished!</p>

<p>There are a couple of switches available to alter the output
slightly and adjust it to your needs. The two most interesting
switches are <tt>-f</tt> to enable follow mode (as in <tt>tail
-f</tt>) and <tt>-n</tt> to change the number of lines to show (you
guessed it, as in <tt>tail -n</tt>).</p>

<p>The log data shown comes from three sources: everything any of the
daemon's processes logged with libc's <tt>syslog()</tt> call,
everything submitted using the native Journal API, plus everything any
of the daemon's processes logged to STDOUT or STDERR. In short:
everything the daemon generates as log data is collected, properly
interleaved and shown in the same format.</p>

<p>And that's it already for today. It's a very simple feature, but an
immensely useful one for every administrator. One of the kind "Why didn't
we already do this 15 years ago?".</p>

<p>Stay tuned for the next installment!</p>

]]></description>
</item>

<item>
  <title>Boot &amp;amp; Base OS Miniconf at Linux Plumbers Conference 2012, San Diego</title>
  <link>http://0pointer.de/blog/projects/lpc2012.html</link>
  <description><![CDATA[

<p style="text-align: center"><a href="http://www.linuxplumbersconf.org/2012/"><img
src="http://www.linuxplumbersconf.org/2012/style/tagline.png" width="493"
height="90" alt="Linux Plumbers Conference Logo"/></a></p>

<p>We are working on putting together <a
href="http://wiki.linuxplumbersconf.org/2012:boot_and_base_os">a miniconf on
the topic of Boot &amp; Base OS</a> for the Linux Plumbers Conference 2012 in San
Diego (Aug 29-31). And we need your submission!</p>

<p>Are you working on some exciting project related to Boot and Base OS and
would like to present your work? Then please submit something <a
href="http://www.linuxplumbersconf.org/2012/2012-lpc-call-for-proposals-take-2/">following
these guidelines</a>, but please CC Kay Sievers and Lennart Poettering.</p>

<p>I hope that at this point the Linux Plumbers Conference
needs little introduction, so I will spare any further prose on how great and
useful and the best conference ever it is for everybody who works on the plumbing
layer of Linux. However, there's one conference that will be co-located with
LPC that is still little known, because it happens for the first time: <a
href="http://www.cconf.org/">The C Conference</a>, organized by Brandon Philips
and friends. It covers all things C, and they are still looking for more
topics, in a <a href="http://www.cconf.org/pfc/">reverse CFP</a>. Please
consider submitting a proposal and registering to the conference!</p>

<p style="text-align: center"><a href="http://www.cconf.org/"><img
src="http://www.cconf.org/assets/cconf.png" width="270" height="270" alt="C
Conference Logo"/></a></p>


]]></description>
</item>

<item>
  <title>The Most Awesome, Least-Advertised Fedora 17 Feature</title>
  <link>http://0pointer.de/blog/projects/multi-seat.html</link>
  <description><![CDATA[

<p>There's one feature In the upcoming Fedora 17 release that is
immensly useful but very little known, since its <a
href="https://fedoraproject.org/wiki/Features/ckremoval">feature page
'ckremoval'</a> does not explicitly refer to it in its name: true
<i>automatic multi-seat</i> support for Linux.</p>

<p>A multi-seat computer is a system that offers not only one local
seat for a user, but multiple, at the same time. A seat refers to a
combination of a screen, a set of input devices (such as mice and
keyboards), and maybe an audio card or webcam, as individual local
workplace for a user. A multi-seat computer can drive an entire class
room of seats with only a fraction of the cost in hardware, energy,
administration and space: you only have one PC, which usually has way
enough CPU power to drive 10 or more workplaces. (In fact, even a
Netbook has fast enough to drive a couple of seats!) <i>Automatic
multi-seat</i> refers to an entirely automatically managed seat setup:
whenever a new seat is plugged in a new login screen immediately
appears -- without any manual configuration --, and when the seat is
unplugged all user sessions on it are removed without delay.</p>

<p>In Fedora 17 we added this functionality to the low-level user and
device tracking of systemd, replacing the previous ConsoleKit logic
that lacked support for automatic multi-seat. With all the ground work
done in systemd, udev and the other components of our plumbing layer
the last remaining bits were surprisingly easy to add.</p>

<p>Currently, the automatic multi-seat logic works best with the USB
multi-seat hardware from <a
href="http://www.amazon.com/Plugable-Universal-DisplayLink-1920x1080-High-Speed/dp/B002PONXAI/ref=sr_1_3?ie=UTF8&amp;qid=1335904746&amp;sr=8-3">Plugable</a>
you can buy cheaply on <a
href="http://www.amazon.com/Plugable-DC-125-Docking-Station-Multiseat/dp/B004PXPPNA/ref=sr_1_10?ie=UTF8&amp;qid=1335904746&amp;sr=8-10">Amazon
(US)</a>. These devices require exactly zero configuration with the
new scheme implemented in Fedora 17: just plug them in at any time,
login screens pop up on them, and you have your additional
seats. Alternatively you can also assemble your seat manually with a
few easy <a
href="http://www.freedesktop.org/software/systemd/man/loginctl.html">loginctl
attach</a> commands, from any kind of hardware you might have lying
around. To get a full seat you need multiple graphics cards, keyboards
and mice: one set for each seat. (Later on we'll probably have a graphical
setup utility for additional seats, but that's not a pressing issue we
believe, as the plug-n-play multi-seat support with the Plugable
devices is so awesomely nice.)</p>

<p>Plugable provided us for free with hardware for testing
multi-seat. They are also involved with the upstream development of
the USB DisplayLink driver for Linux. Due to their positive
involvement with Linux we can only recommend to buy their
hardware. They are good guys, and support Free Software the way all
hardware vendors should! (And besides that, their hardware is also
nicely put together. For example, in contrast to most similar vendors
they actually assign proper vendor/product IDs to their USB hardware
so that we can easily recognize their hardware when plugged in to set
up automatic seats.)</p>

<p>Currently, all this magic is only implemented in the GNOME stack
with the biggest component getting updated being the GNOME Display
Manager. On the Plugable USB hardware you get a full GNOME Shell
session with all the usual graphical gimmicks, the same way as on any
other hardware. (Yes, GNOME 3 works perfectly fine on simpler graphics
cards such as these USB devices!) If you are hacking on a different
desktop environment, or on a different display manager, please have a
look at <a
href="http://www.freedesktop.org/wiki/Software/systemd/multiseat">the
multi-seat documentation</a> we put together, and particularly at
our short piece about <a
href="http://www.freedesktop.org/wiki/Software/systemd/writing-display-managers">writing
display managers</a> which are multi-seat capable.</p>

<p>If you work on a major desktop environment or display manager and
would like to implement multi-seat support for it, but lack the
aforementioned Plugable hardware, we might be able to provide you with
the hardware for free. Please contact us directly, and we might be
able to send you a device. Note that we don't have unlimited devices
available, hence we'll probably not be able to pass hardware to
everybody who asks, and we will pass the hardware preferably to people
who work on well-known software or otherwise have contributed good
code to the community already. Anyway, if in doubt, ping us, and
explain to us why you should get the hardware, and we'll consider you!
(Oh, and this not only applies to display managers, if you hack on some other
software where multi-seat awareness would be truly useful, then don't
hesitate and ping us!)</p>

<p>Phoronix has <a
href="http://www.phoronix.com/scan.php?page=article&amp;item=plugable_multiseat_kick">this
story about this new multi-seat</a> support which is quite interesting and
full of pictures. Please have a look.</p>

<p>Plugable started a <a
href="http://www.kickstarter.com/projects/1666707630/plugable-thin-client-the-50-computer">Pledge
drive</a> to lower the price of the Plugable USB multi-seat terminals
further. It's full of pictures (<a href="http://www.kickstarter.com/projects/1666707630/plugable-thin-client-the-50-computer/widget/video.html"><b>and a video showing all this in action!</b></a>), and uses the code we now make
available in Fedora 17 as base. Please consider pledging a few
bucks.</p>

<p>Recently David Zeuthen <a
href="https://plus.google.com/110773474140772402317/posts/NqPUifsFUYH">added
multi-seat support to udisks</a> as well. With this in place, a user
logged in on a specific seat can only see the USB storage plugged into
his individual seat, but does not see any USB storage plugged into any
other local seat. With this in place we closed the last missing bit of
multi-seat support in our desktop stack.</p>

<p>With this code in Fedora 17 we cover the big use cases of
multi-seat already: internet cafes, class rooms and similar
installations can provide PC workplaces cheaply and easily without any
manual configuration. Later on we want to build on this and make this
useful for different uses too: for example, the ability to get a login
screen as easily as plugging in a USB connector makes this not useful
only for saving money in setups for many people, but also in embedded
environments (consider monitoring/debugging screens made available via
this hotplug logic) or servers (get trivially quick local access to
your otherwise head-less server). To be truly useful in these areas we
need one more thing though: the ability to run a simply getty
(i.e. text login) on the seat, without necessarily involving a
graphical UI.</p>

<p>The well-known X successor Wayland already comes out of the box with multi-seat
support based on this logic.</p>

<p>Oh, and BTW, as Ubuntu appears to be "<i>focussing</i>" on "<i>clarity</i>" in the
"<i>cloud</i>" now ;-), and chose Upstart instead of systemd, this feature
won't be available in Ubuntu any time soon. That's (one detail of) the
price Ubuntu has to pay for choosing to maintain it's own (largely
legacy, such as ConsoleKit) plumbing stack.</p>

<p>Multi-seat has a long history on Unix. Since the earliest days Unix
systems could be accessed by multiple local terminals at the same
time. Since then local terminal support (and hence multi-seat)
gradually moved out of view in computing. The fewest machines these
days have more than one seat, the concept of terminals survived almost
exclusively in the context of PTYs (i.e. fully virtualized API
objects, disconnected from any real hardware seat) and VCs (i.e. a
single virtualized local seat), but almost not in any other way (well,
server setups still use serial terminals for emergency remote access,
but they almost never have more than one serial terminal). All what we
do in systemd is based on the ideas originally brought forward in
Unix; with systemd we now try to bring back a number of the good ideas
of Unix that since the old times were lost on the roadside. For
example, in true Unix style we already started to expose the concept
of a service in the file system (in
<tt>/sys/fs/cgroup/systemd/system/</tt>), something where on Linux the
(often misunderstood) "<i>everything is a file</i>" mantra previously
fell short. With automatic multi-seat support we bring back support
for terminals, but updated with all the features of today's desktops:
plug and play, zero configuration, full graphics, and not limited to
input devices and screens, but extending to all kinds of devices, such
as audio, webcams or USB memory sticks.</p>

<p>Anyway, this is all for now; I'd like to thank everybody who was
involved with making multi-seat work so nicely and natively on the
Linux platform. You know who you are! Thanks a ton!</p>

]]></description>
</item>

<item>
  <title>systemd Status Update</title>
  <link>http://0pointer.de/blog/projects/systemd-update-3.html</link>
  <description><![CDATA[

<p><a href="http://0pointer.de/blog/projects/systemd-update-2.html">It
has been way too long since my last status update on
systemd</a>. Here's another short, incomprehensive status update on
what we worked on for <a
href="http://freedesktop.org/wiki/Software/systemd">systemd</a> since
then.</p>

<p>We have been working hard to turn systemd into the most viable set
of components to build operating systems, appliances and devices from,
and make it the best choice for servers, for desktops and for embedded
environments alike. I think we have a really convincing set of
features now, but we are actively working on making it even
better.</p>

<p>Here's a list of some more and some less interesting features, in
no particular order:</p>

<ol>

<li>We added an automatic pager to <tt>systemctl</tt> (and related tools), similar
to how <tt>git</tt> has it.</li>

<li><tt>systemctl</tt> learnt a new switch <tt>--failed</tt>, to show only
failed services.</li>

<li>You may now start services immediately, overrding all dependency
logic by passing <tt>--ignore-dependencies</tt> to
<tt>systemctl</tt>. This is mostly a debugging tool and nothing people
should use in real life.</li>

<li>Sending <tt>SIGKILL</tt> as final part of the implicit shutdown
logic of services is now optional and may be configured with the
<tt>SendSIGKILL=</tt> option individually for each service.</li>

<li>We split off the Vala/Gtk tools into its own project <tt>systemd-ui</tt>.</li>

<li><tt>systemd-tmpfiles</tt> learnt file globbing and creating FIFO
special files as well as character and block device nodes, and
symlinks. It also is capable of relabelling certain directories at
boot now (in the SELinux sense).</li>

<li>Immediately before shuttding dow we will now invoke all binaries
found in <tt>/lib/systemd/system-shutdown/</tt>, which is useful for
debugging late shutdown.</li>

<li>You may now globally control where STDOUT/STDERR of services goes
(unless individual service configuration overrides it).</li>

<li>There's a new <tt>ConditionVirtualization=</tt> option, that makes
systemd skip a specific service if a certain virtualization technology
is found or not found. Similar, we now have a new option to detect
whether a certain security technology (such as SELinux) is available,
called <tt>ConditionSecurity=</tt>. There's also
<tt>ConditionCapability=</tt> to check whether a certain process
capability is in the capability bounding set of the system. There's
also a new <tt>ConditionFileIsExecutable=</tt>,
<tt>ConditionPathIsMountPoint=</tt>,
<tt>ConditionPathIsReadWrite=</tt>,
<tt>ConditionPathIsSymbolicLink=</tt>.</li>

<li>The file system condition directives now support globbing.</li>

<li>Service conditions may now be "triggering" and "mandatory", meaning that
they can be a necessary requirement to hold for a service to start, or
simply one trigger among many.</li>

<li>At boot time we now print warnings if: <a
href="http://freedesktop.org/wiki/Software/systemd/separate-usr-is-broken"><tt>/usr</tt>
is on a split-off partition but not already mounted by an initrd</a>;
if <tt>/etc/mtab</tt> is not a symlink to <tt>/proc/mounts</tt>; <a
href="http://0pointer.de/blog/projects/cgroups-vs-cgroups.html">CONFIG_CGROUPS
is not enabled in the kernel</a>. We'll also expose this as
<i>tainted</i> flag on the bus.</li>

<li>You may now boot the same OS image on a bare metal machine and in
Linux namespace containers and will get a clean boot in both
cases. This is more complicated than it sounds since device management
with udev or write access to <tt>/sys</tt>, <tt>/proc/sys</tt> or
things like <tt>/dev/kmsg</tt> is not available in a container. This
makes systemd a first-class choice for managing thin container
setups. This is all tested with systemd's own <tt>systemd-nspawn</tt>
tool but should work fine in LXC setups, too. Basically this means
that you do not have to adjust your OS manually to make it work in a
container environment, but will just work out of the box. It also
makes it easier to convert real systems into containers.</li>

<li>We now automatically spawn gettys on HVC ttys when booting in VMs.</li>

<li>We introduced <tt>/etc/machine-id</tt> as a generalization of
D-Bus machine ID logic. See <a
href="http://0pointer.de/blog/projects/the-new-configuration-files.html">this
blog story for more information</a>. On stateless/read-only systems
the machine ID is initialized randomly at boot. In virtualized
environments it may be passed in from the machine manager (with qemu's
<tt>-uuid</tt> switch, or via the <a
href="http://www.freedesktop.org/wiki/Software/systemd/ContainerInterface">container
interface</a>).</li>

<li>All of the systemd-specific <tt>/etc/fstab</tt> mount options are
now in the <tt>x-systemd-<i>xyz</i></tt> format.</li>

<li>To make it easy to find non-converted services we will now
implicitly prefix all LSB and SysV init script descriptions with the
strings "<tt>LSB:</tt>" resp. "<tt>SYSV:</tt>".</li>

<li>We introduced <tt>/run</tt> and made it a hard dependency of
systemd. This directory is now widely accepted and implemented on all
relevant Linux distributions.</li>

<li>systemctl can now execute all its operations remotely too (<tt>-H</tt> switch).</li>

<li>We now ship <a
href="http://0pointer.de/blog/projects/changing-roots.html">systemd-nspawn</a>,
a really powerful tool that can be used to start containers for
debugging, building and testing, much like chroot(1). It is useful to
just get a shell inside a build tree, but is good enough to boot up a
full system in it, too.</li>

<li>If we query the user for a hard disk password at boot he may hit
TAB to hide the asterisks we normally show for each key that is
entered, for extra paranoia.</li>

<li>We don't enable <tt>udev-settle.service</tt> anymore, which is
only required for certain legacy software that still hasn't been
updated to follow devices coming and going cleanly.</li>

<li>We now include a tool that can plot boot speed graphs, similar to
bootchartd, called <a href="http://0pointer.de/blog/projects/blame-game.html"><tt>systemd-analyze</tt></a>.</li>

<li>At boot, we now initialize the kernel's <tt>binfmt_misc</tt> logic with the data from <tt>/etc/binfmt.d</tt>.</li>

<li><tt>systemctl</tt> now recognizes if it is run in a <tt>chroot()</tt>
environment and will work accordingly (i.e. apply changes to the tree
it is run in, instead of talking to the actual PID 1 for this). It also has a new <tt>--root=</tt> switch to work on an OS tree from outside of it.</li>

<li>There's a new unit dependency type <tt>OnFailureIsolate=</tt> that
allows entering a different target whenever a certain unit fails. For
example, this is interesting to enter emergency mode if file system
checks of crucial file systems failed.</li>

<li>Socket units may now listen on Netlink sockets, special files
from <tt>/proc</tt> and POSIX message queues, too.</li>

<li>There's a new <tt>IgnoreOnIsolate=</tt> flag which may be used to
ensure certain units are left untouched by isolation requests. There's
a new <tt>IgnoreOnSnapshot=</tt> flag which may be used to exclude
certain units from snapshot units when they are created.</li>

<li>There's now small mechanism services <a
href="http://www.freedesktop.org/wiki/Software/systemd/hostnamed">for
changing the local hostname and other host meta data</a>, <a
href="http://www.freedesktop.org/wiki/Software/systemd/localed">changing
the system locale and console settings</a> and the <a
href="http://www.freedesktop.org/wiki/Software/systemd/timedated">system
clock</a>.</li>

<li>We now limit the capability bounding set for a number of our
internal services by default.</li>

<li>Plymouth may now be disabled globally with
<tt>plymouth.enable=0</tt> on the kernel command line.</li>

<li>We now disallocate VTs when a getty finished running (and
optionally other tools run on VTs). This adds extra security since it
clears up the scrollback buffer so that subsequent users cannot get
access to a user's session output.</li>

<li>In socket units there are now options to control the
<tt>IP_TRANSPARENT</tt>, <tt>SO_BROADCAST</tt>, <tt>SO_PASSCRED</tt>,
<tt>SO_PASSSEC</tt> socket options.</li>

<li>The receive and send buffers of socket units may now be set larger
than the default system settings if needed by using
SO_{RCV,SND}BUFFORCE.</li>

<li>We now set the hardware timezone as one of the first things in PID
1, in order to avoid time jumps during normal userspace operation, and
to guarantee sensible times on all generated logs. We also no longer
save the system clock to the RTC on shutdown, assuming that this is
done by the clock control tool when the user modifies the time, or
automatically by the kernel if NTP is enabled.</li>

<li>The SELinux directory got moved from <tt>/selinux</tt> to
<tt>/sys/fs/selinux</tt>.</li>

<li>We added a small service <tt>systemd-logind</tt> that keeps tracks
of logged in users and their sessions. It creates control groups for
them, implements the <a
href="http://standards.freedesktop.org/basedir-spec/basedir-spec-latest.html">XDG_RUNTIME_DIR
specification</a> for them, maintains seats and device node ACLs and
implements shutdown/idle inhibiting for clients. It auto-spawns gettys
on all local VTs when the user switches to them (instead of starting
six of them unconditionally), thus reducing the resource foot print by
default. It has a D-Bus interface as well as <a
href="http://www.freedesktop.org/software/systemd/man/sd-login.html">a
simple synchronous library interface</a>. This mechanism obsoletes
ConsoleKit which is now deprecated and should no longer be used.</li>

<li>There's now full, automatic multi-seat support, and this is
enabled in GNOME 3.4. Just by pluging in new seat hardware you get a
new login screen on your seat's screen.</li>

<li>There is now an option <tt>ControlGroupModify=</tt> to allow
services to change the properties of their control groups dynamically,
and one to make control groups persistent in the tree
(<tt>ControlGroupPersistent=</tt>) so that they can be created and
maintained by external tools.</li>

<li>We now jump back into the <tt>initrd</tt> in shutdown, so that it can
detach the root file system and the storage devices backing it. This
allows (for the first time!) to reliably undo complex storage setups
on shutdown and leave them in a clean state.</li>

<li><tt>systemctl</tt> now supports <i>presets</i>, a way for distributions and
administrators to define their own policies on whether services should
be enabled or disabled by default on package installation.</li>

<li><tt>systemctl</tt> now has high-level verbs for masking/unmasking
units. There's also a new command (<tt>systemctl list-unit-files</tt>)
for determining the list of all installed unit file files and whether
they are enabled or not.</li>

<li>We now apply <tt>sysctl</tt> variables to each new network device, as it
appears. This makes <tt>/etc/sysctl.d</tt> compatible with hot-plug
network devices.</li>

<li>There's limited profiling for SELinux start-up perfomance built
into PID 1.</li>

<li>There's a new switch <a
href="http://0pointer.de/blog/projects/security.html"><tt>PrivateNetwork=</tt></a>
to turn of any network access for a specific service.</li>

<li>Service units may now include configuration for control group
parameters. A few (such as <tt>MemoryLimit=</tt>) are exposed with
high-level options, and all others are available via the generic
<tt>ControlGroupAttribute=</tt> setting.</li>

<li>There's now the option to mount certain cgroup controllers
jointly at boot. We do this now for <tt>cpu</tt> and
<tt>cpuacct</tt> by default.</li>

<li>We added <a
href="https://docs.google.com/document/pub?id=1IC9yOXj7j6cdLLxWEBAGRL6wl97tFxgjLUEHIX3MSTs">the
journal</a> and turned it on by default.</li>

<li>All service output is now written to the Journal by default,
regardless whether it is sent via syslog or simply written to
stdout/stderr. Both message streams end up in the same location and
are interleaved the way they should. All log messages even from the
kernel and from early boot end up in the journal. Now, no service
output gets unnoticed and is saved and indexed at the same
location.</li>

<li><tt>systemctl status</tt> will now show the last 10 log lines for
each service, directly from the journal.</li>

<li>We now show the progress of <tt>fsck</tt> at boot on the console,
again. We also show the much loved colorful <tt>[ OK ]</tt> status
messages at boot again, as known from most SysV implementations.</li>

<li>We merged udev into systemd.</li>

<li>We implemented and documented interfaces to <a
href="http://www.freedesktop.org/wiki/Software/systemd/ContainerInterface">container
managers</a> and <a
href="http://www.freedesktop.org/wiki/Software/systemd/InitrdInterface">initrds</a>
for passing execution data to systemd. We also implemented and
documented <a
href="http://www.freedesktop.org/wiki/Software/systemd/RootStorageDaemons">an
interface for storage daemons that are required to back the root file
system</a>.</li>

<li>There are two new options in service files to propagate reload requests between several units.</li>

<li><tt>systemd-cgls</tt> won't show kernel threads by default anymore, or show empty control groups.</li>

<li>We added a new tool <tt>systemd-cgtop</tt> that shows resource
usage of whole services in a top(1) like fasion.</li>

<li>systemd may now supervise services in watchdog style. If enabled
for a service the daemon daemon has to ping PID 1 in regular intervals
or is otherwise considered failed (which might then result in
restarting it, or even rebooting the machine, as configured). Also,
PID 1 is capable of pinging a hardware watchdog. Putting this
together, the hardware watchdogs PID 1 and PID 1 then watchdogs
specific services. This is highly useful for high-availability servers
as well as embedded machines. Since watchdog hardware is noawadays
built into all modern chipsets (including desktop chipsets), this
should hopefully help to make this a more widely used
functionality.</li>

<li>We added support for a new kernel command line option
<tt>systemd.setenv=</tt> to set an environment variable
system-wide.</li>

<li>By default services which are started by systemd will have SIGPIPE
set to ignored. The Unix SIGPIPE logic is used to reliably implement
shell pipelines and when left enabled in services is usually just a
source of bugs and problems.</li>

<li>You may now configure the rate limiting that is applied to
restarts of specific services. Previously the rate limiting parameters
were hard-coded (similar to SysV).</li>

<li>There's now support for loading the IMA integrity policy into the
kernel early in PID 1, similar to how we already did it with the
SELinux policy.</li>

<li>There's now an official API to schedule and query scheduled shutdowns.</li>

<li>We changed the license from GPL2+ to LGPL2.1+.</li>

<li>We made <a
href="http://www.freedesktop.org/software/systemd/man/systemd-detect-virt.html"><tt>systemd-detect-virt</tt></a>
an official tool in the tool set. Since we already had code to detect
certain VM and container environments we now added an official tool
for administrators to make use of in shell scripts and suchlike.</li>

<li>We documented <a
href="http://www.freedesktop.org/wiki/Software/systemd/InterfacePortabilityAndStabilityChart">numerous
interfaces</a> systemd introduced.</li>

</ol>

<p>Much of the stuff above is already available in Fedora 15 and 16,
or will be made available in the upcoming Fedora 17.</p>

<p>And that's it for now. There's a lot of other stuff in the git commits, but
most of it is smaller and I will it thus spare you.</p>

<p>I'd like to thank everybody who contributed to systemd over the past years.</p>

<p>Thanks for your interest!</p>

]]></description>
</item>

<item>
  <title>Control Groups vs. Control Groups</title>
  <link>http://0pointer.de/blog/projects/cgroups-vs-cgroups.html</link>
  <description><![CDATA[

<p><i>TL;DR: <a
href="http://www.freedesktop.org/wiki/Software/systemd/">systemd</a> does not
require the performance-sensitive bits of Linux control groups enabled in the kernel.
However, it does require some non-performance-sensitive bits of the control
group logic.</i></p>

<p>In some areas of the community there's still some confusion about Linux
control groups and their performance impact, and what precisely it is that
systemd requires of them. In the hope to clear this up a bit, I'd like to point
out a few things:</p>

<p>Control Groups are two things: <b>(A)</b> <i>a way to hierarchally group and
label processes</i>, and <b>(B)</b> <i>a way to then apply resource limits</i>
to these groups. systemd only requires the former (A), and not the latter (B).
That means you can compile your kernel without any control group resource
controllers (B) and systemd will work perfectly on it. However, if you in
addition disable the grouping feature entirely (A) then systemd will loudly
complain at boot and proceed only reluctantly with a big warning and in a
limited functionality mode.</p>

<p>At compile time, the grouping/labelling feature in the kernel is enabled by
CONFIG_CGROUPS=y, the individual controllers by CONFIG_CGROUP_FREEZER=y,
CONFIG_CGROUP_DEVICE=y, CONFIG_CGROUP_CPUACCT=y, CONFIG_CGROUP_MEM_RES_CTLR=y,
CONFIG_CGROUP_MEM_RES_CTLR_SWAP=y, CONFIG_CGROUP_MEM_RES_CTLR_KMEM=y,
CONFIG_CGROUP_PERF=y, CONFIG_CGROUP_SCHED=y, CONFIG_BLK_CGROUP=y,
CONFIG_NET_CLS_CGROUP=y, CONFIG_NETPRIO_CGROUP=y. And since (as mentioned) we
only need the former (A), not the latter (B) you may disable all of the latter
options while enabling CONFIG_CGROUPS=y, if you want to run systemd on your
system.</p>

<p>What about the performance impact of these options? Well, every bit of code
comes at some price, so none of these options come entirely for free. However,
the grouping feature (A) alters the general logic very little, it just sticks
hierarchial labels on processes, and its impact is minimal since that is
usually not in any hot path of the OS.  This is different for the various
controllers (B) which have a much bigger impact since they influence the resource
management of the OS and are full of hot paths. This means that the kernel
feature that systemd mandatorily requires (A) has a minimal effect on system
performance, but the actually performance-sensitive features of control groups
(B) are entirely optional.</p>

<p>On boot, systemd will mount all controller hierarchies it finds enabled
in the kernel to individual directories below <tt>/sys/fs/cgroup/</tt>. This is
the official place where kernel controllers are mounted to these days. The
<tt>/sys/fs/cgroup/</tt> mount point in the kernel was created precisely for
this purpose. Since the control group controllers are a shared facility that
might be used by a number of different subsystems <a
href="http://www.freedesktop.org/wiki/Software/systemd/PaxControlGroups">a few
projects have agreed on a set of rules in order to avoid that the various bits
of code step on each other's toes when using these directories</a>. </p>

<p>systemd will also maintain its own, private, controller-less, named control
group hierarchy which is mounted to <tt>/sys/fs/cgroup/systemd/</tt>.  This
hierarchy is private property of systemd, and other software should not try to
interfere with it. This hierarchy is how systemd makes use of the naming and
grouping feature of control groups (A) without actually requiring any kernel
controller enabled for that.</p>

<p>Now, you might notice that by default systemd does create per-service
cgroups in the "cpu" controller if it finds it enabled in the kernel. This is
entirely optional, however. We chose to make use of it by default to even out
CPU usage between system services. Example: On a traditional web server machine
Apache might end up having 100 CGI worker processes around, while MySQL only
has 5 processes running. Without the use of the "cpu" controller this means
that Apache all together ends up having 20x more CPU available than MySQL since
the kernel tries to provide every process with the same amount of CPU time. On
the other hand, if we add these two services to the "cpu" controller in
individual groups by default, Apache and MySQL get the same amount of CPU,
which we think is a good default.</p>

<p>Note that if the CPU controller is not enabled in the kernel systemd will not
attempt to make use of the "cpu" hierarchy as described above. Also, even if it is enabled in the kernel it
is trivial to tell systemd not to make use of it: Simply edit
<tt>/etc/systemd/system.conf</tt> and set <tt>DefaultControllers=</tt> to the
empty string.</p>

<p>Let's discuss a few frequently heard complaints regarding systemd's use of control groups:</p>

<ul>

<li><b>systemd mounts all controllers to <tt>/sys/fs/cgroup/</tt> even though
my software requires it at <tt>/dev/cgroup/</tt> (or some other place)!</b> The
standardization of <tt>/sys/fs/cgroup/</tt> as mount point of the hierarchies
is a relatively recent change in the kernel. Some software has not been updated
yet for it. If you cannot change the software in question you are welcome to
unmount the hierarchies from <tt>/sys/fs/cgroup/</tt> and mount them wherever
you need them instead. However, make sure to leave
<tt>/sys/fs/cgroup/systemd/</tt> untouched.</li>

<li><b>systemd makes use of the "cpu" hierarchy, but it should leave its dirty
fingers from it!</b> As mentioned above, just set the
<tt>DefaultControllers=</tt> option of systemd to the empty string.</li>

<li><b>I need my two controllers "foo" and "bar" mounted into one hierarchy,
but systemd mounts them in two!</b> Use the <tt>JoinControllers=</tt> setting
in <tt>/etc/systemd/system.conf</tt> to mount several controllers into a single
hierarchy.</li>

<li><b>Control groups are evil and they make everything slower!</b> Well,
please read the text above and understand the difference between
"control-groups-as-in-naming-and-grouping" (A) and "cgroups-as-in-controllers"
(B).  Then, please turn off all controllers in you kernel build (B) but leave
CONFIG_CGROUPS=y (A) enabled.</li>

<li><b>I have heard <i>some</i> kernel developers really hate control groups
and think systemd is evil because it requires them!</b> Well, there are a
couple of things behind the dislike of control groups by some folks.
Primarily, this is probably caused because the hackers in question do not
distuingish the naming-and-grouping bits of the control group logic (A) and the
controllers that are based on it (B). Mainly, their beef is with the latter
(which systemd does not require, which is the key point I am trying to make in
the text above), but there are other issues as well: for example, the code of
the grouping logic is not the most beautiful bit of code ever written by man
(which is thankfully likely to get better now, since the control groups
subsystem now has an active maintainer again). And then for some
developers it is important that they can compare the runtime behaviour of many
historic kernel versions in order to find bugs (git bisect).  Since systemd
requires kernels with basic control group support enabled, and this is a
relatively recent feature addition to the kernel, this makes it difficult for
them to use a newer distribution with all these old kernels
that predate cgroups. Anyway, the summary is probably that what matters to
developers is different from what matters to users and
administrators.</li>

</ul>

<p>I hope this explanation was useful for a reader or two! Thank you for your time!</p>

]]></description>
</item>

<item>
  <title>GUADEC 2012 CFP Ending Soon!</title>
  <link>http://0pointer.de/blog/projects/guadec-2012-cfp.html</link>
  <description><![CDATA[

<p>In case you haven't submitted your talk proposal for GUADEC 2012 in A
Coru&ntilde;a, Spain yet, hurry: the deadline is on April 14th, i.e. this
saturday! <a href="http://www.guadec.org/cfp">Read der Call for
Participation!</a> <a
href="https://www.gpul.org/indico/abstractSubmission.py?confId=0">Submit a
proposal!</a></p>

]]></description>
</item>

<item>
  <title>/tmp or not /tmp?</title>
  <link>http://0pointer.de/blog/projects/tmp.html</link>
  <description><![CDATA[

<p>A number of Linux distributions have recently switched (or started
switching) to <tt>/tmp</tt> on tmpfs by default (ArchLinux, Debian among
others). Other distributions have plans/are discussing doing the same (Ubuntu, OpenSUSE).
Since we believe this is a good idea and it's good to keep the delta between
the distributions minimal <a
href="https://fedoraproject.org/wiki/Features/tmp-on-tmpfs">we are proposing
the same for Fedora 18, too</a>. On Solaris a similar change has already been
implemented in 1994 (and other Unixes have made a similar change long ago,
too). Yet, not all of our software is written in a way that it works nicely
together with <tt>/tmp</tt> on tmpfs.</p>

<p>Another <a
href="https://fedoraproject.org/wiki/Features/ServicesPrivateTmp">Fedora
feature (for Fedora 17)</a> changed the semantics of <tt>/tmp</tt> for many
system services to make them more secure, by isolating the /tmp namespaces of the
various services. Handling of temporary files in <tt>/tmp</tt> has been
security sensitive since it has been introduced since it traditionally has been
a world writable, shared namespace and unless all user code safely uses randomized file names
it is vulnerable to DoS attacks and worse.</p>

<p>In this blog story I'd like to shed some light on proper usage of
<tt>/tmp</tt> and what your Linux application should use for what purpose. We'll not
discuss why <tt>/tmp</tt> on tmpfs is a good idea, for that refer to the <a
href="https://fedoraproject.org/wiki/Features/tmp-on-tmpfs">Fedora feature
page</a>. Here we'll just discuss what <tt>/tmp</tt> should be used for and for
what it shouldn't be, as well as what should be used instead. All that in order
to make sure your application remains compatible with these new features
introduced to many newer Linux distributions.</p>

<p><tt>/tmp</tt> is (as the name suggests) an area where temporary files
applications require during operation may be placed. Of course, temporary files
differ very much in their properties:</p>

<ul>
<li>They can be large, or very small</li>
<li>They might be used for sharing between users, or be private to users</li>
<li>They might need to be persistent across boots, or very volatile</li>
<li>They might need to be machine-local or shared on the network</li>
</ul>

<p>Traditionally, <tt>/tmp</tt> has not only been the place where actual
temporary files are stored, but some software used to place (and often still
continues to place) communication primitives such as sockets, FIFOs, shared
memory there as well. Notably X11, but many others too. Usage of world-writable
shared namespaces for communication purposes has always been problematic, since
to establish communication you need stable names, but stable names open the
doors for DoS attacks. This can be corrected partially, by establishing
protected per-app directories for certain services during early boot (like we
do for X11), but this only fixes the problem partially, since this only works
correctly if every package installation is followed by a reboot.</p>

<p>Besides <tt>/tmp</tt> there are various other places where temporary files
(or other files that traditionally have been stored in <tt>/tmp</tt>) can be
stored. Here's a quick overview of the candidates:</p>

<ul>

<li><tt>/tmp</tt>, POSIX suggests this is flushed as boot, FHS says that files
do not need to be persistent between two runs of the application. Old files are
often cleaned up automatically after a time ("aging"). Usually it is
recommended to use $TMPDIR if it is set before falling back to <tt>/tmp</tt>
directly. As mentioned, this is a tmpfs on many Linuxes/Unixes (and most likely
will be for most soon), and hence should be used only for small files. It's
generally a shared namespace, hence the only APIs for using it should be <a
href="http://linux.die.net/man/3/mkstemp"><tt>mkstemp()</tt></a>, <a
href="http://linux.die.net/man/3/mkdtemp"><tt>mkdtemp()</tt></a> (and friends)
to be entirely safe.<sup>[1]</sup> Recently, improvements have been made to
turn this shared namespace into a private namespace (see above), but that doesn't
relieve developers from writing secure code that is also safe if <tt>/tmp</tt> is a shared
namespace. Because <tt>/tmp</tt> is no longer necessarily a shared namespace it
is generally unsuitable as a location for communication primitives. It is
machine-private and local. It's usually fully featured (locking, ...). This
directory is world writable and thus available for both privileged and
unprivileged code.</li>

<li><tt>/var/tmp</tt>, according to FHS "more persistent" than <tt>/tmp</tt>,
and is less often cleaned up (it's persistent across reboots, for example). It's not on a tmpfs, but on a real disk, and
hence can be used to store much larger files. The same namespace problems apply
as with <tt>/tmp</tt>, hence also exclusively use
<tt>mkstemp()</tt>/<tt>mkdtemp()</tt> for this directory. It is also
automatically cleaned up by time. It is machine-private. It's not necessarily
fully featured (no locking, ...). This directory is world writable and thus
available for both privileged and unprivileged code. We suggest to also check
<tt>$TMPDIR</tt> before falling back to <tt>/var/tmp</tt>. That way if
<tt>$TMPDIR</tt> is set this overrides usage of both <tt>/tmp</tt> and
<tt>/var/tmp</tt>.</li>

<li><tt>/run</tt> (traditionally <tt>/var/run</tt>) where privileged daemons
can store runtime data, such as communication primitives. This is where your
daemon should place its sockets. It's guaranteed to be a shared namespace, but
is only writable by privileged code and hence very safe to use. This file
system is guaranteed to be a tmpfs and is hence automatically flushed at boots.
No automatic clean-up is done beyond that. It is machine-private and local. It
is fully-featured, and provides all functionality the local OS can provide
(locking, sockets, ...).</li>

<li><tt><a
href="http://standards.freedesktop.org/basedir-spec/basedir-spec-latest.html">$XDG_RUNTIME_DIR</a></tt>
where unprivileged user software can store runtime data, such as communication
primitives. This is similar to <tt>/run</tt> but for user applications. It's a
user private namespace, and hence very safe to use. It's cleaned up
automatically at logout and also is cleaned up by time via "aging". It is
machine-private and fully featured. In GLib applications use
<tt>g_get_user_runtime_dir()</tt> to query the path of this directory.</li>

<li><tt><a
href="http://standards.freedesktop.org/basedir-spec/basedir-spec-latest.html">$XDG_CACHE_HOME</a></tt>
where unprivileged user software can store non-essential data. It's a private
namespace of the user. It might be shared between machines. It is not
automatically cleaned up, and not fully featured (no locking, and so on, due to
NFS). In GLib applications use <tt>g_get_user_cache_dir()</tt> to query this
directory.</li>

<li><tt><a
href="http://freedesktop.org/wiki/Software/xdg-user-dirs">$XDG_DOWNLOAD_DIR</a></tt>
where unprivileged user software can store downloads and downloads in progress.
It should only be used for downloads, and is a private namespace fo the user,
but might be shared between machines. It is not automatically cleaned up and
not fully featured. In GLib applications use <tt>g_get_user_special_dir()</tt>
to query the path of this directory.</li>

</ul>

<p>Now that we have introduced the contestants, here's a rough guide how we
suggest you (a Linux application developer) pick the right directory to use:</p>

<ol>

<li>You need a place to put your socket (or other communication primitive) and your code runs privileged: use a subdirectory beneath <tt>/run</tt>. (Or beneath <tt>/var/run</tt> for extra compatibility.)</li>

<li>You need a place to put your socket (or other communication primitive) and your code runs unprivileged: use a subdirectory beneath <tt>$XDG_RUNTIME_DIR</tt>.</li>

<li>You need a place to put your larger downloads and downloads in progress and run unprivileged: use <tt>$XDG_DOWNLOAD_DIR</tt>.</li>

<li>You need a place to put cache files which should be persistent and run unprivileged: use <tt>$XDG_CACHE_HOME</tt>.</li>

<li>Nothing of the above applies and you need to place a small file that needs no persistency: use <tt>$TMPDIR</tt> with a fallback on <tt>/tmp</tt>. And use <tt>mkstemp()</tt>, and <tt>mkdtemp()</tt> and nothing homegrown.</li>

<li>Otherwise use <tt>$TMPDIR</tt> with a fallback on <tt>/var/tmp</tt>. Also use <tt>mkstemp()</tt>/<tt>mkdtemp()</tt>.</li>

</ol>

<p>Note that these rules above are only suggested by us. These rules
take into account everything we know about this topic and avoid problems with
current and future distributions, as far as we can see them. Please consider
updating your projects to follow these rules, and keep them in mind if you
write new code.</p>

<p><b>One thing we'd like to stress is that <tt>/tmp</tt> and <tt>/var/tmp</tt>
more often than not are actually not the right choice for your usecase. There
are valid uses of these directories, but quite often another directory might
actually be the better place. So, be careful, consider the other options, but
if you do go for <tt>/tmp</tt> or <tt>/var/tmp</tt> then at least make sure to
use <tt>mkstemp()</tt>/<tt>mkdtemp()</tt>.</b></p>

<p>Thank you for your interest!</p>

<p>Oh, and if you now complain that we don't understand Unix, and that we are
morons and worse, then please read this again, and you might notice that this
is just a best practice guide, not a specification we have written. Nothing that
introduces anything new, just something that explains how things are.</p>

<p>If you want to complain about the <tt>tmp-on-tmpfs</tt> or
<tt>ServicesPrivateTmp</tt> feature, then this is not the right place either,
because this blog post is not really about that. Please direct this to
<tt>fedora-devel</tt> instead. Thank you very much.</p>

<p><b><small>Footnotes</small></b></p>

<p><small>[1] Well, or to turn this around: unless you have a PhD in advanced
Unixology and are not using <tt>mkstemp()</tt>/<tt>mkdtemp()</tt> but use
<tt>/tmp</tt> nonetheless it's very likely you are writing vulnerable
code.</small></p>

]]></description>
</item>

<item>
  <title>/etc/os-release</title>
  <link>http://0pointer.de/blog/projects/os-release.html</link>
  <description><![CDATA[

<p><a
href="http://0pointer.de/blog/projects/the-new-configuration-files.html">One of
the new configuration files systemd introduced is <tt>/etc/os-release</tt></a>.
It replaces the multitude of per-distribution release files<sup>[1]</sup> with
a single one. Yesterday we <a
href="http://lists.freedesktop.org/archives/systemd-devel/2012-February/004475.html">decided
to drop</a> support for systems lacking <a
href="http://www.freedesktop.org/software/systemd/man/os-release.html"><tt>/etc/os-release</tt></a>
in systemd since recently the majority of the big distributions adopted
<tt>/etc/os-release</tt> and many small ones did, too<sup>[2]</sup>.  It's our
hope that by dropping support for non-compliant distributions we gently put
some pressure on the remaining hold-outs to adopt this scheme as well.</p>

<p>I'd like to take the opportunity to explain a bit what the new file offers,
why application developers should care, and why the distributions should adopt
it. Of course, this file is pretty much a triviality in many ways,
but I guess it's still one that deserves explanation.</p>

<p>So, you ask why this all?</p>

<ul> 

<li>It relieves application developers who just want to know the
distribution they are running on to check for a multitude of individual release files.</li>

<li>It provides both a "pretty" name (i.e. one to show to the user), and
machine parsable version/OS identifiers (i.e. for use in build systems).</li>

<li>It is extensible, can easily learn new fields if needed. For example, since
we want to print a welcome message in the color of your distribution at boot
we make it possible to configure the ANSI color for that in the file.</li>

</ul>

<p><b>FAQs</b></p>

<p><b>There's already the <tt>lsb_release</tt> tool for this, why don't you
just use that?</b> Well, it's a very strange interface: a shell script you have
to invoke (and hence spawn asynchronously from your C code), and it's not
written to be extensible. It's an optional package in many distributions, and
nothing we'd be happy to invoke as part of early boot in order to show a
welcome message. (In times with sub-second userspace boot times we really don't
want to invoke a huge shell script for a triviality like showing the welcome
message). The <tt>lsb_release</tt> tool to us appears to be an attempt of
abstracting distribution checks, where standardization of distribution checks
is needed. It's simply a badly designed interface. In our opinion, it
has its use as an interface to determine the LSB version itself, but not for
checking the distribution or version.</p>

<p><b>Why haven't you adopted one of the generic release files, such as
Fedora's <tt>/etc/system-release</tt>?</b> Well, they are much nicer than
<tt>lsb_release</tt>, so much is true. However, they are not extensible and
are not really parsable, if the distribution needs to be identified
programmatically or a specific version needs to be verified.</p>

<p><b>Why didn't you call this file <tt>/etc/bikeshed</tt> instead? The name
<tt>/etc/os-release</tt> sucks!</b> In a way, I think you kind of answered your
own question there already.</p>

<p><b>Does this mean my distribution can now drop our equivalent of
<tt>/etc/fedora-release</tt>?</b> Unlikely, too much code exists that still
checks for the individual release files, and you probably shouldn't break that.
This new file makes things easy for applications, not for distributions:
applications can now rely on a single file only, and use it in a nice way.
Distributions will have to continue to ship the old files unless they are
willing to break compatibility here.</p>

<p><b>This is so useless! My application needs to be compatible with distros
from 1998, so how could I ever make use of the new file? I will have to
continue using the old ones!</b> True, if you need compatibility with really
old distributions you do. But for new code this might not be an issue, and in
general new APIs are new APIs. So if you decide to depend on it, you add a
dependency on it. However, even if you need to stay compatible it might make
sense to check <tt>/etc/os-release</tt> first and just fall back to the old
files if it doesn't exist. The least it does for you is that you don't need 25+
<tt>open()</tt> attempts on modern distributions, but just one.</p>

<p><b>You evil people are forcing my beloved distro $XYZ to adopt your awful
systemd schemes. I hate you!</b> You hate too much, my friend. Also, I am
pretty sure it's not difficult to see the benefit of this new file
independently of systemd, and it's truly useful on systems without systemd,
too.</p>

<p><b>I hate what you people do, can I just ignore this?</b> Well, you really
need to work on your constant feelings of hate, my friend. But, to a certain
degree yes, you can ignore this for a while longer. But already, there are a
number of applications making use of this file.  You lose compatibility with
those. Also, you are kinda working towards the further balkanization of the
Linux landscape, but maybe that's your intention?</p>

<p><b>You guys add a new file because you think there are already too many? You
guys are so confused!</b> None of the existing files is generic and extensible
enough to do what we want it to do. Hence we had to introduce a new one. We
acknowledge the irony, however.</p>

<p><b>The file is extensible? Awesome! I want a new field XYZ= in it!</b> Sure,
it's extensible, and we are happy if distributions extend it. Please prefix
your keys with your distribution's name however. Or even better: talk to us and
we might be able update the documentation and make your field standard, if you
convince us that it makes sense.</p>

<p>Anyway, to summarize all this: if you work on an application that needs to
identify the OS it is being built on or is being run on, please consider making
use of this new file, we created it for you. If you work on a distribution, and
your distribution doesn't support this file yet, please consider adopting this
file, too.</p>

<p>If you are working on a small/embedded distribution, or a legacy-free
distribution we encourage you to adopt only this file and not establish any
other per-distro release file.</p>

<p><a href="http://www.freedesktop.org/software/systemd/man/os-release.html">Read the documentation for <tt>/etc/os-release</tt>.</a></p>

<p><small><b>Footnotes</b></small></p>

<p><small>[1] Yes, multitude, there's at least: <tt>/etc/redhat-release</tt>,
<tt>/etc/SuSE-release</tt>, <tt>/etc/debian_version</tt>,
<tt>/etc/arch-release</tt>, <tt>/etc/gentoo-release</tt>,
<tt>/etc/slackware-version</tt>, <tt>/etc/frugalware-release</tt>,
<tt>/etc/altlinux-release</tt>, <tt>/etc/mandriva-release</tt>,
<tt>/etc/meego-release</tt>, <tt>/etc/angstrom-version</tt>,
<tt>/etc/mageia-release</tt>. And some distributions even have multiple, for
example Fedora has already four different files.</small></p>

<p><small>[2] To our knowledge at least OpenSUSE, Fedora, ArchLinux, Angstrom,
Frugalware have adopted this. (This list is not comprehensive, there are
probably more.)</small></p>

]]></description>
</item>

<item>
  <title>The Case for the /usr Merge</title>
  <link>http://0pointer.de/blog/projects/the-usr-merge.html</link>
  <description><![CDATA[

<p>One of the features of Fedora 17 is <a
href="https://fedoraproject.org/wiki/Features/UsrMove">the /usr merge</a>, put
forward by Harald Hoyer and Kay Sievers<sup>[1]</sup>. In the time since this
feature has been proposed repetitive discussions took place all over the various
Free Software communities, and usually the same questions were asked: what the reasons
behind this feature were, and whether it makes sense to adopt the same scheme for
distribution XYZ, too.</p>

<p>Especially in the Non-Fedora world it appears to be socially unacceptable to
actually have a look at the <a
href="https://fedoraproject.org/wiki/Features/UsrMove">Fedora feature page</a>
(where many of the questions are already brought up and answered) which is very unfortunate. To
improve the situation I spent some time today to summarize the reasons for the
/usr merge independently. I'd hence like to direct you to this new page I put
up which tries to summarize the reasons for this, with an emphasis on the
compatibility point of view:</p>

<p><a href="http://www.freedesktop.org/wiki/Software/systemd/TheCaseForTheUsrMerge">The Case for the /usr Merge</a></p>

<p>Note that even though this page is in the systemd wiki, what it covers is
mostly orthogonal to systemd. systemd supports both systems with a merged /usr
and with a split /usr, and the /usr merge should be interesting for non-systemd
distributions as well.</p>

<p>Primarily I put this together to have a nice place to point all those folks
who continue to write me annoyed emails, even though I am actually not even
working on all of this...</p>

<p>Enjoy the read!</p>

<p><b><small>Footnotes:</small></b></p>

<p><small>[1] And not actually by me, I am just a supportive spectator and am
not doing any work on it. Unfortunately some tech press folks created the false
impression I was behind this. But credit where credit is due, this is all
Harald's and Kay's work.</small></p>

]]></description>
</item>

<item>
  <title>Plumbers Wishlist, The Third Edition, a.k.a. &quot;The Thank You Edition&quot;</title>
  <link>http://0pointer.de/blog/projects/plumbers-wishlist-3.html</link>
  <description><![CDATA[

<p>Last October <a
href="http://0pointer.de/blog/projects/plumbers-wishlist-2.html">we published a
wishlist for plumbing related features</a> we'd like to see added to the Linux
kernel. Three months later it's time to publish a short update, and explain
what has been implemented in the kernel, what people have started working on,
and what's still missing.</p>

<p>The full, updated list is <a
href="https://docs.google.com/document/pub?id=1RmJrtIoTnivkmR9KCqfJNBnEll4X9Jtu0xj5w6hFGs8">available
on Google Docs</a>.</p>

<p>In general, I must say that the list turned out to be a great success. It
shows how awesome the Open Source community is: Just ask nicely and there's a
good chance they'll fulfill your wishes! Thank you very much, Linux
community!</p>

<p>We'd like to thank everybody who worked on any of the features on that list:
Lucas De Marchi, Andi Kleen, Dan Ballard, Li Zefan, Kirill A. Shutemov,
Davidlohr Bueso, Cong Wang, Lennart Poettering, Kay Sievers.</p>

<p>Of the items on the list 5 have been fully implemented and are already part
of a released kernel, or already merged for inclusion for the next kernels
being released.</p>

<p>For 4 further items patches have been posted, and I am hoping they'll get
merged eventually. Davidlohr, Wang, Zefan, Kirill, it would be great if you'd
continue working on your patches, as we think they are following the right
approach<sup>[1]</sup> even if there was some opposition to them on LKML. So,
please keep pushing to solve the outstanding issues and thanks for your work so far!</p>

<p><b><small>Footnotes</small></b></p>

<p><small>[1] Yes, I still believe that tmpfs quota should be implemented via
resource limits, as everything else wouldn't work, as we don't want to
implement complex and fragile userspace infrastructure to racily upload complex
quota data for all current and future UIDs ever used on the system into each
tmpfs mount point at mount time.</small></p>


]]></description>
</item>

<item>
  <title>systemd for Administrators, Part XII</title>
  <link>http://0pointer.de/blog/projects/security.html</link>
  <description><![CDATA[

<p>Here's <a href="http://0pointer.de/blog/projects/inetd.html">the</a> <a
href="http://0pointer.de/blog/projects/instances.html">twelfth</a> <a
href="http://0pointer.de/blog/projects/on-etc-sysinit.html">installment</a>
<a
href="http://0pointer.de/blog/projects/the-new-configuration-files.html">of</a>

<a href="http://0pointer.de/blog/projects/blame-game.html">my</a> <a
href="http://0pointer.de/blog/projects/changing-roots">ongoing</a> <a
href="http://0pointer.de/blog/projects/three-levels-of-off.html">series</a>
<a
href="http://0pointer.de/blog/projects/systemd-for-admins-4.html">on</a>
<a
href="http://0pointer.de/blog/projects/systemd-for-admins-3.html">systemd</a>
<a
href="http://0pointer.de/blog/projects/systemd-for-admins-2.html">for</a>
<a
href="http://0pointer.de/blog/projects/systemd-for-admins-1.html">Administrators</a>:</p>

<h4>Securing Your Services</h4>

<p>One of the core features of Unix systems is the idea of privilege separation
between the different components of the OS. Many system services run under
their own user IDs thus limiting what they can do, and hence the impact they
may have on the OS in case they get exploited.</p>

<p>This kind of privilege separation only provides very basic protection
however, since in general system services run this way can still do at least as
much as a normal local users, though not as much as root. For security purposes
it is however very interesting to limit even further what services can do, and
shut them off a couple of things that normal users are allowed to do.</p>

<p>A great way to limit the impact of services is by employing MAC technologies
such as SELinux. If you are interested to secure down your server, running
SELinux is a very good idea. systemd enables developers and administrators to
apply additional restrictions to local services independently of a MAC. Thus,
regardless whether you are able to make use of SELinux you may still enforce
certain security limits on your services.</p>

<p>In this iteration of the series we want to focus on a couple of these
security features of systemd and how to make use of them in your services.
These features take advantage of a couple of Linux-specific technologies that have
been available in the kernel for a long time, but never have been exposed in a
widely usable fashion. These systemd features have been designed to be as easy to use
as possible, in order to make them attractive to administrators and upstream
developers:</p>

<ul>
<li>Isolating services from the network</li>
<li>Service-private <tt>/tmp</tt></li>
<li>Making directories appear read-only or inaccessible to services</li>
<li>Taking away capabilities from services</li>
<li>Disallowing forking, limiting file creation for services</li>
<li>Controlling device node access of services</li>
</ul>

<p>All options described here are documented in systemd's man pages, notably <a
href="http://0pointer.de/public/systemd-man/systemd.exec.html">systemd.exec(5)</a>.
Please consult these man pages for further details.</p>

<p>All these options are available on all systemd systems, regardless if
SELinux or any other MAC is enabled, or not.</p>

<p>All these options are relatively cheap, so if in doubt use them. Even if you
might think that your service doesn't write to <tt>/tmp</tt> and hence enabling
<tt>PrivateTmp=yes</tt> (as described below) might not be necessary, due to
today's complex software it's still beneficial to enable this feature, simply
because libraries you link to (and plug-ins to those libraries) which you do
not control might need temporary files after all. Example: you never know what
kind of NSS module your local installation has enabled, and what that NSS module
does with <tt>/tmp</tt>.</p>

<p>These options are hopefully interesting both for administrators to secure
their local systems, and for upstream developers to ship their services secure
by default.  We strongly encourage upstream developers to consider using these
options by default in their upstream service units. They are very easy to make
use of and have major benefits for security.</p>

<h4>Isolating Services from the Network</h4>

<p>A very simple but powerful configuration option you may use in systemd
service definitions is <tt>PrivateNetwork=</tt>:</p>

<pre>...
[Service]
ExecStart=...
PrivateNetwork=yes
...</pre>

<p>With this simple switch a service and all the processes it consists of are
entirely disconnected from any kind of networking. Network interfaces became
unavailable to the processes, the only one they'll see is the loopback device
"lo", but it is isolated from the real host loopback. This is a very powerful
protection from network attacks.</p>

<p><b>Caveat:</b> Some services require the network to be operational. Of
course, nobody would consider using <tt>PrivateNetwork=yes</tt> on a
network-facing service such as Apache. However even for non-network-facing
services network support might be necessary and not always obvious. Example: if
the local system is configured for an LDAP-based user database doing glibc name
lookups with calls such as <tt>getpwnam()</tt> might end up resulting in network access.
That said, even in those cases it is more often than not OK to use
<tt>PrivateNetwork=yes</tt> since user IDs of system service users are required to
be resolvable even without any network around. That means as long as the only
user IDs your service needs to resolve are below the magic 1000 boundary using
<tt>PrivateNetwork=yes</tt> should be OK.</p>

<p>Internally, this feature makes use of network namespaces of the kernel. If
enabled a new network namespace is opened and only the loopback device
configured in it.</p>

<h4>Service-Private /tmp</h4>

<p>Another very simple but powerful configuration switch is
<tt>PrivateTmp=</tt>:</p>

<pre>...
[Service]
ExecStart=...
PrivateTmp=yes
...</pre>

<p>If enabled this option will ensure that the <tt>/tmp</tt> directory the
service will see is private and isolated from the host system's <tt>/tmp</tt>.
<tt>/tmp</tt> traditionally has been a shared space for all local services and
users. Over the years it has been a major source of security problems for a
multitude of services. Symlink attacks and DoS vulnerabilities due to guessable
<tt>/tmp</tt> temporary files are common. By isolating the service's
<tt>/tmp</tt> from the rest of the host, such vulnerabilities become moot.</p>

<p>For Fedora 17 a <a
href="https://fedoraproject.org/wiki/Features/ServicesPrivateTmp">feature has
been accepted</a> in order to enable this option across a large number of
services.</p>

<p><b>Caveat:</b> Some services actually misuse <tt>/tmp</tt> as a location
for IPC sockets and other communication primitives, even though this is almost
always a vulnerability (simply because if you use it for communication you need
guessable names, and guessable names make your code vulnerable to DoS and symlink
attacks) and <tt>/run</tt> is the much safer replacement for this, simply
because it is not a location writable to unprivileged processes. For example,
X11 places it's communication sockets below <tt>/tmp</tt> (which is actually
secure -- though still not ideal -- in this exception since it does so in a
safe subdirectory which is created at early boot.) Services which need to
communicate via such communication primitives in <tt>/tmp</tt> are no
candidates for <tt>PrivateTmp=</tt>. Thankfully these days only very few
services misusing <tt>/tmp</tt> like this remain.</p>

<p>Internally, this feature makes use of file system namespaces of the kernel.
If enabled a new file system namespace is opened inheritng most of the host
hierarchy with the exception of <tt>/tmp</tt>.</p>

<h4>Making Directories Appear Read-Only or Inaccessible to Services</h4>

<p>With the <tt>ReadOnlyDirectories=</tt> and <tt>InaccessibleDirectories=</tt>
options it is possible to make the specified directories inaccessible for
writing resp. both reading and writing to the service:</p>

<pre>...
[Service]
ExecStart=...
InaccessibleDirectories=/home
ReadOnlyDirectories=/var
...
</pre>

<p>With these two configuration lines the whole tree below <tt>/home</tt>
becomes inaccessible to the service (i.e. the directory will appear empty and
with 000 access mode), and the tree below <tt>/var</tt> becomes read-only.</p>

<p><b>Caveat:</b> Note that <tt>ReadOnlyDirectories=</tt> currently is not
recursively applied to submounts of the specified directories (i.e. mounts below
<tt>/var</tt> in the example above stay writable). This is likely to get fixed
soon.</p>

<p>Internally, this is also implemented based on file system namspaces.</p>

<h4>Taking Away Capabilities From Services</h4>

<p>Another very powerful security option in systemd is
<tt>CapabilityBoundingSet=</tt> which allows to limit in a relatively fine
grained fashion which kernel capabilities a service started retains:</p>

<pre>...
[Service]
ExecStart=...
CapabilityBoundingSet=CAP_CHOWN CAP_KILL
...
</pre>

<p>In the example above only the CAP_CHOWN and CAP_KILL capabilities are
retained by the service, and the service and any processes it might create have
no chance to ever acquire any other capabilities again, not even via setuid
binaries. The list of currently defined capabilities is available in <a
href="http://linux.die.net/man/7/capabilities">capabilities(7)</a>.
Unfortunately some of the defined capabilities are overly generic (such as
CAP_SYS_ADMIN), however they are still a very useful tool, in particular for
services that otherwise run with full root privileges.</p>

<p>To identify precisely which capabilities are necessary for a service to run
cleanly is not always easy and requires a bit of testing. To simplify this
process a bit, it is possible to blacklist certain capabilities that are
definitely not needed instead of whitelisting all that might be needed. Example: the
CAP_SYS_PTRACE is a particularly powerful and security relevant capability
needed for the implementation of debuggers, since it allows introspecting and
manipulating any local process on the system. A service like Apache obviously
has no business in being a debugger for other processes, hence it is safe to
remove the capability from it:</p>

<pre>...
[Service]
ExecStart=...
CapabilityBoundingSet=~CAP_SYS_PTRACE
...</pre>

<p>The <tt>~</tt> character the value assignment here is prefixed with inverts
the meaning of the option: instead of listing all capabalities the service
will retain you may list the ones it will not retain.</p>

<p><b>Caveat:</b> Some services might react confused if certain capabilities are
made unavailable to them. Thus when determining the right set of capabilities
to keep around you need to do this carefully, and it might be a good idea to talk
to the upstream maintainers since they should know best which operations a
service might need to run successfully.</p>

<p><b>Caveat 2:</b> <a
href="https://forums.grsecurity.net/viewtopic.php?f=7&amp;t=2522">Capabilities are
not a magic wand.</a> You probably want to combine them and use them in
conjunction with other security options in order to make them truly useful.</p>

<p>To easily check which processes on your system retain which capabilities use
the <tt>pscap</tt> tool from the <tt>libcap-ng-utils</tt> package.</p>

<p>Making use of systemd's <tt>CapabilityBoundingSet=</tt> option is often a
simple, discoverable and cheap replacement for patching all system daemons
individually to control the capability bounding set on their own.</p>

<h4>Disallowing Forking, Limiting File Creation for Services</h4>

<p>Resource Limits may be used to apply certain security limits on services
being run. Primarily, resource limits are useful for resource control (as the
name suggests...) not so much access control. However, two of them can be
useful to disable certain OS features: RLIMIT_NPROC and RLIMIT_FSIZE may be
used to disable forking and disable writing of any files with a size >
0:</p>

<pre>...
[Service]
ExecStart=...
LimitNPROC=1
LimitFSIZE=0
...</pre>

<p>Note that this will work only if the service in question drops privileges
and runs under a (non-root) user ID of its own or drops the CAP_SYS_RESOURCE
capability, for example via <tt>CapabilityBoundingSet=</tt> as discussed above.
Without that a process could simply increase the resource limit again thus
voiding any effect.</p>

<p><b>Caveat:</b> <tt>LimitFSIZE=</tt> is pretty brutal. If the service
attempts to write a file with a size > 0, it will immeidately be killed with
the SIGXFSZ which unless caught terminates the process. Also, creating files
with size 0 is still allowed, even if this option is used.</p>

<p>For more information on these and other resource limits, see <a
href="http://linux.die.net/man/2/setrlimit">setrlimit(2)</a>.</p>

<h4>Controlling Device Node Access of Services</h4>

<p>Devices nodes are an important interface to the kernel and its drivers.
Since drivers tend to get much less testing and security checking than the core
kernel they often are a major entry point for security hacks. systemd allows
you to control access to devices individually for each service:</p>

<pre>...
[Service]
ExecStart=...
DeviceAllow=/dev/null rw
...</pre>

<p>This will limit access to <tt>/dev/null</tt> and only this device node,
disallowing access to any other device nodes.</p>

<p>The feature is implemented on top of the <tt>devices</tt> cgroup controller.</p>

<h4>Other Options</h4>

<p>Besides the easy to use options above there are a number of other security
relevant options available. However they usually require a bit of preparation
in the service itself and hence are probably primarily useful for upstream
developers. These options are <tt>RootDirectory=</tt> (to set up
<tt>chroot()</tt> environments for a service) as well as <tt>User=</tt> and
<tt>Group=</tt> to drop privileges to the specified user and group. These
options are particularly useful to greatly simplify writing daemons, where all
the complexities of securely dropping privileges can be left to systemd, and
kept out of the daemons themselves.</p>

<p>If you are wondering why these options are not enabled by default: some of
them simply break seamntics of traditional Unix, and to maintain compatibility
we cannot enable them by default. e.g. since traditional Unix enforced that
<tt>/tmp</tt> was a shared namespace, and processes could use it for IPC we
cannot just go and turn that off globally, just because <tt>/tmp</tt>'s role in
IPC is now replaced by <tt>/run</tt>.</p>

<p>And that's it for now. If you are working on unit files for upstream or in
your distribution, please consider using one or more of the options listed
above. If you service is secure by default by taking advantage of these options
this will help not only your users but also make the Internet a safer
place.</p>

]]></description>
</item>

<item>
  <title>PulseAudio vs. AudioFlinger</title>
  <link>http://0pointer.de/blog/projects/aruns-numbers.html</link>
  <description><![CDATA[

<p><a
href="http://arunraghavan.net/2012/01/pulseaudio-vs-audioflinger-fight/">Arun
put an awesome article up</a>, detailing how PulseAudio compares to Android's
AudioFlinger in terms of power consumption and suchlike. Suffice to say,
PulseAudio rocks, but go and read the whole thing, it's worth it.</p>

<p>Apparently, AudioFlinger is a great choice if you want to shorten your
battery life.</p>

]]></description>
</item>

<item>
  <title>Introducing the Journal</title>
  <link>http://0pointer.de/blog/projects/the-journal.html</link>
  <description><![CDATA[

<p>In the past weeks we have been working on a major new addition to systemd
that will hopefully positively change the Linux ecosystem in a number of ways.
But see for yourself, check out the full explanation on what we have
implemented on the <a
href="https://docs.google.com/document/pub?id=1IC9yOXj7j6cdLLxWEBAGRL6wl97tFxgjLUEHIX3MSTs">design
document we put up on Google Docs</a>.</p>

]]></description>
</item>

<item>
  <title>Kernel Hackers Panel</title>
  <link>http://0pointer.de/blog/projects/linuxcon-kernel-panel.html</link>
  <description><![CDATA[

<p>At LinuxCon Europe/ELCE I had the chance to moderate the <a href="https://events.linuxfoundation.org/events/linuxcon-europe/kernel-panel">kernel hackers
panel with Linus Torvalds, Alan Cox, Paul McKenney and Thomas Gleixner on
stage</a>. I like to believe it went quite well, but check it out for yourself, as
a video recording is now available online:</p>

<video width="800" height="450" controls="1">
  <source src="http://free-electrons.com/pub/video/2011/elce/elce-2011-torvalds-cox-gleixner-mackenney-kernel-developer-panel-450p.webm"/>
</video>

<p>For me personally I think the most notable topic covered was Control Groups,
and the clarification that they are something that is needed even though their
implementation right now is in many ways less than perfect. But in the end there is no
reasonable way around it, and much like SMP, technology that complicates things
substantially but is ultimately unavoidable.</p>

<p><a href="http://free-electrons.com/blog/elce-2011-videos/">Other videos from ELCE are online now, too.</a></p>

]]></description>
</item>

<item>
  <title>libabc</title>
  <link>http://0pointer.de/blog/projects/libabc.html</link>
  <description><![CDATA[

<p>At the Kernel Summit in Prague last week Kay Sievers and I lead a session on
developing shared userspace libraries, for kernel hackers. More and more
userspace interfaces of the kernel (for example many which deal with storage,
audio, resource management, security, file systems or a number of other
subsystems) nowadays rely on a dedicated userspace component. As people who
work primarily in the plumbing layer of the Linux OS we noticed over and over
again that these libraries written by people who usually are at home on the
kernel side of things make the same mistakes repeatedly, thus making life for
the users of the libraries unnecessarily difficult. In our session we tried to
point out a number of these things, and in particular places where the usual
kernel hacking style translates badly into userspace shared library hacking.
Our hope is that maybe a few kernel developers have a look at our list of
recommendations and consider the points we are raising.</p>

<p>To make things easy we have put together an example skeleton library we
dubbed <tt>libabc</tt>, whose <a
href="https://git.kernel.org/?p=linux/kernel/git/kay/libabc.git;a=blob_plain;f=README">README</a>
file includes all our points in terse form. It's available on kernel.org:</p>

<p><a href="https://git.kernel.org/?p=linux/kernel/git/kay/libabc.git">The git repository</a> and the <a href="https://git.kernel.org/?p=linux/kernel/git/kay/libabc.git;a=blob_plain;f=README">README</a>.</p>

<p>This list of recommendations draws inspiration from David Zeuthen's and
Ulrich Drepper's well known papers on the topic of writing shared libraries. In
the README linked above we try to distill this wealth of information into a
terse list of recommendations, with a couple of additions and with a strict
focus on a kernel hacker background.</p>

<p>Please have a look, and even if you are not a kernel hacker there might be
something useful to know in it, especially if you work on the lower layers of
our stack.</p>

<p>If you have any questions or additions, just ping us, or comment below!</p>

]]></description>
</item>

<item>
  <title>Prague</title>
  <link>http://0pointer.de/blog/projects/linuxcon-europe.html</link>
  <description><![CDATA[

<p>If you make it to Prague the coming week for the LinuxCon/ELCE/GStreamer/Kernel Summit/... superconference, make sure not to miss:</p>

<ul>

<li>The Linux Audio BoF with numerous Linux audio hackers, 5pm, on Sunday (23rd, i.e. today).</li>

<li><a
href="http://gstreamer.freedesktop.org/conference/speakers.html#raghavan">Latest
developments in PulseAudio</a> by Arun Raghavan. 4pm, on Tuesday, GStreamer
Summit</li>

<li><a
href="https://events.linuxfoundation.org/events/linuxcon-europe/kernel-panel">Linux
Kernel Developer Panel</a>, a shared session of LinuxCon and ELCE. Panelists
are Linus Torvalds, Alan Cox, Thomas Gleixner and Paul McKenney. Moderated by
yours truly. 9:30am, on Wednesday</li>

<li><a
href="https://events.linuxfoundation.org/events/linuxcon-europe/poettering-sievers">systemd
Administration in the Enterprise</a> by Kay Sievers and yours truly. 4:15pm, on
Wednesday, LinuxCon</li>

<li><a
href="https://events.linuxfoundation.org/events/embedded-linux-conference-europe/kooi">Integrating
systemd: Booting Userspace in Less Than 1 Second</a> by Koen Kooi. 11:15am, on
Friday, ELCE</li>

</ul>

<p>All of that at the Clarion Hotel. See you in Prague!</p>

]]></description>
</item>

<item>
  <title>Plumbers Wishlist, The Second Edition</title>
  <link>http://0pointer.de/blog/projects/plumbers-wishlist-2.html</link>
  <description><![CDATA[

<p>Two weeks ago we published a <a
href="http://0pointer.de/blog/projects/plumbers-wishlist.html">Plumber's
Wishlist for Linux</a>. So far, this has already created lively discussions in
the community (as reported on LWN among others), and patches for a few of the
items listed have already been posted (thanks a lot to those who worked on
this, your contributions are much appreciated!).</p>

<p><a
href="https://docs.google.com/document/pub?id=1RmJrtIoTnivkmR9KCqfJNBnEll4X9Jtu0xj5w6hFGs8">We
have now prepared a second version of the wish list.</a> It includes a number
of additions (tmpfs quota! hostname change notifications! and more!) and
updates to the previous items, including links to patches, and references to
other interesting material.</p>

<p>We hope to update this wishlist from time, so stay tuned!</p>

<p><a href="https://docs.google.com/document/pub?id=1RmJrtIoTnivkmR9KCqfJNBnEll4X9Jtu0xj5w6hFGs8">And now, go and read the new wishlist!</a></p>

]]></description>
</item>

<item>
  <title>Google doesn&apos;t like my name</title>
  <link>http://0pointer.de/blog/projects/google-doesnt-like-my-name.html</link>
  <description><![CDATA[

<p>Nice one, Google suspended my Google+ account because I created it under,
well, my name, which is "Lennart Poettering", and Google+ thinks that wasn't my
name, even though it says so in my passport, and almost every document I own
and I was never aware I had any other name. This is ricidulous. Google, give me
my name back! This is a really uncool move.</p>

]]></description>
</item>

<item>
  <title>Your Questions for the Kernel Developer Panel at LinuxCon in Prague</title>
  <link>http://0pointer.de/blog/projects/kernel-hacker-panel.html</link>
  <description><![CDATA[

<p><a href="https://plus.google.com/115547683951727699051/posts/SuTUvbcJ6p9">I
am currently collecting</a> questions for the <a
href="https://events.linuxfoundation.org/events/linuxcon-europe/kernel-panel">kernel
developer panel at LinuxCon in Prague</a>. If there's something you'd like the
panelists to respond to, please post it on <a
href="https://plus.google.com/115547683951727699051/posts/SuTUvbcJ6p9">the
thread</a>, and I'll see what I can do. Thank you!</p>

]]></description>
</item>

<item>
  <title>A Big Loss</title>
  <link>http://0pointer.de/blog/projects/a-big-loss.html</link>
  <description><![CDATA[

<p><a
href="http://googleblog.blogspot.com/2011/10/fall-sweep.html">Google
announced today that they'll be shutting down Google Code Search in
January</a>. I am quite sure that this would be a massive loss for the Free
Software community.  The ability to learn from other people's code is a key
idea of Free Software.  There's simply no better way to do that than with a
source code search engine.  The day Google Code Search will be shut down will
be a sad day for the Free Software community.</p>

<p>Of course, there are a couple of alternatives around, but they all have one
thing in common: they, uh, don't even remotely compare to the completeness,
performance and simplicity of the Google Code Search interface, and have
serious usability issues. (For example: koders.com is really really slow, and
splits up identifiers you search for at underscores, which kinda makes it
useless for looking for almost any kind of code.)</p>

<p>I think it must be of genuine interest to the Free Software community to
have a capable replacement for Google Code Search, for the day it is turned
off. In fact, it probably should be something the various foundations which
promote Free Software should be looking into, like the FSF or the Linux
Foundation. There are very few better ways to get Free Software into the heads
and minds of engineers than by examples -- examples consisting of real life
code they can find with a source code search engine. I believe a source code
search engine is probably among the best vehicles to promote Free Software
towards engineers. In particular if it itself was Free Software (in contrast to
Google Code Search).</p>

<p>Ideally, all software available on web sites like SourceForge, Freshmeat, or
github should be indexed. But there's also a chance for distributions here:
indexing the sources of all packages a distribution like Debian or Fedora
include would be a great tool for developers. In fact, a distribution offering
this functionality might benefit from such functionality, as it attracts
developer interest in the distribution.</p>

<p>It's sad that Google Code Search will be gone soon. But maybe there's
something positive in the bad news here, and a chance to create something better,
more comprehensive, that is free, and promotes our ideals better than Google
ever could. Maybe there's a chance here for the Open Source foundations, for
the distributions and for the communities to create a better replacement!</p>

]]></description>
</item>

<item>
  <title>Dresden, California, Poznan</title>
  <link>http://0pointer.de/blog/photos/california.html</link>
  <description><![CDATA[

<p><a href="http://0pointer.de/static/dresden.html"><img style="border: 10px solid #232729; background-color: #6b6c6; padding: 1px; -moz-border-radius: 7px; margin: 0.5cm" src="http://0pointer.de/static/dresden-small.jpeg" width="1024" height="291" alt="Hofkirche, Dresden, Saxony, Germany"/></a></p>

<p><i>Hofkirche, Dresden, Saxony, Germany</i></p>

<p><a href="http://0pointer.de/static/bastei.html"><img style="border: 10px solid #232729; background-color: #6b6c6; padding: 1px; -moz-border-radius: 7px; margin: 0.5cm" src="http://0pointer.de/static/bastei-small.jpeg" width="1024" height="260" alt="Bastei, Saxon Switzerland, Saxony, Germany"/></a></p>

<p><i>Bastei, Saxon Switzerland, Saxony, Germany</i></p>

<p><a href="http://0pointer.de/static/dresden2.html"><img style="border: 10px solid #232729; background-color: #6b6c6; padding: 1px; -moz-border-radius: 7px; margin: 0.5cm" src="http://0pointer.de/static/dresden2-small.jpeg" width="1024" height="370" alt="F&uuml;rstenzug, Dresden, Saxony, Germany"/></a></p>

<p><i>F&uuml;rstenzug, Dresden, Saxony, Germany</i></p>

<p><a href="http://0pointer.de/static/california.html"><img style="border: 10px solid #232729; background-color: #6b6c6; padding: 1px; -moz-border-radius: 7px; margin: 0.5cm" src="http://0pointer.de/static/california-small.jpeg" width="1024" height="120" alt="Near California State Route 46, California, USA"/></a></p>

<p><i>Near California State Route 46, California, USA</i></p>

<p><a href="http://0pointer.de/static/california2.html"><img style="border: 10px solid #232729; background-color: #6b6c6; padding: 1px; -moz-border-radius: 7px; margin: 0.5cm" src="http://0pointer.de/static/california2-small.jpeg" width="1024" height="122" alt="Near Generals Highway, California, USA"/></a></p>

<p><i>Near Generals Highway, California, USA</i></p>

<p><a href="http://0pointer.de/static/california3.html"><img style="border: 10px solid #232729; background-color: #6b6c6; padding: 1px; -moz-border-radius: 7px; margin: 0.5cm" src="http://0pointer.de/static/california3-small.jpeg" width="1024" height="230" alt="Near Generals Highway, California, USA"/></a></p>

<p><i>Near Generals Highway, California, USA</i>, a bit further down the road.</p>

<p><a href="http://0pointer.de/static/poznan.html"><img style="border: 10px solid #232729; background-color: #6b6c6; padding: 1px; -moz-border-radius: 7px; margin: 0.5cm" src="http://0pointer.de/static/poznan-small.jpeg" width="1024" height="183" alt="Parish Church in Poznan, Poland"/></a></p>

<p><i>Parish Church in Poznan, Poland</i></p>



]]></description>
</item>

<item>
  <title>A Plumber&apos;s Wish List for Linux</title>
  <link>http://0pointer.de/blog/projects/plumbers-wishlist.html</link>
  <description><![CDATA[

<p>Here's a <a href="http://thread.gmane.org/gmane.linux.kernel/1200272">mail
we just sent to LKML</a>, for your consideration. Enjoy:</p>

<pre><b>Subject: A Plumber’s Wish List for Linux</b>

We’d like to share our current wish list of plumbing layer features we
are hoping to see implemented in the near future in the Linux kernel and
associated tools. Some items we can implement on our own, others are not
our area of expertise, and we will need help getting them implemented.

Acknowledging that this wish list of ours only gets longer and not
shorter, even though we have implemented a number of other features on
our own in the previous years, we are posting this list here, in the
hope to find some help.

If you happen to be interested in working on something from this list or
able to help out, we’d be delighted. Please ping us in case you need
clarifications or more information on specific items.

Thanks,
Kay, Lennart, Harald, in the name of all the other plumbers


An here’s the wish list, in no particular order:

* (ioctl based?) interface to query and modify the label of a mounted
FAT volume:
A FAT labels is implemented as a hidden directory entry in the file
system which need to be renamed when changing the file system label,
this is impossible to do from userspace without unmounting. Hence we’d
like to see a kernel interface that is available on the mounted file
system mount point itself. Of course, bonus points if this new interface
can be implemented for other file systems as well, and also covers fs
UUIDs in addition to labels.

* CPU modaliases in /sys/devices/system/cpu/cpuX/modalias:
useful to allow module auto-loading of e.g. cpufreq drivers and KVM
modules. Andy Kleen has a patch to create the alias file itself. CPU
‘struct sysdev’ needs to be converted to ‘struct device’ and a ‘struct
bus_type cpu’ needs to be introduced to allow proper CPU coldplug event
replay at bootup. This is one of the last remaining places where
automatic hardware-triggered module auto-loading is not available. And
we’d like to see that fix to make numerous ugly userspace work-arounds
to achieve the same go away.

* expose CAP_LAST_CAP somehow in the running kernel at runtime:
Userspace needs to know the highest valid capability of the running
kernel, which right now cannot reliably be retrieved from header files
only. The fact that this value cannot be detected properly right now
creates various problems for libraries compiled on newer header files
which are run on older kernels. They assume capabilities are available
which actually aren’t. Specifically, libcap-ng claims that all running
processes retain the higher capabilities in this case due to the
“inverted” semantics of CapBnd in /proc/$PID/status.

* export ‘struct device_type fb/fbcon’ of ‘struct class graphics’
Userspace wants to easily distinguish ‘fb’ and ‘fbcon’ from each other
without the need to match on the device name.

* allow changing argv[] of a process without mucking with environ[]:
Something like setproctitle() or a prctl() would be ideal. Of course it
is questionable if services like sendmail make use of this, but otoh for
services which fork but do not immediately exec() another binary being
able to rename this child processes in ps is of importance.

* module-init-tools: provide a proper libmodprobe.so from
module-init-tools:
Early boot tools, installers, driver install disks want to access
information about available modules to optimize bootup handling.

* fork throttling mechanism as basic cgroup functionality that is
available in all hierarchies independent of the controllers used:
This is important to implement race-free killing of all members of a
cgroup, so that cgroup member processes cannot fork faster then a cgroup
supervisor process could kill them. This needs to be recursive, so that
not only a cgroup but all its subgroups are covered as well.

* proper cgroup-is-empty notification interface:
The current call_usermodehelper() interface is an unefficient and an
ugly hack. Tools would prefer anything more lightweight like a netlink,
poll() or fanotify interface.

* allow user xattrs to be set on files in the cgroupfs (and maybe
procfs?)

* simple, reliable and future-proof way to detect whether a specific pid
is running in a CLONE_NEWPID container, i.e. not in the root PID
namespace. Currently, there are available a few ugly hacks to detect
this (for example a process wanting to know whether it is running in a
PID namespace could just look for a PID 2 being around and named
kthreadd which is a kernel thread only visible in the root namespace),
however all these solutions encode information and expectations that
better shouldn’t be encoded in a namespace test like this. This
functionality is needed in particular since the removal of the the ns
cgroup controller which provided the namespace membership information to
user code.

* allow making use of the “cpu” cgroup controller by default without
breaking RT. Right now creating a cgroup in the “cpu” hierarchy that
shall be able to take advantage of RT is impossible for the generic case
since it needs an RT budget configured which is from a limited resource
pool. What we want is the ability to create cgroups in “cpu” whose
processes get an non-RT weight applied, but for RT take advantage of the
parent’s RT budget. We want the separation of RT and non-RT budget
assignment in the “cpu” hierarchy, because right now, you lose RT
functionality in it unless you assign an RT budget. This issue severely
limits the usefulness of “cpu” hierarchy on general purpose systems
right now.

* Add a timerslack cgroup controller, to allow increasing the timer
slack of user session cgroups when the machine is idle.

* An auxiliary meta data message for AF_UNIX called SCM_CGROUPS (or
something like that), i.e. a way to attach sender cgroup membership to
messages sent via AF_UNIX. This is useful in case services such as
syslog shall be shared among various containers (or service cgroups),
and the syslog implementation needs to be able to distinguish the
sending cgroup in order to separate the logs on disk. Of course stm
SCM_CREDENTIALS can be used to look up the PID of the sender followed by
a check in /proc/$PID/cgroup, but that is necessarily racy, and actually
a very real race in real life.

* SCM_COMM, with a similar use case as SCM_CGROUPS. This auxiliary
control message should carry the process name as available
in /proc/$PID/comm.</pre>

]]></description>
</item>

</channel>
</rss>
