Posted on Mi 24 Oktober 2012

systemd for Administrators, Part XVIII

Hot on the heels of the previous story, here's now the eighteenth installment of my ongoing series on systemd for Administrators:

Managing Resources

An important facet of modern computing is resource management: if you run more than one program on a single machine you want to assign the available resources to them enforcing particular policies. This is particularly crucial on smaller, embedded or mobile systems where the scarce resources are the main constraint, but equally for large installations such as cloud setups, where resources are plenty, but the number of programs/services/containers on a single node is drastically higher.

Traditionally, on Linux only one policy was really available: all processes got about the same CPU time, or IO bandwith, modulated a bit via the process nice value. This approach is very simple and covered the various uses for Linux quite well for a long time. However, it has drawbacks: not all all processes deserve to be even, and services involving lots of processes (think: Apache with a lot of CGI workers) this way would get more resources than services whith very few (think: syslog).

When thinking about service management for systemd, we quickly realized that resource management must be core functionality of it. In a modern world -- regardless if server or embedded -- controlling CPU, Memory, and IO resources of the various services cannot be an afterthought, but must be built-in as first-class service settings. And it must be per-service and not per-process as the traditional nice values or POSIX Resource Limits were.

In this story I want to shed some light on what you can do to enforce resource policies on systemd services. Resource Management in one way or another has been available in systemd for a while already, so it's really time we introduce this to the broader audience.

In an earlier blog post I highlighted the difference between Linux Control Croups (cgroups) as a labelled, hierarchal grouping mechanism, and Linux cgroups as a resource controlling subsystem. While systemd requires the former, the latter is optional. And this optional latter part is now what we can make use of to manage per-service resources. (At this points, it's probably a good idea to read up on cgroups before reading on, to get at least a basic idea what they are and what they accomplish. Even thought the explanations below will be pretty high-level, it all makes a lot more sense if you grok the background a bit.)

The main Linux cgroup controllers for resource management are cpu, memory and blkio. To make use of these, they need to be enabled in the kernel, which many distributions (including Fedora) do. systemd exposes a couple of high-level service settings to make use of these controllers without requiring too much knowledge of the gory kernel details.

Managing CPU

As a nice default, if the cpu controller is enabled in the kernel, systemd will create a cgroup for each service when starting it. Without any further configuration this already has one nice effect: on a systemd system every system service will get an even amount of CPU, regardless how many processes it consists off. Or in other words: on your web server MySQL will get the roughly same amount of CPU as Apache, even if the latter consists a 1000 CGI script processes, but the former only of a few worker tasks. (This behavior can be turned off, see DefaultControllers= in /etc/systemd/system.conf.)

On top of this default, it is possible to explicitly configure the CPU shares a service gets with the CPUShares= setting. The default value is 1024, if you increase this number you'll assign more CPU to a service than an unaltered one at 1024, if you decrease it, less.

Let's see in more detail, how we can make use of this. Let's say we want to assign Apache 1500 CPU shares instead of the default of 1024. For that, let's create a new administrator service file for Apache in /etc/systemd/system/httpd.service, overriding the vendor supplied one in /usr/lib/systemd/system/httpd.service, but let's change the CPUShares= parameter:

.include /usr/lib/systemd/system/httpd.service


The first line will pull in the vendor service file. Now, lets's reload systemd's configuration and restart Apache so that the new service file is taken into account:

systemctl daemon-reload
systemctl restart httpd.service

And yeah, that's already it, you are done!

(Note that setting CPUShares= in a unit file will cause the specific service to get its own cgroup in the cpu hierarchy, even if cpu is not included in DefaultControllers=.)

Analyzing Resource usage

Of course, changing resource assignments without actually understanding the resource usage of the services in questions is like blind flying. To help you understand the resource usage of all services, we created the tool systemd-cgtop, that will enumerate all cgroups of the system, determine their resource usage (CPU, Memory, and IO) and present them in a top-like fashion. Building on the fact that systemd services are managed in cgroups this tool hence can present to you for services what top shows you for processes.

Unfortunately, by default cgtop will only be able to chart CPU usage per-service for you, IO and Memory are only tracked as total for the entire machine. The reason for this is simply that by default there are no per-service cgroups in the blkio and memory controller hierarchies but that's what we need to determine the resource usage. The best way to get this data for all services is to simply add the memory and blkio controllers to the aforementioned DefaultControllers= setting in system.conf.

Managing Memory

To enforce limits on memory systemd provides the MemoryLimit=, and MemorySoftLimit= settings for services, summing up the memory of all its processes. These settings take memory sizes in bytes that are the total memory limit for the service. This setting understands the usual K, M, G, T suffixes for Kilobyte, Megabyte, Gigabyte, Terabyte (to the base of 1024).

.include /usr/lib/systemd/system/httpd.service


(Analogue to CPUShares= above setting this option will cause the service to get its own cgroup in the memory cgroup hierarchy.)

Managing Block IO

To control block IO multiple settings are available. First of all BlockIOWeight= may be used which assigns an IO weight to a specific service. In behaviour the weight concept is not unlike the shares concept of CPU resource control (see above). However, the default weight is 1000, and the valid range is from 10 to 1000:

.include /usr/lib/systemd/system/httpd.service


Optionally, per-device weights can be specified:

.include /usr/lib/systemd/system/httpd.service

BlockIOWeight=/dev/disk/by-id/ata-SAMSUNG_MMCRE28G8MXP-0VBL1_DC06K01009SE009B5252 750

Instead of specifiying an actual device node you also specify any path in the file system:

.include /usr/lib/systemd/system/httpd.service

BlockIOWeight=/home/lennart 750

If the specified path does not refer to a device node systemd will determine the block device /home/lennart is on, and assign the bandwith weight to it.

You can even add per-device and normal lines at the same time, which will set the per-device weight for the device, and the other value as default for everything else.

Alternatively one may control explicit bandwith limits with the BlockIOReadBandwidth= and BlockIOWriteBandwidth= settings. These settings take a pair of device node and bandwith rate (in bytes per second) or of a file path and bandwith rate:

.include /usr/lib/systemd/system/httpd.service

BlockIOReadBandwith=/var/log 5M

This sets the maximum read bandwith on the block device backing /var/log to 5Mb/s.

(Analogue to CPUShares= and MemoryLimit= using any of these three settings will result in the service getting its own cgroup in the blkio hierarchy.)

Managing Other Resource Parameters

The options described above cover only a small subset of the available controls the various Linux control group controllers expose. We picked these and added high-level options for them since we assumed that these are the most relevant for most folks, and that they really needed a nice interface that can handle units properly and resolve block device names.

In many cases the options explained above might not be sufficient for your usecase, but a low-level kernel cgroup setting might help. It is easy to make use of these options from systemd unit files, without having them covered with a high-level setting. For example, sometimes it might be useful to set the swappiness of a service. The kernel makes this controllable via the memory.swappiness cgroup attribute, but systemd does not expose it as a high-level option. Here's how you use it nonetheless, using the low-level ControlGroupAttribute= setting:

.include /usr/lib/systemd/system/httpd.service

ControlGroupAttribute=memory.swappiness 70

(Analogue to the other cases this too causes the service to be added to the memory hierarchy.)

Later on we might add more high-level controls for the various cgroup attributes. In fact, please ping us if you frequently use one and believe it deserves more focus. We'll consider adding a high-level option for it then. (Even better: send us a patch!)

Disclaimer: note that making use of the various resource controllers does have a runtime impact on the system. Enforcing resource limits comes at a price. If you do use them, certain operations do get slower. Especially the memory controller has (used to have?) a bad reputation to come at a performance cost.

For more details on all of this, please have a look at the documenation of the mentioned unit settings, and of the cpu, memory and blkio controllers.

And that's it for now. Of course, this blog story only focussed on the per-service resource settings. On top this, you can also set the more traditional, well-known per-process resource settings, which will then be inherited by the various subprocesses, but always only be enforced per-process. More specifically that's IOSchedulingClass=, IOSchedulingPriority=, CPUSchedulingPolicy=, CPUSchedulingPriority=, CPUAffinity=, LimitCPU= and related. These do not make use of cgroup controllers and have a much lower performance cost. We might cover those in a later article in more detail.

© Lennart Poettering. Built using Pelican. Theme by Giulio Fidente on github. .