Fight for resources, part 1: Basics of Cgroups

    Computers are hardware. And today we are back to the starting point, in the sense that now rarely find a physical host on which one single task is performed. Even if only one application is spinning on the server, it most likely consists of several processes, containers or even virtual machines (VMs), and they all run on the same server. Red Hat Enterprise Linux 7 does a good job of allocating system resources in such situations, but by default it behaves like a kind grandmother who treats grandchildren with home-made cake and says: “To everybody equally, to all equally.”

    In theory, the “equally divided” principle is, of course, beautiful, but in practice some processes, containers or VMs are more important than others, and, therefore, should receive more.

    Linux has long had resources management tools (nice, ulimit, etc.), but with the advent of Red Hat Enterprise Linux 7 and systemd, we finally have a powerful set of such tools built into the OS itself. The fact is that the key component of systemd is a ready-made, customized set of cgroups, which is fully involved at the OS level.

    Well, what are these cgroups in general, and where is the management of resources or performance?

    Kernel level control

    Beginning with version 2.6.24, released in January 2008, the Linux kernel introduced what was originally invented and created by Google under the name “process containers”, and in Linux it became known as “control groups”, abbreviated cgroups. In short, cgroups is a kernel mechanism that allows you to limit usage, keep records and isolate the consumption of system resources (CPU, memory, disk I / O, network, etc.) at the level of process collections. Cgroups can also freeze processes for checking and restarting. The cgroups controllers first appeared in the 6th version of Red Hat Enterprise Linux, but they had to be manually configured there. But with the advent of Red Hat Enterprise Linux 7 and systemd, the pre-configured cgroups are already bundled with the OS.

    All this works at the kernel level of the OS and therefore guarantees strict control over each process. So now it is extremely difficult for some malware to load the system so that it stops responding and hangs. Although, of course, the buggy code with direct access to the hardware (for example, drivers) is still capable of such. At the same time, Red Hat Enterprise Linux 7 provides an interface for interacting with cgroups, and all work with them is mainly done through the systemd command.

    Own piece of cake

    In the diagram below, resembling a sliced ​​cake, there are three cgroups that are present by default on the Red Hat Enterprise Linux 7 server — System, User, and Machine. Each of these groups is called a “slice” (slice). As can be seen in the figure, each slice can have child slices. And, as in the case of the cake, in the sum all the slices give 100% of the corresponding resource.

    Now consider several concepts of cgroups on the example of processor resources.

    The figure above shows that the processor time is equally divided between the three top-level slices (System, User and Machine). But this happens only under load. If a process from the User slice asks for 100% of the processor resources, and no one else needs these resources at the moment, then it will receive all 100% of the processor time.

    Each of the three top-level slices is designed for its own type of workload, which is used to cut child sectors within the parent slice:

    • System - daemons and services.
    • User - user sessions. Each user receives a child slice, and all sessions with the same UID "live" in the same slice, so that clever people can not get more resources than they should be.
    • Machine - virtual machines, such as KVM guests.

    In addition, the concept of the so-called “ball” (share) is used to control the use of resources. A ball is a relative numeric parameter; its value makes sense only in comparison with the values ​​of other balls in the same cgroup. By default, all slices have a ball equal to 1024. In the System slice, the figure for httpd, sshd, crond and gdm is set to CPU balls equal to 1024. The ball values ​​for System, User, and Machine slices are also 1024. A little confusing? In fact, it can be represented as a tree:

    • System - 1024
      • httpd - 1024
      • sshd - 1024
      • crond - 1024
      • gdm - 1024
    • User - 1024
      • bash (mrichter) - 1024
      • bash (dorf) - 1024
    • Machine - 1024
      • testvm - 1024

    In this list, we have several running demons, a couple of users and one virtual machine. Now imagine that they all simultaneously request all the CPU time that you can get.


    • Slice System gets 33.333% of CPU time and equally divides it between four demons, which gives each of them 8.25% of CPU resources.
    • The User slice receives 33.333% of CPU time and divides it between two users, each of whom has 16.5% of CPU resources. If the user mrichter logs out or stops all running processes, then 33% of the CPU resources will be available to the dorf user.
    • Slice Machine gets 33.333% of CPU time. If you turn off the VM or put it into idle mode, System and User slices will receive approximately 50% of the CPU resources, which will then be divided between their child slices.

    In addition, for any daemon, user, or virtual machine, you can enter not only relative, but also absolute restrictions on the consumption of processor time, and not only one, but also several processors. For example, the slice of the user mrichter has a CPUQuota property. If you set it to 20%, then under no circumstances will mrichter receive more than 20% of the resources of one CPU. On multicore servers, CPUQuota can be more than 100% so that the slice can use the resources of more than one processor. For example, with CPUQuota = 200%, a slice can fully utilize two processor cores. It is important to understand that CPUQuota does not reserve, in other words, it does not guarantee a specified percentage of processor time for any system load - this is only the maximum,

    We twist to the full!

    How can I change slice settings?

    For this, each slice has custom properties. And since this is Linux, we can manually set the settings in the configuration files or set from the command line.

    In the second case, the systemctl set-property command is used. This is what will happen on the screen if you type this command, add a slice name at the end (in our case, User) and then press the Tab key to display the options:

    Not all the properties in this screenshot are cgroup settings. We are mainly interested in those that start at Block, CPU and Memory.

    If you prefer not a command line, but config-files (for example, for automated deployment on several hosts), then you will have to deal with files in the / etc / systemd / system folder. These files are automatically created when setting properties using the systemctl command, but they can also be created in a text editor, stamped using Puppet, or even generated by scripts on the fly.

    So, with the basic concepts of cgroups everything should be clear. Next time we will go through some scenarios and see how changes in certain properties affect performance.

    And literally tomorrow we invite everyone to Red Hat Forum Russia 2018- will be able to ask questions directly to Red Hat engineers.

    Other posts on cgroups from our series “Fight for resources” are available at the links:

    Also popular now: