Controlling and limiting system resources is a great deal for sysadmins, especially on heavy loaded machines. When mentioning limiting or prioritization, some terms pop into my head like nice, and ulimit, that was before cgroups.

CGroups (Control Groups) is a linux kernel feature written by Paul Menage and Rohit Seth at 2006, that limits and allocates system resources to different process groups.

Cgroups considered the base technology used in Linux containers (LXC), Google’s lmctfy (“Let Me Contain That For You”) containers solution which uses only cgroups to implement OS-level virtualization, and of course  Docker.

System resources are managed in cgroups by what’s called subsystems (or system controllers), each subsystem is responsible for one and only one resource in the system, you can see what are the available subsystems implemented by lssubsys command which is provided by cgroup-bin package:

[email protected]:~# lssubsys -a
cpuset
cpuacct
memory
devices
freezer
blkio
perf_event
hugetlb
cpu

using cgroups

Cgroups are implemented in hierarchy virtual filesystem, each cgroup contain a set of subsystems, and each subsystem is controlled by set of parameter files. Also, each cgroup can contain another nested cgroups.

For example, lets assume that (/cgroup) is a control group file system, mounted with the cpu and memory subsystems, those will be the set of files inside this directory:

[email protected]:~# ls /cgroup/
cgroup.clone_children  memory.kmem.limit_in_bytes  memory.numa_stat
cgroup.event_control  memory.kmem.max_usage_in_bytes  memory.oom_control
cgroup.procs  memory.kmem.slabinfo  memory.pressure_level
.......

As you can see, these files control the (cpu and memory) behavior of the processes attached to this cgroup.

Using cgroups can be done in many ways: manually using the cgroup filesystem, using the set of commands and tools provided by the cgroup-bin package which uses libcgroup API, or by using LXC or Docker directly.

To create cgroup manually, simply mount the cgroup filesystem with the specified subsystem:

[email protected]:~# mkdir /cgroup
[email protected]:~# mount -t cgroup -o cpu,memory binbash /cgroup

Now the /cgroup directory is automatically populated with the parameter files responsible for controlling the behavior of the cpu and memory subsystems.

As recommended, you can mount a tmpfs filesystem which holds the different cgroups, this is just a matter of organization:

[email protected]:~# mount -t tmpfs rootcg /cgroups/
[email protected]:~# mkdir /cgroups/memorysubs
[email protected]:~# mount -t cgroup -o memory memorysubs /cgroups/memorysubs/

lets play with some subsystems:

Cpuset Subsystem

This subsystem, determine the cpu cores that the group will use. To bind a cgroup to a certain cpu core(s), you will simply write the core number or range of cores into cpuset.cpus file.

Note that,  the files inside each cgroup is considered a pseudo files, which cant be edited using regular text editor like vim, instead we echo and redirect the preferred values to the files directly.

[email protected]:~# mkdir /cgroups/cpuset
[email protected]:~# mount -t cgroup -o cpuset cpusetcg /cgroups/cpuset/
[email protected]:~# mkdir /cgroups/cpuset/cg1
[email protected]:~# echo "0-3" > /cgroups/cpuset/cg1/cpuset.cpus

the previous commands will bind the first 4 cores to the cg1 cgroup.

CPU Subsystem

This subsystem control the access rates to the cpu, note that, there is a difference between the real cpu time each processes takes, and the priority shares between the processes groups.

The real time each cgroup take on the cpu, is controlled by cpu.rt_period_us and cpu.rt_runtime_us.

cpu.rt_period_us determines the periodic time the processes in the cgroup will take to run for cpu.rt_runtime_us timefor example the process inside a cgroup will run continuously for 0.2 seconds and this will happen every 5 seconds. However, messing with the cpu real time parameters can result in an unstable system and unexpected behaviors.

The priority or weight given to the group can be controlled using cpu.shares file inside the cpu subsystem. Note that, the value in the cpu.shares is considered a ratio not an exact value:

[email protected]:~# mkdir /cgroups/cputest
[email protected]:~# mount -t cgroup -o cpu /cgroups/cpusubs/
[email protected]:~# mkdir /cgroups/cputest/cgroup1
[email protected]:~# mkdir /cgroups/cputest/cgroup2
[email protected]:~# echo 750 > /cgroups/cputest/cgroup1/cpu.shares
[email protected]:~# echo 250 > /cgroups/cputest/cgroup2/cpu.shares

cgroup1 ( cpu.shares = 750 ) ===>  75% cpu time 
cgroup2 ( cpu.shares = 250 ) ===>  25% cpu time

Memory Subsystem

The memory subsystem limit the amount of RAM and Swap each cgroup is using, also it can be used to collect stats and info about the memory usage of each group through memory.usage_in_bytes file, also  you can collect information about cache usage  using memory.stat file.

To limit the amount of user memory for a group, use the memory.limit_in_bytes, also you can use the memory.memsw.limit_in_bytes to limit the amount of memory and swap for specific group, for more information see the memory.txt documentation .

[email protected]:~# mount -t cgroup -o memory memory /cgroups/memory
[email protected]:~# mkdir /cgroups/memory/test1
[email protected]:~# echo "256M" > /cgroups/memory/test1/memory.limit_in_bytes

Network Priority Subsystem

Network priority is controlled by net_prio subsystem, which is not implemented in the kernel by default like the other subsystems, to use this subsystem load the netprio_cgroup module:

[email protected]:~# modprobe netprio_cgroup

The netprio subsystem controls the priority of network traffic generated by the different processes. netprio subsystem contains net_prio.ifpriomap file which maps the network interface to a specific priority, when a process attached to this group, its network traffic will be governed by that priority:

[email protected]:~# mount -t cgroup -onet_prio netprio /cgroups/netsubs/
[email protected]:~# echo "eth0 10" > /cgroups/netsubs/net_prio.ifpriomap

Attach Processes to CGroups

To attach processes to cgroups, you have to add the process id to the tasks file which is generated under each cgroup dir. After adding the pid to the tasks file, every child of that pid will be attached to that group.

You can also use cgexec to start a new process in a group, cgexec is one of the tools that shipped with cgi-bin package. For example:

[email protected]:~# cgexec -g cpu:cpus1 /bin/sh
#

or alternatively you can use the manual method

[email protected]:~# sh
# echo $$ > /cgroups/cputest/cpus1/tasks

CGroups with Docker

Docker uses cgroups to limit the resources for its containers, adding limits to the containers can be done on creating a new container using docker run command:

[email protected]:~# docker run --cpu-shares=250 --memory=100m -i -t ubuntu /bin/bash

–memory used to limit the memory and swap usage for the container.

–cpu-share used to specify the shares for this container.

We can get the container’s id, to see what happened under the hood:

[email protected]:~# docker ps
CONTAINER ID  IMAGE         COMMAND     CREATED       STATUS        PORTS  NAMES
f9b804ced1a8  ubuntu:latest "/bin/bash" 4 minutes ago Up 4 minutes         cgrp-test

[email protected]:~# cat /sys/fs/cgroup/cpu/docker/f9b804ced1a8..../cpu.shares
250

We can see that a new cgroup called docker is created automatically inside each subsystem, and the cpu.share is changed to 250 as we specified.