Controlling and limiting system resources is a great deal for sysadmins, especially on heavy loaded machines. When mentioning limiting or prioritization, some terms pop into my head like nice, and ulimit, that was before cgroups.
CGroups (Control Groups) is a linux kernel feature written by Paul Menage and Rohit Seth at 2006, that limits and allocates system resources to different process groups.
Cgroups considered the base technology used in Linux containers (LXC), Google’s lmctfy (“Let Me Contain That For You”) containers solution which uses only cgroups to implement OS-level virtualization, and of course Docker.
System resources are managed in cgroups by what’s called subsystems (or system controllers), each subsystem is responsible for one and only one resource in the system, you can see what are the available subsystems implemented by lssubsys command which is provided by cgroup-bin package:
[email protected]:~# lssubsys -a cpuset cpuacct memory devices freezer blkio perf_event hugetlb cpu
Cgroups are implemented in hierarchy virtual filesystem, each cgroup contain a set of subsystems, and each subsystem is controlled by set of parameter files. Also, each cgroup can contain another nested cgroups.
For example, lets assume that (/cgroup) is a control group file system, mounted with the cpu and memory subsystems, those will be the set of files inside this directory:
[email protected]:~# ls /cgroup/ cgroup.clone_children memory.kmem.limit_in_bytes memory.numa_stat cgroup.event_control memory.kmem.max_usage_in_bytes memory.oom_control cgroup.procs memory.kmem.slabinfo memory.pressure_level .......
As you can see, these files control the (cpu and memory) behavior of the processes attached to this cgroup.
Using cgroups can be done in many ways: manually using the cgroup filesystem, using the set of commands and tools provided by the cgroup-bin package which uses libcgroup API, or by using LXC or Docker directly.
To create cgroup manually, simply mount the cgroup filesystem with the specified subsystem:
[email protected]:~# mkdir /cgroup [email protected]:~# mount -t cgroup -o cpu,memory binbash /cgroup
Now the /cgroup directory is automatically populated with the parameter files responsible for controlling the behavior of the cpu and memory subsystems.
As recommended, you can mount a tmpfs filesystem which holds the different cgroups, this is just a matter of organization:
[email protected]:~# mount -t tmpfs rootcg /cgroups/ [email protected]:~# mkdir /cgroups/memorysubs [email protected]:~# mount -t cgroup -o memory memorysubs /cgroups/memorysubs/
lets play with some subsystems:
This subsystem, determine the cpu cores that the group will use. To bind a cgroup to a certain cpu core(s), you will simply write the core number or range of cores into cpuset.cpus file.
Note that, the files inside each cgroup is considered a pseudo files, which cant be edited using regular text editor like vim, instead we echo and redirect the preferred values to the files directly.
[email protected]:~# mkdir /cgroups/cpuset [email protected]:~# mount -t cgroup -o cpuset cpusetcg /cgroups/cpuset/ [email protected]:~# mkdir /cgroups/cpuset/cg1 [email protected]:~# echo "0-3" > /cgroups/cpuset/cg1/cpuset.cpus
the previous commands will bind the first 4 cores to the cg1 cgroup.
This subsystem control the access rates to the cpu, note that, there is a difference between the real cpu time each processes takes, and the priority shares between the processes groups.
The real time each cgroup take on the cpu, is controlled by cpu.rt_period_us and cpu.rt_runtime_us.
cpu.rt_period_us determines the periodic time the processes in the cgroup will take to run for cpu.rt_runtime_us time. for example the process inside a cgroup will run continuously for 0.2 seconds and this will happen every 5 seconds. However, messing with the cpu real time parameters can result in an unstable system and unexpected behaviors.
The priority or weight given to the group can be controlled using cpu.shares file inside the cpu subsystem. Note that, the value in the cpu.shares is considered a ratio not an exact value:
[email protected]:~# mkdir /cgroups/cputest [email protected]:~# mount -t cgroup -o cpu /cgroups/cpusubs/ [email protected]:~# mkdir /cgroups/cputest/cgroup1 [email protected]:~# mkdir /cgroups/cputest/cgroup2 [email protected]:~# echo 750 > /cgroups/cputest/cgroup1/cpu.shares [email protected]:~# echo 250 > /cgroups/cputest/cgroup2/cpu.shares cgroup1 ( cpu.shares = 750 ) ===> 75% cpu time cgroup2 ( cpu.shares = 250 ) ===> 25% cpu time
The memory subsystem limit the amount of RAM and Swap each cgroup is using, also it can be used to collect stats and info about the memory usage of each group through memory.usage_in_bytes file, also you can collect information about cache usage using memory.stat file.
To limit the amount of user memory for a group, use the memory.limit_in_bytes, also you can use the memory.memsw.limit_in_bytes to limit the amount of memory and swap for specific group, for more information see the memory.txt documentation .
[email protected]:~# mount -t cgroup -o memory memory /cgroups/memory [email protected]:~# mkdir /cgroups/memory/test1 [email protected]:~# echo "256M" > /cgroups/memory/test1/memory.limit_in_bytes
Network Priority Subsystem
Network priority is controlled by net_prio subsystem, which is not implemented in the kernel by default like the other subsystems, to use this subsystem load the netprio_cgroup module:
[email protected]:~# modprobe netprio_cgroup
The netprio subsystem controls the priority of network traffic generated by the different processes. netprio subsystem contains net_prio.ifpriomap file which maps the network interface to a specific priority, when a process attached to this group, its network traffic will be governed by that priority:
[email protected]:~# mount -t cgroup -onet_prio netprio /cgroups/netsubs/ [email protected]:~# echo "eth0 10" > /cgroups/netsubs/net_prio.ifpriomap
Attach Processes to CGroups
To attach processes to cgroups, you have to add the process id to the tasks file which is generated under each cgroup dir. After adding the pid to the tasks file, every child of that pid will be attached to that group.
You can also use cgexec to start a new process in a group, cgexec is one of the tools that shipped with cgi-bin package. For example:
[email protected]:~# cgexec -g cpu:cpus1 /bin/sh #
or alternatively you can use the manual method
[email protected]:~# sh # echo $$ > /cgroups/cputest/cpus1/tasks
CGroups with Docker
Docker uses cgroups to limit the resources for its containers, adding limits to the containers can be done on creating a new container using docker run command:
[email protected]:~# docker run --cpu-shares=250 --memory=100m -i -t ubuntu /bin/bash
–memory used to limit the memory and swap usage for the container.
–cpu-share used to specify the shares for this container.
We can get the container’s id, to see what happened under the hood:
[email protected]:~# docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES f9b804ced1a8 ubuntu:latest "/bin/bash" 4 minutes ago Up 4 minutes cgrp-test [email protected]:~# cat /sys/fs/cgroup/cpu/docker/f9b804ced1a8..../cpu.shares 250
We can see that a new cgroup called docker is created automatically inside each subsystem, and the cpu.share is changed to 250 as we specified.