The /bin/bash Theory


K3S SELinux

In the previous post I talked about how K3S manages bootstrap data in multi server clusters, and how it ensures a secure data throughout the start and reconciliation process, today’s post will be a little bit different, it will shed some light on a fairly complex subject which is how k3s starts on systems that support SELinux and what kind of policy that k3s installs.

First I would like to thank Jacob Blain for guiding me through the k3s-selinux code, and helping me understand how the policy works with K3S and how SELinux works in general.

Container SELinux Policy

container-selinux is the custom SELinux policy for container runtimes, this post and other Dan Walsh’s posts explains in details how the container SELinux policy works, I highly recommend reading these posts for more information around container labeling and SELinux

Container engines run with type container_runtime_t which in SELinux called process domain, SELinux controls process type changing via something called transitions, for example K3S runs containerd as the container engine, when containerd runs a pod its type transitions to container_t:

# ps auxZ | grep containerd
system_u:system_r:container_runtime_t:s0 root 5721 11.3  7.9 903080 160848 ?     Sl   21:47   0:11 containerd

When inspecting the traefik pod created by default in K3S, you can see that the pod process transitions to container_t type:

# ps auxZ | grep traefik
system_u:system_r:container_t:s0:c245,c621 65532 7879 0.7  3.8 798060 78384 ?    Ssl  21:48   0:00 traefik...

By default container_t is allowed to read and execute certain labels while it only can write to container_file_t label which represent the writable layer of the container, for example when creating a file inside the pod in the container layer, you can see that the file has a label container_file_t in the active snapshot thats created on top of the final layer of content for the image:

# crictl -r /var/run/k3s/containerd/containerd.sock exec -it 9a3f80a344a31 touch test-file

# ls /var/lib/rancher/k3s/agent/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/48/fs/test-file -Zl
-rw-r--r--. 1 root root system_u:object_r:container_file_t:s0:c492,c760 0 Mar 14 22:17 /var/lib/rancher/k3s/agent/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/48/fs/test-file

the container domain has different container types as well, which you can see when executing the following comamnd:

# seinfo -acontainer_domain -x

Type Attributes: 1
   attribute container_domain;
	container_engine_t
	container_init_t
	container_kvm_t
	container_logreader_t
	container_t
	container_userns_t
	spc_t

These labels represents different ways for the containers that can run with, for example when creating a privilieged pod you can see its process transitioned to type spc_t (Super Privileged Container) which has higher permissions than normal container container_t.

K3S Installation With SELinux

K3S installation is done via the installation script which is hosted on k3s site, to install k3s you can run:

curl https://get.k3s.io | sh

The installation script will automatically detect the host system, and will define the right method to install k3s server on it, for RHEL systems the script will automatically setup and installs k3s-SELinux rpm package unless you opted not to by passing INSTALL_K3S_SKIP_SELinux_RPM=true environment variable when running the script.

k3s-SELinux

The k3s-SELinux rpm package contains the custom defined policy to run k3s on SELinux enabled systems, within inspecting the repo you will find the policy directory, which contains 3 subdirs that has policies for different systems:

policy
├── centos7
├── centos8
└── microos

The centos7 and centos8 dirs has policies for RHEL 7 and 8 systems, and the microos policy is defined for SUSE systems inclduing SLES and SLE Micro.

In this post I will try to explain the policy of centos8, however the three policies actually are fairly the same, the policy consists of main 3 files:

tree -L 1 policy/centos8 
policy/centos8
├── k3s.fc
├── k3s.if
├── k3s-SELinux.spec
├── k3s.te
└── scripts

We can ignore k3s-SELinux.spec file, since its just the RPM package spec file, the main 3 files are k3s.te, k3s.if, and k3s.fc

k3s.te

The .te files are called Type Enforcement file, This TE file is comprised of different code blocks, lets explain the file block by block:

module block

The module command identifies the module name and version which must be unique, in the file above you can see the module name and version:

policy_module(k3s, 1.0.0)

type definition block

In the k3s-SELinux policy we define 3 new file types as you can see in the snippet:

##### type: k3s_data_t
type k3s_data_t;
files_type(k3s_data_t);

##### type: k3s_lock_t
type k3s_lock_t;
files_lock_file(k3s_lock_t)

##### type: k3s_root_t, attr: k3s_root_domain
k3s_runtime_domain_template(k3s_root)

The three types are:

  • k3s_data_t: This type is intended for the directory /var/lib/rancher/k3s/data and its subdirectories which are the dirs that have all binaries and containerd config needed by k3 to start:
ls -ldZ /var/lib/rancher/k3s/data/
drwxr-xr-x. 3 root root unconfined_u:object_r:k3s_data_t:s0 106 Nov 25 23:27 /var/lib/rancher/k3s/data/
  • k3s_lock_t: This type is only intended for the lock files created by k3s
ls -lZ /var/lib/rancher/k3s/data/.lock
-rw-------. 1 root root system_u:object_r:k3s_lock_t:s0 0 Nov 25 23:27 /var/lib/rancher/k3s/data/.lock
  • k3s_root_t: This is the most important type definiton in the policy, and basically used for all files related to k3s, and you can notice that line
k3s_runtime_domain_template(k3s_root)

which basically calls a function that defines both the k3s_root_t type and attribute k3s_root_domain which allow certain permission of that attribute, basically when checking the access of a particular subject, its label is checked for supported attributes and rules on that attribute are accepted as well, in the next k3s.if file I will explain in details what that means and how the k3s_root_t type is defined and the attribute as well.

require block

The next block of the k3s.te file is the require block. This informs the policy loader which types, classes and roles are required in the system policy before this module can be installed, you can see that the policy requires all the types and attributes of the container-selinux policy which is a pre-requisite to the policy loading:

gen_require(`
    attribute container_runtime_domain;
    type container_runtime_exec_t, container_runtime_t;
    type container_file_t, container_share_t;
    type container_var_lib_t, var_lib_t;
    type container_log_t, var_log_t;
')

permissions block

The first two lines in that block allow admin permissions of container_runtime_domain to the defined types:

admin_pattern(container_runtime_domain, k3s_data_t)
admin_pattern(container_runtime_domain, k3s_lock_t)

If you like to know more about container_runtime_domain I highly recommend reading the following article which eloquently describes what is the container runtime domain attribute defined in container-selinux policy.

The admin_pattern allow certain rules to the container_runtime_domain which you may know by now, that k3s binary runs with it, to break that down and explain it more clearly:

admin_pattern(container_runtime_domain, k3s_data_t)

The previous line calls admin_patterns (and we will get to its definition in a minute) with two parameters container_runtime_domain and k3s_data_t

container_runtime_domain which is a policy attributes groups the following types:

# seinfo -acontainer_runtime_domain -x

Type Attributes: 1
   attribute container_runtime_domain;
	container_runtime_t
	kubelet_t

And k3s process itself runs with type(domain):

ps auxZ | grep k3s 
system_u:system_r:container_runtime_t:s0 root 5921 3.7  6.3 1266968 499484 ?     Ssl  23:27   1:03 /usr/local/bin/k3s server

Which means that admin_pattern will give admin privileges to k3s process to the following type k3s_data_t which happens to be the second parameter to admin_patterns.

To look further into the admin_pattern definition:

define(`admin_pattern',`
        manage_dirs_pattern($1,$2,$2)
        manage_files_pattern($1,$2,$2)
        manage_lnk_files_pattern($1,$2,$2)
        manage_fifo_files_pattern($1,$2,$2)
        manage_sock_files_pattern($1,$2,$2)

        relabel_dirs_pattern($1,$2,$2)
        relabel_files_pattern($1,$2,$2)
        relabel_lnk_files_pattern($1,$2,$2)
        relabel_fifo_files_pattern($1,$2,$2)
        relabel_sock_files_pattern($1,$2,$2)
')

You can see that the definition of admin_pattern calls other different definitions, to take an example of what it does, lets look at manage_dirs_pattern($1,$2,$2) which takes 3 parameters:

define(`manage_dirs_pattern',`
	allow $1 $2:dir rw_dir_perms;
	allow $1 $3:dir manage_dir_perms;
')

As you can see it has two allow statements which gives the first parameter which in our case is container_runtime_domain i.e the k3s process read and write permissions to the second parameter which is k3s_data_t in our case, also allow the following macro manage_dir_perms to the same source and target which in a nutshell gives dir permissions to create, delete, etc.

I wont go any further in explaining the actual macros but if you are interested in knowing what these macros means, you can refer to the following link

Transitions block

The final block of TE file is the file transition pattern:

files_lock_filetrans(container_runtime_domain, k3s_lock_t, { dir file })
filetrans_pattern(container_runtime_t, container_var_lib_t, k3s_data_t, dir, "data")
filetrans_pattern(container_runtime_t, k3s_data_t, k3s_lock_t, file, ".lock")
filetrans_pattern(container_runtime_t, k3s_data_t, k3s_root_t, dir, "bin")
filetrans_pattern(container_runtime_t, k3s_root_t, k3s_data_t, file, ".links")
filetrans_pattern(container_runtime_t, k3s_root_t, k3s_data_t, file, ".sha256sums")
filetrans_pattern(container_runtime_t, k3s_root_t, container_runtime_exec_t, file, "cni")
filetrans_pattern(container_runtime_t, k3s_root_t, container_runtime_exec_t, file, "containerd")
filetrans_pattern(container_runtime_t, k3s_root_t, container_runtime_exec_t, file, "containerd-shim")
filetrans_pattern(container_runtime_t, k3s_root_t, container_runtime_exec_t, file, "containerd-shim-runc-v1")
filetrans_pattern(container_runtime_t, k3s_root_t, container_runtime_exec_t, file, "containerd-shim-runc-v2")
filetrans_pattern(container_runtime_t, k3s_root_t, container_runtime_exec_t, file, "runc")
filetrans_pattern(container_runtime_t, container_var_lib_t, container_file_t, dir, "storage")
filetrans_pattern(container_runtime_t, container_var_lib_t, container_share_t, dir, "snapshots")
filetrans_pattern(container_runtime_t, var_lib_t, container_var_lib_t, dir, "kubelet")
filetrans_pattern(container_runtime_t, container_var_lib_t, container_file_t, dir, "pods")
filetrans_pattern(container_runtime_t, var_log_t, container_log_t, dir, "containers")
filetrans_pattern(container_runtime_t, var_log_t, container_log_t, dir, "pods")

The file name transition as defined by the RedHat guide states that:

The file name transition feature allows policy writers to specify the file name when writing policy transition rules. It is possible to write a rule that states: If a process labeled A_t creates a specified object class in a directory labeled B_t and the specified object class is named objectname, it gets the label C_t. This mechanism provides more fine-grained control over processes on the system.

To understand more lets take an example:

filetrans_pattern(container_runtime_t, k3s_data_t, k3s_root_t, dir, "bin")

This means that any process that runs within the attribute container_runtime_t places an object inside a directory that is labeled with k3s_data_t and that directory is named bin then that directory will have the label k3s_root_t, to confirm that, you can see that k3s creates /var/lib/rancher/k3s/data and that dir has the label k3s_data_t and then k3s proceed by creating a dir within called bin:

# ls -lZ /var/lib/rancher/k3s/data/7c994f47fd344e1637da337b92c51433c255b387d207b30b3e0262779457afe4/
drwxr-xr-x. 3 root root system_u:object_r:k3s_root_t:s0 8192 Nov 25 23:27 bin

The transitions as you can see makes sure that any containerd related files are labeled with container_runtime_exec_t, and makes sure that the log dirs are labeled with container_log_t

k3s.if

The k3s Interface file defines the k3s_runtime_domain_template that we discussed in the previous section:

template(`k3s_runtime_domain_template',`
	gen_require(`
		attribute container_runtime_domain, exec_type;
		role system_r, sysadm_r;
	')

	attribute $1_domain;
	type $1_t, $1_domain;
	role system_r types $1_t;
	role sysadm_r types $1_t;

	can_exec($1_t, exec_type)
	domain_type($1_t)
	domain_entry_file($1_domain, $1_t)

	admin_pattern(container_runtime_domain, $1_t)
')

This begins with the gen_require which requires certain attributes and roles to be present before this can be loaded, then it proceed by defining the attribute k3s_root_domain and the type k3s_root_t.

Then you can see that k3s_root_t has given executable rights by the line can_exec($1_t, exec_type) which adds an exec_type attribute on k3s_root_t.

Then the next lines domain_type which makes the specified type usable as a domain and domain_entry_file which makes the specified type usable as an entry point for the domain, in other words it sets the k3s_root_domain attribute which groups the k3s_root_t, you can confirm that by running:

$ seinfo -ak3s_root_domain -x

Type Attributes: 1
   attribute k3s_root_domain;
	k3s_root_t

The final rule line calls the admin_pattern() that we talked about in the previous section, and grants the container_runtime_domain types admin patterns to the k3s_root_t type, i.e any process that runs with a type defined in the container_runtime_domain attribute will be allowed admin privileges to k3s_root_t files like k3s process for example.

k3s.fc

Finally the File Context file contains patterns to label the files and dirs certain context according to the policy, this file should be straightforward:

/etc/systemd/system/k3s.*                                       --  gen_context(system_u:object_r:container_unit_file_t,s0)
/usr/lib/systemd/system/k3s.*                                   --  gen_context(system_u:object_r:container_unit_file_t,s0)
/usr/local/lib/systemd/system/k3s.*                             --  gen_context(system_u:object_r:container_unit_file_t,s0)
/usr/s?bin/k3s                                                  --  gen_context(system_u:object_r:container_runtime_exec_t,s0)
/usr/local/s?bin/k3s                                            --  gen_context(system_u:object_r:container_runtime_exec_t,s0)
/var/lib/rancher/k3s(/.*)?                                          gen_context(system_u:object_r:container_var_lib_t,s0)
/var/lib/rancher/k3s/agent/containerd/[^/]*/snapshots           -d  gen_context(system_u:object_r:container_share_t,s0)
/var/lib/rancher/k3s/agent/containerd/[^/]*/snapshots/[^/]*     -d  gen_context(system_u:object_r:container_share_t,s0)
/var/lib/rancher/k3s/agent/containerd/[^/]*/snapshots/[^/]*/.*      <<none>>
/var/lib/rancher/k3s/agent/containerd/[^/]*/sandboxes(/.*)?         gen_context(system_u:object_r:container_share_t,s0)
/var/lib/rancher/k3s/data(/.*)?                                     gen_context(system_u:object_r:k3s_data_t,s0)
/var/lib/rancher/k3s/data/.lock                                 --  gen_context(system_u:object_r:k3s_lock_t,s0)
/var/lib/rancher/k3s/data/[^/]*/bin(/.*)?                           gen_context(system_u:object_r:k3s_root_t,s0)
/var/lib/rancher/k3s/data/[^/]*/bin/[.]links                    --  gen_context(system_u:object_r:k3s_data_t,s0)
/var/lib/rancher/k3s/data/[^/]*/bin/[.]sha256sums               --  gen_context(system_u:object_r:k3s_data_t,s0)
/var/lib/rancher/k3s/data/[^/]*/bin/cni                         --  gen_context(system_u:object_r:container_runtime_exec_t,s0)
/var/lib/rancher/k3s/data/[^/]*/bin/containerd                  --  gen_context(system_u:object_r:container_runtime_exec_t,s0)
/var/lib/rancher/k3s/data/[^/]*/bin/containerd-shim             --  gen_context(system_u:object_r:container_runtime_exec_t,s0)
/var/lib/rancher/k3s/data/[^/]*/bin/containerd-shim-runc-v[12]  --  gen_context(system_u:object_r:container_runtime_exec_t,s0)
/var/lib/rancher/k3s/data/[^/]*/bin/runc                        --  gen_context(system_u:object_r:container_runtime_exec_t,s0)
/var/lib/rancher/k3s/data/[^/]*/etc(/.*)?                           gen_context(system_u:object_r:container_config_t,s0)
/var/lib/rancher/k3s/storage(/.*)?                                  gen_context(system_u:object_r:container_file_t,s0)
/var/run/k3s(/.*)?                                                  gen_context(system_u:object_r:container_var_run_t,s0)
/var/run/k3s/containerd/[^/]*/sandboxes/[^/]*/shm(/.*)?             gen_context(system_u:object_r:container_runtime_tmpfs_t,s0)

As explained the file sets the patterns to label the directories and files used by k3s, for example:

/var/lib/rancher/k3s/data/.lock                                 --  gen_context(system_u:object_r:k3s_lock_t,s0)

This line will label the lock file /var/lib/rancher/k3s/data/.lock with the context system_u:object_r:k3s_lock_t,s0

Conclusion

K3S installs a custom SELinux policy that defines new types which are used by various files installed and managed by k3s, the post hopefully explained how k3s manage these labels and what files are defined in the custom policy, for more information on k3s please refer to the official (documentation)[https://docs.k3s.io/].