Case Study in Security hardening of a Red Hat OpenShift Operator

6 min readApr 28, 2022

Applying the principle of least privileges when developing Operators

Let me start by asking you a few questions.

Are you developing Operators for Kubernetes?
Are you the kind of person who yearns for a safe environment to run everything as a privileged user (‘root’) during development?
Is bypassing security checks like running as the privileged root user or disabling SELinux, your swiss army knife to root cause any runtime issues with your software?

If you have answered “yes” to any of the above questions, this article might help you.

This entire work wouldn’t have been possible without contributions from my teammates Jens Freimann, Francesco Giudici, Julien Rope and Udica developer Juan Osorio Robles. My heartfelt thanks!!

Before proceeding, a disclaimer!!

Disclaimer: I’m guilty of turning off security checks as a default mechanism to root cause scenarios where a functional issue happens only in specific environments :-(. And this might seriously make Dan Walsh weep. I’m still learning and trying to get better.

So let’s begin by explaining the problem and understand how we arrived at the eventual solution.

Problem

One of the Kubernetes daemonset (kata-monitor) managed by the sandboxed containers operator required access to the following host paths:

/run/crio/crio.sock
/run/vc/sbs

If you are curious about knowing the actual daemonset code and sandboxed containers operator, you can jump to the following links:

KATA-1136: Metrics prometheus setup by littlejawa · Pull Request #149 ·…

Description Implements KATA-1136 This pull request is meant to make the Operator install everything needed to have kata…

github.com

OpenShift sandboxed containers

OpenShift sandboxed containers, based on the Kata Containers open source project, provides an Open Container Initiative…

www.redhat.com

Easy Solution

To access the required host paths, either we have to run the daemonset as a privileged container or modify the file permissions accordingly.

Modifying the file permissions was impossible because other processes also used these host paths.

So the easiest solution to get this working was by running the daemonset as a privileged container, including the container process itself running with the root user. And so we did, and everything was functionally working.

In Red Hat OpenShift, running privileged containers requires using a suitable SecurityContextConstraints(SCC). A good article explaining SCCs — https://cloud.redhat.com/blog/managing-sccs-in-openshift

Issues with the easy solution

Using privileged SCC for providing host path access to a container defies the least privilege principle. It increases the possibility of misuse, and it’s terrible to use the most relaxed of all the SCCs in OpenShift without careful consideration. Also, running the container process as a “root” should be avoided.

So we had a solution that was working, but it had to be improved. We had to do better.

And as with any improvement, it’s a journey with great learnings on the way.

Optimal Solution

An optimal solution should use the least privileges required to access the host paths and nothing more.

We needed a way to find out the required least privileges to implement this.

We approached the problem in the following way.

Identify the issues when running the container as a non-root user.
Identify the minimum required Linux system capabilities (CAP_*) for a non-root process to access files and directories (e.g., CAP_DAC_OVERRIDE, CAP_DAC_READ_SEARCH).
Identify the SELinux requirements.
Determine how to fit into the OpenShift security model w.r.to using SCCs.

Investigation

Running the container with non-root user-id

This was straightforward and required setting the USERID directive in the Dockerfile and the container manifest (YAML).

Example container manifest

securityContext:
  runAsUser: 1001
  runAsGroup: 1001

Linux system capabilities required for accessing host files

From the capabilities manpage, the following were likely candidates:

CAP_DAC_OVERRIDE or CAP_DAC_READ_SEARCH

Digging further into these capabilities led to a good StackOverflow explanation, which indicated that the CAP_DAC_READ_SEARCH capability was sufficient for the container to read the files. CAP_DAC_OVERRIDE is too permissive.

However, in our experiments, we found CAP_DAC_READ_SEARCH insufficient for kata-monitor. We had to use CAP_DAC_OVERRIDE.

The program ( kata-monitor) binary needs to have the specific capabilities set.

Following is the updated Dockerfile used to build the container image with the binary having the required capability.

FROM quay.io/openshift_sandboxed_containers/openshift-sandboxed-containers-monitorRUN chmod u-s /usr/bin/kata-monitor
RUN setcap "cap_dac_override+eip" /usr/bin/kata-monitor
CMD ["-h"]
ENTRYPOINT ["/usr/bin/kata-monitor"]

SELinux requirements

SELinux type for the required host paths:

# ls -lZd /run/vc
drwxr-xr-x. 3 root root system_u:object_r:container_var_run_t:s0 60 Nov 24 17:19 /run/vc# ls -lZd /run/crio
drwxr-xr-x. 4 root root system_u:object_r:container_var_run_t:s0 1880 Nov 25 07:10 /run/crio

From the above, we can see that the SELinux domain is container_var_run_t.

Now the question is which SELinux domain will allow access to container_var_run_t type?

Note that the default domain used by containers is container_runtime_t. And it doesn't allow access to container_var_run_t type.

The following blog from Dan Walsh describes our problem and possible solutions — https://danwalsh.livejournal.com/81756.html

We had to create a custom SELinux policy and Udica came to our rescue.

Creating custom SELinux policy using Udica

The steps below describe the policy generation process using Udica

Retrieve all the details of the running container

Udica requires the container.json file to create the policy.

We retrieved the container.json from an existing setup (running the easy solution mentioned earlier)

crictl inspect <container-id> > container.json

Another approach is to just run a sample container simulating the required behaviour.

For example, mounting the exact host paths and performing read operation.

podman run -it -v /run/crio:/run/crio quay.io/fedora/fedora:35 bash<Run some operations on the dir /run/crio, eg. "ls". You'll hit permission denied errors>podman inspect <container-id> > container.json

Generate policy

udica -j container.json my_policyPolicy my_policy created!
Please load these modules using:
semodule -i my_policy.cil /usr/share/udica/templates/base_container.cil
Restart the container with: "-security-opt label=type:my_policy.process" parameter

Another alternative approach to identify the required SELinux policy is using the security profiles operator — https://github.com/kubernetes-sigs/security-profiles-operator

Udica Caveats

While udica can create a good baseline policy from the container.json file, sometimes it might not be sufficient, and you will need customisations.

For example, in our case, when we deployed the policy, we figured out it was insufficient from the SELinux denial messages in the log.

The kata-monitor daemonset required permissions to listen on a TCP socket and connect to a Unix socket.

We had to redo the policy generation and enable network access.

udica -full-network-access -j container.json my_policy

The full-network-access option adds suitable rules to allow network connections.

For Unix socket access, we added the following custom rule.

(allow process container_runtime_t (unix_stream_socket (connectto)))

The SELinux denial messages in the system logs are very helpful in finding out what additional rules are needed. Please refer to the following article for learning about denial messages — https://www.redhat.com/sysadmin/selinux-denial2

Following is the complete SELinux custom policy for the kata-monitor daemonset.

(block container
(type process)
(type socket)
(roletype system_r process)
(typeattributeset domain (process ))
(typeattributeset container_domain (process ))
(typeattributeset svirt_sandbox_domain (process ))
(typeattributeset mcs_constrained_type (process ))
(typeattributeset file_type (socket ))
(allow process socket (sock_file (create open getattr setattr read write rename link unlink ioctl lock append)))
(allow process proc_type (file (getattr open read)))
(allow process cpu_online_t (file (getattr open read)))
(allow container_runtime_t process (key (create link read search setattr view write)))
)
(block net_container
(optional net_container_optional
(typeattributeset sandbox_net_domain (process))
)
)
(block osc_monitor
(blockinherit container)
(blockinherit net_container)
(allow process process ( capability ( chown dac_override setfcap setgid setpcap setuid )))
(allow process container_runtime_t (unix_stream_socket (connectto)))
(allow process container_var_run_t ( dir ( add_name create getattr ioctl lock open read remove_name rmdir search setattr write )))
(allow process container_var_run_t ( file ( append create getattr ioctl lock map open read rename setattr unlink write )))
(allow process container_var_run_t ( fifo_file ( getattr read write append ioctl lock open )))
(allow process container_var_run_t ( sock_file ( ioctl append getattr open read write )))
(allow process container_var_run_t ( lnk_file ( ioctl lock append getattr open read write )))
)

Create Custom OpenShift SCC Policy

Finally, to get all of these working with OpenShift, we had to create a custom SCC policy and use the same with our operator.

allowHostDirVolumePlugin: true
allowHostIPC: false
allowHostNetwork: false
allowHostPID: false
allowHostPorts: false
allowPrivilegeEscalation: false
allowPrivilegedContainer: false
allowedCapabilities:
- DAC_READ_OVERRIDE
apiVersion: security.openshift.io/v1
defaultAddCapabilities: null
fsGroup:
  type: RunAsAny
groups: []
kind: SecurityContextConstraints
metadata:
  name: sandboxed-containers-operator-scc
priority: null
readOnlyRootFilesystem: false
requiredDropCapabilities:
- MKNOD
- FSETID
- KILL
- FOWNER
runAsUser:
  type: MustRunAsNonRoot
seLinuxContext:
  seLinuxOptions:
    type: osc_monitor.process
  type: MustRunAs
supplementalGroups:
  type: RunAsAny
users:
- system:serviceaccount:openshift-sandboxed-containers-operator:monitor
volumes:
- '*'

Note that setting allowHostIPC to false takes care of disabling CAP_IPC_*. Likewise setting allowHostNetwork to false takes care of disabling CAP_NET_* and so on.

The custom SCC and the kata-monitor binary with the required capabilities set ensured we used the least privileges principle for our operator.

Summary

Hope the information presented here will help you to make informed decisions about tradeoffs between different options for hardening operators.

Remember it’s a journey and there are numerous tools to help you out. Would love to hear your experiences with hardening Kubernetes operators.