Enforcing Security in Confidential Containers with Kata Agent Policies
How to control Pod execution inside a TEE
The CNCF Confidential Containers (CoCo) project enables deploying a Kubernetes pod inside a Trusted Execution Environment (TEE). This approach makes confidential computing adoption easier for, but it also comes with a few challenges, especially regarding the pod’s dynamic nature. Containers in the pod can be created, deleted, and updated via making Remote Procedure Calls (RPC) to the container runtime. So, how can we ensure that nothing unintended will be executed in it that would undermine confidentiality?
Magnus’s blog, “Policing the Sandbox “, discusses this topic in detail. The recommended solution is to utilise the Kata agent policy feature. In this blog, I’ll deep dive into the the agent policy feature.
Let’s look at the high-level architecture of a typical CoCo deployment in Kubernetes.
The diagram shows that a CoCo pod uses a Linux Confidential Virtual Machine (CVM) executing inside a TEE (the secure enclave) as the foundation. The use of CVM enables the lift-and-shift of existing container workloads.
The container images related to the pod are downloaded and kept inside the CVM and can be signed and/or encrypted.
The components within the CVM — namely image-rs, kata-agent, the Confidential Data Hub (CDH), and the Attestation Agent (AA) — are collectively referred to as the enclave software stack.
An external entity, the “Relying Party Services,” is responsible for attesting the CVM and releasing secret resources. For detailed information related to attestation, refer to the following documentation.
Our focus for this article is the communication between the kata runtime and the kata-agent.
The kata runtime makes remote procedure calls (RPC) to the kata-agent inside the CVM for creating and managing the life-cycle of the containers. Further, the kata-agent provides the capability to define an allowlist of kata-agent RPC API requests. This is the agent policy feature. It is a set of rules that define what RPC API requests can execute. The policies are written in the Rego language and kata-agent uses regorus library to evaluate the policy.
The diagram below illustrates the key components, with some omitted for brevity.
Structure of an Agent Policy file
An agent policy document is typically structured into three main components:
package agent_policy
//1. defaults
//2. rules
//3. policy data
- Defaults: These define the default allow or deny values for different RPC API requests.
- Rules: These are rego rules
- Policy Data: This section provides configuration values for the rules
Defaults
Here is an example default list. Note that CreateContainerRequest and ExecProcessRequest is disabled (set to false)
default CreateContainerRequest := false
default ExecProcessRequest := false
default CopyFileRequest := true
Rules
Rules enable overriding the default allow or deny values for the RPC requests. The rules typically compare the input parameters of an RPC API request with values from the policy data. Based on this comparison, a rule can either allow or deny the request by returning true or false. Policy rules are optional.
Examples of rules corresponding to the kata-agent’s CreateContainer and ExecProcess requests:
import future.keywords.in
import future.keywords.if
import future.keywords.every
CreateContainerRequest if {
every storage in input.storages {
some allowed_image in policy_data.allowed_images
storage.source == allowed_image
}
}
ExecProcessRequest if {
input_command = concat(" ", input.process.Args)
some allowed_command in policy_data.allowed_commands
input_command == allowed_command
}
The input
data in the rules is provided by the kata-agent to regorus library as a JSON format representation of the API request parameters. The input data is compared against the reference values provided by the policy data (policy_data
).
Policy data
It contains the reference values that are compared by the Policy rules with the input parameters of the RPC API request. Policy data is optional.
Example of Policy data:
policy_data := {
"allowed_commands": [
"ps"
],
"allowed_images": [
"pause",
"quay.io/fedora/fedora:latest"
]
}
A complete rego policy document looks like this:
package agent_policy
import future.keywords.in
import future.keywords.if
import future.keywords.every
default AddARPNeighborsRequest := true
default AddSwapRequest := true
default CloseStdinRequest := true
default CopyFileRequest := true
default CreateSandboxRequest := true
default DestroySandboxRequest := true
default GetMetricsRequest := true
default GetOOMEventRequest := true
default GuestDetailsRequest := true
default ListInterfacesRequest := true
default ListRoutesRequest := true
default MemHotplugByProbeRequest := true
default OnlineCPUMemRequest := true
default PauseContainerRequest := true
default PullImageRequest := true
default ReadStreamRequest := true
default RemoveContainerRequest := true
default RemoveStaleVirtiofsShareMountsRequest := true
default ReseedRandomDevRequest := true
default ResumeContainerRequest := true
default SetGuestDateTimeRequest := true
default SetPolicyRequest := true
default SignalProcessRequest := true
default StartContainerRequest := true
default StartTracingRequest := true
default StatsContainerRequest := true
default StopTracingRequest := true
default TtyWinResizeRequest := true
default UpdateContainerRequest := true
default UpdateEphemeralMountsRequest := true
default UpdateInterfaceRequest := true
default UpdateRoutesRequest := true
default WaitProcessRequest := true
default WriteStreamRequest := true
default CreateContainerRequest := false
default ExecProcessRequest := false
CreateContainerRequest if {
every storage in input.storages {
some allowed_image in policy_data.allowed_images
storage.source == allowed_image
}
}
ExecProcessRequest if {
input_command = concat(" ", input.process.Args)
some allowed_command in policy_data.allowed_commands
input_command == allowed_command
}
policy_data := {
"allowed_commands": [
"/bin/bash -c ls"
],
"allowed_images": [
"pause",
"quay.io/fedora/fedora@sha256:4d29104e4d6f0fb6fad0792e1cab6c44f574f2f3d6ff9e0de7737ab9c86b9d94"
]
}
You can use a rego linter like regal to check for errors in the policy file.
You can also use the genpolicy tool to autogenerate a policy based on the pod manifest file. You’ll need to adapt the generated policy file to suit your environment.
Using agent policy file with pods
You can embed the agent policy file in the VM root filesystem or provide it during pod creation via an annotation in the pod manifest.
When using an annotation, you need to base64 encode the policy. Two annotations are available for specifying the policy:
- io.katacontainers.config.runtime.cc_init_data
- io.katacontainers.config.agent.policy
The following doc describes the format of the cc_init_data annotation. The preferred way is to provide the policy file as part of the cc_init_data annotation since this is measured via remote attestation.
The policy file, when provided as part of cc_init_data annotation, looks like this:
algorithm = "sha384"
version = "0.1.0"
[data]
"policy.rego" = '''
package agent_policy
import future.keywords.in
import future.keywords.if
import future.keywords.every
default AddARPNeighborsRequest := true
default AddSwapRequest := true
default CloseStdinRequest := true
default CopyFileRequest := true
default CreateSandboxRequest := true
default DestroySandboxRequest := true
default GetMetricsRequest := true
default GetOOMEventRequest := true
default GuestDetailsRequest := true
default ListInterfacesRequest := true
default ListRoutesRequest := true
default MemHotplugByProbeRequest := true
default OnlineCPUMemRequest := true
default PauseContainerRequest := true
default PullImageRequest := true
default ReadStreamRequest := true
default RemoveContainerRequest := true
default RemoveStaleVirtiofsShareMountsRequest := true
default ReseedRandomDevRequest := true
default ResumeContainerRequest := true
default SetGuestDateTimeRequest := true
default SetPolicyRequest := true
default SignalProcessRequest := true
default StartContainerRequest := true
default StartTracingRequest := true
default StatsContainerRequest := true
default StopTracingRequest := true
default TtyWinResizeRequest := true
default UpdateContainerRequest := true
default UpdateEphemeralMountsRequest := true
default UpdateInterfaceRequest := true
default UpdateRoutesRequest := true
default WaitProcessRequest := true
default WriteStreamRequest := true
default CreateContainerRequest := false
default ExecProcessRequest := false
CreateContainerRequest if {
every storage in input.storages {
some allowed_image in policy_data.allowed_images
storage.source == allowed_image
}
}
ExecProcessRequest if {
input_command = concat(" ", input.process.Args)
some allowed_command in policy_data.allowed_commands
input_command == allowed_command
}
policy_data := {
"allowed_commands": [
"/bin/bash -c ls"
],
"allowed_images": [
"pause",
"quay.io/fedora/fedora@sha256:4d29104e4d6f0fb6fad0792e1cab6c44f574f2f3d6ff9e0de7737ab9c86b9d94"
]
}
'''
Example pod yaml:
apiVersion: v1
kind: Pod
metadata:
name: test
labels:
app: test
annotations:
io.katacontainers.config.runtime.cc_init_data: YWxnb3JpdGhtID0gInNoYTM4NCIKdmVyc2lvbiA9ICIwLjEuMCIKCltkYXRhXQoicG9saWN5LnJlZ28iID0gJycnCnBhY2thZ2UgYWdlbnRfcG9saWN5CgppbXBvcnQgZnV0dXJlLmtleXdvcmRzLmluCmltcG9ydCBmdXR1cmUua2V5d29yZHMuaWYKaW1wb3J0IGZ1dHVyZS5rZXl3b3Jkcy5ldmVyeQoKZGVmYXVsdCBBZGRBUlBOZWlnaGJvcnNSZXF1ZXN0IDo9IHRydWUKZGVmYXVsdCBBZGRTd2FwUmVxdWVzdCA6PSB0cnVlCmRlZmF1bHQgQ2xvc2VTdGRpblJlcXVlc3QgOj0gdHJ1ZQpkZWZhdWx0IENvcHlGaWxlUmVxdWVzdCA6PSB0cnVlCmRlZmF1bHQgQ3JlYXRlU2FuZGJveFJlcXVlc3QgOj0gdHJ1ZQpkZWZhdWx0IERlc3Ryb3lTYW5kYm94UmVxdWVzdCA6PSB0cnVlCmRlZmF1bHQgR2V0TWV0cmljc1JlcXVlc3QgOj0gdHJ1ZQpkZWZhdWx0IEdldE9PTUV2ZW50UmVxdWVzdCA6PSB0cnVlCmRlZmF1bHQgR3Vlc3REZXRhaWxzUmVxdWVzdCA6PSB0cnVlCmRlZmF1bHQgTGlzdEludGVyZmFjZXNSZXF1ZXN0IDo9IHRydWUKZGVmYXVsdCBMaXN0Um91dGVzUmVxdWVzdCA6PSB0cnVlCmRlZmF1bHQgTWVtSG90cGx1Z0J5UHJvYmVSZXF1ZXN0IDo9IHRydWUKZGVmYXVsdCBPbmxpbmVDUFVNZW1SZXF1ZXN0IDo9IHRydWUKZGVmYXVsdCBQYXVzZUNvbnRhaW5lclJlcXVlc3QgOj0gdHJ1ZQpkZWZhdWx0IFB1bGxJbWFnZVJlcXVlc3QgOj0gdHJ1ZQpkZWZhdWx0IFJlYWRTdHJlYW1SZXF1ZXN0IDo9IHRydWUKZGVmYXVsdCBSZW1vdmVDb250YWluZXJSZXF1ZXN0IDo9IHRydWUKZGVmYXVsdCBSZW1vdmVTdGFsZVZpcnRpb2ZzU2hhcmVNb3VudHNSZXF1ZXN0IDo9IHRydWUKZGVmYXVsdCBSZXNlZWRSYW5kb21EZXZSZXF1ZXN0IDo9IHRydWUKZGVmYXVsdCBSZXN1bWVDb250YWluZXJSZXF1ZXN0IDo9IHRydWUKZGVmYXVsdCBTZXRHdWVzdERhdGVUaW1lUmVxdWVzdCA6PSB0cnVlCmRlZmF1bHQgU2V0UG9saWN5UmVxdWVzdCA6PSB0cnVlCmRlZmF1bHQgU2lnbmFsUHJvY2Vzc1JlcXVlc3QgOj0gdHJ1ZQpkZWZhdWx0IFN0YXJ0Q29udGFpbmVyUmVxdWVzdCA6PSB0cnVlCmRlZmF1bHQgU3RhcnRUcmFjaW5nUmVxdWVzdCA6PSB0cnVlCmRlZmF1bHQgU3RhdHNDb250YWluZXJSZXF1ZXN0IDo9IHRydWUKZGVmYXVsdCBTdG9wVHJhY2luZ1JlcXVlc3QgOj0gdHJ1ZQpkZWZhdWx0IFR0eVdpblJlc2l6ZVJlcXVlc3QgOj0gdHJ1ZQpkZWZhdWx0IFVwZGF0ZUNvbnRhaW5lclJlcXVlc3QgOj0gdHJ1ZQpkZWZhdWx0IFVwZGF0ZUVwaGVtZXJhbE1vdW50c1JlcXVlc3QgOj0gdHJ1ZQpkZWZhdWx0IFVwZGF0ZUludGVyZmFjZVJlcXVlc3QgOj0gdHJ1ZQpkZWZhdWx0IFVwZGF0ZVJvdXRlc1JlcXVlc3QgOj0gdHJ1ZQpkZWZhdWx0IFdhaXRQcm9jZXNzUmVxdWVzdCA6PSB0cnVlCmRlZmF1bHQgV3JpdGVTdHJlYW1SZXF1ZXN0IDo9IHRydWUKCmRlZmF1bHQgQ3JlYXRlQ29udGFpbmVyUmVxdWVzdCA6PSBmYWxzZQpkZWZhdWx0IEV4ZWNQcm9jZXNzUmVxdWVzdCA6PSBmYWxzZQoKCkNyZWF0ZUNvbnRhaW5lclJlcXVlc3QgaWYgewoJZXZlcnkgc3RvcmFnZSBpbiBpbnB1dC5zdG9yYWdlcyB7CiAgICAgICAgc29tZSBhbGxvd2VkX2ltYWdlIGluIHBvbGljeV9kYXRhLmFsbG93ZWRfaW1hZ2VzCiAgICAgICAgc3RvcmFnZS5zb3VyY2UgPT0gYWxsb3dlZF9pbWFnZQogICAgfQp9CgpFeGVjUHJvY2Vzc1JlcXVlc3QgaWYgewogICAgaW5wdXRfY29tbWFuZCA9IGNvbmNhdCgiICIsIGlucHV0LnByb2Nlc3MuQXJncykKCXNvbWUgYWxsb3dlZF9jb21tYW5kIGluIHBvbGljeV9kYXRhLmFsbG93ZWRfY29tbWFuZHMKCWlucHV0X2NvbW1hbmQgPT0gYWxsb3dlZF9jb21tYW5kCn0KCnBvbGljeV9kYXRhIDo9IHsgIAogICJhbGxvd2VkX2NvbW1hbmRzIjogWyAgIAoJIi9iaW4vYmFzaCAtYyBscyIKICBdLAogICJhbGxvd2VkX2ltYWdlcyI6IFsKICAgICJwYXVzZSIsCgkicXVheS5pby9mZWRvcmEvZmVkb3JhQHNoYTI1Njo0ZDI5MTA0ZTRkNmYwZmI2ZmFkMDc5MmUxY2FiNmM0NGY1NzRmMmYzZDZmZjllMGRlNzczN2FiOWM4NmI5ZDk0IgogIF0KfQoKJycnCg==
spec:
runtimeClassName: kata-remote
containers:
- name: test
image: quay.io/fedora/fedora@sha256:4d29104e4d6f0fb6fad0792e1cab6c44f574f2f3d6ff9e0de7737ab9c86b9d94
command:
- sleep
- "36000"
securityContext:
privileged: false
seccompProfile:
type: RuntimeDefault
Shown here are two exec invocations, where one is blocked due to the policy.
# Successful exec
$ kubectl exec -it test - /bin/bash -c ls
afs boot etc lib media opt root sbin sys usr
bin dev home lib64 mnt proc run srv tmp var
# Blocked exec
$ kubectl exec -it test - /bin/bash -c ps
error: Internal error occurred: error executing command in container: failed to exec in container: failed to start exec "f76adbce6154b5bafe38daac7a1019bc3dc6c6af75f0f24d867caeac2ff7b21b": cannot enter container 2a36831a32e51b9e87bce5efb5759b9a1c87f0e3a1dcbea1fb66bb93389d3c46, with err rpc error: code = PermissionDenied desc = "ExecProcessRequest is blocked by policy: ": unknown
Likewise you can try other scenarios. For example you can change the container image and verify that the container creation fails with the new image.
Conclusion
The kata-agent policy feature provides a flexible mechanism to enforce security for the Kata API and is an essential building block for confidential containers. There is a bit of a learning curve involved when working with the policies, but it’s a powerful mechanism to secure the communication channel between kata runtime (untrusted) and kata-agent (trusted). If you have suggestions on usability, new features, etc., please do not hesitate to reach out.