CRI-O: Applying seccomp profiles from OCI registries



  • Author: Sascha Grunert

    Seccomp stands for secure computing mode and has been a feature of the Linux kernel since version 2.6.12. It can be used to sandbox the privileges of a process, restricting the calls it is able to make from userspace into the kernel. Kubernetes lets you automatically apply seccomp profiles loaded onto a node to your Pods and containers.

    But distributing those seccomp profiles is a major challenge in Kubernetes, because the JSON files have to be available on all nodes where a workload can possibly run. Projects like the Security Profiles Operator solve that problem by running as a daemon within the cluster, which makes me wonder which part of that distribution could be done by the container runtime.

    Runtimes usually apply the profiles from a local path, for example:

    apiVersion: v1
    kind: Pod
    metadata:
     name: pod
    spec:
     containers:
     - name: container
     image: nginx:1.25.3
     securityContext:
     seccompProfile:
     type: Localhost
     localhostProfile: nginx-1.25.3.json
    

    The profile nginx-1.25.3.json has to be available in the root directory of the kubelet, appended by the seccomp directory. This means the default location for the profile on-disk would be /var/lib/kubelet/seccomp/nginx-1.25.3.json. If the profile is not available, then runtimes will fail on container creation like this:

    kubectl get pods
    
    NAME READY STATUS RESTARTS AGE
    pod 0/1 CreateContainerError 0 38s
    
    kubectl describe pod/pod | tail
    
    Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
     node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
    Events:
     Type Reason Age From Message
     ---- ------ ---- ---- -------
     Normal Scheduled 117s default-scheduler Successfully assigned default/pod to 127.0.0.1
     Normal Pulling 117s kubelet Pulling image "nginx:1.25.3"
     Normal Pulled 111s kubelet Successfully pulled image "nginx:1.25.3" in 5.948s (5.948s including waiting)
     Warning Failed 7s (x10 over 111s) kubelet Error: setup seccomp: unable to load local profile "/var/lib/kubelet/seccomp/nginx-1.25.3.json": open /var/lib/kubelet/seccomp/nginx-1.25.3.json: no such file or directory
     Normal Pulled 7s (x9 over 111s) kubelet Container image "nginx:1.25.3" already present on machine
    

    The major obstacle of having to manually distribute the Localhost profiles will lead many end-users to fall back to RuntimeDefault or even running their workloads as Unconfined (with disabled seccomp).

    CRI-O to the rescue

    The Kubernetes container runtime CRI-O provides various features using custom annotations. The v1.30 release adds support for a new set of annotations called seccomp-profile.kubernetes.cri-o.io/POD and seccomp-profile.kubernetes.cri-o.io/<CONTAINER>. Those annotations allow you to specify:

    • a seccomp profile for a specific container, when used as: seccomp-profile.kubernetes.cri-o.io/<CONTAINER> (example: seccomp-profile.kubernetes.cri-o.io/webserver: 'registry.example/example/webserver:v1')
    • a seccomp profile for every container within a pod, when used without the container name suffix but the reserved name POD: seccomp-profile.kubernetes.cri-o.io/POD
    • a seccomp profile for a whole container image, if the image itself contains the annotation seccomp-profile.kubernetes.cri-o.io/POD or seccomp-profile.kubernetes.cri-o.io/<CONTAINER>.

    CRI-O will only respect the annotation if the runtime is configured to allow it, as well as for workloads running as Unconfined. All other workloads will still use the value from the securityContext with a higher priority.

    The annotations alone will not help much with the distribution of the profiles, but the way they can be referenced will! For example, you can now specify seccomp profiles like regular container images by using OCI artifacts:

    apiVersion: v1
    kind: Pod
    metadata:
     name: pod
     annotations:
     seccomp-profile.kubernetes.cri-o.io/POD: quay.io/crio/seccomp:v2
    spec: …
    

    The image quay.io/crio/seccomp:v2 contains a seccomp.json file, which contains the actual profile content. Tools like ORAS or Skopeo can be used to inspect the contents of the image:

    oras pull quay.io/crio/seccomp:v2
    
    Downloading 92d8ebfa89aa seccomp.json
    Downloaded 92d8ebfa89aa seccomp.json
    Pulled [registry] quay.io/crio/seccomp:v2
    Digest: sha256:f0205dac8a24394d9ddf4e48c7ac201ca7dcfea4c554f7ca27777a7f8c43ec1b
    
    jq . seccomp.json | head
    
    {
     "defaultAction": "SCMP_ACT_ERRNO",
     "defaultErrnoRet": 38,
     "defaultErrno": "ENOSYS",
     "archMap": [
     {
     "architecture": "SCMP_ARCH_X86_64",
     "subArchitectures": [
     "SCMP_ARCH_X86",
     "SCMP_ARCH_X32"
    
    # Inspect the plain manifest of the image
    skopeo inspect --raw docker://quay.io/crio/seccomp:v2 | jq .
    
    {
     "schemaVersion": 2,
     "mediaType": "application/vnd.oci.image.manifest.v1+json",
     "config":
     {
     "mediaType": "application/vnd.cncf.seccomp-profile.config.v1+json",
     "digest": "sha256:ca3d163bab055381827226140568f3bef7eaac187cebd76878e0b63e9e442356",
     "size": 3,
     },
     "layers":
     [
     {
     "mediaType": "application/vnd.oci.image.layer.v1.tar",
     "digest": "sha256:92d8ebfa89aa6dd752c6443c27e412df1b568d62b4af129494d7364802b2d476",
     "size": 18853,
     "annotations": { "org.opencontainers.image.title": "seccomp.json" },
     },
     ],
     "annotations": { "org.opencontainers.image.created": "2024-02-26T09:03:30Z" },
    }
    

    The image manifest contains a reference to a specific required config media type (application/vnd.cncf.seccomp-profile.config.v1+json) and a single layer (application/vnd.oci.image.layer.v1.tar) pointing to the seccomp.json file. But now, let's give that new feature a try!

    Using the annotation for a specific container or whole pod

    CRI-O needs to be configured adequately before it can utilize the annotation. To do this, add the annotation to the allowed_annotations array for the runtime. This can be done by using a drop-in configuration /etc/crio/crio.conf.d/10-crun.conf like this:

    [crio.runtime]
    default_runtime = "crun"
    
    [crio.runtime.runtimes.crun]
    allowed_annotations = [
     "seccomp-profile.kubernetes.cri-o.io",
    ]
    

    Now, let's run CRI-O from the latest main commit. This can be done by either building it from source, using the static binary bundles or the prerelease packages.

    To demonstrate this, I ran the crio binary from my command line using a single node Kubernetes cluster via local-up-cluster.sh. Now that the cluster is up and running, let's try a pod without the annotation running as seccomp Unconfined:

    cat pod.yaml
    
    apiVersion: v1
    kind: Pod
    metadata:
     name: pod
    spec:
     containers:
     - name: container
     image: nginx:1.25.3
     securityContext:
     seccompProfile:
     type: Unconfined
    
    kubectl apply -f pod.yaml
    

    The workload is up and running:

    kubectl get pods
    
    NAME READY STATUS RESTARTS AGE
    pod 1/1 Running 0 15s
    

    And no seccomp profile got applied if I inspect the container using crictl:

    export CONTAINER_ID=$(sudo crictl ps --name container -q)
    sudo crictl inspect $CONTAINER_ID | jq .info.runtimeSpec.linux.seccomp
    
    null
    

    Now, let's modify the pod to apply the profile quay.io/crio/seccomp:v2 to the container:

    apiVersion: v1
    kind: Pod
    metadata:
     name: pod
     annotations:
     seccomp-profile.kubernetes.cri-o.io/container: quay.io/crio/seccomp:v2
    spec:
     containers:
     - name: container
     image: nginx:1.25.3
    

    I have to delete and recreate the Pod, because only recreation will apply a new seccomp profile:

    kubectl delete pod/pod
    
    pod "pod" deleted
    
    kubectl apply -f pod.yaml
    
    pod/pod created
    

    The CRI-O logs will now indicate that the runtime pulled the artifact:

    WARN[…] Allowed annotations are specified for workload [seccomp-profile.kubernetes.cri-o.io]
    INFO[…] Found container specific seccomp profile annotation: seccomp-profile.kubernetes.cri-o.io/container=quay.io/crio/seccomp:v2 id=26ddcbe6-6efe-414a-88fd-b1ca91979e93 name=/runtime.v1.RuntimeService/CreateContainer
    INFO[…] Pulling OCI artifact from ref: quay.io/crio/seccomp:v2 id=26ddcbe6-6efe-414a-88fd-b1ca91979e93 name=/runtime.v1.RuntimeService/CreateContainer
    INFO[…] Retrieved OCI artifact seccomp profile of len: 18853 id=26ddcbe6-6efe-414a-88fd-b1ca91979e93 name=/runtime.v1.RuntimeService/CreateContainer
    

    And the container is finally using the profile:

    export CONTAINER_ID=$(sudo crictl ps --name container -q)
    sudo crictl inspect $CONTAINER_ID | jq .info.runtimeSpec.linux.seccomp | head
    
    {
     "defaultAction": "SCMP_ACT_ERRNO",
     "defaultErrnoRet": 38,
     "architectures": [
     "SCMP_ARCH_X86_64",
     "SCMP_ARCH_X86",
     "SCMP_ARCH_X32"
     ],
     "syscalls": [
     {
    

    The same would work for every container in the pod, if users replace the /container suffix with the reserved name /POD, for example:

    apiVersion: v1
    kind: Pod
    metadata:
     name: pod
     annotations:
     seccomp-profile.kubernetes.cri-o.io/POD: quay.io/crio/seccomp:v2
    spec:
     containers:
     - name: container
     image: nginx:1.25.3
    

    Using the annotation for a container image

    While specifying seccomp profiles as OCI artifacts on certain workloads is a cool feature, the majority of end users would like to link seccomp profiles to published container images. This can be done by using a container image annotation; instead of being applied to a Kubernetes Pod, the annotation is some metadata applied at the container image itself. For example, Podman can be used to add the image annotation directly during image build:

    podman build \
     --annotation seccomp-profile.kubernetes.cri-o.io=quay.io/crio/seccomp:v2 \
     -t quay.io/crio/nginx-seccomp:v2 .
    

    The pushed image then contains the annotation:

    skopeo inspect --raw docker://quay.io/crio/nginx-seccomp:v2 |
     jq '.annotations."seccomp-profile.kubernetes.cri-o.io"'
    
    "quay.io/crio/seccomp:v2"
    

    If I now use that image in an CRI-O test pod definition:

    apiVersion: v1
    kind: Pod
    metadata:
     name: pod
     # no Pod annotations set
    spec:
     containers:
     - name: container
     image: quay.io/crio/nginx-seccomp:v2
    

    Then the CRI-O logs will indicate that the image annotation got evaluated and the profile got applied:

    kubectl delete pod/pod
    
    pod "pod" deleted
    
    kubectl apply -f pod.yaml
    
    pod/pod created
    
    INFO[…] Found image specific seccomp profile annotation: seccomp-profile.kubernetes.cri-o.io=quay.io/crio/seccomp:v2 id=c1f22c59-e30e-4046-931d-a0c0fdc2c8b7 name=/runtime.v1.RuntimeService/CreateContainer
    INFO[…] Pulling OCI artifact from ref: quay.io/crio/seccomp:v2 id=c1f22c59-e30e-4046-931d-a0c0fdc2c8b7 name=/runtime.v1.RuntimeService/CreateContainer
    INFO[…] Retrieved OCI artifact seccomp profile of len: 18853 id=c1f22c59-e30e-4046-931d-a0c0fdc2c8b7 name=/runtime.v1.RuntimeService/CreateContainer
    INFO[…] Created container 116a316cd9a11fe861dd04c43b94f45046d1ff37e2ed05a4e4194fcaab29ee63: default/pod/container id=c1f22c59-e30e-4046-931d-a0c0fdc2c8b7 name=/runtime.v1.RuntimeService/CreateContainer
    
    export CONTAINER_ID=$(sudo crictl ps --name container -q)
    sudo crictl inspect $CONTAINER_ID | jq .info.runtimeSpec.linux.seccomp | head
    
    {
     "defaultAction": "SCMP_ACT_ERRNO",
     "defaultErrnoRet": 38,
     "architectures": [
     "SCMP_ARCH_X86_64",
     "SCMP_ARCH_X86",
     "SCMP_ARCH_X32"
     ],
     "syscalls": [
     {
    

    For container images, the annotation seccomp-profile.kubernetes.cri-o.io will be treated in the same way as seccomp-profile.kubernetes.cri-o.io/POD and applies to the whole pod. In addition to that, the whole feature also works when using the container specific annotation on an image, for example if a container is named container1:

    skopeo inspect --raw docker://quay.io/crio/nginx-seccomp:v2-container |
     jq '.annotations."seccomp-profile.kubernetes.cri-o.io/container1"'
    
    "quay.io/crio/seccomp:v2"
    

    The cool thing about this whole feature is that users can now create seccomp profiles for specific container images and store them side by side in the same registry. Linking the images to the profiles provides a great flexibility to maintain them over the whole application's life cycle.

    Pushing profiles using ORAS

    The actual creation of the OCI object that contains a seccomp profile requires a bit more work when using ORAS. I have the hope that tools like Podman will simplify the overall process in the future. Right now, the container registry needs to be OCI compatible, which is also the case for Quay.io. CRI-O expects the seccomp profile object to have a container image media type (application/vnd.cncf.seccomp-profile.config.v1+json), while ORAS uses application/vnd.oci.empty.v1+json per default. To achieve all of that, the following commands can be executed:

    echo "{}" > config.json
    oras push \
     --config config.json:application/vnd.cncf.seccomp-profile.config.v1+json \
     quay.io/crio/seccomp:v2 seccomp.json
    

    The resulting image contains the mediaType that CRI-O expects. ORAS pushes a single layer seccomp.json to the registry. The name of the profile does not matter much. CRI-O will pick the first layer and check if that can act as a seccomp profile.

    Future work

    CRI-O internally manages the OCI artifacts like regular files. This provides the benefit of moving them around, removing them if not used any more or having any other data available than seccomp profiles. This enables future enhancements in CRI-O on top of OCI artifacts, but also allows thinking about stacking seccomp profiles as part of having multiple layers in an OCI artifact. The limitation that it only works for Unconfined workloads for v1.30.x releases is something different CRI-O would like to address in the future. Simplifying the overall user experience by not compromising security seems to be the key for a successful future of seccomp in container workloads.

    The CRI-O maintainers will be happy to listen to any feedback or suggestions on the new feature! Thank you for reading this blog post, feel free to reach out to the maintainers via the Kubernetes Slack channel #crio or create an issue in the GitHub repository.



    https://kubernetes.io/blog/2024/03/07/cri-o-seccomp-oci-artifacts/

Log in to reply
 

© Lightnetics 2024