ALL THINGS KUBERNETES

Defining Privileges and Access Control Settings for Pods and Containers in Kubernetes

In the recent tutorial, we discussed Secrets API designed to encode sensitive data and expose it to pods in a controlled way, enabling secrets encapsulation and sharing between containers.

However, Secrets are only one component of the pod- and container-level security in Kubernetes. Another important dimension is a security context that facilitates management of access rights, privileges, and permissions for processes and filesystems in Kubernetes.

In this tutorial, we’ll discuss how to set access rights and privileges for container processes within a pod using discretionary access control (DAC) and ensuring proper isolation of container processes from the host using Linux capabilities. By the end of this tutorial, you’ll know how to limit the ability of containers to negatively impact your infrastructure and other containers and limit access of users to sensitive data and mission-critical programs in your Kubernetes environment. Let’s get started!

Defining Security Context

A security context can be defined as a set of constraints applied to a container in order to achieve the following goals:

  • Enable a distinct isolation between a container and the host/node it runs on. Many users of containers underestimate this task and think that containers are properly isolated from hosts like virtual machines (VMs). The reality is different though. Privileged processes (e.g., running as root) running in the container are identical to privileged processes that run on the host. Therefore, running an application in the container does not isolate it from the host. Running containers as root can cause serious problems if Docker images from untrusted sources are used.
  • Prevent containers from negatively impacting the infrastructure or other containers.

These basic goals necessitate the following best practices for using security contexts in Kubernetes:

  • Drop process privileges in containers as quickly as possible or be aware of them.
  • Run services as non-root whenever possible.
  • Don’t use random Docker images in your system.

Security contexts in Kubernetes facilitate implementation of this task and help protect your system against various security risks. We’ll discuss below how to achieve the goals outlined above by using PodSecurityContext  and SecurityContext  in your pods and containers.

Tutorial

To complete examples in this tutorial, you’ll need:

  • A running Kubernetes cluster. See Supergiant documentation for more information about deploying a Kubernetes cluster with Supergiant. As an alternative, you can install a single-node Kubernetes cluster on a local system using Minikube.
  • A kubectl command line tool installed and configured to communicate with the cluster. See how to install kubectl here.

Using Security Contexts in Pods and Containers

Security context settings implement basic philosophy of discretionary access control (DAC). This is a type of access control in which a given user has complete control over all programs it owns and executes. This user can also determine the permissions of other users for accessing and modifying these files or programs. DAC contrasts with mandatory access control (MAC) by which the operating system (OS) constraints the ability of a subject (e.g., process) or initiator to access or perform some operations on computing objects (e.g., files),

In Kubernetes, using DAC implies that you, as a user or administrator, can set access and permission constraints on files and processes ran in your pods and containers. Security contexts can be specified for the entire pods and/or for individual containers.

Let’s first start with the pod-level security context. To specify security settings for a pod, you need to include the securityContext  field in the pod manifest. This field is a PodSecurityContext  object that saves security context in the Kubernetes API. Let’s create a pod with a security context using the example below. This is a pod that runs a simple Node.js application that we wrote and saved in the public Docker Hub repository.

As you can see, we have two security contexts in this pod. The first one is a pod-level security context defined by the PodSecurityContext  object, and the second one is a SecurityContext  defined for the individual container. Pod-level security context works for all individual containers in the pod, but, field values of container.securityContext  take precedence over field values of PodSecurityContext . In other words, if the container-level security context is defined, it overrides the pod-level security context.

You now have a basic understanding of how security contexts work, so let’s discuss key settings available for the PodSecurityContext :

.spec.securityContext.runAsUser  — This field specifies the User ID (UID) with which to run the Entrypoint (default executable of the image) of the container process. If the field value is not specified, it defaults to the UID defined in the image metadata. The discussed field can be also used in the spec.containers[].securityContext  , in which case it takes precedence over the same field in the PodSecurityContext . In our example, the field specifies that for any containers in the pod, the container process runs with user ID 2500 .

.spec.securityContext.fsGroup  — The field defines a special supplemental group that assigns a group ID (GID) for all containers in the pod. Also, this group ID is associated with the emptyDir  volume mounted at /data/test  and with any files created in that volume. You should remember that only certain volume types allow the kubelet  to change the ownership of a volume to be owned by the pod. If the volume type allows this (as emptyDir  volume type) the owning GID will be the fsGroup .

.spec.securityContext.runAsGroup  — This field is useful in cases when you want to run the entrypoint of the container process by a group rather than a user. In this case, you can specify a GID for that group using this field. If the field is not set, the image default will be used. If the field is set both in SecurityContext  and PodSecurityContext , the value specified in the container’s SecurityContext  takes precedence over the one specified in the PodSecurityContext .

.spec.securityContext.runAsNonRoot  — The field determines whether the pod’s container should run as a non-root user. If set to true, the kubelet  will validate the image at runtime to make sure that it does not run as UID 0 (root) and won’t start the container if it does. If set in both SecurityContext  and PodSecurityContext , the value specified in SecurityContext takes precedence. The discussed field is very important for preventing privileged processes in containers from accessing the system and the host.

Now, as you understand key options for PodSecurityContext , save the spec above in security-context-demo.yaml  and create the Pod:

Now, verify that the pod is running:

Next, we will check the ownership of processes run within the Node.js container. First, get a shell to the running container:

Inside the container, list all running processes:

Awesome! The output above shows that all processes in the container are run by the UID 2500  as we expected.

Remember that we set the GID for all containers and volumes in our Pod? Let’s check how it worked out. Go to the /data  directory in the container’s filesystem root and list the permissions of the /test  directory inside it:

You should see something like this:

The output shows that the /data/demo  directory has group ID 2000, which is the value of fsGroup .

Hypothetically, all new files and directories will also receive the GID defined by the fsGroup . Let’s check if this is true:

Now, check the file’s ownership:

As you see, the demofile  has a group ID 2000, which is the value of fsGroup . As simple as that!

Overriding Pod Security Context in the Container

As we’ve already mentioned, a container’s SecurityContext  takes precedence over the PodSecurityContext . Therefore, you can set a pod-level security context for all containers in the pod and override it if needed by modifying a SecurityContext  for individual containers. Let’s create a new pod to see how this works:

This pod runs the container with the same Docker image as in the example above, but this time UID to run the process with is specified both for the pod and the container inside it.

Before creating this Pod, let’s discuss key options available in the container’s SecurityContext :

.spec.containers[]securityContext.runAsUser  — The same as in the PodSecurityContext

.spec.containers[]securityContext.runAsGroup  — The same as in the PodSecurityContext

.spec.containers[]securityContext.runAsNonRoot  — The same as in the PodSecurityContext

.spec.containers[].securityContext.allowPrivilegeEscalation  — This field controls whether a process can get more privileges than its parent process. More specifically, it controls whether the no_new_privs  flag will be set on the container process. AllowPrivilegeEscalation  is always true when the container is: (1) run as Privileged (2) has a CAP_SYS_ADMIN  Linux capability enabled.

.spec.containers[].securityContext.privileged  — The field tells kubelet  to run the container in the privileged mode. Processes in privileged containers are essentially identical to root processes on the host. The default value is false.

.spec.containers[].securityContext.readOnlyRootFilesystem  — Defines whether a container has a read-only root filesystem. The default value is false.

.spec.containers[].securityContext.seLinuxOptions  — The SELinux context to be applied to the container. If the value is unspecified, the container runtime (e.g., Docker) will assign a random SELinux context for each container in a pod. If the value is set in both SecurityContext  and PodSecurityContext , the value specified in SecurityContext  takes precedence.

Save this spec in the override-security-demo.yaml  and create the pod running the following command:

Next, verify that the pod is running:

Then, as in the first example, get a shell to the running container to check the ownership of container processes:

Inside the container, show the list of running processes:

As you see, all the processes are run with the UID 2000  which is the value of runAsUser  specified for the Container. It overrides the UID value 3000  specified for the pod.

Using Linux Capabilities

If you want a fine-grained control over process privileges, you can use Linux capabilities. To understand how they work, we need a basic introduction to the Unix/Linux processes. In a nutshell, traditional Unix implementations have two classes of processes: (1) privileged processes (whose user ID is 0, referred to as root or as superuser) and (2) unprivileged processes (that have a non-zero UID).

In contrast to privileged processes that bypass all kernel permission checks, unprivileged processes have to pass full permission checking based on the process’s credentials such as effective UID, GID, and supplementary group list). Starting with kernel 2.2, Linux has divided privileged processes’ privileges into distinct units, known as capabilities. These distinct units/privileges can be independently assigned and enabled for unprivileged processes introducing root privileges to them. Kubernetes users can use Linux capabilities to grant certain privileges to a process without giving it all privileges of the root user. This is helpful for improving container isolation from the host since containers no longer need to write as root — you can just grant certain root privileges to them and that’s it.

To add or remove Linux capabilities for a container, you can include the capabilities  field in the securityContext section of the container manifest. Let’s see an example:

In this example, we assigned a CAP_NET_ADMIN  capability to the container. This Linux capability allows a process to perform various network-related operations such as interface configuration, administration of IP firewall, modifying routing tables, enabling multicasting, etc. For the full list of available capabilities, see the official Linux documentation.

Note: Linux capabilities have the form CAP_XXX . However, when you list capabilities in your Container manifest, you must omit the CAP_ part of the constant. For example, to add CAP_NET_ADMIN  capability, include SYS_TIME  in your list of capabilities.

Cleaning Up

As this tutorial is over, let’s clean after ourselves.

Don’t forget to delete all pods:

Also, you may wish to delete all files with the pod manifests if you don’t need them anymore.

Conclusion

In this article, we have discussed how to use Kubernetes security contexts in your pods and containers. Security contexts are a powerful tool for controlling access rights and privileges of processes running in the pod’s containers. Kubernetes allows setting a pod-level security context for all containers and overriding it by the individual containers using SecurityContext  manifest.

Kubernetes security contexts are also helpful if you want to isolate container processes from the host. In particular, you learned how to use Linux capabilities to grant certain root privileges to processes allowing them to run as non-root while giving them root privileges necessary for them to work. All these features make Kubernetes security context a powerful addition to Kubernetes secrets that allow improving the security of your Kubernetes application and proper isolation of container environment from other users and underlying nodes.