Kubernetes Security Blog | RAD Security

What is Defense in Depth & How Is it Implemented? KSPM Best Practices Part III

Written by Jimmy Mesta | Aug 22, 2023 4:01:14 PM

Intro

This article continues our in-depth look at Kubernetes Security Posture Management. In the first post of this series, we discussed basic Kubernetes hardening with KSPM to protect against the most common attack vectors. In the second post, we examined ways to augment those measures to ensure your security posture is able to detect and respond in Kubernetes against actual incidents and/or emerging threats. In this post, we are going to examine Kubernetes security posture measures that provide “defense-in-depth” to your cluster.

 

What is defense in depth

Defense-in-depth is a cyber security concept adapted from military strategy. A defending force doesn’t amass all its defenses in one place or along a single line, it develops “layers” of defense designed to slow down an attacking force and make forward advance too difficult or costly to continue. In a cyber security context, we can think of defense-in-depth similarly: our goal is to slow down a cyber adversary, making them work hard for each new “exploit” they might attempt in our environment. The more difficult it is for an adversary to maneuver in our environment, the more likely we are to detect, contain, and evict them before they can do real damage. The measures outlined in this post are designed for exactly that purpose: slow the adversary down and give defenders the maximum chance to respond.

 

Common Kubernetes Misconfigurations

 

Protect Your East-West Traffic

While a Kubernetes cluster presents a logical security boundary, it’s important to remember that a cluster is an abstraction over a collection of hosts and the network topologies linking them together. This means that the attack surface for your cluster and the workloads running on it is not limited to the exposed interfaces of those workloads or the Kubernetes API, it also includes the attack surfaces of the underlying hosts and networks on which they run. These present both potential initial access vectors and opportunities for lateral movement. We can substantially decrease this attack surface by using a service mesh to (a) encrypt traffic between services in our cluster (preventing Man-in-the-Middle or replay attacks), (b) mutually authenticate those services to one-another (preventing spoofing attacks), and (c) define limits on what services can communicate with one another (cutting down the potential avenues for lateral movement). A service mesh also gives us another window of visibility into the traffic across our cluster, improving our odds at finding attempts at lateral movement.

Crawl: Go ahead and install that service mesh: The basic deployment of a service mesh will generally bring you encrypted East-West traffic and mutual authentication right out of the box. This isn’t quite as easy as hitting deploy: the services running on your cluster may need some tweaking to work well with the service mesh, but the mesh itself shouldn’t need any modification to bring you these benefits.

Walk: Collect service mesh logs: One benefit of the service mesh is that it provides network log visibility for your cluster. This can be invaluable both in real-time detection and in investigating incidents (ie, the responsiveness we discussed in the last post).

Run: Require apps to define/restrict network connections: The real defense-in-depth benefit from a service mesh comes from using the mesh to restrict allowed network connections on an app-by-app or service-by-service basis. This means that a compromised service can only establish connections with a defined list of other services, limiting the blast radius and the opportunities an attacker has for lateral movement. It also increases the likelihood of detection by creating lots of opportunities for attackers to get it wrong, trying to access services they aren’t allowed to touch and generating noise in the process.

 

Protect Key Configuration Files

Kubernetes orchestrates workloads by maintaining a list of API objects representing the “desired state” of your cluster. Then, by constantly comparing that desired state with your cluster’s actual state, it “orchestrates” the various systems under its control (container runtimes, networking, etc) to achieve (or at least move closer to) that desired state. Kubernetes uses various configuration files on both control plane and worker nodes to link the cluster together and manage those underlying systems it is orchestrating. Protecting those files from tampering or modification is therefore an important defense-in-depth move to prevent an attacker who already has a foothold in your cluster from being able to escalate their privileges or modify the cluster’s intended state or behavior. The recommended way to do this is by restricting write access on those critical configuration files to the root user. 

Crawl: Manually harden critical files: You can literally do this by hand on each node, or you can use a configuration management system like Ansible to apply this hardening across your whole cluster.

Walk: Use hardened node images: Move the process of hardening these files back a layer by baking the hardening of critical files into your image generation process. This ensures the files are hardened from the start when new nodes are deployed.

Run: Set up monitoring for attempts to modify critical files: A clumsy attacker who attempts to modify these files should be easy to detect. A more clever attacker will notice the restricted permissions first and attempt an escalation to root. Ideally, a host-based detection system (ie, an EDR) will detect either.

 

How KSOC can help

This kind of issue will be picked up by KSOC’s Detection & Response capabilities, which combine runtime events with other Kubernetes risks to identify active exploitation.  

Below is an example of a rule that would pick up on this kind of a runtime event:

 

Kubernetes RBAC

 

Use an External Secret Store

The most common end-goal of an attacker is to access sensitive information. Whether they intend to lock up that sensitive information (ie, a ransomware attack) or steal it (a data disclosure or theft incident), they ultimately need access to the information first. An important way you can further insulate the data in your cluster from such access is to ensure the keys, passwords, and certificates that protect that data are themselves protected. Kubernetes has a native object for “secrets” of this kind to prevent them being committed openly in your code. However the default for those secret objects is to still be available in plaintext in the cluster, so you need to take additional steps to protect them from prying eyes.

Crawl: Use K8s Secrets: The first step is to actually use Kubernetes secrets objects for storing secrets rather than simply writing those secret values into a config map or other configuration file. This might sound like it doesn’t need to be said, we promise you it does.

Walk: Protect secrets through encryption and RBAC: You can use RBAC to restrict which Kubernetes users can view and/or modify secrets (ideally, there should be no modification because you’re using GitOps). However, remember (see “Protect Key Configuration Files” above) that Kubernetes is only a logical security boundary. If your secrets aren’t encrypted, a user on the host can simply cat the files and view the contents. So you should also encrypt those secrets in the cluster to ensure your RBAC rules hold water.

Run: Use an external secrets operator: Even if you are encrypting and restricting access to secrets, they have to get into your cluster somehow. This provides another possible line of attack to extract your secrets. A further step you can take to defend secrets in your cluster is to store them in a secure secrets backend (the most well known option probably being Hashicorp’s Vault). Then you can use an operator to inject those (encrypted) values into your (RBAC protected) secrets objects in the cluster.

 

How KSOC can help; RBAC for secrets

With KSOC, in seconds you can get a view of RBAC permissions, organized by ‘who can access secrets’ (and other top of mind questions like ‘who can exec into pods’).

In KSOC, you can also set policies to receive alerts and notifications when RBAC is over-permissioned to allow access to secrets. See the following example of a policy limiting RBAC controls that has been violated:

Followed by the detailed information to help with remediation:

 

Don’t Use Default Service Accounts

If you don’t specify when creating a deployment, Kubernetes will use the “default” service account. The security problem this presents is that, if every deployment is done this way, there is one service account to rule them all. This means that an attacker who successfully gets access to the service account for one deployment now has access to every workload running in your cluster. That’s not good! Instead, deployments should specify a unique service account to run under so that the blast radius of a compromised service account is as small as possible.

Crawl: Take an audit of service accounts: If it has not already been your practice to use separate service accounts for different workloads, you’ll want to start by performing an audit. This may help you identify the highest priorities for remediation.

Walk: Use your admissions controller to enforce a rule: Having a defined service account is something you can check for with your admissions controller. This prevents new workloads from being deployed with the default service account.

Run: Create a CICD check: The thing about using an admissions controller as the only enforcement point is that it’s very annoying to developers to be blocked during deployment. Shifting left by providing a CICD check for not using the default service account helps developers address this problem well before its time to deploy.

How KSOC can help with default service accounts 

KSOC admission control can be used to enforce a policy that prevents workloads being deployed within a default service account. For example, this admission control policy example that would detect whether a default service account is used in a role binding:

 

Don’t Use Default Namespaces

Similarly to the above, if not specified, Kubernetes will put deployed resources into the default namespace. The implications of this one are a bit more nuanced, however. If all workloads are deployed in the default namespace, this could mean that users with roles provisioned in the default namespace have access to all your workloads, which is perhaps less than ideal from a defense-in-depth perspective. There are two ways to address this. 

First, you can add additional restrictions to roles limiting which resources they can access. This is a viable strategy for maintaining RBAC, but it can be tedious to maintain the right lists of restrictions in those roles. 

A second option is to deploy workloads in different namespaces. This effectively forces the deployment of additional roles on a namespace-by-namespace (or workload-by-workload) basis, but those roles might be easier to write. However, the namespace boundary is a somewhat permeable one. For example, some cluster resources (ie, Nodes) always exist globally across the cluster and cannot be “namespaced.” This is because namespaces in Kubernetes were not meant to function as a true security boundary but rather as a way of avoiding naming collisions. For example, if every app you deploy has a mysql database service, separating out deployments into different namespaces prevents failures due to two teams naming theirs simply mysql. Using namespaces to create a security boundary works to a point (with the caveats we’ve given) and is a reasonable defense-in-depth strategy. 

That said, if you need iron-clad divisions between certain workloads, you’re going to want to run them in separate clusters, not just separate namespaces. Our recommendations here are basically the same as we gave regarding service accounts above:

Crawl: Take an audit of namespaces: This is also a good opportunity to identify the security boundaries you want to enforce.

Walk: Use your admissions controller to enforce a rule: See above.

Run: Create a CICD check: See above.

 

How KSOC can help with default service accounts

KSOC RBAC Explorer provides an easy way to audit namespaces for RBAC permissions. Simply find the namespace you want to look at and the associated kind of RBAC permissions that are available.

Below you will choose your namespace:

Here you can see all of the resources and permissions associated with that namespace.

With KSOC, you can also use Github Actions to enforce the use of namespaces as part of a CI check.

 

Container Runtime Security

 

Pull Images Only From Trusted Registries

Kubernetes runs workloads in containers, including many of the “workloads” that make up the cluster’s own control plane. While some things can be done cluster-wide to secure these containers (such as not running containers in “privileged” mode), ultimately how secure any given workload is will largely depend on how secure its underlying container image is. Trusting the security of those container images is a supply chain problem, and solving it largely requires trusting the sources of those container images. This becomes a defense in depth measure because it is much harder for an attacker to deploy a malicious container into your cluster if that container must originate from particular places (especially if those sources are internal to your organization). Setting limits on what registries your cluster can pull images from substantially reduces an adversary’s options.

Crawl: Identify the registries you trust: The prerequisite to establishing limits is knowing what those limits are going to be. This may require auditing existing images in your cluster and their sources.

Walk: Verify image signatures on admission: While not a direct limitation on the registries images can be sourced from, this does limit the images from those registries that can be pulled to ones that are validly signed. Requiring that images be signed and then verifying those signatures on admission cuts out certain classes of supply chain attacks, but it can be “defeated” if the malicious code an attacker wants is itself signed (or included in a validly signed image).

Run: Limit registries via Admissions Control or Network controls: You can write an admissions control rule that checks whether images are pulled from certain registries. This could either (a) block images from certain “untrusted” registries or, (b) only allow images from certain trusted registries (a more ideal state). The regex for these rules can be tricky, so another approach (particularly when allow-listing) can be to use network controls (ie, a hosts file or an egress proxy) to limit what registries the cluster can connect with.

How KSOC can help 

KSOC admission control can limit the usage of unauthorized registries. Below is an example Rego policy, compatible with OPA, that would control the registry used.

 

Defense in depth using KSOC threat vectors

Threat vectors and automated risk triage are the ultimate defense-in-depth tools for Kubernetes environments, allowing you to get an understanding of risk across runtime, RBAC, image CVEs, Kubernetes misconfigurations, the cloud environment and more all in one view. This gives you clear guidance on how to reduce your blast radius and slow down or altogether stop an attacker.

For the Protect Key Configuration Files guidance above, threat vectors would show runtime alerts around privilege escalation attempts, or attempts to modify critical files in combination with the presence of other risks, like a manifest misconfiguration. So it would be very easy to identify where this has potentially happened in your environment.

For the External Secrets Store guidance above, here a threat vector is showing us that a cluster role with access to secrets is exposed:

and we can clearly identify the associated pod:

Conclusion

Defense in depth with KSPM clearly involves Kubernetes hardening, as well as elements involved in detection and response for Kubernetes. Combined with admission control and policy changes through Github Actions in the CI, threat vectors and automated risk triage allow you to address Kubernetes security at scale, making an overwhelming security task something that is manageable so you can cover your KSPM bases. For more KSPM guidance, we recommend learning about how real-time KSPM makes configuration findings actionable through a connection to the Kubernetes lifecycle.

 

To discuss implementing defense in depth with KSPM, contact us for a demo today!