Stuart Barnett
Cloud Architect Lead
Well, there are probably better opening lines at parties (remember them?) – but it’s hard to disagree with the success of the Kubernetes platform in the modern IT landscape (and likely to cause fewer disagreements than debating the headline of this blog). Since Google first open-sourced it in 2015, Kubernetes has gone from strength to strength, becoming the de facto container orchestration platform of choice, available in multiple distributions and supported by all of the major CSPs.
Its flexible resource model has seen the ecosphere expand with a dizzying array of tools and custom resources – indeed there’s not much you can’t do with k8s these days – but then, that’s part of the problem. The open nature of the platform means, out of the box, it’s not very proscriptive – you want to pull in a random container image from the interweb? Help yourself. A pod that can read settings in another pod? Sure, why not. A pod that accesses the node file system? Yes, ok then. A pod that can elevate its own privileges and do it what it darn well pleases, even spin up other processes…? – well, you get the idea. It’s the kind of thing that is the stuff of nightmares to platform operators and CISOs alike. (Note – there may be very valid reasons for some special workloads to have these privileges – i.e. monitoring daemonsets, cluster processes and the like – but generally not tenant workloads i.e. your teams apps).
As we move to a world where organizations may be operating many, many clusters across multiple cloud environments, with 100s of different tenants deploying thousands of pods, not knowing and controlling what these workloads might be capable of doing becomes a very big problem indeed. You can mitigate some of these issues with Pod Security Policies*, Network Policies etc (you could start by reading this post by my colleague Josh Hill), but it can all start to get rather complicated very quickly. Ideally, you want to be working alongside the security pros in your org to make sure they’re happy with what workloads are allowed to do on your clusters. And you want those controls to be in place across all of your clusters as you spin them up – and hopefully stopping your InfoSec team wanting to shut them down once they hear about another k8s CVE…
[* STOP PRESS – as of Kubernetes 1.21 – PodSecurityPolicies will be deprecated and subsequently removed – so we’d better start looking at some replacements!]
So how do you manage a platform that allows developers to make the most of all the advantages of Kubernetes, whilst ensuring we have the necessary guard rails in place; how can we ensure we give visibility to our DevSecOps colleagues of what controls we have in place and keep the CISOs happy? The answer is – use Kubernetes! Specifically, by utilizing a policy engine.
Yes, many of us with an ill-spent youth may have fallen victim to the enforcement of a dress code as an admission policy (e.g. “sorry, no trainers”, normally translated as “you’re not cool/rich/beautiful/sober enough to come in here tonight”. But I digress…).
In Kubernetes, we can implement a similar mechanism, by “looking” at how a pod is specified and intends to act before deciding whether or not to create it within our cluster. Because everything in Kubernetes is declarative – i.e. all k8s objects can be defined as YAML – any pods that we wish to create are declared in a pod specification, i.e. the yaml that defines the pod. When we create objects like pods in Kubernetes, this involves sending the declared configuration to the Kubernetes API for creation – but typically these requests have to pass through a series of standard Kubernetes admission controllers first, checking for compliance against a number of criteria as specified in the cluster configuration.
We can extend this behaviour by using an admission webhook to check the pod specification against a set of custom constraints before any deployment takes place – if these are violated, we can log the reasons and block the creation of the non-compliant pod – hence we can subject all pods being deployed to our desired controls.
A policy is simply a declaration of permitted behaviour for pods being created within a cluster – e.g. “ensure the pod has a label”, “don’t allow pods to escalate their privileges”, “only use https for ingress” etc. Providing we have a way of defining our required policies, we can get our validating webhook to check the specs against these policies, and permit/block pod creation appropriately. As this is Kubernetes, it makes sense that we define these policies as …. custom Kubernetes objects! (these are known as Custom Resource Definitions, or CRDs). The process that implements constraints from policy definitions and validates workload specifications against the defined policies is known as a Policy Engine.
There are a couple of open source alternatives out there which you can deploy in your k8s clusters: Open Policy Agent Gatekeeper (open sourced by Styra) has been around longest, and was recently joined by Kyverno (open sourced by Nirmata). They have slightly different feature sets (you can read a nice comparison here) but operate in very similar manners – you write the policies you want applied as yaml files, and use them to create the appropriate CRDs for the Policy Engine to validate any new admissions to the cluster, as detected by the admission controller.
Importantly, as policies are written in plain text, they can be stored and versioned in git, just like any other code or configuration, whereupon they can be reviewed, audited, subject to PRs etc.. Even better, as they are Kubernetes objects defined in YAML and reference k8s primitives like pods and namespaces,, they will be familiar to anyone who’s used to dealing with Kubernetes. We can even use tools such as Kustomize to maintain base configurations or patch them if we wish.
In Gatekeeper, these policies are implemented in two parts, both CRDs – a constraint template and a constraint. The former defines a generic parameterized policy on the pod spec, with logic written in a special DSL known as rego. The actual policy is applied by creating a constraint that refers to the template with the appropriate parameters (if you’re a programmer – think of it like an object instance of a class).
Fortunately for those of us who don’t know rego yet, there are example libraries of constraint templates available for use- including this one which checks for disallowed image tags for containers (K8sDisallowedTags). So assuming we have this template available on out policy engine library, say we want to prevent containers using images tagged as ‘latest’; we need to write a constraint that refers to the template supplying the appropriate parameters.
Our constraint refers to the installed template, and specifies the tags we wish to disallow (i.e. “latest”) – we create this CRD in our cluster et voila – we have created an admission policy! Now, any time someone tries to deploy a pod using an image with a ‘latest’ tag, the deployment will be blocked and an appropriate event logged within Kubernetes.
Of course – it’s still on you to create these policies in git, and subsequently pull and apply these to the clusters you are managing – what would be really nice would be if you could somehow guarantee that these could be rolled out to all of your current clusters as soon as you update them in Git – well, step forward ACM Policy Controller. This forms part of Google’s Anthos stack, specifically as a component of Anthos Config Management (ACM), a solution for providing GitOps-driven configuration across fleets of managed k8s clusters. It provides an implementation of the aforementioned OPA Gatekeeper, integrated with ACM, so you can deploy your policies automatically to your clusters simply by committing them to a configured git repository, utilising the integrated Google Config Sync. (In the interest of fairness, I should point out that Azure provides a Gatekeeper integration for their AKS offerings, but IMHO nowhere near as neat as ACM).
Whichever solution you choose, using a policy engine is a quantum leap in terms of maintaining a consistent security posture across your cluster fleet. Teams can use the same policies in their CI pipelines to “shift left” and detect any potential issues in their deployments before they deploy to production. DevSecOps teams can review the current controls and compare them against known and breaking vulnerabilities to ensure their security posture is uptodate, updating/adding new controls as necessary.
This really happened recently! A leading financial institution was looking to deploy their first k8s workloads to a public cloud when, just before launch, the following CVE was detected, – with no upgrade available at that time, the rollout was blocked. Because they were using Anthos Policy Controller, they were able to mitigate the CVE almost immediately by applying an appropriate policy to ensure NET_RAW capabilities must be dropped by pods, and rolling it out to the clusters. Result? InfoSec satisfied, release saved!
We’ll be talking more about configuration management, “Configuration as Data” and all things Anthos in future blogs, so stay tuned – but, for now…if you’re running k8s clusters in production, and you want to have more control over what capabilities you’re allowing (and, more importantly, not allowing) for your workloads, you might want to start looking into policy engines. You might end up being better friends with your developers AND your DevSecOps/InfoSec teams….. and you might even get an invite to one of their parties. Well, maybe.