DevSecOps in AKS

Implementing DevSecOps principles in AKS | Lume

How To Implement DevSecOps in AKS

Security is no longer something you only think about after development, right before a release. Especially with containers and Kubernetes in cloud-native stacks, it’s become an integral part of the entire development process. This shift is often called “Shift Left”, building security into your process as early as possible. How do you actually achieve that shift in a CI/CD pipeline, though? In this blog, we’ll explain how to handle each phase, and what tools you’ll need.

Scan code before deployment
Enforce policies on the cluster
Continuously monitor at runtime

Yves Van Stappen - Cloud Architect

Phase 1: Scanning before deployment (pre-deployment analysis)

The first, and arguably most important, step in your DevSecOps pipeline, is analyzing your code and artifacts before they ever hit your AKS cluster. Catching issues early saves both time and money down the line.

The pre-deployment phase includes several types of scans:

IaC static analysis
Manifests and Helm charts
Container image scans
Dependency scanning

Start with static analysis of your Infrastructure as Code (IaC). Tools, such as Checkov, scan your IaC (Terraform, Bicep, ARM) for known mis-configurations and security risks before any infrastructure is provisioned. That way, you make sure your foundation is secure from the very start.

Up next, it’s time to take a closer look at your Kubernetes manifests and Helm charts. Unsafe settings can creep in here too. Think of containers that are unnecessarily running as root. Tools, such as KubeSec or Trivy (and Checkov as well), can help detect those issues. They’ll also flag secrets that are accidentally stored in plain text or Base64 in your manifests; something you definitely want to avoid. We dive deeper into this in our blog on secret management in AKS.

Another important check is the container image scan. Your images are built from multiple layers, including OS packages and application libraries that could contain known vulnerabilities (CVEs). By scanning your images with tools like Trivy or Snyk before pushing them to a registry, you’ll know what risks you’re inheriting. You can then decide whether to patch or block the image.

Finally, there’s dependency scanning. Modern apps use tons of external libraries and packages, which may themselves have vulnerabilities. Trivy can help here by generating a Software Bill of Materials (SBOM). This is a full list of your (in)direct dependencies. Tools, such as Dependabot, can alert you to known issues in those dependencies and even propose safer versions.

In an ideal world, you’d run all of those scans during the CI phase of your pipeline. Even better: integrate some checks directly into the developers’ IDE using plugins. In any case, the goal is clear: detect issues as early on as possible, raise developer awareness, and fix problems while it’s still relatively easy and cheap to do so.

Phase 2: Enforcing policies on the cluster (admission control)

Static analysis is a great first step, but it won’t catch everything. Sometimes, risky configurations can still sneak through, or someone might manually try to deploy something in an unsafe way. That’s where phase two comes in: enforcing policies directly on the cluster by using admission control. Think of it as a set of rules that defines what is(n’t) allowed in your AKS environment.

The de facto standard in the Kubernetes world is Open Policy Agent (OPA), typically implemented via Gatekeeper, which is built specifically for Kubernetes. Azure Policy for Kubernetes, which we've discussed in detail in our dedicated AKS policies blog, also uses OPA.

Gatekeeper integrates with the Kubernetes API server as an admission controller. Before a resource (a Pod or Service, for instance) is created or modified, Gatekeeper will check whether it complies with active policies.

Good to know: those policies have to be written in Rego; a domain-specific language. Depending on the individual, Rego can have a mild to steep learning curve. However, it gives you a fine-grained control over what your resources should look like. If a resource doesn’t meet the policy, the API server denies the request.

What type of rules can we define:

Prohibit containers from running as root (impose securityContext.runAsNonRoot: true).
Require resource requests and limits for CPU and storage on all pods to avoid resource hogging.
Only allow container images from specific, trusted registries (like your own ACR).
Enforce labels or annotations for governance resources or cost allocation.

OPA and Gatekeeper offer powerful control over what’s allowed to run on your cluster. The trade-off, unfortunately, is that setting up and maintaining those policies (and learning Rego, for that matter) takes time. Still, it’s a crucial layer for maintaining consistency and security across your cluster.

Phase 3: Keep watching while it’s running (runtime monitoring & feedback)

By now, you’ve scanned your code and images, and policies are governing what gets deployed on your cluster. All set? Not quite. Security never stops. That’s why a third phase is essential: continuous runtime monitoring.

Here’s why: new CVEs are discovered constantly, even in software that passed your scans just yesterday. In fact, an image that’s safe today might contain a critical vulnerability tomorrow. Runtime scanning helps you detect those new risks in actively running containers. For example, Trivy can periodically scan your live workloads in ‘cluster mode’.

Second of all, attackers are always finding new exploits. In other words: you also need to monitor behavior (both of your applications and your cluster) with threat detection and anomaly detection. Microsoft Defender for Container monitors cluster activity, for instance, and alerts you to suspicious activity or behavior: unexpected pod-to-pod network connections, the starting of unusual processes, or privilege escalation attempts.

Just as important as detection is the feedback loop. If runtime monitoring flags an issue, that information must make its way back to your dev and platform teams. They can then analyse the root cause, update the code or configuration (back to phase 1), and roll out a better version.

Without this feedback loop, the same problems will reoccur (probably silently). So, runtime monitoring isn’t just reactive but also provides you with the information you need to be able to keep improving.

Security is a shared (and ongoing) responsibility

Look, an effective DevSecOps pipeline for AKS is not a one-time checklist but a continuous cycle. By scanning beforehand, enforcing policies at deployment, and monitoring constantly at runtime, you’re layering your security approach.

What you’re learning during monitoring should feed back into your code and configurations. That doesn’t just require smart tooling, but true collaboration between developers, security engineers, and operations as well. The results? Fewer risks, more control, and great confidence!

Want to know more? Check this out!

blog

Container scanning tools: which one works best? | Lume

Fri 30 Aug 24

Cloud container scanning: which tool works best?

Ensuring the safety and reliability of your container images is vital to protecting your cloud-native applications from potential vulnerabilities. However, there are plenty of possible tools out there. Need help choosing? We’ve prepared a comprehensive comparison of four popular different container scanning tools.

Want to bring DevSecOps to your Kubernetes environment?

We'll help you embed security into every stage of your AKS pipeline. Let's talk.