Trivy, the popular open-source vulnerability scanner, can be integrated into an Argo CD-based GitOps workflow to act as a security gate, preventing vulnerable container images from being deployed.
Here’s how it works in action:
Imagine you have a Kubernetes cluster managed by Argo CD. Your application’s desired state, including the container image to be deployed, is defined in a Git repository. When a developer pushes a new commit with an updated image tag to Git, Argo CD detects the change and attempts to synchronize the cluster with the new state.
Before this synchronization happens, we want to scan the container image for vulnerabilities. If Trivy finds any critical or high-severity issues, we want to halt the deployment.
Here’s a simplified example of a Kubernetes Deployment manifest that Argo CD might be managing:
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
spec:
replicas: 3
selector:
matchLabels:
app: my-app
template:
metadata:
labels:
app: my-app
spec:
containers:
- name: my-app-container
image: my-docker-registry/my-app:v1.2.3 # This is the image we'll scan
ports:
- containerPort: 8080
In a GitOps pipeline, this Deployment manifest would reside in your Git repository. When this manifest changes (e.g., to image: my-docker-registry/my-app:v1.2.4), Argo CD would pick up the change.
Now, let’s integrate Trivy. We can use Argo CD’s Application spec to define pre-sync or sync hooks. A common approach is to use a ResourceHook that triggers a Kubernetes Job to run Trivy.
Here’s a conceptual outline of how this might look within an Application manifest:
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: my-app-application
namespace: argocd
spec:
project: default
source:
repoURL: <your-git-repo-url>
targetRevision: HEAD
path: <path-to-your-manifests>
destination:
server: https://kubernetes.default.svc
namespace: my-app-namespace
syncPolicy:
automated:
prune: true
selfHeal: true
syncHooks:
- hookType: PreSync
hook:
apiVersion: batch/v1
kind: Job
metadata:
name: trivy-scan-{{.Application.Name}}-{{.Source.TargetRevision}} # Dynamic naming
spec:
template:
spec:
containers:
- name: trivy-scanner
image: aquasec/trivy:0.47.0 # Use a specific Trivy version
command: ["trivy", "image", "--severity", "HIGH,CRITICAL", "--exit-code", "1", "my-docker-registry/my-app:v1.2.4"] # Scan and fail if HIGH/CRITICAL found
restartPolicy: Never
backoffLimit: 2 # Retry the job a couple of times on failure
The Problem This Solves:
The fundamental problem is ensuring that only secure container images are deployed to production. Without a security gate, a developer could accidentally (or intentionally) push an image with known, exploitable vulnerabilities, leading to security breaches. Traditional CI/CD pipelines often scan images as part of the build process, but in a GitOps model, the source of truth is Git, and Argo CD reconciles the cluster state. This integration places the security scan directly within the GitOps reconciliation loop.
How It Works Internally:
- Argo CD Detects Change: Argo CD monitors the configured Git repository. When it finds a new commit that modifies the
Application’ssource(e.g., updating theimagetag in aDeploymentmanifest), it enters the sync phase. - PreSync Hook Execution: Before Argo CD attempts to apply any Kubernetes resources defined in the Git repository, it checks for
syncHooks. In our example, aPreSynchook is defined as a KubernetesJob. - Trivy Job Runs: Argo CD creates this
Jobin the cluster. TheJob’s pod starts up, pulling the specified Trivy image (aquasec/trivy:0.47.0in the example). - Vulnerability Scan: The Trivy container executes its
command. The commandtrivy image --severity HIGH,CRITICAL --exit-code 1 my-docker-registry/my-app:v1.2.4tells Trivy to:- Scan the specified container image (
my-docker-registry/my-app:v1.2.4). - Report vulnerabilities with a severity of
HIGHorCRITICAL. - Exit with a non-zero status code (
--exit-code 1) if any vulnerabilities matching the severity criteria are found.
- Scan the specified container image (
- Job Success/Failure:
- If Trivy finds no
HIGHorCRITICALvulnerabilities, it exits with status code0. TheJobis marked as successful. Argo CD proceeds to the sync phase, applying the updatedDeploymentmanifest to the cluster. - If Trivy does find
HIGHorCRITICALvulnerabilities, it exits with status code1. TheJobfails. Argo CD, seeing thePreSynchook failed, aborts the sync operation. TheDeploymentis not updated, and the vulnerable image is not deployed.
- If Trivy finds no
- Retry Mechanism: The
backoffLimit: 2on theJobmeans that if the Trivy scan fails (e.g., due to transient network issues accessing the image registry), Kubernetes will attempt to restart the pod up to two times before marking theJobas failed.
The Exact Levers You Control:
-
repoURL,targetRevision,path: These define where Argo CD finds your application manifests. -
destination.server,destination.namespace: These define where Argo CD deploys your application. -
syncPolicy.automated: Controls if Argo CD automatically syncs changes from Git. -
syncHooks.hookType: Determines when the hook runs.PreSyncis crucial for a security gate. Other options includeSyncandPostSync. -
hook.apiVersion,hook.kind,hook.metadata: These define the Kubernetes resource Argo CD will create for the hook (here, aJob). Thenamecan be dynamically generated using Argo CD’s templating. -
containers[0].image(within theJobspec): This is the Trivy image itself. Pinning to a specific version (e.g.,aquasec/trivy:0.47.0) is essential for reproducible scans. -
containers[0].command(within theJobspec): This is the heart of the security gate.-
trivy image: Specifies the scan target. -
--severity HIGH,CRITICAL: Defines the vulnerability thresholds that will cause a failure. You can adjust this (e.g.,LOW,MEDIUM,HIGH,CRITICALto be more strict, or justCRITICALto be less strict). -
--exit-code 1: The critical flag that tells theJobto fail if vulnerabilities are found. -
<image-name>:<tag>: The actual container image to scan. Crucially, this image tag must be dynamically injected from theDeploymentmanifest being synced. In a real-world scenario, you’d likely use Argo CD’s{{.Application.spec.source.helm.values.image}}or similar templating to pull the image name and tag from your application’s values or manifest.
-
-
spec.backoffLimit: Controls how many times theJobpod will be retried if it fails.
The one thing most people don’t know:
The Job definition for the syncHook must have a way to dynamically reference the image being deployed. Hardcoding the image name and tag within the syncHook’s command would mean the security gate only scans a specific, static image, defeating the purpose of scanning new images. You need to leverage Argo CD’s templating capabilities to extract the image name and tag from the Application’s source manifests, often by passing them as parameters to a Helm chart or Kustomize overlay that the Application points to, and then referencing those parameters within the syncHook’s command. A common pattern involves defining the image in a common values.yaml file for Helm, and then referencing {{ .Values.image.repository }}:{{ .Values.image.tag }} within the syncHook command.
The next thing you’ll run into is managing the output and reporting. If a sync fails due to Trivy, Argo CD will show a sync failure, but the detailed Trivy report might be lost unless you configure logging or artifact storage for the Job’s pod.