Trivy’s Kubernetes cluster scan can reveal RBAC issues, misconfigurations, and CVEs, but the underlying problem is usually that the scanned cluster’s API server is denying Trivy access to critical information.

Common Causes of Trivy Kubernetes Cluster Scan Failures

  1. Insufficient RBAC Permissions: Trivy, running as a Pod within the cluster, needs specific permissions to list and get resources like Pods, Deployments, Nodes, and Secrets. Without these, it can’t gather the necessary data.

    • Diagnosis: Check the ClusterRole and ClusterRoleBinding associated with the Trivy service account. You’re looking for verbs like list, get, watch on resources such as pods, nodes, deployments, replicasets, statefulsets, daemonsets, configmaps, secrets, clusterroles, clusterrolebindings, roles, rolebindings, serviceaccounts, and namespaces.
    • Fix: Ensure the ClusterRole bound to the Trivy service account includes permissions for get, list, and watch on relevant Kubernetes API resources. For example, to scan all pods and nodes, you’d need:
      apiVersion: rbac.authorization.k8s.io/v1
      kind: ClusterRole
      metadata:
        name: trivy-scan-role
      rules:
      - apiGroups: [""]
        resources: ["pods", "nodes", "namespaces", "configmaps", "secrets", "serviceaccounts"]
        verbs: ["get", "list", "watch"]
      - apiGroups: ["apps"]
        resources: ["deployments", "replicasets", "statefulsets", "daemonsets"]
        verbs: ["get", "list", "watch"]
      # Add more resources as needed for comprehensive scanning
      
      Then, bind this ClusterRole to the Trivy service account:
      apiVersion: rbac.authorization.k8s.io/v1
      kind: ClusterRoleBinding
      metadata:
        name: trivy-scan-binding
      subjects:
      - kind: ServiceAccount
        name: trivy # Assuming your Trivy SA is named 'trivy'
        namespace: trivy # Namespace where Trivy is deployed
      roleRef:
        kind: ClusterRole
        name: trivy-scan-role
        apiGroup: rbac.authorization.k8s.io
      
    • Why it works: This grants Trivy the necessary read-only access to query the Kubernetes API server for information about your cluster’s state.
  2. Incorrect Kubernetes Context/Configuration: If Trivy is not configured to point to the correct Kubernetes API server endpoint or uses outdated credentials, it won’t be able to connect.

    • Diagnosis: When running trivy k8s --cluster, Trivy uses the kubeconfig file of the environment it’s running in. Check if this kubeconfig is valid and points to the correct cluster.
    • Fix: Ensure the KUBECONFIG environment variable is set correctly, or that the default ~/.kube/config file is accurate for the cluster you intend to scan. You can explicitly pass a kubeconfig file using the --kubeconfig flag:
      trivy k8s --cluster --kubeconfig /path/to/your/cluster.kubeconfig
      
    • Why it works: This explicitly tells Trivy which Kubernetes API server to communicate with and how to authenticate.
  3. Network Policies Blocking Trivy’s Access: Kubernetes NetworkPolicy objects can restrict traffic between pods. If a NetworkPolicy is in place that prevents the Trivy pod from reaching the Kubernetes API server (usually kube-apiserver on port 443), the scan will fail.

    • Diagnosis: Examine NetworkPolicy resources in the namespace where Trivy is running and in the kube-system namespace (or wherever the API server is exposed). Look for policies that might egress traffic from the Trivy pod or ingress traffic to the API server.
    • Fix: Create or modify a NetworkPolicy to allow egress from the Trivy pod to the Kubernetes API server. For instance, if Trivy is in the trivy namespace and the API server is accessible via a ClusterIP service in kube-system:
      apiVersion: networking.k8s.io/v1
      kind: NetworkPolicy
      metadata:
        name: allow-trivy-to-apiserver
        namespace: trivy # Namespace of the Trivy pod
      spec:
        podSelector:
          matchLabels:
            app.kubernetes.io/name: trivy # Label of your Trivy pod
        policyTypes:
        - Egress
        egress:
        - to:
          - ipBlock:
              # CIDR for your cluster's service IPs (e.g., 10.96.0.0/12)
              # Or target the specific IP of the kube-apiserver service
              cidr: "10.96.0.0/12"
          ports:
          - protocol: TCP
            port: 443
        # If you have a default deny ingress, you might also need to allow
        # ingress for the API server to reach Trivy if it needs to send data back.
        # However, for a cluster scan, Trivy initiates the connection.
      
    • Why it works: This explicitly permits outbound TCP traffic from the Trivy pod on port 443, which is how it communicates with the Kubernetes API.
  4. Trivy Pod Not Starting or Crashing: The Trivy pod itself might be failing to start due to resource constraints, image pull issues, or misconfigurations in its own deployment manifest.

    • Diagnosis: Use kubectl get pods -n trivy (or your Trivy namespace) to check the status. If it’s CrashLoopBackOff or Error, investigate with kubectl logs <trivy-pod-name> -n trivy and kubectl describe pod <trivy-pod-name> -n trivy.
    • Fix:
      • Resource Limits: If OOMKilled is in the logs, increase the resources.requests and resources.limits for CPU and memory in the Trivy deployment manifest.
      • Image Pull: Ensure the image ghcr.io/aquasecurity/trivy:latest (or your specified version) can be pulled from your cluster’s network. Check ImagePullSecrets if using a private registry.
      • Mounts: Verify any persistent volume claims or config map mounts are correctly defined and accessible.
    • Why it works: A healthy Trivy pod is fundamental; these steps ensure it can start, run, and access necessary configurations or resources to perform its scan.
  5. Cluster-scoped Resources Not Scanned Due to Missing Permissions: While basic RBAC covers Pods and Deployments, scanning for cluster-level misconfigurations (like Ingress, ClusterRoles) requires broader permissions.

    • Diagnosis: The scan might complete but report "0 misconfigurations" or miss certain types of resources. Check the ClusterRole for specific resource types like ingresses, clusterroles, clusterrolebindings, customresourcedefinitions.
    • Fix: Add permissions for these specific resources to the trivy-scan-role ClusterRole defined in cause #1. For example:
      # ... within the rules section of your ClusterRole ...
      - apiGroups: ["networking.k8s.io"]
        resources: ["ingresses"]
        verbs: ["get", "list", "watch"]
      - apiGroups: ["rbac.authorization.k8s.io"]
        resources: ["clusterroles", "clusterrolebindings"]
        verbs: ["get", "list", "watch"]
      - apiGroups: ["apiextensions.k8s.io"]
        resources: ["customresourcedefinitions"]
        verbs: ["get", "list", "watch"]
      
    • Why it works: These permissions allow Trivy to query the API server for cluster-wide resources, providing a more complete security posture assessment.
  6. API Server Rate Limiting: In very large or highly active clusters, the Kubernetes API server might start rate-limiting requests from Trivy if it makes too many calls too quickly.

    • Diagnosis: Look for 429 Too Many Requests errors in Trivy logs or in the kube-apiserver logs if you have access.
    • Fix: While Trivy doesn’t have explicit rate-limiting flags for API server calls, you can mitigate this by:
      • Running scans less frequently.
      • If possible, adjusting API server rate-limiting configurations (advanced and usually not recommended without deep understanding).
      • Ensuring your Trivy deployment has adequate resources (CPU/memory) so it can process data efficiently, potentially reducing the perceived load.
    • Why it works: Reducing the frequency or perceived intensity of requests helps prevent the API server from throttling Trivy, allowing it to complete its queries.

After resolving these, you’ll likely encounter issues with Trivy not being able to pull container images from private registries for vulnerability scanning, requiring ImagePullSecrets configuration.

Want structured learning?

Take the full Trivy course →