Trivy’s Kubernetes cluster scan can reveal RBAC issues, misconfigurations, and CVEs, but the underlying problem is usually that the scanned cluster’s API server is denying Trivy access to critical information.
Common Causes of Trivy Kubernetes Cluster Scan Failures
-
Insufficient RBAC Permissions: Trivy, running as a Pod within the cluster, needs specific permissions to list and get resources like Pods, Deployments, Nodes, and Secrets. Without these, it can’t gather the necessary data.
- Diagnosis: Check the
ClusterRoleandClusterRoleBindingassociated with the Trivy service account. You’re looking for verbs likelist,get,watchon resources such aspods,nodes,deployments,replicasets,statefulsets,daemonsets,configmaps,secrets,clusterroles,clusterrolebindings,roles,rolebindings,serviceaccounts, andnamespaces. - Fix: Ensure the
ClusterRolebound to the Trivy service account includes permissions forget,list, andwatchon relevant Kubernetes API resources. For example, to scan all pods and nodes, you’d need:
Then, bind thisapiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: trivy-scan-role rules: - apiGroups: [""] resources: ["pods", "nodes", "namespaces", "configmaps", "secrets", "serviceaccounts"] verbs: ["get", "list", "watch"] - apiGroups: ["apps"] resources: ["deployments", "replicasets", "statefulsets", "daemonsets"] verbs: ["get", "list", "watch"] # Add more resources as needed for comprehensive scanningClusterRoleto the Trivy service account:apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: trivy-scan-binding subjects: - kind: ServiceAccount name: trivy # Assuming your Trivy SA is named 'trivy' namespace: trivy # Namespace where Trivy is deployed roleRef: kind: ClusterRole name: trivy-scan-role apiGroup: rbac.authorization.k8s.io - Why it works: This grants Trivy the necessary read-only access to query the Kubernetes API server for information about your cluster’s state.
- Diagnosis: Check the
-
Incorrect Kubernetes Context/Configuration: If Trivy is not configured to point to the correct Kubernetes API server endpoint or uses outdated credentials, it won’t be able to connect.
- Diagnosis: When running
trivy k8s --cluster, Trivy uses thekubeconfigfile of the environment it’s running in. Check if thiskubeconfigis valid and points to the correct cluster. - Fix: Ensure the
KUBECONFIGenvironment variable is set correctly, or that the default~/.kube/configfile is accurate for the cluster you intend to scan. You can explicitly pass a kubeconfig file using the--kubeconfigflag:trivy k8s --cluster --kubeconfig /path/to/your/cluster.kubeconfig - Why it works: This explicitly tells Trivy which Kubernetes API server to communicate with and how to authenticate.
- Diagnosis: When running
-
Network Policies Blocking Trivy’s Access: Kubernetes
NetworkPolicyobjects can restrict traffic between pods. If aNetworkPolicyis in place that prevents the Trivy pod from reaching the Kubernetes API server (usuallykube-apiserveron port 443), the scan will fail.- Diagnosis: Examine
NetworkPolicyresources in the namespace where Trivy is running and in thekube-systemnamespace (or wherever the API server is exposed). Look for policies that might egress traffic from the Trivy pod or ingress traffic to the API server. - Fix: Create or modify a
NetworkPolicyto allow egress from the Trivy pod to the Kubernetes API server. For instance, if Trivy is in thetrivynamespace and the API server is accessible via a ClusterIP service inkube-system:apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: allow-trivy-to-apiserver namespace: trivy # Namespace of the Trivy pod spec: podSelector: matchLabels: app.kubernetes.io/name: trivy # Label of your Trivy pod policyTypes: - Egress egress: - to: - ipBlock: # CIDR for your cluster's service IPs (e.g., 10.96.0.0/12) # Or target the specific IP of the kube-apiserver service cidr: "10.96.0.0/12" ports: - protocol: TCP port: 443 # If you have a default deny ingress, you might also need to allow # ingress for the API server to reach Trivy if it needs to send data back. # However, for a cluster scan, Trivy initiates the connection. - Why it works: This explicitly permits outbound TCP traffic from the Trivy pod on port 443, which is how it communicates with the Kubernetes API.
- Diagnosis: Examine
-
Trivy Pod Not Starting or Crashing: The Trivy pod itself might be failing to start due to resource constraints, image pull issues, or misconfigurations in its own deployment manifest.
- Diagnosis: Use
kubectl get pods -n trivy(or your Trivy namespace) to check the status. If it’sCrashLoopBackOfforError, investigate withkubectl logs <trivy-pod-name> -n trivyandkubectl describe pod <trivy-pod-name> -n trivy. - Fix:
- Resource Limits: If
OOMKilledis in the logs, increase theresources.requestsandresources.limitsfor CPU and memory in the Trivy deployment manifest. - Image Pull: Ensure the image
ghcr.io/aquasecurity/trivy:latest(or your specified version) can be pulled from your cluster’s network. CheckImagePullSecretsif using a private registry. - Mounts: Verify any persistent volume claims or config map mounts are correctly defined and accessible.
- Resource Limits: If
- Why it works: A healthy Trivy pod is fundamental; these steps ensure it can start, run, and access necessary configurations or resources to perform its scan.
- Diagnosis: Use
-
Cluster-scoped Resources Not Scanned Due to Missing Permissions: While basic RBAC covers Pods and Deployments, scanning for cluster-level misconfigurations (like Ingress, ClusterRoles) requires broader permissions.
- Diagnosis: The scan might complete but report "0 misconfigurations" or miss certain types of resources. Check the
ClusterRolefor specific resource types likeingresses,clusterroles,clusterrolebindings,customresourcedefinitions. - Fix: Add permissions for these specific resources to the
trivy-scan-roleClusterRoledefined in cause #1. For example:# ... within the rules section of your ClusterRole ... - apiGroups: ["networking.k8s.io"] resources: ["ingresses"] verbs: ["get", "list", "watch"] - apiGroups: ["rbac.authorization.k8s.io"] resources: ["clusterroles", "clusterrolebindings"] verbs: ["get", "list", "watch"] - apiGroups: ["apiextensions.k8s.io"] resources: ["customresourcedefinitions"] verbs: ["get", "list", "watch"] - Why it works: These permissions allow Trivy to query the API server for cluster-wide resources, providing a more complete security posture assessment.
- Diagnosis: The scan might complete but report "0 misconfigurations" or miss certain types of resources. Check the
-
API Server Rate Limiting: In very large or highly active clusters, the Kubernetes API server might start rate-limiting requests from Trivy if it makes too many calls too quickly.
- Diagnosis: Look for
429 Too Many Requestserrors in Trivy logs or in thekube-apiserverlogs if you have access. - Fix: While Trivy doesn’t have explicit rate-limiting flags for API server calls, you can mitigate this by:
- Running scans less frequently.
- If possible, adjusting API server rate-limiting configurations (advanced and usually not recommended without deep understanding).
- Ensuring your Trivy deployment has adequate resources (CPU/memory) so it can process data efficiently, potentially reducing the perceived load.
- Why it works: Reducing the frequency or perceived intensity of requests helps prevent the API server from throttling Trivy, allowing it to complete its queries.
- Diagnosis: Look for
After resolving these, you’ll likely encounter issues with Trivy not being able to pull container images from private registries for vulnerability scanning, requiring ImagePullSecrets configuration.