Tekton’s docker-in-docker (dind) setup is surprisingly fragile because the inner Docker daemon runs as root inside a container, which itself is usually run by a non-root user on the Kubernetes node.
Here’s how to build container images securely and reliably using Tekton’s dind:
Common Causes and Fixes
-
Insufficient Privileges for the
dindContainer: Thedindcontainer needs elevated privileges to start its own Docker daemon.- Diagnosis: Check the logs of your
dindpod. You’ll likely see errors related to mounting/var/lib/dockeror starting the Docker daemon. - Fix: Ensure the
SecurityContextfor yourdindcontainer in theTaskorPipelinegrantsprivileged: true.- name: docker-daemon image: docker:20.10.17-dind securityContext: privileged: true script: | #!/bin/sh dockerd-entrypoint.sh volumeMounts: - name: docker-storage mountPath: /var/lib/docker - Why it works:
privileged: trueallows the container to perform all host operations, including starting a full Docker daemon with its own kernel modules and device access, bypassing many standard container restrictions.
- Diagnosis: Check the logs of your
-
Incorrectly Mounted Docker Storage: The
dinddaemon needs persistent storage for its images, containers, and build cache.- Diagnosis: Look for
volumedefinitions in yourTaskorPipelinethat mount/var/lib/dockerwithin thedindcontainer. If this is missing or incorrectly configured, the dind daemon won’t start or will lose state between runs. - Fix: Add a
volumeandvolumeMountfor/var/lib/docker. AhostPathvolume is common for dind, but be mindful of security implications.
Note: For true persistence of the Docker image cache across pipeline runs, you’d typically use aapiVersion: tekton.dev/v1beta1 kind: Task metadata: name: build-docker-image spec: params: - name: IMAGE_URL description: URL of the image to build type: string volumes: - name: docker-storage emptyDir: {} # Or use hostPath for persistence across pod restarts steps: - name: build-and-push image: docker:20.10.17 command: ["/bin/sh", "-c"] args: - docker build -t $(params.IMAGE_URL) . && docker push $(params.IMAGE_URL) volumeMounts: - name: docker-storage mountPath: /var/lib/dockerhostPathvolume pointing to a specific directory on the Kubernetes node, e.g.,/mnt/docker-data. However,emptyDiris sufficient for a single pipeline run. - Why it works: This provides the
dockerdprocess inside the container with a dedicated directory to store its state, allowing it to function as a full Docker daemon.
- Diagnosis: Look for
-
Service Account Lacking Permissions for Docker Registry: If you’re pushing images, the Kubernetes Service Account used by your Tekton PipelineRun needs permissions to authenticate with your Docker registry.
- Diagnosis: Check your pipeline logs for
denied: requested access to the resource is deniedor similar authentication errors whendocker pushis executed. - Fix: Create a Kubernetes
Secretcontaining your Docker registry credentials and reference it in yourPipelineRun.
Then, in yourapiVersion: v1 kind: Secret metadata: name: docker-creds type: kubernetes.io/dockerconfigjson data: .dockerconfigjson: <base64-encoded-docker-config-json>PipelineRun:
TheapiVersion: tekton.dev/v1beta1 kind: PipelineRun metadata: name: my-pipeline-run spec: pipelineRef: name: my-pipeline serviceAccountName: tekton-pipelines-service-account # Ensure this SA exists and has the necessary role bindings secrets: - name: docker-credsdocker logincommand will then pick up these credentials. - Why it works: Tekton automatically makes secrets of type
kubernetes.io/dockerconfigjsonavailable to tasks, and thedockerclient within the task automatically uses them for authentication.
- Diagnosis: Check your pipeline logs for
-
Resource Constraints on the
dindPod: The Docker daemon can be resource-intensive, especially during image builds. If thedindpod doesn’t have enough CPU or memory, it can crash or become unresponsive.- Diagnosis: Monitor the
dindpod’s resource utilization in Kubernetes. Look for OOMKilled events or high CPU usage leading to timeouts. - Fix: Increase the resource requests and limits for the
dindcontainer in yourTaskdefinition.- name: docker-daemon image: docker:20.10.17-dind securityContext: privileged: true resources: requests: cpu: "1000m" memory: "1Gi" limits: cpu: "2000m" memory: "2Gi" script: | #!/bin/sh dockerd-entrypoint.sh volumeMounts: - name: docker-storage mountPath: /var/lib/docker - Why it works: Providing adequate CPU and memory ensures the Docker daemon can effectively manage its processes and resources without being terminated by the Kubernetes scheduler.
- Diagnosis: Monitor the
-
Network Issues or Firewall Blocking Docker Daemon Access: The
dindcontainer needs to communicate with the Docker registry and potentially other services.- Diagnosis: Check
dindcontainer logs for network-related errors (e.g.,connection refused,timeout,name resolution failed). - Fix: Ensure your Kubernetes network policies allow egress traffic from the
dindpod to your Docker registry (e.g.,docker.io,gcr.io,quay.io) on port 443. Also, verify that the Kubernetes node itself has proper network connectivity. - Why it works: Network policies can restrict pod-to-pod and pod-to-external communication. Allowing necessary egress traffic ensures the
dinddaemon can reach external services like registries.
- Diagnosis: Check
-
Using an Outdated or Incompatible Docker-in-Docker Image: The
dindimage version might have bugs or incompatibilities with your Kubernetes environment or Tekton version.- Diagnosis: Examine the
dindcontainer logs for specific error messages that might indicate a version mismatch or known issues with that Docker version. - Fix: Pin your
dindimage to a known stable version, or try upgrading/downgrading to a different patch release. For example,docker:20.10.17-dindis a specific, tested version.- name: docker-daemon image: docker:20.10.17-dind # Pin to a specific, known-good version # ... rest of the configuration - Why it works: Different versions of Docker have different behaviors and dependencies. Using a specific version ensures predictable behavior and avoids introducing unexpected bugs from newer or older releases.
- Diagnosis: Examine the
The next error you’ll likely encounter after fixing these is related to the actual build process failing due to syntax errors in your Dockerfile or missing build context files.