Deploying Vector as a DaemonSet for Kubernetes log collection means you’re about to have a robust, efficient way to grab logs from every node in your cluster without manual intervention.
Here’s how it works in the wild. Imagine you have a Kubernetes cluster with three nodes: node-1, node-2, and node-3. You’ve just deployed your vector-collector DaemonSet. What happens? Kubernetes schedules one vector-collector pod onto each of those nodes.
# Example DaemonSet spec snippet
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: vector-collector
namespace: logging
spec:
selector:
matchLabels:
app: vector-collector
template:
metadata:
labels:
app: vector-collector
spec:
containers:
- name: vector
image: timberio/vector:latest
ports:
- containerPort: 8686 # Vector's internal metrics port
volumeMounts:
- name: varlog
mountPath: /var/log
- name: containers
mountPath: /var/lib/docker/containers # Or your container runtime's path
volumes:
- name: varlog
hostPath:
path: /var/log
- name: containers
hostPath:
path: /var/lib/docker/containers
Each of these pods, running directly on the host node, can then access the node’s filesystem. Specifically, it mounts /var/log and the directory where container logs are stored (commonly /var/lib/docker/containers for Docker, or similar paths for containerd/CRI-O). This allows the Vector agent on node-1 to read log files generated by applications running on node-1, the agent on node-2 to read logs from node-2, and so on.
The core problem this solves is distributed log aggregation. In a microservices architecture, applications are ephemeral and can run on any node. You can’t just SSH into each node and tail logs; the logs would be scattered, and nodes might disappear. A DaemonSet ensures a log collector is always present on every node, providing a consistent collection point.
Vector’s internal architecture is key here. Each Vector agent pod, operating as a DaemonSet, acts as an independent collector. It uses Vector’s host_metrics source to discover running containers on its local node and configures file sources to tail log files within /var/lib/docker/containers. These files are typically structured with a .log extension (e.g., container_id-container_name.log). Vector then processes these logs, potentially enriching them with Kubernetes metadata (like pod name, namespace, labels) via its kubernetes transform, and forwards them to a central destination like Elasticsearch, Loki, or S3.
The hostPath volume mounts are the critical piece that gives the Vector pod direct access to the host’s filesystem. Without these, the pod would be isolated within its own container filesystem and unable to see the node’s logs. The selector and matchLabels ensure that Kubernetes only schedules the vector-collector pod on nodes that have the app: vector-collector label, which is standard practice for DaemonSets.
The kubernetes transform in Vector is incredibly powerful. When configured, it watches the Kubernetes API server. For each log line it processes, it can look up the associated pod’s metadata (namespace, name, labels, annotations) and attach it to the log event. This turns raw container logs into context-rich events that are searchable and filterable by Kubernetes attributes.
One of the most overlooked aspects of DaemonSet log collection is how Vector handles log rotation. When a container logs to a file, the container runtime (like Docker) will eventually rotate that file, creating a new one and archiving the old. Vector is designed to detect these new files and resume tailing from the correct position, preventing log loss during rotation events. It achieves this by tracking file inode numbers and offsets, ensuring that even if a file is renamed or recreated, Vector can re-acquire it.
The next step you’ll likely encounter is configuring Vector’s output sinks to send these collected and enriched logs to your chosen observability platform.