Vector’s Kubernetes filter can inject pod metadata into your logs, turning a stream of cryptic events into rich, searchable insights.
Let’s see it in action. Imagine you have a Kubernetes cluster and you’re sending logs from your applications to Vector. Without any enrichment, a log line might look like this:
{"message": "Request processed successfully", "timestamp": "2023-10-27T10:00:00Z"}
This tells you what happened, but not where or by whom within your cluster. Now, let’s configure Vector to add that context.
Here’s a simplified vector.toml configuration:
[sources.kube_logs]
type = "kubernetes_logs"
# Assumes Vector runs as a DaemonSet, watching /var/log/containers
[transforms.enrich_with_pod_info]
type = "kubernetes"
inputs = ["kube_logs"]
# This is the magic: Vector watches the Kubernetes API for pod updates
# and automatically adds metadata to matching log events.
# You can specify which fields to include.
pod_annotation_fields = [
"pod.name",
"pod.namespace",
"pod.labels",
"pod.uid",
"pod.ip",
"pod.node.name"
]
[sinks.my_logging_service]
type = "http"
inputs = ["enrich_with_pod_info"]
# Replace with your actual logging service endpoint
endpoint = "http://your-logging-service.example.com/ingest"
encoding = "json"
With this configuration, the same log event, when processed by Vector, will look like this:
{
"message": "Request processed successfully",
"timestamp": "2023-10-27T10:00:00Z",
"pod": {
"name": "my-app-deployment-7d9b8d5f9f-abcde",
"namespace": "default",
"labels": {
"app": "my-app",
"tier": "backend"
},
"uid": "a1b2c3d4-e5f6-7890-1234-567890abcdef",
"ip": "10.244.0.10",
"node": {
"name": "worker-node-1"
}
}
}
You now have instant context: which pod (my-app-deployment-7d9b8d5f9f-abcde) generated the log, in which namespace (default), on which node (worker-node-1), and with what labels (app: my-app, tier: backend). This is crucial for debugging, auditing, and understanding application behavior in a dynamic Kubernetes environment.
The Kubernetes transform in Vector operates by first establishing a connection to the Kubernetes API server. When running as a DaemonSet within the cluster (which is the most common deployment pattern for log collection), it can often discover the API server’s address automatically. If not, you’d configure kubeconfig_path or kubernetes_api_host.
Once connected, Vector watches for changes to Pod objects. Whenever a pod starts, stops, or updates, Vector receives an event. It maintains an in-memory cache of all known pods and their associated metadata. When a log event arrives from a source (like kubernetes_logs which tailors logs from /var/log/containers/), Vector looks up the pod that generated that log event using internal identifiers (usually derived from the log file path, which Kubernetes annotates). It then merges the relevant metadata from its pod cache into the log event.
The key levers you control are the fields you choose to include in pod_annotation_fields. You can pull a wide range of information directly from the Pod object: its name, namespace, UID, IP address, node it’s running on, and crucially, its labels and annotations. Labels are particularly powerful because they are often used for application identification and routing within Kubernetes. You can also include fields like pod.status.phase or pod.container.name if needed.
The "magic" here is that Vector handles the complexity of watching the Kubernetes API, maintaining the cache, and performing the join between log events and pod metadata. You don’t need to write separate services to query the Kubernetes API and correlate logs; Vector does it implicitly. This makes it incredibly efficient for enriching logs at scale.
A common pitfall is assuming that all pod metadata will be available instantly. If a pod is just starting up, or if Vector starts up after a pod has already logged something, there might be a small delay before the metadata is fully populated in Vector’s cache. This is usually negligible for most use cases, but it’s worth remembering if you see logs missing metadata immediately after a pod deployment.
The next step is often to learn how to use this enriched metadata in your sinks for powerful filtering and routing.