Grafana Tempo’s performance is surprisingly sensitive to how you configure its resource limits, and often, the default settings are a ticking time bomb waiting for a busy day.
Let’s see Tempo in action. Imagine we’re ingesting traces from a few microservices. We’ll hit the Tempo API with some sample traces and then query them back.
Here’s a simple example using curl to send a trace:
curl -H "Content-Type: application/json" -X POST -d '{
"traceID": "a1b2c3d4e5f67890",
"spans": [
{
"traceID": "a1b2c3d4e5f67890",
"spanID": "0987654321fedcba",
"operationName": "http_request",
"startTime": 1678886400000000000,
"duration": 100000000,
"tags": {
"http.method": "GET",
"http.url": "/users",
"http.status_code": 200
}
}
]
}' http://localhost:3100/write
And then querying it back via the Tempo API:
curl http://localhost:3100/tempo/api/traces/a1b2c3d4e5f67890
This is Tempo’s core job: receive, store, and retrieve distributed tracing data. It achieves this by breaking down traces into individual spans and storing them in object storage like S3, GCS, or MinIO. When you query, it reconstructs the traces from these spans. Tempo is designed to be horizontally scalable, meaning you can run multiple instances to handle increasing load. However, each instance has its own appetite for CPU and memory, and mismanaging these can lead to performance degradation or outright failures.
The primary levers you control are the resource requests and limits you set for the Tempo pods in your Kubernetes deployment. These are defined in the pod’s YAML manifest.
resources:
requests:
cpu: "500m"
memory: "1Gi"
limits:
cpu: "1"
memory: "2Gi"
Here, requests are what Kubernetes guarantees for the pod, and limits are the hard ceiling. If a pod exceeds its CPU limit, it gets throttled. If it exceeds its memory limit, it gets OOMKilled (Out Of Memory Killed).
The surprising thing about Tempo’s resource usage is how much it fluctuates. Ingesting a large number of short traces might consume less CPU but spike memory due to internal data structures. Conversely, querying a very large, complex trace can temporarily hog CPU as Tempo reconstructs and processes the span data. This makes setting static limits a delicate balancing act.
Let’s look at a typical Tempo configuration and how to tune it. The tempo-distributed Helm chart is a common way to deploy Tempo. Within its values.yaml, you’ll find sections for ingester, querier, distributor, and compactor. Each of these components has its own resource needs.
For the Ingester, which receives spans and writes them to object storage:
- CPU: During high ingest rates, the ingester needs CPU to buffer, process, and write spans. A common starting point for a moderately busy system might be
requests: { cpu: "1" }andlimits: { cpu: "2" }. - Memory: The ingester uses memory for buffering incoming spans before writing them in batches. If you see frequent OOMKills on the ingester, this is often the culprit. A good starting point is
requests: { memory: "2Gi" }andlimits: { memory: "4Gi" }.
For the Querier, which handles trace retrieval requests:
- CPU: Queries can be CPU-intensive, especially for large traces or when scanning across many spans.
requests: { cpu: "500m" }andlimits: { cpu: "1" }is a reasonable start. - Memory: Queriers also use memory to reconstruct traces from object storage.
requests: { memory: "1Gi" }andlimits: { memory: "2Gi" }is typical.
The Distributor acts as a load balancer and fan-out for incoming traces. Its resource needs are generally lower than the ingester or querier but can spike with very high ingest rates.
- CPU:
requests: { cpu: "250m" }andlimits: { cpu: "500m" }. - Memory:
requests: { memory: "512Mi" }andlimits: { memory: "1Gi" }.
The Compactor is responsible for managing data in object storage (e.g., merging small files). It’s less sensitive to real-time load but needs sufficient resources when it runs its periodic tasks.
- CPU:
requests: { cpu: "250m" }andlimits: { cpu: "1" }. - Memory:
requests: { memory: "1Gi" }andlimits: { memory: "2Gi" }.
When tuning, observe your system’s metrics. If you see high CPU utilization on the ingester consistently, increase its CPU limit. If ingester pods are OOMKilled, increase their memory limit. For queriers, observe query latency; high latency might indicate CPU starvation, while OOMKills point to insufficient memory.
A common pitfall is setting the memory limits too close to the requests. Kubernetes’ scheduler uses requests to decide where to place pods. If a pod’s limits are only slightly higher than its requests, and the node it’s on becomes busy, the pod might get throttled or OOMKilled before the scheduler has a chance to move it. It’s often beneficial to have a larger gap between requests and limits for components that experience spiky resource usage.
When you start seeing context deadline exceeded errors on your trace queries, and your querier pods are showing high CPU, you might be tempted to just crank up the CPU. However, Tempo also has internal worker pools for processing queries. If you’ve also set max_concurrent_queries too low in your Tempo configuration, it can lead to a backlog of requests that eventually time out, even if the CPU isn’t maxed out. This is configured in Tempo’s main configuration file (e.g., tempo-distributed.yaml):
querier:
# ... other settings
max_concurrent_queries: 100
Increasing this value, in conjunction with sufficient CPU resources, can resolve those specific timeout errors.