Grafana Tempo Distributed Mode: Scale Components Independently (2026)

Tempo in distributed mode lets you scale its components like ingesters, distributors, and queryers independently, which is pretty neat for optimizing resource usage and handling varying loads.

Let’s see it in action. Imagine you’re getting a massive influx of traces from a new microservice, but your overall query load hasn’t changed much. Normally, you’d have to scale your entire tracing system, potentially over-provisioning resources for parts that aren’t seeing the increased load. With Tempo’s distributed mode, you can just scale up the ingester and distributor components specifically.

Here’s a simplified tempo.yaml configuration illustrating this:

distributor:
  receivers:
    - jaeger
    - zipkin
    - otlp

ingester:
  trace_idle_period: 5m
  max_block_duration: 1h
  chunk_target_size: 524288 # 512 KB
  chunk_encoding: zstd
  replication_factor: 3

querier:
  query_frontend:
    enabled: true
    query_timeout: 3m

common:
  local_storage:
    path: /loki/data # This is a placeholder, real path depends on your setup

In this setup:

Distributor: This is the entry point for traces. It receives traces from various protocols (Jaeger, Zipkin, OTLP) and immediately pushes them to the ingesters. If your trace volume spikes, you’d increase the number of distributor instances.
Ingester: This component receives traces from the distributors, chunks them, and writes them to object storage. The trace_idle_period, max_block_duration, chunk_target_size, and chunk_encoding are all tunable parameters here that affect how traces are processed and stored. If you’re seeing more traces coming in, you’d scale up ingesters.
Querier: This component handles trace queries. It interacts with the query frontend (if enabled) and fetches data from object storage. If your query load increases, you’d scale up queriers.
Query Frontend: An optional but recommended component that aggregates queries from multiple queriers, improving performance and preventing individual queriers from being overwhelmed.

The key here is that each of these components can be deployed as separate services, often running in different Kubernetes deployments or as distinct processes on different machines. This means you can monitor the resource utilization of each component independently and scale them up or down based on their specific load.

For example, if your distributor pods are consistently hitting 90% CPU due to a surge in incoming traces, you’d adjust the replica count for your distributor deployment. Similarly, if your ingester pods are struggling with disk I/O or memory due to large trace volumes, you’d scale those up. The querier and query-frontend scaling would be driven by metrics like query latency or CPU usage on those specific services.

This independent scaling is fundamental to achieving cost-efficiency and performance resilience. You’re not paying for more query capacity when all you need is more ingestion capacity, and vice-versa. It allows Tempo to adapt dynamically to the ever-changing demands of your observability data.

The actual mechanism for scaling often involves Kubernetes Horizontal Pod Autoscaler (HPA) configured with CPU or custom metrics specific to each Tempo component’s workload. For instance, you might autoscale distributors based on the rate of incoming spans, or queriers based on query latency.

When you’re troubleshooting query performance, remember that a slow query might not be a slow querier; it could be a bottleneck in the object storage itself, or even the network path between the querier and storage.

More Deep Dives in Tempo