Grafana Tempo’s query frontend shards incoming requests across multiple ingesters, but it doesn’t cache query results itself; that’s delegated to the backend.

Let’s see Tempo’s query frontend in action. Imagine you have a distributed tracing system and you want to find all traces that contain a specific operation name, say http.request. You’d typically send a query to Tempo.

Here’s a simplified example of what a query might look like hitting the Tempo query frontend:

curl -G 'http://tempo-query-frontend.example.com:8080/tempo/api/v2/query' \
  --data-urlencode 'query={http.request="some-value"}' \
  --data-urlencode 'start_time=1678886400000' \
  --data-urlencode 'end_time=1678890000000'

The query frontend receives this. It doesn’t execute the query directly. Instead, it looks at the query and decides which of its internal ingester shards should handle this. This decision is based on a hashing algorithm applied to parts of the query (like the tenant ID, if used, or specific indexed labels). It then forwards the request to the appropriate ingester shard.

The problem Tempo solves is how to scale trace ingestion and querying across many nodes without a single point of failure or bottleneck. Traces can be massive, and querying across them needs to be efficient.

Internally, the query frontend acts as a smart router. When a query arrives, it doesn’t just pick a random ingester. It uses a consistent hashing mechanism. This means if you have, say, 5 ingester shards and you add a 6th, most queries will still go to their original shard, minimizing disruption. The query frontend maintains a list of available ingester shards and their current load (though this is a simplification; the actual sharding is more about distributing the work of querying, not necessarily a direct load balancing in the traditional sense for query results).

The actual query execution happens on the ingester shards. Each ingester shard is responsible for a subset of the trace data, based on how Tempo partitions its storage (usually by time and potentially by tenant ID). When the query frontend forwards a query to an ingester shard, that shard goes to its local storage (often object storage like S3 or GCS, with indexing metadata) and retrieves the relevant trace IDs and spans.

The query frontend’s role is critical for performance because it distributes the query load. If you have 100 ingesters, and a complex query comes in, the query frontend can send parts of that query (or rather, the request to retrieve data relevant to that query) to many ingesters simultaneously. Each ingester then performs its portion of the work independently. The query frontend then aggregates the results from all the ingesters it contacted and returns a single, consolidated response to the user.

The "sharding" here refers to how the query processing workload is distributed across the ingester instances, not necessarily how the query results are cached. Tempo itself doesn’t have a dedicated query result cache at the frontend layer. Any caching that happens is typically at the backend storage level (e.g., object storage caching) or in systems Tempo integrates with. If you run the exact same query multiple times, Tempo will likely re-fetch the data from its backend storage each time, unless the underlying storage or object cache has already materialized that data.

The most surprising true thing about this is that Tempo’s query frontend doesn’t perform any aggregation or caching of query results itself. Its sole purpose is to distribute the incoming query request to the appropriate ingester shards. The ingesters then fetch data from their respective storage partitions, and the query frontend simply stitches together the responses. This design choice pushes the complexity of data retrieval and aggregation down to the ingester nodes, which are closer to the actual trace data.

The next concept you’ll likely encounter is how Tempo handles indexing and its impact on query performance, especially with high-cardinality labels.

Want structured learning?

Take the full Tempo course →