Grafana Tempo’s caching layer is surprisingly effective at reducing load on its backend storage, and understanding its configuration is key to unlocking significant performance gains.

Let’s see it in action. Imagine a steady stream of traces hitting Tempo. Without caching, every single trace ID lookup would go directly to the object store (like S3, GCS, or Azure Blob Storage), which is relatively slow and expensive. With a cache, Tempo first checks if it already has the trace ID’s location information. If it does, it serves it straight from the cache, bypassing the object store entirely.

Here’s a simplified view of the Tempo configuration for using Memcached as a cache:

caching:
  memcached:
    - host: memcached-01.example.com
      port: 11211
      timeout: 100ms
      max_idle_connections: 10

And here’s how you’d configure it for Redis:

caching:
  redis:
    - host: redis-01.example.com
      port: 6379
      username: tempo_cache_user
      password: <your-redis-password>
      db: 0
      pool_size: 10
      timeout: 100ms

The problem Tempo solves with caching is the inherent latency of fetching trace metadata from object storage. Traces are often accessed by their ID, and the object store is not designed for the kind of random, high-volume lookups that a tracing backend experiences. Tempo’s cache acts as a high-speed lookup table, storing the mapping between trace IDs and the object store locations where the actual trace data resides. This dramatically speeds up trace retrieval and reduces the number of API calls to your object store, saving money and improving user experience.

Internally, when Tempo receives a request for a trace ID, it first queries its configured cache. If the trace ID is found in the cache, Tempo retrieves the object store path from the cache and proceeds to fetch the trace data from there. If the trace ID is not found in the cache, Tempo then queries the object store to find the location of the trace data, retrieves that location, stores it in the cache for future requests, and then fetches the trace data.

The exact levers you control are primarily the connection details to your cache instance(s) and some performance tuning parameters. For Memcached, timeout dictates how long Tempo will wait for a response before giving up and trying the object store, and max_idle_connections manages the pool of connections to the Memcached server. For Redis, you have similar controls like pool_size for connection pooling, timeout, and also authentication parameters like username, password, and db for selecting the correct Redis database.

A common point of confusion is how Tempo handles cache invalidation. It doesn’t, in the traditional sense. Tempo’s cache is primarily for mapping trace IDs to object store locations. When a trace is ingested, Tempo writes its metadata (including the object store path) to the object store and then immediately updates its cache with this mapping. If a trace is deleted or overwritten in the object store, the cache entry will simply become stale. Tempo relies on the fact that trace IDs are generally immutable and that it will eventually discover a missing object during retrieval and update the cache accordingly (or remove the stale entry if a trace is truly gone). This "write-through" or "write-around" caching strategy, where the cache is updated alongside the primary store, is what makes it so effective for read-heavy workloads.

The next concept you’ll likely explore is how to scale your Tempo deployment, including the implications of distributed tracing and high availability for your caching layer.

Want structured learning?

Take the full Tempo course →