Grafana Tempo Sampling: Probabilistic and Rate Limiting (2026)

Tempo’s sampling isn’t about deciding if a trace should be stored, but rather how much of it gets stored when it’s too big to fit into a single trace.

Let’s see Tempo sampling in action with a hypothetical scenario. Imagine you have a high-traffic service that generates millions of traces daily. Without sampling, your storage would explode. Tempo, by default, uses a probabilistic approach. If a trace has a 1 in 10 chance of being sampled, and 100 million traces are generated, approximately 10 million will be kept. This is configured in tempo.yaml under traces.sample:

traces:
  sample:
    # Probabilistic sampling: sample 10% of traces
    probabilistic:
      sampling_rate: 0.1

This sampling_rate is the primary lever. A value of 0.1 means 10% of traces are kept.

Now, what if your service is too successful, and even 10% of traces flood your backend? That’s where rate limiting comes in. Tempo can also limit the number of traces it accepts per second. This is configured with rate_limiting:

traces:
  sample:
    # ... probabilistic config ...
    rate_limiting:
      # Limit to 1000 traces per second
      max_traces_per_second: 1000

Here, max_traces_per_second caps the ingestion rate. If your probabilistic sampling would result in more than 1000 traces/sec, Tempo will drop the excess. This is crucial for protecting your storage and processing infrastructure from sudden spikes.

Internally, when a trace arrives, Tempo first checks if it exceeds a certain size threshold (configurable, but defaults to something reasonable like 1MB). If it’s too large, it triggers sampling. The probabilistic sampler assigns a random number to the trace. If this number falls within the sampling_rate, the trace is marked for potential storage. The rate limiter then acts as a gatekeeper, ensuring that even if many traces are marked for storage, the overall ingestion rate doesn’t exceed the configured max_traces_per_second.

The most surprising thing about Tempo’s sampling is that the sampling_rate is applied per trace, not per service or per request. This means if you have a single, long-running trace that spans many requests, each individual span within that trace has an independent chance of being sampled based on the overall sampling_rate. This can lead to situations where a trace might appear "incomplete" if many of its constituent spans were dropped due to the sampling configuration, even if the trace itself was initially marked for sampling. This is a direct consequence of the distributed nature of trace collection and the independent decision-making at each span’s generation point. It’s not a single decision for the entire trace, but a series of decisions for each component that generates a span.

The real power of this system is its layered approach. You get a broad overview with probabilistic sampling, and then a safety net with rate limiting. This prevents both storage overruns and the loss of critical, albeit rare, high-volume trace data.

Understanding the interplay between sampling_rate and max_traces_per_second is key to optimizing your Tempo setup for both cost and observability.

The next challenge will be understanding how to effectively query sampled traces when you need to reconstruct a specific request flow.

More Deep Dives in Tempo