Grafana Tempo Span Metrics: P99 Latency from Traces (2026)

Tempo’s span metrics are a surprisingly effective way to get latency percentiles without needing to sample every trace.

Let’s say you’re trying to understand the P99 latency of your checkout service. You’ve got Tempo collecting traces, and you want to see how long that operation really takes for the slowest users. Instead of trying to pull and process every single checkout trace (which would drown your system), Tempo can aggregate this for you.

Here’s a trace of a checkout request as seen by Tempo:

{
  "traceID": "a1b2c3d4e5f67890",
  "spans": [
    {
      "traceID": "a1b2c3d4e5f67890",
      "spanID": "1111111111111111",
      "parentSpanID": "0000000000000000",
      "operationName": "checkout",
      "startTimeUnixNano": 1678886400123456789,
      "durationNano": 500000000, // 500ms
      "tags": [
        {"key": "service.name", "value": "checkout-service"},
        {"key": "http.method", "value": "POST"},
        {"key": "http.status_code", "value": "200"}
      ]
    },
    {
      "traceID": "a1b2c3d4e5f67890",
      "spanID": "2222222222222222",
      "parentSpanID": "1111111111111111",
      "operationName": "payment_processing",
      "startTimeUnixNano": 1678886400150000000,
      "durationNano": 200000000, // 200ms
      "tags": [
        {"key": "service.name", "value": "payment-service"}
      ]
    }
  ]
}

Tempo doesn’t just store these raw spans; it can build summary metrics from them. When you query for span metrics, you’re asking Tempo to act like a specialized Prometheus, but instead of scraping endpoints, it’s aggregating data from its own stored traces.

The core idea is that Tempo can extract startTimeUnixNano and durationNano for spans matching certain criteria (like operationName: checkout and service.name: checkout-service). It then uses these to calculate the end time (startTimeUnixNano + durationNano) and can build histograms of these durations. From these histograms, you can derive percentiles like P99.

To get this working, you need to configure Tempo to export these metrics. In your tempo.yaml configuration, under the metrics section, you’ll enable the Prometheus Remote Write exporter:

metrics:
  global_labels:
    region: "us-east-1"
  profiling_allocator: false
  server_address: ":9090" # Tempo's metrics endpoint
  # ... other metrics config
  remote_write:
    enabled: true
    address: "http://prometheus-server:9090/api/v1/write" # Your Prometheus remote write endpoint
    queue_config:
      capacity: 5000
      max_samples_per_send: 500

This tells Tempo to send aggregated metrics about its traces to your Prometheus instance. The key spans and their durations are transformed into Prometheus metrics, typically in the format of {operation_name="checkout", service_name="checkout-service", ...} duration_seconds. Tempo will automatically create histogram buckets for these durations.

Once configured, you can query these metrics in Grafana. For example, to get the P99 latency for the checkout operation, you’d use a query like this in your Prometheus data source:

histogram_quantile(0.99, sum by (le, operation_name, service_name) (rate(tempo_span_duration_seconds_bucket[5m])))

Here’s what’s happening:

tempo_span_duration_seconds_bucket: This is the histogram metric Tempo exports. It has labels like operation_name and service_name.
rate(...[5m]): We’re looking at the rate of events over the last 5 minutes, which is standard for Prometheus histogram percentiles.
sum by (le, operation_name, service_name): We sum up the buckets, keeping the le (less than or equal to) label that defines the histogram buckets, and the original service/operation labels so we can filter.
histogram_quantile(0.99, ...): This Prometheus function calculates the 99th percentile from the histogram data.

The real magic is that Tempo is doing the heavy lifting of extracting and bucketing the durations from the traces it stores. You don’t need to parse trace data in Prometheus or another system. Tempo’s internal processing pipeline identifies spans, extracts their timing information, and emits it as Prometheus-compatible metrics.

What most people miss is that Tempo doesn’t just store traces; it actively processes them in the background to generate these derived metrics. The tempo_span_duration_seconds metric isn’t something you instrument your application to send. It’s a metric Tempo generates from the traces it receives. The configuration in tempo.yaml simply tells Tempo where to send these generated metrics.

With this setup, you can now visualize and alert on the P99 latency of your critical operations directly from Tempo’s trace data, without the performance hit of processing every trace individually.

The next step is to explore how to filter these span metrics by specific tags beyond operation_name and service.name.

More Deep Dives in Tempo