Tempo can drop spans based on attributes during ingestion, saving storage and improving query performance by only keeping what’s relevant.
Let’s see this in action. Imagine we’re ingesting traces for a web application, and we want to filter out all spans related to health checks, as they’re noisy and irrelevant to user-facing performance.
Here’s a snippet of a Tempo configuration file (tempo.yaml) demonstrating this:
# ... other tempo configurations ...
ingester:
# ... other ingester configurations ...
# This section configures how spans are filtered *before* they are
# written to storage.
span_filter:
# 'mode' can be 'keep' or 'drop'. 'drop' means we discard spans
# that match the criteria.
mode: drop
# 'expression' uses Tempo's attribute filtering language.
# This expression targets spans where the 'name' attribute is
# exactly 'health_check'.
expression: '{name="health_check"}'
# ... rest of tempo configuration ...
In this example, Tempo’s ingester component is configured with a span_filter. The mode is set to drop, meaning any span that matches the expression will be discarded. The expression itself is a simple selector: {name="health_check"}. This tells Tempo to look for spans where the name attribute is precisely "health_check".
This filtering happens before spans are even written to the underlying object storage (like S3, GCS, or MinIO). When a trace is sent to Tempo, the ingester processes each span. If a span’s attributes match the expression and the mode is drop, that span is simply not persisted. This means it won’t consume storage space and won’t appear in search results or trace views.
The expression language is powerful and flexible. It’s based on OpenTelemetry’s attribute syntax, allowing you to filter on any attribute present in your spans. For instance, you could filter out spans based on:
- Service name:
expression: '{service.name="internal-cron-job"}'(drop spans from a specific background service) - HTTP status code: `expression: '{http.status_code=~"5…"}`` (drop spans for all 5xx server errors)
- Span kind:
expression: '{span.kind="server"}'(if you only wanted client-side spans) - Combined attributes:
expression: '{service.name="user-api", http.method="OPTIONS"}'(drop OPTIONS requests from the user-API service)
The mode can also be set to keep, which is the inverse. If mode is keep, only spans matching the expression will be persisted. This is useful for highly targeted tracing, where you only want to store specific types of operations.
The core problem this solves is managing the sheer volume of trace data generated by modern distributed systems. Without filtering, even a moderately busy application can produce terabytes of trace data daily. Storing all of it is expensive, and querying it becomes incredibly slow. By dropping irrelevant spans at the ingress, you significantly reduce storage costs and dramatically improve the speed and relevance of your trace analysis. It’s a proactive approach to cost and performance management for your observability data.
The expression syntax supports various operators:
=for exact string match!=for not equal=~for regex match!~for not regex match=for numeric equality!=for numeric inequality>,<,>=,<=for numeric comparison
You can also combine conditions with commas, which act as an AND operator.
A common misconception is that filtering happens at query time. While Tempo does support query-time filtering, attribute filtering at ingest is a fundamental storage and cost optimization. It’s not about making queries faster after data is stored; it’s about ensuring only the data you care about ever gets stored.
If you configure span_filter with mode: drop and an expression like {http.status_code=~"5.."} to drop all 5xx errors, and then later decide you do want to see those 5xx errors, you’ll discover that the data is gone forever.