The Tempo compactor is the unsung hero of your distributed tracing storage, diligently merging small trace blocks into larger ones and purging old data, a process surprisingly complex and often misunderstood.

Let’s see it in action. Imagine you’ve just ingested a batch of traces. Tempo’s object storage (like S3 or GCS) might have hundreds of tiny files representing these traces.

// Example: Object storage before compaction
s3://my-tempo-bucket/tempo/2023/10/27/15/00/trace-part-1.json
s3://my-tempo-bucket/tempo/2023/10/27/15/00/trace-part-2.json
s3://my-tempo-bucket/tempo/2023/10/27/15/00/trace-part-3.json
// ... many more small files

The compactor’s job is to find these small files, group them by time and tenant, and rewrite them into larger, more efficient "blocks."

// Example: Object storage after compaction
s3://my-tempo-bucket/tempo/2023/10/27/15/block-001.json
s3://my-tempo-bucket/tempo/2023/10/27/15/block-002.json
// ... fewer, larger files

This process directly impacts two critical aspects: retention and query performance.

Retention: The compactor is responsible for deleting data that has passed its retention period. It doesn’t just delete individual files; it deletes entire blocks. When a block is older than your configured retention.days, the compactor will mark it for deletion. This is crucial for managing storage costs and complying with data policies.

Block Merging: Why merge small files into larger blocks? Efficiency. Reading thousands of tiny files is significantly slower and more resource-intensive than reading a few large ones. Compaction reduces the number of I/O operations your Tempo querier needs to perform, leading to faster trace retrieval. It also reduces the "metadata overhead" in your object storage.

The core configuration for the compactor resides within your Tempo configuration file, typically tempo.yaml.

compactor:
  # Interval between compactor runs. Default is 30m.
  # Setting this too low can lead to excessive load.
  # Setting it too high means compaction is delayed.
  run_interval: 1h

  # Retention configuration
  retention:
    # Number of days to retain traces.
    # This is the primary driver for data deletion.
    days: 7

  # Block configuration
  block_retention:
    # Minimum size of a block before it's considered for compaction.
    # Smaller files will be grouped into larger ones.
    min_block_size: 100MB

    # Maximum size of a block. If a block reaches this size,
    # it will be finalized and a new one started.
    max_block_size: 1GB

    # Maximum age of a block before it's finalized, regardless of size.
    # Prevents very old, small blocks from lingering.
    max_block_age: 24h

  # Tenant configuration (if you use multi-tenancy)
  # You can set specific retention periods per tenant.
  # For example:
  # tenant_retention:
  #   tenant-a:
  #     days: 3
  #   tenant-b:
  #     days: 14

The run_interval controls how often the compactor wakes up to do its work. A common mistake is setting this too low, e.g., 1m. This can overwhelm your object storage and Tempo itself with constant compaction activity. For most deployments, 1h or 6h is a more reasonable starting point.

The retention.days is the absolute cutoff. Once a block’s data is older than this, it’s marked for deletion by the compactor. This is a hard limit; if you set it to 7, data older than 7 days will eventually disappear.

min_block_size, max_block_size, and max_block_age govern how blocks are formed. The compactor looks for small, uncompacted "parts" (individual trace files) and merges them. It continues merging until a block reaches max_block_size or max_block_age. If the current block being built reaches max_block_size before max_block_age, it will be finalized and a new block started. Conversely, if max_block_age is reached and the block is still smaller than max_block_size, it will still be finalized. The min_block_size is more of a threshold; files smaller than this are prime candidates for merging into a larger block.

A common, subtle point is that compaction is not instantaneous. When the compactor runs, it identifies blocks to merge or delete. The actual deletion of data in object storage is eventually handled by the compactor’s lifecycle, but there’s a lag. You won’t see space freed up immediately after changing retention.days. Also, Tempo doesn’t delete blocks directly; it marks them as deleted within its internal state, and the object storage lifecycle rules (if configured) or the compactor itself will eventually remove the underlying objects.

The compactor actively manages the lifecycle of your trace data by merging small, inefficient files into larger, more query-friendly blocks and by enforcing your configured retention policies.

The next thing you’ll likely encounter is managing the compactor’s resource usage, particularly CPU and network bandwidth, on the Tempo instances where it runs.

Want structured learning?

Take the full Tempo course →