Tempo’s vParquet block format is a game-changer for trace storage, offering significant performance gains by fundamentally rethinking how trace data is organized and accessed.

Let’s watch Tempo ingest and query some traces with vParquet in action. Imagine we have a high-throughput microservice environment.

# Simulate ingesting traces
tempo-cli --backend tempo-redis ingest --trace-id 1234567890abcdef --service my-app --operation process --duration 50ms
tempo-cli --backend tempo-redis ingest --trace-id 1234567890abcdef --service my-app --operation db-query --duration 20ms --parent-id 1234567890abcdef
tempo-cli --backend tempo-redis ingest --trace-id 1234567890abcdef --service my-app --operation cache-lookup --duration 10ms --parent-id 1234567890abcdef

# Query for the trace
tempo-cli --backend tempo-redis query --trace-id 1234567890abcdef

Now, let’s dive into what makes vParquet so special.

Tempo’s default storage format, especially in older versions, often involved storing traces as individual objects or in formats that weren’t optimized for analytical queries. This meant that retrieving a specific trace, or aggregating data across many traces, could involve a lot of I/O and deserialization overhead. vParquet, by adopting the Apache Parquet format, introduces columnar storage. Instead of reading an entire trace object, Tempo can now read only the specific columns (like service name, operation name, duration, or timestamps) that are relevant to a query. This drastically reduces the amount of data that needs to be read from disk or object storage, leading to much faster query times, especially for large datasets.

The problem vParquet solves is the inefficient retrieval of trace data when performing analytical queries or looking for specific traces within massive datasets. Traditional trace storage often treated traces as monolithic blobs, making it hard to efficiently filter, aggregate, or even find specific spans without reading the entire trace. This leads to slow dashboards, long query execution times, and high I/O costs.

Internally, vParquet leverages the Parquet specification to store trace data. Each Parquet file is composed of multiple row groups, and within each row group, data is organized by column. When a query comes in, Tempo’s query engine identifies which columns are needed and then efficiently reads only those columns from the relevant row groups and Parquet files. This is analogous to a database index but applied to trace data. The data is compressed and encoded column by column, further reducing storage size and I/O.

The exact levers you control are primarily through your Tempo configuration, specifically related to how you configure your object store and the block_storage settings. While you don’t directly "write" Parquet files, you instruct Tempo to use vParquet as its underlying storage mechanism.

# Example Tempo configuration snippet
tempo:
  # ... other configurations
  storage:
    trace:
      backend: s3 # or gcs, azure, etc.
      s3:
        endpoint: "localhost:9000"
        bucket: "tempo-traces"
        region: "us-east-1"
        # ... other s3 specific configs
  # This is where vParquet is enabled
  block_storage:
    # The vParquet format is implicitly used when block_storage is configured.
    # Specific configuration for vParquet is often managed by Tempo's defaults
    # or through subtle tuning parameters depending on the backend.
    # For example, block_retention is a key parameter.
    block_retention: 168h # Keep blocks for 7 days
    # Optional: tune flush intervals, etc.
    # flush_interval: 10m

The most surprising aspect of vParquet’s performance gains isn’t just the columnar nature, but how it interacts with object storage’s inherent latency. Because Parquet files are structured and Tempo can read arbitrary byte ranges, it can fetch only the metadata and specific column chunks it needs for a query, minimizing the number of API calls and the total data transferred. This means even on high-latency object stores, query performance sees a dramatic improvement because the effective data read is so much smaller. It’s not just about reading less data; it’s about reading smarter data.

The next challenge you’ll likely encounter is optimizing your sampling strategies to ensure you’re capturing the most critical traces without overwhelming your storage and query capabilities, especially as vParquet makes querying larger datasets more feasible.

Want structured learning?

Take the full Tempo course →