OpenTelemetry data ingestion isn’t just about getting data in; it’s about transforming raw telemetry into a structured, queryable format that unlocks observability.

Let’s look at this from the perspective of receiving OTLP (OpenTelemetry Protocol) data. Imagine a stream of incoming requests, each containing traces and metrics. The Vector OTLP source acts as the entry point, listening for this data and preparing it for further processing.

Here’s how it looks in action. We’ll set up a simple Vector configuration to listen on a specific port and forward the ingested data to stdout for inspection.

[sources.otlp_listener]
type = "otlp"
address = "0.0.0.0:4317" # Default OTLP/gRPC port
protocol = "grpc"

[sinks.stdout_sink]
type = "blackhole" # Use blackhole to just see the data without sending it anywhere
inputs = ["otlp_listener"]

When you send OTLP data to 0.0.0.0:4317 (using a tool like curl with a valid OTLP payload or an OpenTelemetry SDK configured to export to this address), Vector will receive it. If you were to replace blackhole with a console sink pointing to stdout, you’d see the structured data.

The core problem the OTLP source solves is bridging the gap between the OTLP wire format and Vector’s internal event structure. It handles the deserialization of Protobuf messages (for gRPC) or JSON (for HTTP) into a common schema that Vector can then manipulate. This includes parsing trace spans, metric data points, and associated metadata.

Internally, the otlp source is a sophisticated parser. For gRPC, it leverages the tonic library to handle the OTLP Protobuf definitions. For HTTP, it parses the incoming request bodies. The key is that it doesn’t just dump the raw Protobuf or JSON; it maps these fields to Vector’s Event type. For traces, this means creating Trace events, and for metrics, Metric events. Each event type has specific fields for attributes, timestamps, names, and values, which are populated from the OTLP data.

The primary lever you control is the address and protocol. You can choose to listen on 0.0.0.0:4317 for gRPC, 0.0.0.0:4318 for OTLP/HTTP, or any other combination. The protocol can be explicitly set to grpc or http. Beyond that, the source is largely self-configuring with respect to the OTLP specification.

What most people overlook is that the OTLP source can also be configured to batch incoming data. By default, it processes data as it arrives. However, for high-throughput scenarios, you can enable batching by setting batch.timeout and batch.max_size. This allows Vector to group multiple OTLP messages into larger batches before passing them downstream. This significantly reduces overhead when sending data to sinks that can handle batched input, improving overall throughput and efficiency.

Once data is ingested and parsed by the OTLP source, the next logical step is to transform it further or route it to different backends.

Want structured learning?

Take the full Vector course →