Vector’s multiline aggregation is how it pieces together fragmented log events, like a detective reconstructing a shattered vase, to form a coherent narrative.

Let’s watch it in action with a common scenario: Java stack traces. Imagine a web server spitting out errors. Each line of a stack trace might be sent as a separate log event to Vector because the underlying logging library can’t buffer an entire multiline message.

Here’s a snippet of what that might look like in a raw log file:

2023-10-27 10:00:01 ERROR com.example.MyApp - Unhandled exception
java.lang.NullPointerException: Cannot invoke "String.length()" because "myString" is null
	at com.example.MyApp.processRequest(MyApp.java:45)
	at com.example.MyApp.handleRequest(MyApp.java:28)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
	at org.eclipse.jetty.server.Server.handle(Server.java:501)
	at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:383)
	at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:242)
	at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:668)
	at org.eclipse.jetty.util.thread.ScheduledExecutorScheduler$1.run(ScheduledExecutorScheduler.java:62)
	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:833)

If each of those lines were a separate event, you’d lose the context. Vector’s multiline transform is the solution.

The Mental Model: Building the Stack

The multiline transform works by looking for patterns to decide if a new log event has started or if the current line is a continuation of the previous one. It maintains a "current event" in memory. When a new line arrives:

  1. Does it start a new event? If a line matches a "start" pattern, the current event (if any) is considered complete, and this new line begins a fresh event.
  2. Is it a continuation? If a line doesn’t match a "start" pattern, and the previous line was not the end of a multiline event, it’s appended to the current event.
  3. Is it the end? Some patterns might signal the explicit end of a multiline event.

The key is defining these patterns. For Java stack traces, the common pattern is that a new error event starts with a timestamp and an ERROR or WARN level, while subsequent lines of the stack trace typically don’t start with such a prefix and often begin with whitespace or the at keyword.

Configuration in Action

Let’s configure Vector to handle this. We’ll use a file source, the multiline transform, and a stdout sink for demonstration.

# vector.toml

[sources.my_log_files]
type = "file"
include = ["/var/log/my_app.log"]

[transforms.stack_trace_aggregator]
type = "multiline"
inputs = ["my_log_files"]
# This regex defines what constitutes the START of a new log event.
# It looks for a timestamp (YYYY-MM-DD HH:MM:SS), followed by a log level (ERROR, WARN, INFO, DEBUG, TRACE), and then a message.
# The important part is that lines *not* matching this are considered continuations.
regex = '^\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2} (?:ERROR|WARN|INFO|DEBUG|TRACE) '
# If a line doesn't match the 'regex', it's appended to the previous event.
# This is the default behavior if 'mode' is not specified, but explicit is good.
# mode = "most_recent" # This is the default, keeps the current event

[sinks.output_logs]
type = "stdout"
inputs = ["stack_trace_aggregator"]
encoding = "json"

In this configuration:

  • sources.my_log_files: Reads from /var/log/my_app.log.
  • transforms.stack_trace_aggregator: This is where the magic happens.
    • regex = '^\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2} (?:ERROR|WARN|INFO|DEBUG|TRACE) ': This regular expression is crucial. It tells Vector that any line starting with a date-time pattern followed by a log level and a space marks the beginning of a new log event.
    • Because mode is not explicitly set, it defaults to most_recent. This means if a line doesn’t match the regex, Vector will assume it’s a continuation and append it to the event currently being built.
  • sinks.output_logs: Prints the processed, aggregated events to standard output as JSON.

When Vector processes the log file with the multiline transform configured as above, it will see the first line (2023-10-27 10:00:01 ERROR com.example.MyApp - Unhandled exception) as the start of a new event. Then, it will see the subsequent lines (java.lang.NullPointerException: ..., \tat com.example.MyApp..., etc.) do not match the regex. Therefore, it appends them to the first event. The result is a single JSON log event containing the entire stack trace.

Here’s what the output in stdout would look like (simplified for clarity):

{
  "timestamp": "2023-10-27T10:00:01.000Z",
  "level": "error",
  "message": "Unhandled exception\njava.lang.NullPointerException: Cannot invoke \"String.length()\" because \"myString\" is null\n\tat com.example.MyApp.processRequest(MyApp.java:45)\n\tat com.example.MyApp.handleRequest(MyApp.java:28)\n\t... (rest of the stack trace)"
}

Notice how the message field now contains the full, multi-line stack trace, with newline characters (\n) preserving the original formatting.

The multiline transform is highly flexible. You can define more complex patterns using regex, start_regex, end_regex, and mode. For instance, if your logs had a specific footer that indicated the end of a multiline block, you could use end_regex. The mode can also be set to previous to append the current line to the previous event, which is useful for specific log formats.

The subtle but powerful aspect of multiline is that it doesn’t just concatenate lines; it intelligently reconstructs events based on defined boundaries. This allows you to ingest logs that are inherently multi-line in nature, like stack traces, thread dumps, or even multipart HTTP request/response bodies, into a single, meaningful unit for analysis.

Once you’ve successfully aggregated your stack traces, the next challenge is often enriching them with contextual information, like Kubernetes pod names or IP addresses.

Want structured learning?

Take the full Vector course →