Vector’s rate-limiting transform is surprisingly effective at preventing downstream systems from being overwhelmed by log volume, but most people misunderstand its core mechanism. It doesn’t just drop events; it actively transforms them into a more digestible form when exceeding a threshold.
Let’s see it in action. Imagine you have a noisy application generating tons of identical "user login failed" messages.
[sources.my_app_logs]
type = "file"
include = ["/var/log/my_app.log"]
[transforms.throttle_logins]
type = "throttle"
inputs = ["my_app_logs"]
# Limit to 10 events per second, and if exceeded, aggregate into a single event
# representing the count and last seen timestamp of the suppressed events.
rate = 10
period = "1s"
group_by = ["message"] # Group by the log message content
drop_on_limit = false # Don't drop, transform instead
# This is the key: what to do when the rate is exceeded.
# We'll create a new "throttled_event" field.
# The "count" will be the number of events that would have been dropped.
# The "last_timestamp" will be the timestamp of the most recent event that was throttled.
actions = { throttle = [{ field = "throttled_event", value = { count = "count", last_timestamp = "timestamp", original_message = "message" } }] }
[sinks.stdout]
type = "console"
inputs = ["throttle_logins"]
format = "json"
Here’s what happens when your my_app.log starts spamming:
2023-10-27T10:00:00.123Z INFO user 123 login failed
2023-10-27T10:00:00.124Z INFO user 123 login failed
2023-10-27T10:00:00.125Z INFO user 123 login failed
... (many more) ...
2023-10-27T10:00:00.130Z INFO user 123 login failed
Without throttling, you’d see 10+ identical JSON logs in your sink. With the above config, after the first 10 events within the second, you’ll see something like this instead:
{
"timestamp": "2023-10-27T10:00:00.130Z",
"message": "user 123 login failed",
"throttled_event": {
"count": 95,
"last_timestamp": "2023-10-27T10:00:00.130Z",
"original_message": "user 123 login failed"
}
}
Notice how the original log message is still present, but now it has an extra throttled_event field. This field tells you that 95 other "user 123 login failed" messages occurred within that second, and the timestamp of the last one that was suppressed. The count here is dynamic; it’s the number of events in addition to the first one that were suppressed within that period.
The throttle transform operates on a sliding window defined by period. For each unique group_by key (in our case, the message content), it counts how many events arrive within that window. If the count exceeds rate, it triggers the specified actions.
The actions field is where the magic happens. drop_on_limit = false tells Vector not to discard the event entirely, but to perform an action. The throttle action is a special case. When drop_on_limit is false, the throttle action will create a new object (named throttled_event in our example) containing aggregate data about the suppressed events. The count variable within the value clause refers to the number of events that would have been dropped within that period for that group_by key, excluding the first event that was allowed through. The last_timestamp captures the timestamp of the most recent event that was suppressed.
This group_by clause is crucial. If you omit it, Vector would try to rate-limit all events globally, which is rarely what you want. By grouping by message, you’re saying, "If I see too many of the exact same log message within a second, aggregate them." You could group by other fields like host, level, or a combination of fields using Vector’s templating language for more granular control. For example, group_by = ["host", "message"] would rate-limit identical messages on a per-host basis.
The actions syntax is powerful. You can define what happens when the rate is exceeded. The throttle action is one option, specifically designed for this aggregation scenario. Other actions, like log, would simply log a message indicating throttling occurred. The value clause uses Vector’s templating language, allowing you to reference fields from the incoming event (timestamp, message) and special variables like count (the number of suppressed events) and rate (the configured rate limit).
The most impactful configuration here is drop_on_limit = false combined with the throttle action. It means you’re not losing information; you’re transforming high-volume, repetitive data into a concise summary, making it much easier for your downstream analysis tools (like Elasticsearch, Splunk, or a SIEM) to ingest and process without getting overloaded. This prevents alert storms and reduces storage costs while still providing visibility into the fact that an event is occurring frequently.
If you forget to set drop_on_limit = false and only define actions, Vector will still drop the events after performing the action. The throttle action is specifically designed to be used when drop_on_limit is false to augment the event rather than replace it.
The next challenge will be correlating these aggregated "throttled" events back to the original, individual events when debugging.