Vector’s Lua transform lets you inject arbitrary processing logic into your data pipelines, but its real power comes from how it cleverly sidesteps traditional ETL bottlenecks.
Let’s see it in action. Imagine you’ve got incoming logs from a web server, and you want to enrich them with some custom geolocation data based on IP addresses. Normally, you’d pull that data into a separate service, do the lookup, and then merge it back. With Vector, you can do it all inline.
Here’s a simplified example of a Vector configuration that uses a Lua transform to achieve this:
[sources.apache_logs]
type = "file"
include = ["/var/log/apache2/access.log"]
[transforms.enrich_ip]
type = "lua"
inputs = ["apache_logs"]
source = """
function process_event(event)
local ip = event.get("remote_addr")
if ip then
-- In a real scenario, this would be a lookup against a GeoIP database
local geo_data = {
country = "Unknown",
city = "Unknown"
}
if ip == "192.168.1.100" then
geo_data.country = "Canada"
geo_data.city = "Toronto"
elseif ip == "10.0.0.5" then
geo_data.country = "USA"
geo_data.city = "New York"
end
event.insert("geo", geo_data)
end
return event
end
"""
[sinks.console]
type = "console"
inputs = ["enrich_ip"]
encoding.codec = "json"
In this setup, apache_logs is our source, reading from an Apache access log file. The enrich_ip transform is where the magic happens. It’s a lua transform, meaning it will execute the provided Lua script for each incoming event. The source field contains our Lua code.
The process_event(event) function is the core. Vector calls this for every log line it receives. Inside, we extract the remote_addr field. If an IP address is found, we simulate a geolocation lookup. In a production system, you’d replace this hardcoded logic with a call to a database or a cached lookup. We then insert a new field, geo, containing the country and city into the event. Finally, return event sends the modified event downstream. The console sink then prints these enriched events as JSON.
This inline processing is powerful because it keeps your data flow contiguous. You avoid the overhead of sending data out to another service and back, which can be a significant bottleneck for high-volume streaming data. Vector manages the Lua execution within its own process, leveraging its highly optimized Rust core for performance.
The Lua script itself operates on event objects. You can get() fields, insert() new fields (or even nested tables), and remove() existing ones. Vector provides robust bindings to interact with these events directly. The event.insert("geo", { country = "Canada", city = "Toronto" }) call demonstrates adding a new, structured field.
The most surprising thing about Vector’s Lua transform is how it handles state. While the Lua environment itself is ephemeral for each process_event call, you can persist state across events using Vector’s internal state management capabilities, effectively allowing you to build complex aggregations or maintain caches directly within your Lua scripts without resorting to external databases for every lookup. For instance, you could load a large GeoIP lookup table into a Lua variable once when the transform starts, and then use that variable within process_event for each event, significantly speeding up lookups compared to hitting a database every time.
Once you’ve mastered custom logic with Lua, you’ll likely want to explore how to combine multiple Lua transforms or integrate them with other transform types for more complex data manipulation.