You’re trying to understand how to transform log data using Vector’s VRL language, specifically when you need to remap specific values. It’s not just about simple replacements; VRL offers a powerful, structured way to manipulate log lines before they’re sent off.

Let’s imagine you’re getting logs from a web server, and you want to normalize the http.status_code field. Sometimes it’s a string like "200", other times it might be an integer 200. You also want to group common client errors (4xx) into a single category and server errors (5xx) into another.

Here’s what a raw log line might look like:

{
  "timestamp": "2023-10-27T10:00:00Z",
  "message": "GET /index.html HTTP/1.1",
  "http": {
    "status_code": 200,
    "request_method": "GET",
    "request_url": "/index.html"
  },
  "client": {
    "ip": "192.168.1.100"
  }
}

And another one:

{
  "timestamp": "2023-10-27T10:01:00Z",
  "message": "POST /submit HTTP/1.1",
  "http": {
    "status_code": "404",
    "request_method": "POST",
    "request_url": "/submit"
  },
  "client": {
    "ip": "10.0.0.5"
  }
}

Now, let’s craft a VRL transform to handle this. We’ll use the remap function, which is perfect for this kind of conditional value assignment.

[[transforms.log_remap]]
type = "remap"
inputs = ["raw_logs"] # Assuming your input source is named "raw_logs"
source = "http.status_code"
target = "http.status_class"
mapping = {
  "200" = "success",
  "201" = "success",
  "204" = "success",
  "400" = "client_error",
  "401" = "client_error",
  "403" = "client_error",
  "404" = "client_error",
  "500" = "server_error",
  "502" = "server_error",
  "503" = "server_error"
}
# Default value if no match is found in mapping
default = "unknown"

When this transform runs, the first log line, with http.status_code: 200, will have a new field http.status_class: "success" added. The second log line, with http.status_code: "404", will also get http.status_class: "client_error".

Notice how remap handles both string and integer inputs for http.status_code automatically. It performs a string comparison against the keys in your mapping table. If it doesn’t find a match, it falls back to the default value.

This remap transform is powerful because it allows you to abstract away specific numerical codes into more meaningful, categorized labels. This is incredibly useful for metrics and dashboards where you want to see trends like "percentage of client errors" rather than "percentage of 404s and 401s."

You can also chain VRL transforms. For example, after remapping the status code, you might want to extract the user agent string into its own field.

[[transforms.user_agent_extract]]
type = "regex_extract"
inputs = ["logs_with_status_class"] # Assuming the output of the remap transform
source = "message"
target = "http.user_agent"
pattern = ".*User-Agent: (?P<user_agent>.*)"

This would parse a log line like ... - - "GET /" 200 - "Mozilla/5.0 ..." "..." and put the User-Agent string into http.user_agent.

The remap function’s ability to work with different data types implicitly by casting them to strings for lookup is a subtle but crucial aspect. It means you don’t have to write explicit tostring() conversions for every possible input format of your status codes, making your VRL surprisingly concise.

Once you’ve categorized your HTTP status codes, the next logical step is often to aggregate these categories into metrics for easier analysis.

Want structured learning?

Take the full Vector course →