Grafana Loki, the log aggregation system, can ingest logs from various sources, but a common challenge is efficiently forwarding logs from applications that might not have direct Loki integrations. This is where Vector, a high-performance observability data pipeline, shines. By configuring Vector to act as an intermediary, you can reliably send logs from almost any application to your Loki instance.

Let’s see Vector in action, collecting logs from a simple Nginx access log file and sending them to Loki.

First, ensure you have Vector installed. A quick way to get started is with Docker:

docker run -d --name vector \
  -v $(pwd)/vector.toml:/etc/vector/vector.toml \
  timberio/vector:latest

Now, create a vector.toml file in the same directory with the following configuration:

[sources.nginx_logs]
type = "file"
include = ["/var/log/nginx/access.log"] # This path is inside the container

[transforms.parse_nginx]
type = "remap"
inputs = ["nginx_logs"]
source = '''
  # Basic parsing for common Nginx log format
  # Example log: 127.0.0.1 - - [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326
  log = parse_regex!(.message, r#^(?P<remote_addr>[\d\.]+) - (?P<remote_user>[\w\.-]+) \[(?P<timestamp>.*?)\].*?\"(?P<method>\w+) (?P<path>.*?) (?P<protocol>HTTP\/\d\.\d)\" (?P<status>\d+) (?P<body_bytes_sent>\d+)$#)
  .timestamp = parse_timestamp!(log.timestamp, "%d/%b/%Y:%H:%M:%S %z")
  .message = format!("{} {} {}", log.method, log.path, log.protocol)
  .http.status_code = parse_int!(log.status)
  .url.path = log.path
  .client.address = log.remote_addr
  .log_level = "info" # Default log level if not parsed
'''

[sinks.loki]
type = "loki"
inputs = ["parse_nginx"]
endpoint = "http://localhost:3100/loki/api/v1/push" # Assuming Loki is running on localhost:3100
labels = { job = "nginx", instance = "docker-nginx" }
encoding = { codec = "json" }

To make this work with the docker run command, we need to simulate an Nginx log file. You can create a dummy access.log file:

mkdir -p ./nginx_logs
echo '127.0.0.1 - - [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326' > ./nginx_logs/access.log

And then update your vector.toml to point to this mounted volume:

# ...
sources.nginx_logs]
type = "file"
include = ["/var/log/nginx/access.log"] # This path is inside the container
# ...

[sinks.loki]
type = "loki"
inputs = ["parse_nginx"]
endpoint = "http://localhost:3100/loki/api/v1/push"
labels = { job = "nginx", instance = "docker-nginx" }
encoding = { codec = "json" }

And modify your docker run command to mount this directory:

docker run -d --name vector \
  -v $(pwd)/vector.toml:/etc/vector/vector.toml \
  -v $(pwd)/nginx_logs:/var/log/nginx \
  timberio/vector:latest

Now, if you have Loki running (e.g., via Docker Compose), Vector will pick up the log line from your access.log file, parse it into structured data, and send it to Loki. You can then query this log in Grafana by selecting the job="nginx" and instance="docker-nginx" labels.

Vector’s power lies in its flexible configuration, allowing you to define complex data pipelines. The sources section defines where data comes from (files, network ports, Kafka, etc.). transforms are where you manipulate the data: parsing, filtering, enriching, or aggregating. Finally, sinks define where the processed data goes (Loki, Prometheus, Kafka, files, etc.).

The remap transform using the remap language is particularly potent. It allows for programmatic manipulation of log events. In our Nginx example, parse_regex! extracts fields, parse_timestamp! converts the string timestamp into a recognized format, and format! reconstructs the message. We also map extracted fields to Loki’s common schema (http.status_code, url.path, client.address).

A subtle but critical aspect of Vector’s operation, especially when dealing with structured logs or complex parsing, is its internal event representation. Vector uses a structured event format that can include arbitrary key-value pairs, nested structures, and different data types. When you parse a log line, you’re not just getting a string; you’re populating this structured event. The remap language allows you to access and manipulate these fields directly. For instance, log.timestamp refers to a field named timestamp within a nested object named log, which was created by the parse_regex! function. This structured approach is what enables powerful transformations and efficient forwarding to systems like Loki, which leverage these structured fields for querying and indexing.

The next step is usually exploring more advanced Vector transformations like filtering or aggregation before sending to Loki.

Want structured learning?

Take the full Vector course →