Vector’s aws_s3 and azure_blob sinks can archive logs to cloud object storage, but they can also be configured to write to Google Cloud Storage (GCS) using the gcs sink. This allows for a unified, cost-effective, and scalable log archiving strategy across multiple cloud providers.
Let’s see what this looks like in practice. Imagine we have Vector collecting logs from various sources and we want to archive them. Here’s a simplified vector.toml configuration:
[sources.my_app_logs]
type = "file"
include = ["/var/log/my_app/*.log"]
[sinks.archive_to_s3]
type = "aws_s3"
inputs = ["my_app_logs"]
bucket = "my-log-archive-bucket-s3"
key = "app_logs/{{.host}}/{{.timestamp | format_timestamp(\"%Y/%m/%d/%H\")}}.log"
encoding = "json"
compression = "gzip"
[sinks.archive_to_gcs]
type = "gcs"
inputs = ["my_app_logs"]
bucket = "my-log-archive-bucket-gcs"
key = "app_logs/{{.host}}/{{.timestamp | format_timestamp(\"%Y/%m/%d/%H\")}}.log"
encoding = "json"
compression = "gzip"
[sinks.archive_to_azure]
type = "azure_blob"
inputs = ["my_app_logs"]
container = "my-log-archive-container-azure"
name = "app_logs/{{.host}}/{{.timestamp | format_timestamp(\"%Y/%m/%d/%H\")}}.log"
encoding = "json"
compression = "gzip"
In this configuration, we’re taking logs from /var/log/my_app/*.log and sending them to three different sinks: S3, GCS, and Azure Blob Storage. The key (or name for Azure) uses templating to organize the logs by host and then by date and hour, which is a common and effective pattern for log archiving. The encoding and compression settings ensure that the logs are stored in a compact and easily queryable format.
The core problem Vector’s cloud sinks solve is efficient and reliable log retention. Instead of managing your own complex log storage infrastructure, you offload it to battle-tested, highly available, and cost-effective object storage services. Vector handles the batching, retries, and formatting, ensuring that logs are delivered and stored without manual intervention. This frees up your infrastructure and operational teams to focus on core application development.
Internally, each of these sinks operates by collecting events from its inputs, buffering them, and then periodically flushing them to the configured object storage. The encoding transform (like json, text, common, etc.) dictates how the raw event data is serialized, and compression (like gzip, snappy, zstd) reduces the data size before upload. The key or name template is crucial for organizing your data; it allows you to define a logical structure within your object storage bucket, making it easier to retrieve specific logs later. For example, {{.host}} injects the hostname of the machine where Vector is running, and {{.timestamp | format_timestamp(...)}} uses the event’s timestamp to create a time-based directory structure.
The specific levers you control are primarily the bucket (or container), the key/name template, and the encoding/compression settings. Beyond these, there are crucial configuration options related to performance and reliability. For instance, batch_size controls how many events are grouped into a single file upload, and timeout governs how long Vector will wait for an upload to complete. For S3, you can also specify acl, storage_class, and server_side_encryption. GCS offers similar options for content_type and metadata, while Azure Blob Storage allows for content_encoding and content_type.
A subtle but critical aspect of these sinks is how they handle retries and error conditions. When an upload fails (e.g., due to network issues, temporary API throttling, or credential problems), Vector doesn’t immediately give up. It employs a configurable retry mechanism, often with exponential backoff, to attempt the upload again. This resilience is paramount for log archiving, as losing even a small amount of log data can be detrimental. The max_retries and retry_initial_interval parameters are key here. However, it’s important to monitor for persistent failures, as these can indicate underlying issues that won’t resolve themselves.
If you configure multiple cloud sinks for the same input, Vector will dutifully send the same logs to each destination. This is a powerful pattern for multi-cloud disaster recovery or for satisfying compliance requirements that mandate data redundancy across different providers. The cost implications, of course, must be carefully considered.
The next step after ensuring reliable log archiving is to think about how you’ll process those archived logs.