The most surprising thing about Splunk Forwarders is that they’re not just about sending data; they’re about intelligently curating data before it even leaves the source.
Let’s see this in action. Imagine you have a web server generating massive access logs.
# On the web server (simulated log entry)
echo "192.168.1.100 - - [10/Oct/2023:10:00:01 -0700] \"GET /index.html HTTP/1.1\" 200 1234 \"-\" \"Mozilla/5.0\"" >> /var/log/apache2/access.log
A Splunk Heavy Forwarder (HF) sitting on that server, configured with an inputs.conf like this:
[monitor:///var/log/apache2/access.log]
disabled = false
index = web_logs
sourcetype = apache_access
crcSalt = <SOURCE>
and a props.conf:
[apache_access]
TIME_PREFIX = \d{2}/\w{3}/\d{4}:\d{2}:\d{2}:\d{2}
TIME_FORMAT = %d/%b/%Y:%H:%M:%S
MAX_TIMESTAMP_LOOKAHEAD = 30
SHOULD_LINEMERGE = false
TRUNCATE = 10000
will ingest that log, parse the timestamp, and send it to a Splunk Indexer. But it can do much more. It can transform the data before sending. For instance, if you only care about 200 status codes and want to drop the rest, you’d add this to props.conf on the HF:
[apache_access]
# ... previous settings ...
TRANSFORM = drop_non_200_status
And define the transform in transforms.conf on the HF:
[drop_non_200_status]
REGEX = ^(?P<ip>\S+) \S+ \S+ \[(?P<time>[^\]]+)\] \"(?P<method>\S+) (?P<url>\S+) (?P<protocol>\S+)\" (?P<status>\d+) (?P<bytes>\S+) \"(?P<referrer>\S+)\" \"(?P<useragent>.*)\"$
DEST_KEY = queue
FORMAT = nullQueue
MV_ADD = true
PRESERVE_EXTRACTED_FIELDS = true
DEFAULT_VALUE = nullQueue
MATCH_ONLY = true
CONDITION = status=200
Now, only lines with status=200 make it to the indexer. This is data curation at the edge.
The Universal Forwarder (UF), on the other hand, is designed for pure, lightweight data collection and forwarding. Its inputs.conf would look similar but lack the props.conf and transforms.conf directives for local processing:
[monitor:///var/log/apache2/access.log]
disabled = false
index = web_logs
sourcetype = apache_access
crcSalt = <SOURCE>
The UF simply tails the file, reads new lines, and sends them. Any parsing, indexing-time transformations, or filtering happens on the indexer. This is crucial because UFs are typically deployed on thousands of endpoints where you don’t want the overhead of heavy processing.
The Problem They Solve:
Splunk’s architecture relies on separating data collection from data processing.
- Universal Forwarders (UFs): Act as efficient, low-impact agents on your data sources (servers, endpoints, network devices). They collect raw data, add minimal metadata (like host, source, sourcetype), and forward it reliably to intermediate forwarders or directly to indexers. Their primary job is to get data into Splunk without consuming significant resources on the source machine.
- Heavy Forwarders (HFs): Are more powerful. They can receive data from multiple UFs, perform local parsing and filtering (using
props.confandtransforms.conf), route data to different indexers based on rules, and even act as load balancers for indexers. They are often used in larger deployments to centralize data collection, apply initial data enrichment, and reduce the load on indexers.
Internal Mechanics:
- UFs: Use the Splunk Stream (a network protocol) to send data. They maintain a state of what they’ve read from each file (using
crcSaltto identify unique file instances) to avoid sending duplicate data. They are built on C++ for maximum efficiency. - HFs: Are essentially a subset of the Splunk Enterprise application. They have the full Splunk processing pipeline available locally, including parsing, filtering, and routing capabilities. They can also function as indexers, though this is typically discouraged for performance reasons if they are also doing heavy forwarding.
Choosing the Right One:
- Use UFs when: You have many data sources, need minimal impact on the source machine, and want to centralize processing on your indexers or HFs. This is the most common scenario for log collection from application servers, workstations, and network devices.
- Use HFs when: You need to pre-process, filter, or route data before it hits your indexers. This is common for:
- Consolidating data from many UFs into a smaller number of HFs before sending to indexers.
- Filtering out noisy or irrelevant data at the source to save network bandwidth and indexer resources.
- Enriching data with local context before forwarding.
- Acting as a load balancer for indexers.
The One Thing Most People Don’t Know:
Heavy Forwarders can actually index data locally if configured to do so, effectively acting as a combined forwarder and indexer. However, this is generally a bad idea in production environments because the processing demands of indexing can interfere with the forwarder’s ability to efficiently collect and forward data from other sources. It’s a capability that exists but should be avoided for performance and scalability reasons.
The next step after understanding forwarder types is often configuring index-time parsing and field extraction to make your data searchable.