Splunk’s Heavy Forwarder (HF) load balancing isn’t just about distributing data; it’s a sophisticated mechanism for ensuring data availability and routing it to the right indexers, not just any indexers.

Let’s see it in action. Imagine you have a single HF and two indexers, idx1.splunk.com and idx2.splunk.com. Your HF’s inputs.conf might look like this for receiving data:

[splunktcp://9997]
disabled = false
port = 9997

And its outputs.conf would define the target indexers:

[tcpout]
defaultGroup = production_indexers

[tcpout:production_indexers]
server = idx1.splunk.com:9997, idx2.splunk.com:9997
autoLBFrequency = 30
useACKs = true

When data flows into the HF on port 9997, the tcpout:production_indexers stanza dictates where it goes. The server parameter lists the available indexers. autoLBFrequency = 30 means the HF will query the health of these indexers every 30 seconds and adjust its routing based on their responsiveness. useACKs = true ensures data isn’t lost if an indexer is temporarily down; the HF will hold onto the data until it receives an acknowledgment from an indexer that it has successfully stored the data.

The core problem this solves is preventing data loss and ensuring continuous data ingestion even when individual indexers or network paths become unavailable. Without this, a single point of failure in the indexer tier would stop all data flow from the HF. The HF acts as a smart buffer and router, understanding the state of the downstream indexers.

Internally, the HF maintains a connection to each indexer listed in its outputs.conf. It periodically probes these connections (controlled by autoLBFrequency). If an indexer fails to respond within a certain timeout period, the HF marks it as unavailable and stops sending data to it. When the indexer becomes available again, the HF will detect this and resume sending data. This dynamic re-routing is what keeps the data flowing.

The "weight" of an indexer isn’t explicitly configured in outputs.conf in this basic setup. The load balancing is primarily round-robin among available indexers. If you have idx1, idx2, and idx3, and all are healthy, the HF will distribute data roughly equally. If idx2 becomes unavailable, the HF will only send data to idx1 and idx3 until idx2 recovers. The autoLBFrequency determines how quickly the HF reacts to changes in indexer availability.

A common misconception is that autoLBFrequency directly controls how often data is sent. It doesn’t. It controls how often the HF checks if its list of available indexers needs to be updated. The actual data sending happens as data arrives at the HF. Another subtlety is that the HF attempts to maintain a minimum number of connections to its target group. If you have 10 indexers and the HF has established connections to 9, it will actively try to establish a 10th connection, even if the load on the existing 9 is low.

What most people don’t realize is that the order of servers in the server list in outputs.conf can matter during initial startup or after a significant network event. While the load balancing is dynamic, the HF might initially favor indexers listed earlier in the configuration if multiple are available and healthy. This is a minor effect, but can be observed in very specific, transient scenarios.

The next concept you’ll encounter is how to configure different indexer groups for different data types or sources, and how to manage those groups using outputs.conf stanzas and props.conf routing.

Want structured learning?

Take the full Splunk course →