TCP load balancing is less about distributing traffic and more about managing state across a fleet of services, often in ways that break common assumptions.

Let’s watch a request flow through a common setup: a client, an HAProxy load balancer, and a backend web server farm.

# Client initiates a TCP connection to HAProxy's public IP and port 80.
# HAProxy receives the SYN packet.
# HAProxy has a backend pool configured: 'webservers' with IPs 10.0.0.1:80, 10.0.0.2:80, 10.0.0.3:80.
# HAProxy, using round-robin by default, selects 10.0.0.1:80.
# HAProxy initiates a *new* TCP connection to 10.0.0.1:80.
# HAProxy forwards the client's data (HTTP request) to 10.0.0.1:80.
# 10.0.0.1 processes the request and sends a response back to HAProxy.
# HAProxy forwards the response back to the original client.
# The client receives the response.

This looks simple, but notice HAProxy establishes its own connection to the backend. It’s not just a pass-through. This is the core of load balancing: it’s a stateful proxy. It maintains the client’s connection state and the backend’s connection state independently.

The primary problem this solves is scalability and availability. If one web server goes down, HAProxy stops sending traffic to it and the other servers handle the load. If traffic spikes, you can add more web servers to the backend pool, and HAProxy will start distributing to them.

Internally, HAProxy (or any L4 load balancer) works by listening on a virtual IP address. When a SYN packet arrives, it consults its configuration to pick a backend server. This selection algorithm (round-robin, least connections, source IP hashing, etc.) is configurable. Once a backend is chosen, it creates a new TCP socket to that backend server and then bridges the data between the client socket and the backend socket. For TCP, this means it’s simply forwarding raw bytes. For HTTP (which runs over TCP), this byte-forwarding can be inspected by L7 load balancers to make more intelligent routing decisions.

The levers you control are primarily in the HAProxy configuration:

  • listen or frontend/backend blocks: Define the listening IP/port and the backend server pools.
    • listen myapp 0.0.0.0:80 defines a listening endpoint.
    • backend webservers defines a pool of servers.
    • server web1 10.0.0.1:80 check adds a server to the pool with a health check.
  • balance directive: The algorithm for selecting a backend server.
    • balance roundrobin (default)
    • balance leastconn
    • balance source (hashes client IP to pick a server, ensuring a client always hits the same backend server)
  • mode directive: tcp for L4 (raw TCP) or http for L7 (HTTP awareness).

The most surprising thing about TCP load balancing is that balance source is often the only way to make sticky sessions work reliably at L4, and it still has limitations. If your backend application expects a client to maintain a connection to the same server for its entire lifecycle (e.g., WebSockets, long-polling, or applications with in-memory session state tied to a specific server instance), a simple round-robin or least-connections approach will break it because the client’s subsequent packets might go to a different backend server. balance source hashes the client’s IP address to consistently pick the same backend server. However, if multiple clients are behind a single NAT gateway (common in enterprise networks), they will all share the same source IP and thus be directed to the same backend server, negating the load-balancing effect for those clients.

The next concept you’ll run into is handling SSL/TLS termination at the load balancer versus passing it through to the backend.

Want structured learning?

Take the full Tcp course →