Load Balancer Design: L4, L7, and Consistent Hashing (2026)

Load balancers are the unsung heroes of scalable systems, but thinking of them as just "distributors" misses the fundamental trade-off they make between network visibility and distribution intelligence.

Let’s see this in action. Imagine a simple web application. We’ve got a load balancer sitting in front of three identical web servers (10.0.0.10, 10.0.0.11, 10.0.0.12).

Client -> Load Balancer (10.0.0.1) -> Web Server 1 (10.0.0.10)
                                 -> Web Server 2 (10.0.0.11)
                                 -> Web Server 3 (10.0.0.12)

When a client requests http://example.com, the load balancer intercepts that request.

L4 Load Balancing: The Network-Level Switch

At Layer 4 (Transport Layer), the load balancer looks at the IP address and port of the incoming request. It doesn’t care what’s inside the packet, only where it’s going.

How it works:

Packet Arrives: A TCP SYN packet for 10.0.0.1:80 arrives at the load balancer.
Decision: The load balancer picks a backend server (say, 10.0.0.10) based on a simple algorithm (like round-robin).
Rewriting: It changes the destination IP address from 10.0.0.1 to 10.0.0.10.
Forwarding: The packet is sent to 10.0.0.10.
Return Traffic: Crucially, the return traffic from 10.0.0.10 is often configured to go directly back to the client, bypassing the load balancer. This is known as Direct Server Return (DSR) or Floating IP, and it’s a performance optimization. If not using DSR, the backend server sends traffic back to the load balancer, which then rewrites the source IP to 10.0.0.1 before sending it to the client.

What this solves: Distributes incoming network connections across a pool of servers. It’s fast because it’s minimal processing.

Levers:

Algorithm: Round Robin, Least Connections, IP Hash.
Backend Pool: Which servers are available.

Example Config (Conceptual - iptables on Linux):

# Assume 10.0.0.1 is the load balancer's IP, 10.0.0.10-12 are backends
# This is a simplified example; real L4 LB software is more complex.
# For DSR, the backend servers would need to "own" the Virtual IP (10.0.0.1).

# Forward traffic to backend servers
iptables -t nat -A PREROUTING -d 10.0.0.1 -p tcp --dport 80 -m statistic --mode random --probability 0.33 -j DNAT --to-destination 10.0.0.10:80
iptables -t nat -A PREROUTING -d 10.0.0.1 -p tcp --dport 80 -m statistic --mode random --probability 0.5 -j DNAT --to-destination 10.0.0.11:80
iptables -t nat -A PREROUTING -d 10.0.0.1 -p tcp --dport 80 -j DNAT --to-destination 10.0.0.12:80

# If NOT using DSR, you'd need SNAT rules for return traffic
# iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE

L7 Load Balancing: The Application-Aware Proxy

Layer 7 (Application Layer) load balancers are more sophisticated. They understand the content of the traffic, like HTTP headers, URLs, cookies, and even application-specific data.

How it works:

Full Connection: The load balancer establishes a full TCP connection with the client. It then inspects the application-level data (e.g., the HTTP request).
Decision: Based on rules defined in its configuration, it decides which backend server to send the request to. This could be based on the URL path (/api goes to API servers, /images goes to image servers), a specific cookie (stickiness), or even the payload content.
New Connection: The load balancer establishes a new TCP connection to the chosen backend server and forwards the request.
Return Traffic: All return traffic from the backend server goes back through the load balancer, which then forwards it to the client. This proxying behavior allows for more advanced features but adds latency.

What this solves: Enables intelligent routing based on application context, provides features like SSL termination, request/response modification, and granular traffic control.

Levers:

Rules/Conditions: URL path, host header, cookies, request method, header values.
SSL Termination: Offload SSL/TLS encryption/decryption.
Content Switching: Route based on request content.
Health Checks: More sophisticated checks (e.g., expecting a 200 OK from /healthz).

Example Config (Conceptual - Nginx):

http {
    upstream backend_servers {
        server 10.0.0.10:80;
        server 10.0.0.11:80;
        server 10.0.0.12:80;
    }

    upstream api_servers {
        server 10.0.0.20:8080;
        server 10.0.0.21:8080;
    }

    server {
        listen 80;
        server_name example.com;

        location / {
            proxy_pass http://backend_servers;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header X-Forwarded-Proto $scheme;
        }

        location /api/ {
            proxy_pass http://api_servers;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header X-Forwarded-Proto $scheme;
        }
    }
}

Consistent Hashing: The Smarter Distribution

Both L4 and L7 load balancers need a strategy to pick a backend server. Simple round-robin can lead to uneven distribution if requests are very different in size or processing time. IP hashing distributes based on the client’s IP, which is good for stickiness but can lead to uneven loads if many clients share a single IP (like behind a corporate NAT).

Consistent hashing is a technique that aims to minimize remapping when the set of backend servers changes. Instead of a simple modulo operation (e.g., hash(key) % num_servers), it maps both servers and keys onto a conceptual ring. A key is assigned to the first server encountered clockwise on the ring.

What this solves: When a server is added or removed, only a small fraction of keys need to be remapped, unlike traditional hashing where almost all keys would need remapping. This is crucial for high-availability systems where servers might fail or be added/removed frequently.

How it works:

Ring Creation: A hash function (e.g., SHA-1) is used to map each server’s identifier (IP address and port) to a point on a virtual ring.
Key Mapping: When a request comes in (identified by a user ID, session ID, or any other key), its hash is also calculated and mapped to a point on the same ring.
Server Assignment: The request is assigned to the first server found clockwise from the request’s point on the ring.
Adding/Removing Servers: If a server is added, it takes responsibility for keys between its point and the next server clockwise. If a server is removed, its responsibilities are taken over by the next server clockwise. This localized change is the key benefit.

Levers:

Number of Virtual Nodes: To improve distribution, each physical server can be represented by multiple "virtual nodes" on the ring. This spreads the load more evenly.
Hashing Algorithm: The choice of hash function impacts the distribution and collision probabilities.

Example (Conceptual - redis-trib uses consistent hashing for sharding): Imagine a ring from 0 to 1000. Servers: S1 (10.0.0.10), S2 (10.0.0.11), S3 (10.0.0.12) Virtual Nodes: For simplicity, let’s just use the server IPs as keys on the ring. hash(S1) -> 250 hash(S2) -> 500 hash(S3) -> 750

Now, a request with key user_abc comes in: hash(user_abc) -> 400. On the ring, 400 is between 250 (S1) and 500 (S2). The next server clockwise from 400 is S2. So, user_abc is routed to S2.

If S2 fails, its keys between 250 and 750 are now routed to S3. This is a much smaller set of keys than if we were using simple modulo hashing.

The act of picking a backend server, especially in L7, involves parsing HTTP headers, potentially decrypting SSL, and then applying a routing rule. This processing overhead is why L4 is generally faster, but L7 offers unparalleled control and insight into application traffic.

L4 Load Balancing: The Network-Level Switch

L7 Load Balancing: The Application-Aware Proxy

Consistent Hashing: The Smarter Distribution

More Deep Dives in System Design