ZeroMQ’s load balancing isn’t about a central server doling out work; it’s about clients and workers finding each other and distributing tasks dynamically.

Let’s watch this in action. Imagine a simple scenario: a task producer (pusher) and multiple task consumers (workers).

Producer (Push)

import zmq

context = zmq.Context()
pusher = context.socket(zmq.PUSH)
pusher.bind("tcp://*:5557")

for i in range(100):
    message = f"Task {i}"
    print(f"Sending: {message}")
    pusher.send_string(message)
    # Simulate some work or delay
    # time.sleep(0.01)

pusher.close()
context.term()

Worker (Pull)

import zmq
import time

context = zmq.Context()
worker = context.socket(zmq.PULL)
worker.connect("tcp://localhost:5557") # Connect to the pusher's address

for i in range(100):
    message = worker.recv_string()
    print(f"Received: {message}")
    # Simulate processing the task
    time.sleep(0.1)

worker.close()
context.term()

If you run the pusher and multiple instances of the worker (say, three), you’ll see the messages from the pusher being distributed among the workers. Each worker gets a roughly equal share of the tasks. This is the fundamental PUSH/PULL pattern. The PUSH socket distributes messages to available PULL sockets, and the PULL sockets receive messages in a round-robin fashion.

The problem this solves is straightforward: how do you scale out your task processing? If one worker isn’t enough, just add more workers. The PUSH socket inherently handles the distribution. You don’t need to write any explicit load balancing logic in your producer.

Internally, when you bind a PUSH socket and connect multiple PULL sockets, ZeroMQ establishes connections. The PUSH socket maintains a list of connected peers. When it has a message to send, it picks one of these peers (using a round-robin algorithm) and sends the message. The PULL socket simply waits for a message to arrive. If multiple PULL sockets are connected, the PUSH socket will cycle through them.

The DEALER/ROUTER pattern is where things get more sophisticated, especially for request-reply scenarios where you need bidirectional communication and explicit routing.

Worker (DEALER)

import zmq

context = zmq.Context()
worker = context.socket(zmq.DEALER)
worker.setsockopt_string(zmq.IDENTITY, "Worker-%s" % str(uuid.uuid4())[:8]) # Unique identity
worker.connect("tcp://localhost:5559")

print("Worker started, waiting for tasks...")

while True:
    try:
        request = worker.recv_multipart()
        # request[0] is the client's identity
        # request[1] is the empty delimiter
        # request[2] is the actual task
        print(f"Received task: {request[2].decode()}")
        # Simulate work
        time.sleep(1)
        reply = f"Processed {request[2].decode()}"
        # Send reply back to the client via the ROUTER
        worker.send_multipart([request[0], b'', reply.encode()])
    except zmq.Again:
        continue

Broker (ROUTER)

import zmq
import time

context = zmq.Context()
# Frontend for clients
frontend = context.socket(zmq.ROUTER)
frontend.bind("tcp://*:5559")

# Backend for workers
backend = context.socket(zmq.DEALER)
backend.bind("tcp://*:5560")

print("Broker started...")

# Start a few workers (simulated here by just printing)
# In a real scenario, you'd run separate worker processes
print("Starting workers (simulated)...")
# For demonstration, let's assume workers connect to backend:5560

poller = zmq.Poller()
poller.register(frontend, zmq.POLLIN)
poller.register(backend, zmq.POLLIN)

while True:
    socks = dict(poller.poll(100)) # Poll with a timeout

    if frontend in socks and socks[frontend] == zmq.POLLIN:
        client_id, empty, request = frontend.recv_multipart()
        print(f"Received request from client {client_id.decode()}: {request.decode()}")
        # Send request to a worker, ROUTER automatically adds sender identity
        backend.send_multipart([client_id, b'', request])

    if backend in socks and socks[backend] == zmq.POLLIN:
        worker_id, empty, reply = backend.recv_multipart()
        # The first part of the message is the original client ID
        client_id = worker_id # In DEALER/ROUTER, the first part IS the client ID
        print(f"Received reply from worker {worker_id.decode()}: {reply.decode()}")
        # Send reply back to the client
        frontend.send_multipart([client_id, b'', reply])

frontend.close()
backend.close()
context.term()

Client (REQ)

import zmq
import time

context = zmq.Context()
client = context.socket(zmq.REQ)
client.connect("tcp://localhost:5559") # Connect to the frontend of the broker

print("Sending request...")
client.send_string("Get Status")

message = client.recv_string()
print(f"Received reply: {message}")

client.close()
context.term()

In this DEALER/ROUTER setup, the ROUTER acts as a smart proxy. It receives requests from clients (using REQ sockets, which expect a reply) and forwards them to available DEALER workers. The DEALER workers then send replies back to the ROUTER, which knows exactly which client sent the original request and forwards the reply. The ROUTER socket is crucial here because it automatically annotates incoming messages with the sender’s identity, allowing the broker to route replies correctly. The DEALER socket, on the other hand, is the client-side of this pattern for workers; it can send messages to any connected peer and receive messages from any peer, but it doesn’t automatically get the sender’s identity. This means the broker (the ROUTER) needs to prepend the client’s identity when sending to the DEALER, and the DEALER must return that identity along with the reply.

The most surprising true thing about ZeroMQ’s load balancing is that it’s not about a single point of control; rather, it’s about decentralized peer-to-peer communication where sockets themselves manage distribution based on their type and connection state.

The PUSH/PULL pattern is the simplest form of load balancing, where the PUSH socket distributes messages to connected PULL sockets in a round-robin fashion. The PULL sockets don’t need to do anything special; they just receive messages. This is ideal for fire-and-forget tasks where you don’t need to know if a task was completed, or if you have a separate mechanism for tracking completion.

The DEALER/ROUTER pattern is more powerful for request-reply scenarios. The ROUTER acts as a smart proxy, capable of receiving messages from multiple clients and routing them to multiple workers. Crucially, it remembers which client sent which message. The DEALER socket acts as the worker’s interface to this routing, allowing it to send replies back to the ROUTER without needing to know the client’s identity directly. The ROUTER socket is the key component that enables this intelligent routing by preserving and re-attaching client identities.

One thing that trips many people up is how DEALER and ROUTER handle message framing for routing. When a ROUTER receives a message from a DEALER, it receives the DEALER’s identity as the first part of a multipart message. When a ROUTER sends a message to a DEALER, it must prepend the client’s identity as the first part, followed by an empty frame (a delimiter), and then the actual message. The DEALER socket, when receiving, will then present the client’s identity as the first part of the received multipart message. This explicit framing is how the ROUTER knows where to send the reply.

The next concept you’ll likely encounter is how to handle worker failures and ensure tasks are re-processed, leading into patterns like the "Majordomo" broker.

Want structured learning?

Take the full Zeromq course →