ZeroMQ’s heartbeat is less about keeping a connection alive and more about detecting when the other side has irrevocably died.

Let’s see it in action. Imagine two ZeroMQ processes, a publisher and a subscriber, talking over a TCP socket. The publisher is sending messages, and the subscriber is receiving them.

# Publisher side
import zmq
import time

context = zmq.Context()
socket = context.socket(zmq.PUB)
socket.bind("tcp://*:5556")

print("Publisher started, sending messages...")

for i in range(1000):
    message = f"Message {i}"
    print(f"Sending: {message}")
    socket.send_string(message)
    time.sleep(0.1)

print("Publisher finished.")
socket.close()
context.term()
# Subscriber side
import zmq
import time

context = zmq.Context()
socket = context.socket(zmq.SUB)
socket.setsockopt_string(zmq.SUBSCRIBE, "") # Subscribe to all topics
socket.connect("tcp://localhost:5556")

print("Subscriber started, waiting for messages...")

message_count = 0
while message_count < 10: # Let's only receive a few for this example
    try:
        message = socket.recv_string(zmq.NOBLOCK) # Use NOBLOCK to avoid hanging indefinitely
        print(f"Received: {message}")
        message_count += 1
    except zmq.Again:
        # No message yet, do something else or wait
        time.sleep(0.05)
    except Exception as e:
        print(f"An error occurred: {e}")
        break

print("Subscriber finished.")
socket.close()
context.term()

Now, let’s introduce a scenario where the subscriber process abruptly terminates without closing its ZeroMQ socket gracefully. If the publisher is configured with a heartbeat, it will eventually realize the subscriber is gone.

ZeroMQ’s heartbeat mechanism is configured using zmq_setsockopt with ZMQ_HEARTBEAT_IVL (interval) and ZMQ_HEARTBEAT_TIMEOUT (timeout).

ZMQ_HEARTBEAT_IVL defines how often the ZeroMQ socket sends a heartbeat message to the peer. This is a request to the peer to confirm it’s still alive. The default is 0, meaning no heartbeats are sent. If you set ZMQ_HEARTBEAT_IVL to 5000 (milliseconds), ZeroMQ will send a small, empty heartbeat message every 5 seconds.

ZMQ_HEARTBEAT_TIMEOUT defines how long ZeroMQ will wait for a response (or any activity) from the peer after the last received message (including heartbeats from the peer). If this duration passes without any network activity on the socket, the socket is considered dead, and ZeroMQ will internally close the connection and trigger an error event. The default is also 0, meaning no timeout. If you set ZMQ_HEARTBEAT_TIMEOUT to 15000 (milliseconds), the socket will be considered dead if no data has been received for 15 seconds.

Crucially, the ZMQ_HEARTBEAT_TIMEOUT is a cumulative value. If ZMQ_HEARTBEAT_IVL is 5000ms and ZMQ_HEARTBEAT_TIMEOUT is 15000ms, ZeroMQ will send a heartbeat every 5 seconds. If it doesn’t receive any data (including incoming heartbeats) for 15 seconds, it declares the peer dead. This means it can tolerate up to three missed heartbeats (5s + 5s + 5s = 15s) before timing out.

The problem this solves is detecting silent failures. Network glitches, process crashes, or even a firewall silently dropping packets can leave a connection in a state where no new application data is flowing, but the underlying TCP connection might still appear "open" from the perspective of the operating system. Without heartbeats, your application might keep trying to send data into a black hole, never realizing the other end is gone, or it might wait indefinitely for a reply that will never come.

Here’s how you’d set it up on the publisher side:

# Publisher side with Heartbeat
import zmq
import time

context = zmq.Context()
socket = context.socket(zmq.PUB)
socket.bind("tcp://*:5556")

# Set heartbeat interval to 5 seconds (5000 milliseconds)
socket.setsockopt(zmq.HEARTBEAT_IVL, 5000)
# Set heartbeat timeout to 15 seconds (15000 milliseconds)
socket.setsockopt(zmq.HEARTBEAT_TIMEOUT, 15000)

print("Publisher started with heartbeat enabled...")

for i in range(1000):
    message = f"Message {i}"
    print(f"Sending: {message}")
    socket.send_string(message)
    time.sleep(0.1)

print("Publisher finished.")
socket.close()
context.term()

And on the subscriber side:

# Subscriber side with Heartbeat
import zmq
import time

context = zmq.Context()
socket = context.socket(zmq.SUB)
socket.setsockopt_string(zmq.SUBSCRIBE, "")
socket.connect("tcp://localhost:5556")

# Set heartbeat interval to 5 seconds (5000 milliseconds)
socket.setsockopt(zmq.HEARTBEAT_IVL, 5000)
# Set heartbeat timeout to 15 seconds (15000 milliseconds)
socket.setsockopt(zmq.HEARTBEAT_TIMEOUT, 15000)

print("Subscriber started with heartbeat enabled, waiting for messages...")

message_count = 0
while True: # Keep running until an error or termination
    try:
        # Use a longer timeout here to observe the heartbeat timeout
        # If you use ZMQ_NOBLOCK, you might not see the ZMQ_EVENT_DISCONNECTED
        # as clearly. A short recv timeout allows the loop to check for events.
        message = socket.recv_string(zmq.RCVTIMEO=1000) # 1 second receive timeout
        print(f"Received: {message}")
        message_count += 1
    except zmq.Again:
        # Timeout occurred, no message received. This is normal if no data is sent.
        # The heartbeat mechanism will still be active in the background.
        pass
    except zmq.ZMQError as e:
        if e.errno == zmq.ERRNO_AGAIN:
            # This is handled by the zmq.Again exception, but good to be aware
            pass
        elif e.errno == zmq.ERRNO_CONNECTED:
            # This is not a typical ZMQError for recv, but good to know about errors
            print(f"Connection error: {e}")
            break
        elif e.errno == zmq.ERRNO_DISCONNECTED:
            print(f"Peer disconnected: {e}")
            break
        elif e.errno == zmq.ERRNO_TERM:
            print(f"Context terminated: {e}")
            break
        else:
            print(f"An unexpected ZMQ error occurred: {e}")
            break
    except Exception as e:
        print(f"An unexpected error occurred: {e}")
        break

print("Subscriber finished.")
socket.close()
context.term()

When the subscriber process is killed abruptly (e.g., kill -9 <pid>), the publisher will continue sending messages for a while. After approximately 15 seconds of no network activity on the socket (meaning no messages, and no incoming heartbeats from the subscriber), the publisher’s socket will internally detect the peer as dead. It will then emit a ZMQ_EVENT_DISCONNECTED event (if you are using the event API) or simply stop sending further messages and potentially raise an error on the next send operation if the underlying transport is truly broken. The subscriber, if it were still alive and waiting, would also eventually time out and report a ZMQ_EVENT_DISCONNECTED.

The most surprising thing about ZeroMQ heartbeats is that they are not a simple "ping" mechanism; they are tightly integrated with the underlying transport and the ZMQ_HEARTBEAT_TIMEOUT. The timeout isn’t just about sending heartbeats, but about any activity on the socket. If your application is sending data frequently, the ZMQ_HEARTBEAT_IVL might not even be relevant because the regular data flow keeps the timeout from triggering. The heartbeat interval only matters for triggering a proactive check when the application layer is otherwise silent.

The next logical step after detecting a dead peer is to implement reconnection logic.

Want structured learning?

Take the full Zeromq course →