ZeroMQ’s unbounded queues are a hidden memory leak waiting to happen, silently consuming RAM until your application grinds to a halt.

Let’s see this in action. Imagine a simple publisher-subscriber setup where the publisher is a bit too enthusiastic and the subscriber can’t keep up.

# Publisher (runs fast)
import zmq
import time

context = zmq.Context()
socket = context.socket(zmq.PUB)
socket.bind("tcp://*:5556")

message_count = 0
while True:
    socket.send_string(f"Message {message_count}")
    message_count += 1
    # No sleep here, publisher is eager
# Subscriber (runs slow)
import zmq
import time

context = zmq.Context()
socket = context.socket(zmq.SUB)
socket.setsockopt_string(zmq.SUBSCRIBE, "") # Subscribe to everything
socket.connect("tcp://localhost:5556")

while True:
    message = socket.recv_string()
    # Simulate slow processing
    time.sleep(0.1)
    print(f"Received: {message}")

If you run these two, you’ll notice the publisher’s memory usage (check with ps aux | grep python) will start climbing. It keeps sending messages, and ZeroMQ, by default, holds onto them in its internal queue until the subscriber is ready. But if the subscriber is never ready, or consistently slower, that queue grows infinitely.

The core problem is that ZeroMQ’s socket options, particularly for high-level protocols like PUB/SUB, have default values that prioritize message delivery above all else. When a sender is faster than a receiver, ZeroMQ’s default behavior is to buffer all outgoing messages in memory. This buffer is unbounded, meaning it will keep growing as long as the sender produces messages faster than the receiver consumes them. It’s not a bug; it’s a feature designed for scenarios where temporary bursts of traffic are expected and the receiver will eventually catch up. However, in persistent mismatches, this feature becomes a liability.

The fix is to impose a limit on this buffer. This is done using the zmq.SNDHWM (Send High Water Mark) socket option. This option sets the maximum number of messages that can be queued for sending. If this limit is reached, subsequent send() calls will block (or return an error if zmq.DONTWAIT is used) until space becomes available in the queue.

For the publisher, you’d add:

# Publisher (with HWM)
import zmq
import time

context = zmq.Context()
socket = context.socket(zmq.PUB)
socket.bind("tcp://*:5556")

# Set the send high water mark to 1000 messages
socket.setsockopt(zmq.SNDHWM, 1000)

message_count = 0
while True:
    try:
        socket.send_string(f"Message {message_count}")
        message_count += 1
    except zmq.Again:
        # This happens when the HWM is reached and we're in non-blocking mode
        # Or if the socket is blocking and we choose not to wait
        print("Send buffer full, dropping message or waiting...")
        time.sleep(0.01) # Small backoff to avoid busy-waiting if blocking

By setting zmq.SNDHWM to 1000, you’re telling ZeroMQ, "Don’t hold more than 1000 messages for sending." When the queue fills up, the send_string call will either block (default behavior) or raise zmq.Again if you’ve set zmq.DONTWAIT. In the example above, we catch zmq.Again and print a message, effectively dropping messages or pausing the publisher. This prevents unbounded memory growth. The key is that the memory usage will now stabilize around the size of the HWM plus whatever is currently being processed by the network stack.

Common Causes and Fixes:

  1. Publisher too fast, Subscriber too slow (most common):

    • Diagnosis: Monitor publisher’s memory usage (ps aux | grep python) and observe it growing. Check network traffic (iftop).
    • Fix: Set socket.setsockopt(zmq.SNDHWM, 1000) on the publisher. Adjust 1000 based on your needs.
    • Why it works: Limits the number of messages waiting in the publisher’s outgoing queue.
  2. Network congestion or latency:

    • Diagnosis: High packet loss, retransmissions in ping or netstat -s. Slow recv times on the subscriber.
    • Fix: Apply zmq.SNDHWM on the publisher and zmq.RCVHWM on the subscriber. For example, socket.setsockopt(zmq.RCVHWM, 500) on the subscriber.
    • Why it works: RCVHWM limits the subscriber’s incoming queue, preventing it from filling up and potentially dropping messages if its buffer overflows. It also signals backpressure more effectively.
  3. Long-running message processing on subscriber:

    • Diagnosis: Subscriber prints messages with significant delays, or time.sleep() calls are long.
    • Fix: Optimize subscriber processing logic or increase subscriber concurrency (e.g., using threads or processes) to consume messages faster. If optimization isn’t possible, apply zmq.SNDHWM on the publisher to throttle it.
    • Why it works: Addresses the root cause of slow consumption or implements a throttling mechanism.
  4. Serialization/Deserialization overhead:

    • Diagnosis: Large message sizes are being sent, or complex objects are being serialized/deserialized. CPU usage on subscriber is high during recv.
    • Fix: Use more efficient serialization formats (e.g., Protocol Buffers, MessagePack instead of JSON/Pickle). Reduce message size by sending only necessary data. Apply zmq.SNDHWM on the publisher.
    • Why it works: Reduces the time spent processing each message, allowing the subscriber to keep up or reducing the rate at which the publisher needs to be throttled.
  5. ZeroMQ internal buffer limits (less common, but possible):

    • Diagnosis: Extremely high message rates causing even reasonable HWMs to be hit quickly, leading to perceived slowness or dropped messages.
    • Fix: Increase the OS-level TCP send buffer sizes. For Linux, edit /etc/sysctl.conf and add/modify net.core.rmem_max = 16777216 and net.core.wmem_max = 16777216. Then run sysctl -p. Also, ensure zmq.SNDHWM and zmq.RCVHWM are set to sufficiently large values (e.g., 10000 or more if needed).
    • Why it works: Allows the underlying network stack to buffer more data, complementing ZeroMQ’s HWM.
  6. Using zmq.DONTWAIT incorrectly:

    • Diagnosis: send() calls are immediately returning zmq.Again even with seemingly adequate buffer space.
    • Fix: Remove zmq.DONTWAIT if you intend for send to block until space is available, or implement proper retry/backoff logic if zmq.DONTWAIT is intentional. Ensure zmq.SNDHWM is set appropriately.
    • Why it works: Ensures that send operations respect the high-water mark and either succeed or block as expected, rather than prematurely failing.

After fixing the unbounded queue, the next error you might encounter is zmq.Again if you’re using zmq.DONTWAIT and the send buffer is full, or if your subscriber is still too slow and begins dropping messages due to its own receive buffer (if RCVHWM is also reached).

Want structured learning?

Take the full Zeromq course →