ZeroMQ Performance Benchmarks: Latency and Throughput (2026)

ZeroMQ’s real magic isn’t just in its message queuing; it’s in how it makes distributed systems feel like local ones, often with performance that rivals or beats traditional socket programming, especially under heavy load.

Let’s see ZeroMQ in action, measuring the fundamental performance characteristics: latency and throughput. We’ll use the ZMQ_PAIR socket type for simplicity, as it’s a direct, one-to-one connection, perfect for isolating the network and ZeroMQ overhead.

Scenario: Latency Measurement

We’ll set up a sender and a receiver. The sender will send a single, small message, and the receiver will immediately send it back. We’ll time this round trip.

Sender (latency_sender.py):

import zmq
import time

context = zmq.Context()
socket = context.socket(zmq.PAIR)
socket.bind("tcp://*:5555")

message = b"ping"
socket.send(message)
reply = socket.recv()
end_time = time.perf_counter_ns()
start_time = int(socket.getsockopt(zmq.RCVTIME_START)) # Get the timestamp when the message was received
round_trip_time_ns = end_time - start_time

print(f"Round trip time: {round_trip_time_ns} ns")
socket.close()
context.term()

Receiver (latency_receiver.py):

import zmq
import time

context = zmq.Context()
socket = context.socket(zmq.PAIR)
socket.connect("tcp://localhost:5555")

while True:
    message = socket.recv()
    # Capture the receive timestamp *before* sending the reply
    socket.setsockopt(zmq.SNDTIME_START, time.perf_counter_ns())
    socket.send(message)
    break # Only process one message for latency test

socket.close()
context.term()

To run this:

Start latency_receiver.py in one terminal.
Start latency_sender.py in another terminal.

You’ll see output like: Round trip time: 50000 ns. This includes serialization (negligible for small messages), ZeroMQ’s internal processing, and network stack traversal.

Scenario: Throughput Measurement

Now, let’s push a lot of data. The sender will send as many messages as possible in a short period, and the receiver will just count them.

Sender (throughput_sender.py):

import zmq
import time

context = zmq.Context()
socket = context.socket(zmq.PAIR)
socket.bind("tcp://*:5555")

message = b"X" * 1024 # 1KB message
num_messages = 1_000_000
start_time = time.perf_counter_ns()

for _ in range(num_messages):
    socket.send(message)

end_time = time.perf_counter_ns()
duration_s = (end_time - start_time) / 1_000_000_000
throughput_mbps = (num_messages * len(message)) / (duration_s * 1024 * 1024)

print(f"Sent {num_messages} messages ({len(message)} bytes each) in {duration_s:.2f}s")
print(f"Throughput: {throughput_mbps:.2f} MB/s")

socket.close()
context.term()

Receiver (throughput_receiver.py):

import zmq
import time

context = zmq.Context()
socket = context.socket(zmq.PAIR)
socket.connect("tcp://localhost:5555")

num_received = 0
while num_received < 1_000_000: # Expecting the same number of messages
    socket.recv()
    num_received += 1

print(f"Received {num_received} messages.")
socket.close()
context.term()

Run the receiver first, then the sender. You might see output like: Throughput: 850.20 MB/s. This is the raw data rate ZeroMQ can push through the network stack.

The Mental Model: Beyond Simple Queues

ZeroMQ isn’t a traditional message broker. It’s a library that provides sockets with advanced messaging patterns. The PAIR socket is the simplest: a direct, unsynchronized connection. You send and recv. If you send when the other side can’t recv, the message is dropped (unless using ROUTER/DEALER with queuing). This behavior is crucial: ZeroMQ prioritizes speed and simplicity over guaranteed delivery in its basic forms.

The patterns (REQ/REP, PUB/SUB, PUSH/PULL, ROUTER/DEALER) are built on top of these primitives. They add semantics like request-reply synchronization, publish-subscribe fan-out, or load balancing. The performance characteristics you see in the benchmarks are the baseline ZeroMQ adds before pattern-specific logic kicks in.

Key Levers:

Socket Type: PAIR is fastest but has no guarantees. REQ/REP adds synchronization overhead. PUB/SUB is for one-to-many. ROUTER/DEALER is the workhorse for complex routing and load balancing, offering more control but with more internal state.
Transport: inproc:// is for inter-thread communication and is blazingly fast. ipc:// is for inter-process on the same machine, very fast. tcp:// is for network communication, subject to network latency and bandwidth.
Message Size: Larger messages have higher throughput but also higher latency. Small messages are great for low latency but can saturate the network control plane if sent too frequently.
High Water Mark (HWM): This is a critical tuning parameter. socket.setsockopt(zmq.SNDHWM, value) and socket.setsockopt(zmq.RCVHWM, value) define how many messages the socket can queue internally before blocking or dropping. If your sender is faster than your receiver, messages will queue up. If the queue fills, send will block or drop (depending on socket type and flags), impacting throughput. Setting HWM too low can cause premature blocking; too high can lead to memory exhaustion or increased latency due to large queues. A common starting point for HWM is 1000.

The SNDTIME_START and RCVTIME_START socket options, used in the latency test, are a powerful debugging tool. They allow you to introspect ZeroMQ’s internal timing, revealing how much time is spent within the library itself versus the actual network transit. This is invaluable for pinpointing whether performance issues lie in your application logic, ZeroMQ’s protocol handling, or the underlying network infrastructure.

Understanding these patterns and tuning parameters is how you achieve high performance with ZeroMQ. The next step is often exploring how to build resilient, scalable applications using the ROUTER/DEALER pattern for fault tolerance and load distribution.

More Deep Dives in Zeromq