TCP_NODELAY is a socket option that, when enabled, tells the TCP stack to immediately send any data that’s ready to go, rather than waiting to see if more data will arrive soon to bundle into a larger packet.

Let’s see it in action. Imagine a simple ZeroMQ PUSH/PULL setup. The PUSH socket is sending small messages very rapidly, and the PULL socket is consuming them.

# PUSH side
import zmq

context = zmq.Context()
socket = context.socket(zmq.PUSH)
socket.bind("tcp://*:5557")

for i in range(1000000):
    message = f"Message {i}".encode('utf-8')
    socket.send(message)
    if i % 10000 == 0:
        print(f"Sent {i} messages")

print("Done sending")
socket.close()
context.term()
# PULL side
import zmq
import time

context = zmq.Context()
socket = context.socket(zmq.PULL)
socket.connect("tcp://localhost:5557")

start_time = time.time()
count = 0
while True:
    message = socket.recv()
    count += 1
    if count % 10000 == 0:
        elapsed = time.time() - start_time
        print(f"Received {count} messages. Average time per message: {elapsed/count:.6f}s")
    if count == 1000000:
        break

print("Done receiving")
socket.close()
context.term()

By default, TCP uses an algorithm called the Nagle algorithm. Its goal is to improve network efficiency by reducing the number of small packets sent. If an application sends a small amount of data, TCP might buffer it, waiting to see if the application sends more data soon. If more data arrives, it’s bundled into a larger packet. This is great for bulk transfers, but for applications that send many small, time-sensitive messages, like high-frequency trading or real-time gaming, this buffering can introduce significant latency. The PUSH socket sends a message, but TCP might hold onto it for a moment, waiting for more data, before actually transmitting it over the network. The PULL socket then has to wait for that delayed packet.

ZeroMQ provides a direct way to control this behavior through the zmq.TCP_NODELAY socket option. Setting socket.setsockopt(zmq.TCP_NODELAY, 1) tells the underlying TCP socket to disable the Nagle algorithm. This means that every send() call will attempt to transmit its data immediately, without waiting for more data to accumulate. For our PUSH/PULL example, this directly translates to the PUSH socket’s data being sent out to the network as soon as socket.send() is called, and consequently, the PULL socket receiving it much faster.

The primary benefit of TCP_NODELAY is reduced latency. When you’re sending individual, small messages where each message needs to be processed as quickly as possible, Nagle’s algorithm can introduce an unnecessary delay. Disabling it ensures that your messages are put onto the wire with minimal delay introduced by the TCP stack itself. This is crucial for applications where even a few milliseconds of latency can be detrimental.

The trade-off, however, is a potential increase in network overhead. By sending smaller packets more frequently, you might consume more bandwidth and increase the processing load on both the sender and receiver’s network stacks. The network might also be less efficient in terms of packet utilization. For example, if you send 10 messages of 1 byte each, with Nagle enabled, they might all go in one packet. With TCP_NODELAY enabled, they might each go in their own packet, each with its own IP and TCP headers, significantly increasing the total data transmitted.

You can set this option on any ZeroMQ socket type that uses TCP as its transport, which includes zmq.REQ, zmq.REP, zmq.ROUTER, zmq.DEALER, zmq.PUSH, zmq.PULL, zmq.PUB, and zmq.SUB when they are using a tcp:// address. It’s important to set TCP_NODELAY on both the sending and receiving sockets if you want to eliminate any potential buffering delays introduced by Nagle on either end. If you only set it on the sender, the receiver’s TCP stack might still buffer the incoming data.

The zmq.TCP_NODELAY option is a boolean flag. A value of 1 (or True) enables it, disabling Nagle. A value of 0 (or False) disables it, enabling Nagle (the default behavior).

# PUSH side with TCP_NODELAY enabled
import zmq

context = zmq.Context()
socket = context.socket(zmq.PUSH)
socket.setsockopt(zmq.TCP_NODELAY, 1) # Disable Nagle
socket.bind("tcp://*:5557")

# ... rest of the sending code ...
# PULL side with TCP_NODELAY enabled
import zmq
import time

context = zmq.Context()
socket = context.socket(zmq.PULL)
socket.setsockopt(zmq.TCP_NODELAY, 1) # Disable Nagle
socket.connect("tcp://localhost:5557")

# ... rest of the receiving code ...

When you run the code with zmq.TCP_NODELAY set to 1 on both sides, you’ll observe a noticeable reduction in the average time per message printed by the PULL side, especially if the network link has any inherent latency or if the messages are extremely small. The output will show a lower, more consistent average time per message, reflecting the immediate transmission of each message.

There’s a subtle point: even with TCP_NODELAY set to 1, the operating system’s TCP/IP stack still performs some level of packetization and flow control. ZeroMQ’s TCP_NODELAY option directly maps to the TCP_NODELAY socket option in POSIX systems (like Linux and macOS) and its equivalent on Windows. It’s a low-level instruction to the kernel. However, other factors like network congestion, MTU (Maximum Transmission Unit) size, and the receiver’s ability to process data can still influence the actual end-to-end latency.

The next logical step after optimizing for low latency with TCP_NODELAY is to consider message serialization and deserialization costs.

Want structured learning?

Take the full Zeromq course →