Designing a custom UDP protocol for reliability and framing is more about carefully layering abstractions than about reinventing the wheel.

Let’s say we’re building a simple game server that needs to send player positions to clients. UDP is fast, but it’s unreliable. Packets can be lost, arrive out of order, or be duplicated. We need to add our own reliability and framing on top.

Here’s a basic setup for a UDP server listening on port 12345:

import socket

UDP_IP = "0.0.0.0"
UDP_PORT = 12345

sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
sock.bind((UDP_IP, UDP_PORT))

print(f"Listening on UDP {UDP_IP}:{UDP_PORT}")

while True:
    data, addr = sock.recvfrom(1024) # Buffer size is 1024 bytes
    print(f"Received message: {data} from {addr}")
    # Process data...
    # Send a response (if needed)...

And a corresponding client:

import socket

UDP_IP = "127.0.0.1" # Or your server's IP
UDP_PORT = 12345
MESSAGE = b"Hello, Server!"

sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
sock.sendto(MESSAGE, (UDP_IP, UDP_PORT))

print(f"Sent message: {MESSAGE} to {UDP_IP}:{UDP_PORT}")

Now, how do we add reliability? The core idea is to acknowledge received packets.

Reliability Mechanism: Acknowledgements (ACKs)

We’ll assign a sequence number to each outgoing packet. The receiver will send back an ACK packet containing the sequence number of the last contiguous packet it received.

On the server side, we’ll need to keep track of sent packets and their status (sent, ACKed).

import socket
import time

UDP_IP = "0.0.0.0"
UDP_PORT = 12345

sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
sock.bind((UDP_IP, UDP_PORT))

# For reliability:
sent_packets = {} # {seq_num: (packet_data, timestamp, resent_count)}
next_seq_num = 0
ACK_TIMEOUT = 1.0 # seconds
RESEND_LIMIT = 5

print(f"Listening on UDP {UDP_IP}:{UDP_PORT}")

def send_reliable(data, addr):
    global next_seq_num
    seq_num = next_seq_num
    # Simple framing: seq_num (4 bytes) + data
    packet = seq_num.to_bytes(4, 'big') + data
    sock.sendto(packet, addr)
    sent_packets[seq_num] = (packet, time.time(), 0)
    next_seq_num += 1
    print(f"Sent packet {seq_num} to {addr}")

def resend_unacked_packets():
    current_time = time.time()
    for seq_num, (packet, timestamp, resent_count) in list(sent_packets.items()):
        if current_time - timestamp > ACK_TIMEOUT and resent_count < RESEND_LIMIT:
            sock.sendto(packet, client_addr) # Need client_addr here, this is simplified
            sent_packets[seq_num] = (packet, time.time(), resent_count + 1)
            print(f"Resent packet {seq_num} to {client_addr}")

# This is a simplified loop. In a real app, you'd use threads or asyncio for non-blocking ops.
last_ack_seq = -1
while True:
    # In a real app, you'd have a proper event loop and handle receive/resend concurrently.
    # For demonstration, we'll poll.
    try:
        sock.settimeout(0.1) # Non-blocking check
        data, addr = sock.recvfrom(1024)

        if data[:3] == b"ACK": # Simple ACK framing
            acked_seq_num = int.from_bytes(data[3:7], 'big')
            if acked_seq_num in sent_packets:
                print(f"Received ACK for {acked_seq_num} from {addr}")
                del sent_packets[acked_seq_num]
                # Update last_ack_seq if this is a contiguous ACK
                if acked_seq_num == last_ack_seq + 1:
                    last_ack_seq = acked_seq_num
            continue

        # Process incoming data
        seq_num = int.from_bytes(data[:4], 'big')
        payload = data[4:]
        print(f"Received packet {seq_num} from {addr} with payload: {payload}")

        # Send ACK back
        ack_packet = b"ACK" + seq_num.to_bytes(4, 'big')
        sock.sendto(ack_packet, addr)

        # Basic duplicate detection and in-order delivery (very simplified)
        if seq_num > last_ack_seq:
            # Process payload...
            print(f"Processing payload for packet {seq_num}")
            last_ack_seq = seq_num
        elif seq_num == last_ack_seq:
            print(f"Received duplicate packet {seq_num}, ignoring payload.")
        else: # seq_num < last_ack_seq (out of order)
            print(f"Received out-of-order packet {seq_num}, expected {last_ack_seq + 1}.")
            # In a real system, you'd buffer these.

    except socket.timeout:
        pass # No data received, continue to resend check

    # This part needs to be triggered more frequently.
    # In a real app, this would be in its own thread or timer.
    # For this example, let's just call it.
    # resend_unacked_packets() # This would need client_addr to be known per connection

The framing is crucial here. We’re prepending a 4-byte sequence number to our actual data. The ACK packet also has a specific format (b"ACK" followed by the sequence number).

When a client sends data, it should also use sequence numbers and expect ACKs. The server needs to know which client sent which packet to send ACKs back to the correct address. In a real application, you’d manage per-client state.

The resend_unacked_packets function is the heart of the reliability. If a packet isn’t ACKed within ACK_TIMEOUT, we resend it up to RESEND_LIMIT times.

A significant challenge in implementing this correctly is managing the state for multiple clients. Each client needs its own sequence number, its own sent_packets tracking, and its own "last received" sequence number. This often leads to a connection-oriented layer on top of UDP, where you establish a "connection" (a mapping of client IP/port to state) and tear it down.

The last_ack_seq logic is a very basic form of in-order delivery. If we receive packet 5, but we were expecting 3, we hold onto 5 and wait for 3. This is essential for many applications like game state updates.

The most surprising true thing about implementing custom UDP reliability is that you often end up building a subset of TCP’s features, but with more control over the specific behavior and overhead.

Here’s what a client would look like with basic reliability:

import socket
import time

UDP_IP = "127.0.0.1" # Server IP
UDP_PORT = 12345
MESSAGE = b"PlayerPos:10,20,30"

sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
# sock.settimeout(1.0) # Timeout for receiving ACKs

sent_packets = {} # {seq_num: (packet_data, timestamp, resent_count)}
next_seq_num = 0
ACK_TIMEOUT = 1.0
RESEND_LIMIT = 5

def send_reliable(data):
    global next_seq_num
    seq_num = next_seq_num
    packet = seq_num.to_bytes(4, 'big') + data
    sock.sendto(packet, (UDP_IP, UDP_PORT))
    sent_packets[seq_num] = (packet, time.time(), 0)
    next_seq_num += 1
    print(f"Sent packet {seq_num} to {UDP_IP}:{UDP_PORT}")

def resend_unacked_packets():
    current_time = time.time()
    for seq_num, (packet, timestamp, resent_count) in list(sent_packets.items()):
        if current_time - timestamp > ACK_TIMEOUT and resent_count < RESEND_LIMIT:
            sock.sendto(packet, (UDP_IP, UDP_PORT))
            sent_packets[seq_num] = (packet, time.time(), resent_count + 1)
            print(f"Resent packet {seq_num}")

# In a real app, you'd have a proper event loop.
# For this example, we'll send one message and wait for its ACK.
send_reliable(MESSAGE)

# Simple ACK receiving loop for this specific message
start_time = time.time()
while time.time() - start_time < 5.0: # Give it 5 seconds to get an ACK
    try:
        sock.settimeout(0.1)
        data, server_addr = sock.recvfrom(1024)
        if data[:3] == b"ACK":
            acked_seq_num = int.from_bytes(data[3:7], 'big')
            if acked_seq_num in sent_packets:
                print(f"Received ACK for {acked_seq_num}")
                del sent_packets[acked_seq_num]
                break # Got ACK for our message
    except socket.timeout:
        pass
    resend_unacked_packets()

if sent_packets:
    print("Failed to get ACK for message.")

sock.close()

This example uses a simple sequence number prepended to the data. For more complex applications, you might need more sophisticated framing:

  • Packet Type: A byte indicating if it’s a data packet, ACK, control packet (e.g., connection request), etc.
  • Length Field: If packets can contain variable-length data and you need to parse them reliably, you might prepend a length field.
  • Checksum: To detect corruption within a packet, you could add a CRC32 checksum.

The "connection" in UDP is virtual. You manage it by maintaining state for each client IP/port combination. This includes their current sequence number, expected sequence number, retransmission buffer, and perhaps a keep-alive timer.

If you need ordered delivery, the receiver must buffer out-of-order packets. When packet N+1 arrives but you’re still waiting for N, you store N+1 and any subsequent packets until N finally arrives. Then, you process N, N+1, etc., in order. This buffering adds latency.

The mechanism for managing last_ack_seq on the server and the equivalent on the client is the core of ensuring that a message sent was truly received and processed in the correct order. If you only check for packet existence, you don’t guarantee that the application logic on the other side saw it. The ACK confirms that the receiver has processed up to a certain point.

The next problem you’ll run into is managing multiple simultaneous "connections" and their associated states, especially when clients might change IP addresses or ports (e.g., behind NAT).

Want structured learning?

Take the full Udp course →