UDP load testing isn’t about seeing how much UDP traffic your server can handle; it’s about seeing how much UDP traffic your network can handle before your server starts dropping packets.

Let’s say you’ve built a high-performance UDP server. Maybe it’s a game server, a real-time data ingestion service, or a custom protocol endpoint. You’ve tuned your application, optimized your code, and now you want to know its limits. You fire up netcat or iperf3 and blast it with UDP packets. Suddenly, your server seems to be dropping a significant percentage of them, even though your CPU is only at 30%. What gives?

The most common culprit is not your server’s application logic, but the kernel’s network stack and the underlying network hardware. When UDP packets arrive faster than the kernel can process them, they get queued. If these queues fill up, packets are dropped. This can happen at several points:

  1. Ingress Network Interface Card (NIC) Buffers: Your NIC has a small buffer to hold incoming packets before handing them off to the kernel. If packets arrive too rapidly, these buffers overflow.

    • Diagnosis: Use ethtool -S <interface_name> to inspect NIC statistics. Look for rx_dropped, rx_fifo_errors, or similar counters that indicate packets were dropped by the hardware.
    • Fix: This often points to a hardware limitation or a misconfiguration on the switch port. Ensure the NIC is properly configured (e.g., speed/duplex matching the switch port) and consider a NIC with larger buffers or higher throughput capabilities. For a quick test, try reducing the packet rate.
    • Why it works: Larger buffers or faster processing at the NIC level give the kernel more time to pull packets off, preventing hardware-level discards.
  2. Kernel Receive Buffers (Socket Buffers): Once the NIC hands packets to the kernel, they are placed in a receive buffer associated with the socket. The kernel then picks from this buffer to deliver to your application.

    • Diagnosis: Check netstat -s | grep 'receive buffer errors' or ss -s. You’re looking for indications of buffer overflows, though direct counters can be elusive. A more practical check is to monitor netstat -su for receive packets dropped or similar messages.
    • Fix: Increase the UDP receive buffer size. This is done via sysctl:
      sudo sysctl net.core.rmem_max=16777216  # Set max receive buffer size
      sudo sysctl net.core.rmem_default=16777216 # Set default receive buffer size
      sudo sysctl net.ipv4.udp_rmem_max=16777216 # Specific UDP max receive buffer
      
      Then, ensure your application is requesting a larger buffer, or that the system’s default is sufficient.
    • Why it works: A larger kernel receive buffer allows the system to queue more incoming UDP packets, giving your application more time to process them before they are dropped due to buffer exhaustion.
  3. Network Interface (ifconfig/ip statistics): The operating system’s network interface driver also maintains statistics and can drop packets if its internal queues are overwhelmed.

    • Diagnosis: Use ip -s link show <interface_name> and look for dropped or errors counters.
    • Fix: If these counters increment, it’s often a sign that the kernel’s processing for that interface is saturated. This can be due to high interrupt rates or insufficient CPU resources allocated to network processing. Ensuring network processing is offloaded to the NIC (if supported) or dedicating CPU cores to network interrupts (e.g., via irqbalance or manual affinity) can help.
    • Why it works: By optimizing how the kernel handles incoming packets from the interface, you reduce the chances of the driver itself dropping packets before they even reach the socket buffer.
  4. CPU Interrupt Saturation: Each incoming packet generates a CPU interrupt. If your server is receiving millions of UDP packets per second, the CPU can spend a significant amount of time just handling interrupts, leaving less time for processing the actual packet data.

    • Diagnosis: Use top or htop and observe %si (system interrupt time). If it’s consistently high (e.g., > 30-40%), interrupt saturation is likely.
    • Fix:
      • Receive Side Scaling (RSS): Ensure RSS is enabled and configured on your NIC. This distributes interrupt processing across multiple CPU cores.
      • IRQ Affinity: Manually bind network interrupts to specific CPU cores using smp_affinity. This prevents interrupts from migrating and ensures dedicated processing power.
      • irqbalance: Use the irqbalance daemon, which attempts to automatically distribute interrupt load across available CPUs.
      • Jumbo Frames: If your network infrastructure supports it, enabling jumbo frames (MTU > 1500) can reduce the number of packets and thus the interrupt rate for the same amount of data.
    • Why it works: Distributing interrupt load or reducing the number of interrupts (via jumbo frames) frees up CPU cycles for the kernel and your application to process the packet data itself.
  5. Application Processing Bottleneck: While less common for packet drops directly, if your application is slow to consume packets from the socket buffer, the buffer will fill up, and eventually, the kernel will drop new incoming packets.

    • Diagnosis: Monitor your application’s internal queue lengths or latency. If your application is slow to recvfrom or read from the socket, the receive buffer will grow. Use tools like perf to profile your application’s network receive path.
    • Fix: Optimize your application’s UDP receiving logic. This might involve:
      • Using non-blocking sockets and a polling loop.
      • Increasing the number of worker threads/processes consuming from the socket.
      • Batching recvfrom calls if your application logic allows.
      • Tuning SO_RCVBUFFORCE if necessary (use with caution).
    • Why it works: A faster application consumes data from the kernel’s receive buffer more quickly, preventing it from overflowing and causing packet drops.
  6. Network Congestion (External): If the traffic is coming from a remote network, intermediate routers or switches could be experiencing congestion, leading to packet loss before the packets even reach your server’s network interface.

    • Diagnosis: This is harder to diagnose directly on your server. Use tools like mtr or traceroute to identify high latency or packet loss on hops leading to your server. Monitor network traffic on upstream devices if possible.
    • Fix: Address the congestion on the intermediate network path. This might involve increasing bandwidth, implementing Quality of Service (QoS) on routers, or optimizing routing.
    • Why it works: Reducing congestion on the path to your server ensures packets arrive at your interface more reliably.

Once you’ve addressed these common causes and your server is no longer dropping packets due to network stack issues, you’ll likely encounter the next hurdle: your application’s own processing capacity. This might manifest as increased application-level latency, timeouts within your application’s logic, or simply the point where your application’s internal state management can’t keep up with the incoming data rate.

Want structured learning?

Take the full Udp course →