TCP retransmissions are happening, and you need to figure out why. This typically means a sender thinks packets are lost because it hasn’t received acknowledgments (ACKs) within its retransmission timeout (RTO) window, and it’s resending them.

Common Causes for TCP Retransmissions

  1. Network Congestion: The most frequent culprit. Routers along the path are overwhelmed, dropping packets because their buffers are full.

    • Diagnosis: Look for a high rate of retransmissions in tcpdump output, especially clustered together. Correlate this with high interface utilization on routers or switches in the path. On Linux, sar -n DEV 1 or iftop can show interface bandwidth usage.
    • Fix: This often requires network-level intervention. If you control the network, you might implement Quality of Service (QoS) to prioritize certain traffic, increase buffer sizes on routers, or reduce the overall traffic load. For a client/server scenario, you might need to inform the network team.
    • Why it works: By reducing the number of packets hitting congested links or prioritizing critical traffic, you allow ACKs to flow back to the sender in a timely manner, preventing it from timing out.
  2. Packet Loss Due to Faulty Hardware: A bad cable, a failing NIC, or a misconfigured switch port can silently drop packets.

    • Diagnosis: tcpdump will show retransmissions. If this is localized, checking interface error counters on switches (show interface <interface_name>) for CRC errors, discards, or input/output errors can pinpoint a physical issue. On the host, netstat -s might show packet drops at the OS level, though this is less granular.
    • Fix: Replace the faulty cable, switch port, or NIC. If it’s a misconfiguration, correct it (e.g., duplex mismatch, speed negotiation).
    • Why it works: Eliminating the physical or logical defect that’s causing packets to be dropped allows them to reach their destination and be acknowledged.
  3. Firewall/IDS Stateful Inspection Issues: State tables in firewalls or Intrusion Detection Systems can become full, corrupted, or have overly aggressive timeouts, leading them to drop legitimate packets or their ACKs.

    • Diagnosis: If retransmissions coincide with traffic passing through a firewall, check the firewall logs for dropped packets, state table exhaustion, or policy violations. Sometimes, disabling features like deep packet inspection (DPI) or IPS on the firewall temporarily can reveal if it’s the cause.
    • Fix: Increase the state table size on the firewall, adjust session timeouts to be more generous (though this has security implications), or fine-tune IPS rules.
    • Why it works: A properly functioning state table ensures that the firewall correctly tracks the TCP connection. If it drops a packet because its state is lost or it incorrectly flags it as malicious, the connection breaks down, forcing retransmissions.
  4. Duplex Mismatch: A classic networking problem where one end of a link is set to full-duplex and the other to half-duplex. This leads to collisions and dropped packets.

    • Diagnosis: While less common on modern auto-negotiating links, it can still occur. tcpdump will show retransmissions. On switches, interface counters might show a high number of late collisions or output discards on one side of the link. ethtool <interface_name> on Linux can show the negotiated speed and duplex for the host’s NIC.
    • Fix: Ensure both ends of the link are set to the same speed and duplex mode (usually auto-negotiate, or explicitly set both to 1000/Full or 100/Full).
    • Why it works: When duplex modes don’t match, collisions occur when both devices try to transmit simultaneously. This corrupts the packets, causing them to be dropped and triggering retransmissions.
  5. High Latency and Jitter with Small Receive Windows: TCP’s performance is heavily influenced by the round-trip time (RTT) and the receiver’s advertised window size. If the window is too small for the RTT, the sender can’t send enough data to keep the pipe full, leading to idle periods and potential timeouts.

    • Diagnosis: Use tcpdump with the -s 0 flag to capture full packets and analyze the win= value in the TCP options. A consistently small win= value (e.g., a few KB) combined with high RTT (calculate from SYN/SYN-ACK or ACK timing) indicates a potential bottleneck. ping -c 10 <destination> can give an idea of RTT and variance (jitter).
    • Fix: Increase the TCP receive buffer sizes on the server. On Linux, this is controlled by net.core.rmem_max and net.ipv4.tcp_rmem sysctl parameters. For example, sysctl -w net.core.rmem_max=16777216 and sysctl -w net.ipv4.tcp_rmem="4096 87380 16777216".
    • Why it works: A larger receive window allows the sender to transmit more unacknowledged data, effectively "filling the pipe" over high-latency links. This prevents the sender’s retransmission timer from expiring due to a lack of outgoing data.
  6. Application-Level Issues (Less Common for pure retransmissions): While less direct, an application that isn’t processing data fast enough can cause the OS to fill its receive buffers, leading to the OS advertising a small window size to the sender.

    • Diagnosis: Monitor application performance and CPU/memory usage on the receiving host. If the application is sluggish, it might be the root cause.
    • Fix: Optimize the application’s performance or increase the resources available to it.
    • Why it works: A responsive application consumes data from the OS buffer, allowing the OS to advertise a larger receive window. This, in turn, allows the sender to send data more aggressively, reducing the likelihood of timeouts.

The next error you’ll likely encounter is TCP Zero Window if you’ve addressed retransmissions but the receiver is still overwhelmed or slow to process data.

Want structured learning?

Take the full Tcpdump course →